Alchemy: Message Buffer

Why Computers Haven't Replaced Programmers »

Dec 17

Alchemy: Message Buffer

adaptability, portability, reliability, CodeProject, C++, maintainability, Alchemy, designAdd comments

This is an entry for the continuing series of blog entries that documents the design and implementation process of a library. This library is called, Network Alchemy[^]. Alchemy performs data serialization and it is written in C++. This is an Open Source project and can be found at GitHub.

Previously I posted the first prototype that demonstrates that the concept of Alchemy is both feasible and useful. However, the article ended up being much longer than I had anticipated and was unable to cover serializing the user object to and from a data stream. This entry will finish the prototype by adding serialization capabilities to the prototype for the basic datum fields that have already been specified.

Message Buffer

One topic that has been glossed over up to this point is how is the memory going to be managed for messages that are passed around with Alchemy. The Alchemy message itself is a class object that holds a composited collection of Datum fields convenient for a user to access, just like a struct. Unfortunately, this format is not binary compatible or portable for message transfer on a network or storage to a file.

We will need a strategy to manage memory buffers. We could go with something similar to the standard BSD socket API and require that the user simply manage the memory buffer. This path is unsatisfying to me for two reasons:

BSD sockets ignore the format of the data and simply setup end-points as well as read/write capabilities.
Alchemy is an API that handles the preparation of binary data formats to create ABI compatible data-streams.

Ignoring the memory buffer used to serialize the data would only provide a marginal service to the user, however, not enough to be compelling for this to be a universal necessity when serializing data. Adding a memory management strategy to Alchemy would only require a small amount of extra effort on our part, yet provide enormous value to the user.

Considerations

It will be possible for us to create a solution that is completely transparent to the user, with respect to memory management. The Message object could simply hide the allocations and management internally. A const shared_ptr could be given to the user once they call an accessor function like data(). However, experience has shown be that often times developers have already tackled the memory management on their own.

Furthermore, even if they have not yet tackled the memory management problem, the abstractions that they have created around their socket and other transport protocols has forced a mechanism upon a user. Therefore, I propose that we develop a generic memory buffer. One that meets our immediate needs of development, and also provides flexibility to integrate other strategies in the future.

The Basics

There are four operations that must be considered when memory management is discussed. "FOUR?! I thought there was only two!" Go ahead and silently snicker at the other readers that you know made that exclamation because you were aware of the four operations:

Allocation
De-allocation
Read
Write

It's very easy to overlook the that read and write must be considered when we discuss memory allocation. Because if we simply talk in terms of malloc/free, new/delete, or simply new for JAVA and C#, you allocate a buffer, and reads and writes are implicitly built into the language. This only is only true for the fundamental types native to the language.

However, when you create an object, you control read and write access to the data with accessory functions for the specific fields of your object. In most cases we are interested in keeping the concept of raw memory abstract inside of an object. We are managing a buffer of memory, and it is important for us to be able to provide proper access to appropriate locations within the buffer that correspond to the values advertised to the user through the Datum interfaces.

That brings to mind one last piece of information that we will want to have readily available at all times, the size of the buffer. This is true whether we choose a strategy that uses a fixed size block of buffers, dynamically allocate the buffers, or we adapt a buffer previously defined by the user.

The Policy Design Pattern

Strictly speaking, this is better known as the Strategy design pattern. I am sure there are other names as well, probably as many as there are ways to implement it. We are developing in C++, and this solution is traditionally implemented with a policy-based design. We want to create a memory buffer object that is universal to our message implementation in Alchemy. So far we have not provided any hint of a special memory object to deal with in the Alchemy interface. I do not plan on changing this either.

However, we have already established there are multiple ways that memory will be used to transfer and store data. A Policy-based design will allow us to implement a single object to perform the details of managing a memory buffer and providing the correct read/write access, and still allow the user to integrate their own memory management system with Alchemy. This design pattern is an example of the 'O' in the SOLID object-oriented methodology. The 'O' represents Open for extension, closed for modification.

In order for a user to integrate their custom component, they will be required to implement a policy class to map the four memory management functions mentioned above to a standard form that will be accessed by our memory buffer class. A policy class is a collection of constants and static member functions. Generally a struct is used because of its public by default nature. The class that is extended expects a certain set of functions to be available in the policy type. The policy class is associated with the extended class as a template parameter. The only requirement is the policy class implements all of the functions and constants accessed by the policy host.

Policy Declaration

Here is the declaration for an Alchemy storage policy:

C++

struct StoragePolicy
{
  // Typedefs for generalization
  typedef unsigned char                 data_type;
  typedef data_type*                    pointer;
  typedef const data_type*              const_pointer;
  typedef std::shared_ptr&lt; data_type >  s_pointer;
 
  static
    s_pointer allocate(size_t size);
  static
    void deallocate(s_pointer &amp;spBuffer)
  static
    bool read ( const_pointer   pBuffer, 
                void*           pStorage,
                size_t          size,
                std::ptrdiff_t  offset)
  static
    bool write( pointer         pBuffer, 
                const void*     pStorage,
                size_t          size,
                std::ptrdiff_t  offset)
}:

The typedefs can be defined to any type that makes sense for the users storage policy. The class doesn't even need to be named or derived from StoragePolicy, because it will be used as a parameterized input type. The only requirement, is that the type does support all of the declarations defined above. When this is put to use, it becomes an example of static polymorphism. This is the foundation that most of The C++ Standard Library (formerly STL) is built upon. The polymorphism is invoked implicitly rather than explicitly by way of deriving from a base class and overriding virtual functions.

Policy Implementation

At this point, I am only concerned with leaving the door open to extensibility without major modifications in the future. That is my front-loaded excuse for why the implementation to these policy interface functions are so damn simple. Frankly, this code was original implemented inline with the original message buffer class. I thought that it would be better to introduce this policy extension now, so that some other decisions that you will see in the near future make much more sense. Don't blink as you scroll down, or you may miss the implementation for the functions of the storage policy below:

Allocate:

C++

static
  s_pointer allocate(size_t size)
  {
    s_pointer spBuffer = 
      std::make_shared(new(std::nothrow) data_type[size]);
    return spBuffer;
  }

Deallocate:

C++

static
    void deallocate(s_pointer &amp;spBuffer)
  {
    // No real action for this storage_policy.
    // Clear the pointer anyway.
    spBuffer.reset();
  }

Read:

C++

static
  bool read ( const_pointer   pBuffer, 
              void*           pStorage,
              size_t          size,
              std::ptrdiff_t  offset)
  {
    ::memcpy( pStorage,
              pBuffer + offset, 
              size);
    return true;
  }

Write:

C++

static
  bool write( pointer           pBuffer, 
              const void*       pStorage,
              size_t            size,
              std::ptrdiff_t    offset)
  {
    ::memcpy( pBuffer + offset,
              pStorage,
              size);
    return true;
  }

Message Buffer (continued)

I have covered all of the important concepts related to the message buffer, basic needs, extensibility and adaptability. There isn't much left except to present the class declaration and clarify any thing particularly tricky within the implementation of the actual class. Keep in mind this is an actual class, and we don't intend on providing direct user access to this particular object. The Alchemy class Hg::Message will be the consumer of this object:

Class Definition and Typedefs

typedefs are extremely important when practicing generic programming techniques in C++. They provide the flexibility to substitute different types in the function declarations. In some cases the types defined may seem silly, such as the size_type fields used in the STL. However, in our case the definitions for data_type, pointer and const_pointer become invaluable.

If it isn't obvious, the policy class that we just created is used as the template parameter below for the MsgBuffer. You will see further below in the function implementations that I display how the calls are make through the policy. We declared the functions static, therefore there is no need to create an instance of the policy.

One last note: Starting with C++11 the ability to alias definitions is preferred over the typedef. There are many advantages, some of which include partially defined template aliases, a more intuitive definition for function pointers, and the compiler preserves the name of the aliased type. Preservation of the type in the compiler error messages goes a long way towards improving the readability of template programming errors, especially template meta-programming errors.

C++

template &lt; typename StorageT>
class MsgBuffer
{
public:
  //  Typedefs **************************************************
  typedef StorageT                           storage_type;
  typedef typename 
    storage_type::data_type                  data_type;
  typedef typename 
    storage_type::s_pointer                  s_pointer;
  typedef typename 
    storage_type::w_pointer                  w_pointer;
 
  typedef data_type*.                        pointer;
  typedef const data_type*                   const_pointer;
 
  // ...
};

Construction

C++

//  Ctor ********************************************
  MsgBuffer();
 
  //  Fill Ctor ***************************************
  // Create a zeroed buffer with the requested size
   explicit 
    MsgBuffer(size_t n);
 
  //  Copy Ctor ***************************************
  MsgBuffer(const MsgBuffer&amp; rhs);
 
  //  Dtor ********************************************
  ~MsgBuffer();
 
  //  Assignment Operator ****************************
  MsgBuffer&amp; operator=(const MsgBuffer&amp; rhs);

Status

For a construct like the message buffer, I like to use functions that are consistent with the naming and behavior of the standard library. Or if my development fits closer in context to some other API I will select names that closely match the primary environment that most closely matches the code.

C++

bool empty() const;
 
  size_t capacity() const;
 
  size_t size() const;
 
  void clear();
 
  void resize(size_t n);
 
  void resize(size_t n, byte_t val);
 
  MsgBuffer clone() const;
 
  const_pointer data() const;

Basic Methods

There was one mistake, actually, learning experience that I acquired during my first attempt with this library. I did not provide a simple way for users to directly initialize an Alchemy buffer, from a buffer of raw memory. When in many cases, that is how their memory was managed or accessible to the user. I encouraged and intended for users to develop StoragePolicy objects to suite their needs. Instead they would create convoluted wrappers around the main Message object to allocate and copy data into the message construct.

This time I was sure to add an assign operation that would allow the initialization of the internal buffer from raw memory.

C++

//  *************************************************
  /// Zeroes the contents of the buffer.
  void zero();
 
  //  *************************************************
  /// Assigns the contents of an incoming 
  /// raw memory buffer to the message buffer.
  void assign(const_pointer pBuffer, size_t n);
 
  //  *************************************************
  /// Returns the offset used to access the buffer.
  std::ptrdiff_t offset() const;
 
  //  *************************************************
  /// Assigns a new base offset for 
  /// memory access to this object.
  void offset(std::ptrdiff_t new_offset);

I would like to briefly mention the offset() property. This will not be used immediately, however, it becomes useful once I add nested Datum support. This will allow a message format to contain sub-message formats. The offset property allows a single MsgBuffer to be sent to the serialization of sub-structures without requiring a distinction to be made between a top-level format and a nested format. When this becomes more relevant to the project I will elaborate further on this topic.

Getting Values

This function deserves an explanation. This is a template member-function. That means this is a parameterized member function, a function that requires template type-definitions. An instance of this function will be generated for every type that is called against it.

This function provides two values beyond allowing data to be extracted.

A convenient interface is created for the user to get values without a typecast.
Type-safety is introduced with this type specific function. All operations on the value can have the appropriate type associated with it up through this function call. This call performs the typecast to a void* at the final moment when data will be read into the data type.

C++

template &lt; typename T >
  size_t get_data(T&amp; value, std::ptrdiff_t pos) const
  {
    if (empty())
      return 0;
 
    std::ptrdiff_t total_offset = offset() + pos;
 
    // Verify the enough space remains in the buffer.
    size_t bytes_read = 0;
    if ( total_offset >= 0
      &amp;&amp; total_offset + sizeof(value) &lt;= size())
    {
      bytes_read =
        storage_type::read( data(),
                            &amp;value,
                            sizeof(T),
                            total_offset)
        ? sizeof(T)
        : 0;
    }
 
    return bytes_read;
  }

Setting Values

This function is similar to get_data, and provides the same advantages. The only difference is this function writes user data to the buffer rather than reading it.

C++

template &lt; typename T >
  size_t set_data(const T&amp; value, size_t pos)
  {
    if (empty())
      return 0;
 
    size_t total_offset = 
      static_cast&lt; size_t >(offset()) + pos;
 
    size_t bytes_written = 0;
    size_t total_size = size();
    if ( (total_offset >= 0)
      &amp;&amp; (total_offset + Hg::SizeOf&lt; t >::value) &lt;= total_size)
    {
      bytes_written = 
        storage_type::write ( raw_data(),
                              &amp;value,
                              Hg::SizeOf&lt; t >::value,
                              total_offset)
        ? Hg::SizeOf&lt; t >::value
        : 0;
    }
 
    return bytes_written;
  }

Summary

I have just presented the internal memory management construct that will be used in an Alchemy Message. We now have the final piece that will allow us to move forward and serialized the message fields programmatically into a buffer. My next entry on Alchemy will demonstrate how this is done.

To know and not do, is to not yet know

	`struct StoragePolicy`
	`{`
	`// Typedefs for generalization`
	`typedef unsigned char data_type;`
	`typedef data_type* pointer;`
	`typedef const data_type* const_pointer;`
	`typedef std::shared_ptr< data_type > s_pointer;`

	`static`
	`s_pointer allocate(size_t size);`
	`static`
	`void deallocate(s_pointer &spBuffer)`
	`static`
	`bool read ( const_pointer pBuffer,`
	`void* pStorage,`
	`size_t size,`
	`std::ptrdiff_t offset)`
	`static`
	`bool write( pointer pBuffer,`
	`const void* pStorage,`
	`size_t size,`
	`std::ptrdiff_t offset)`
	`}:`

	`static`
	`s_pointer allocate(size_t size)`
	`{`
	`s_pointer spBuffer =`
	`std::make_shared(new(std::nothrow) data_type[size]);`
	`return spBuffer;`
	`}`

	`static`
	`void deallocate(s_pointer &spBuffer)`
	`{`
	`// No real action for this storage_policy.`
	`// Clear the pointer anyway.`
	`spBuffer.reset();`
	`}`

	`static`
	`bool read ( const_pointer pBuffer,`
	`void* pStorage,`
	`size_t size,`
	`std::ptrdiff_t offset)`
	`{`
	`::memcpy( pStorage,`
	`pBuffer + offset,`
	`size);`
	`return true;`
	`}`

	`static`
	`bool write( pointer pBuffer,`
	`const void* pStorage,`
	`size_t size,`
	`std::ptrdiff_t offset)`
	`{`
	`::memcpy( pBuffer + offset,`
	`pStorage,`
	`size);`
	`return true;`
	`}`

code_of_the_damned();