Alchemy: Array / Vector Serialization

Apr 27

Alchemy: Array / Vector Serialization

C++, maintainability, Alchemy, designAdd comments

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

The alterations required up to this point have been relatively minor to integrate arrays and vectors into Alchemy. That does not mean that the solutions were clean and simple from the beginning. The exercise for integrating the serialization support of these containers was quite challenging. These containers became especially challenging because of the possibilities they created for flexibility with data management

Rabbit holes

Because of the foundation work that I laid out for the array and vector, I had the simple serialization scenarios working fairly quickly. It was only once I peered deeper into testing that I realized what challenges were ahead of me. Only after I was able to work through each problem, another one or possibly three appeared behind it.

I found myself on a journey down a rabbit hole.

The problem with a rabbit hole is that you start following it, and you have almost no context to look back and see how far you have actually travelled. Even worse, you have no feel for how much deeper it goes.

I followed the rabbit hole to solve two problems. One path I followed, and followed, and followed, and eventually abandoned. However, this did branch off early to the other path that became a necessity to make these sequential containers a viable and useful constructs.

Abandoned path

The path that I ended up abandoning was the situation where you create a sequence of BitList types. I initially started to follow this path because of the size of the BitList, v2 specifically. The BitList as an object took up much more space than its data does by itself. However, accessing the data alone does not provide the accessor functions that make the BitList valuable.

So I created specialized BitList sequence containers. These worked to some degree. However, they were extremely slow. They also added an enormous amount of complexity to a mostly clean solution. Moreover, it was only a partial solution. There were many more special cases that I would need to code around if I continued to follow this path.

In the end I exclaimed out loud with certainty to my computer screen "Using an array of bit-fields would be fools errand! Whoever builds a message like this should reconsider and make a better design." So I left the code for these containers in place, just in case I decided to pursue this path again in the future.

It turns out later that eventually I accidentally ended up on that fools quest. I was using an Alchemy message to read in a bitmap, and I wanted to perform pixel-by-pixel manipulations. I did find a better way, and you will read about it when I add my post regarding the demonstration application: "Stegonography". (BTW, No! I didn't end up creating the BlitList... but I'll have to consider that.)

Necessity, the mother of invention

What I discovered chasing the BitList through these containers, is that the std::vector also has the same problem. Its memory footprint is slightly larger than its data payload. Damnit! so do the nested fields.

It turns out that any type in the top-level message is handled naturally by virtue of the initial implementation. This is also true for any level of traversal into the message with a nested field type. However, the trouble appears if you attempt to place an array, a nested-type, a vector, or a BitList within an array or a vector.

These types would be very cumbersome to use, (and therefore probably not used) if I could not support the diverse range of nesting. I worked through the solution for these combinations:

Array of Arrays
Array of Nested-types
Array of Vectors
Vector of Arrays
Vector of Nested-types

I handled an array or vector of BitLists by converting the BitList to its internal integral type. The full data now fits in the allotted space, but the caller will need to cast or assign the indexed value to the BitList type if they want to access a bit-field value.

As far as attempting to solve the vector of vectors case, I started to believe that I was messing with forces beyond my understanding. For all I know attempting to wrangle in this case could disrupt the entire space-time continuum. I'll leave that to the folks at CERN.

In all seriousness, the scenario of vector of vectors starts to become a very convoluted and complex task (think vector of vectors of vectors. Since the container is a vector, it already requires a run-time based solution. For me to solve this in Alchemy would provide very little value compared to the risk it will introduce in the correctness of the code, as well as its maintainability.

If a user encounters a message where they need to support a vector of vectors, I believe the best approach would be to allocate a single vector of bytes large enough to contain all of the data, and develop a custom processor to meet that specific situation. The door is still open, and we can revisit and attempt to solve this case at any time. However, I believe there are far more valuable additions that can be made to Alchemy than this one.

How to over-stuff a data buffer

The short answer: "With meta-data".

If you have not recognize this by now, or at least stopped and said out loud, template meta-programming is all about the type information. Type information is meta-data. It is not directly manipulated at run-time, it is the context the compiler builds the processing logic upon. All assumptions and deductions make by the compiler, code-generator and linker are based upon the types and constant information supporting the logic of your code.

The reason I have trouble trying to store a vector inside of an array, is because there is extra run-time bookkeeping that is necessary to operate the vector run-time construct. However, since Alchemy is its own code-generator of sorts, we can manipulate the meta-data to become part of the logic that is generated by the compiler.

Therefore the data can remain its raw fixed size, and the type of message to be processed provides the context we need to correctly apply the meta-data. Essentially we are using the meta-data and compiler logic to help us encode more data in a buffer than would normally fit.

Transforming the meta-data

Once I realized that it was the run-time types that were populating the sequence containers, I started to search for a solution. I was searching for a method that would allow me to prevent run-time types from being the target type processed when the compile-time reflection algorithms were at work, such as the ForEachType processor.

At first I started replacing the types at the point-of-execution in the specialized functions. This proved to be the correct approach to solve the problem, however, the cost was too great. Implementing specialized templates for each potential case and performing the work inline started to make the source code in this region a cluttered mess.

I still believed this was the correct approach, I just needed to find a cleaner way to implement the solution. I decided to create a second TypeList that was appropriate for use with the Hg meta-processing, such as the serialization of data. This second list would substitute the most appropriate type within the array and vector to generate the proper logic.

Hg::make_hg_type_list

I created this reference table to help me understand the collection of relationships that I had to deal with.

Type<T>:	Sub-type<S>:	Substitution Required:
Fundamental	n/a	none
BitList	n/a	none
Nested	n/a	Hg::make_Hg_type_list<T>::type

Array	Fundamental	none
Array	BitList	BitList integral type
Array	Nested	Hg::make_Hg_type_list<S>::type
Array	Array	yes / Sub-type also processed
Array	Vector	vector / Sub-type also processed

Vector	Fundamental	none
Vector	BitList	BitList integral type
Vector	Nested	Hg::make_Hg_type_list<S>::type
Vector	Array	yes / Sub-type also processed
Vector	Vector	Warning: Potential risk of disrupting the space-time continuum

I will provide a brief overview and a few samples for how I implemented this substitution table. If you want to inspect the full implementation you can find it here on GitHub: Alchemy/Hg/make_Hg_type_list.h[^]

This is the meta-function that processes over a TypeList, to define a new typedef called type, which is the appropriate substitution for the table listed above.

C++

//  **************************************************
//  @Note:  Type T must be a typelist.
//
template< class T >
struct make_Hg_type_list
{
protected:
  // The next item on the list to process.
  typedef typename                 
    Hg::front<T>::type             next_type;
 
  // The remainder of the unprocessed types.
  typedef typename
    Hg::pop_front<T>::type         list_tail;
 
public:                                        
  // Make the Hg compatible list.
  typedef typename                      
    detail::make_Hg_worker< next_type,
                            list_tail,
                            Hg::TypeList<Hg::MT>
                          >::type       type;
};

detail::make_Hg_worker has a base implementation and a terminating specialization. The terminator simply typedefs the last parameterized type. Thus the recursion stops. The default template defines a set of types designed to build a new list that mirrors the original; with the exception that some of the types will have adjusted replacements as needed:

Let's break this one into a few chunks. The typedefs defined in this sample will be used to create the recursive expansion, and the final result:

C++

template< typename T,
          typename TailT,
          typename HgT
        >
struct make_Hg_worker
{
  // The result of inspecting a type and
  // potentially adjusting the type.
  typedef typename
    AdjustType<T>::type           adjusted_type;
 
  // The next item on the list to process.
  typedef typename
    Hg::front<TailT>::type        next_type;
 
  // The remainder of the unprocessed types.
  typedef typename
    Hg::pop_front<TailT>::type    list_tail;
 
  // ...

This is the remainder of the worker template. The first typedef is the recursive call to complete the processing of the other fields. The typedef for Hg::push_front defines the final replacement TypeList.

C++

// ...
 
  // A recursively built list of processed
  // types that have been adjusted to be
  // compatible for Hg message types.
  typedef typename                      
    make_Hg_worker< next_type,          
                    list_tail,          
                    HgT                 
                  >::type               hg_list;
 
  // The list of types converted for Hg.
  typedef typename  
    Hg::push_front< hg_list,
                    adjusted_type       
                  >::type               type; 
};

There are three more templates worth describing, otherwise the remainder of definitions are a set of template specializations that are used to fulfill the requirements mapped out in the table above. The first template is a simple adapter that I included for clarity. It's contents could have been placed completely inline inside of the make_Hg_worker template. However, I believe it makes the final solution much clearer. At the same time, it moves all type replacement logic to a single point outside of the main worker function.

C++

template< class T >
struct AdjustType
{
  typedef typename 
    ReplaceType < T, 
                  typename DeduceTypeTrait<T>::type 
                >::type                 type;
};

The next sample is one specialization of a ReplaceType template. A specialization of this template exists for each Alchemy supported data type. This sample demonstrates the vector replacement.

C++

template< class VectorT >
struct ReplaceType< VectorT, vector_trait>
{
  // Extract the value type declaration from inside the vector declaration.
  typedef typename
    VectorT::value_type                 value_type;
 
  // Deduce the type trait of value_type
  // for declaration of the type sequence.
  typedef typename
    DeduceTypeTrait<value_type>::type   type_trait;
 
  typedef typename
    DeclareTypeSequence < VectorT, 
                          type_trait
                        >::type         type;
};

The final typedef in the previous sample uses the final template meta-function DeclareTypeSequence to define the replacement type. DeclareTypeSequence is actually a simple declaration, which redefines an array or vector to use a type that will manage the expanded type, rather than the type specified in the original TypeList. This is where the meta-data becomes useful, and is applied to construct the proper run-time structures for managing the serialized data.

C++

template< typename T,
          typename A,
          template <class, class> typename VectorT
        >
struct DeclareTypeSequence < VectorT<T,A>, nested_trait  >
{
  // Define a typedef that represents the actual data-type
  // that reprsents the type-list T passed into this array.
  typedef typename
    FieldTypes <T>::value_type          value_type;
 
  typedef 
    std::vector< value_type, A>         type;
};

Hg::make_Hg_type_list is instantiated with the same TypeList that is used to define a new HG_BEGIN_FORMAT definition. The Hg::make_Hg_type_list version of the replaced TypeList is then used in the field_data_t specialization that is created for every Alchemy data-type, that defines the value_type to be used for storing message data.

The substitute TypeList is a solution that evolved after a number of iterative attempts to elegantly solve serialization for the new sequential containers. What I demonstrate in the next section is the current serialization implementation used in Alchemy that is the result of an evolved solution.

pack_message

pack_message is the implementation that is called by the Hg::message object to take the run-time data stored within the object, and pack it into a raw-storage buffer in the format specified by the message's TypeList definition.

With the addition of a variable-sized field, two specializations of this function exist. The compiler tag-dispatches to the correct function based on the message's SizeTrait characteristic.

Fixed-size operation

C++

template< typename MsgT,
          typename BufferT,
          typename SizeTraitT
        >
bool
  pack_message( MsgT& msg_values,
                BufferT & fixed_buffer)
{
  return   detail::pack_fixed_size_message( msg_values,
                                            fixed_buffer,
                                            SizeTraitT());
}

Variable-size operation

C++

template< typename MsgT,
          typename BufferT,
          typename SizeTraitT
        >
size_t pack_message(MsgT &msg_values,
                    BufferT  &buffer,
                    size_t    offset)
{
  return detail::pack_message < MsgT, 
                                BufferT
                              >(msg_values, 
                                buffer,
                                offset,
                                SizeTraitT());
}

The unpack_message template functions are similarly defined.

detail::pack_message

Both versions of Hg::pack_message call a different version of the internal detailed version. The difference between these two functions is in how they initialize the message buffer that receives the serialized data. The variable-sized implementations prepare for process that may require dynamic offsets based on the placement and size of vectors defined in the middle of the message. Once both versions initialize the PackMessageWorker, the recursive process of serializing the message data begins.

C++

template< typename MsgT,
          typename BufferT
        >
BufferT <
  pack_message( MsgT      &msg_values, 
                size_t     size,
                BufferT  & buffer,
                const dynamic_size_trait&)
{
  // Memory is only allocated for 
  // the variable-sized message.
  buffer.resize(size);
 
  detail::PackMessageWorker < 0, 
                              Hg::length<typename MsgT::format_type>::value,
                              MsgT,
                              BufferT
                            > pack;
  // This additional parameter tracks the dynamic
  // adjustments of variable-sized messages.
  size_t dynamic_offset = 0;
  pack(msg_values,  buffer, dynamic_offset);
  return buffer;
}

Both versions call the function operator of the same PackMessageWorker object. The only difference, is the variable-sized implementation provides an extra parameter called, dynamic_offset, which propagates the total number of bytes that have accumulated from variable-length fields. This indicates to the detail-leveled functions to make appropriate adjustments with respect to the sizes pre-calculated at compile-time.

C++

template< size_t    Idx,
          size_t    Count,
          typename  MsgT,
          typename  BufferT
        >
struct PackMessageWorker
{ 
  void operator()(MsgT &message,
                  BufferT  &buffer)
  {
    size_t dynamic_offset = 0;
    WriteDatum< Idx, MsgT, BufferT>(
      message, 
      buffer,
      dynamic_offset);
 
    PackMessageWorker < Idx+1, Count, MsgT, BufferT> pack;
    pack(message, buffer);
  }
 
  void operator()(MsgT &message,
                  BufferT  &buffer,
                  size_t   &dynamic_offset)
  { /* ... */ }
};

Where is ForEachType<T>

The ForEachType construct is not used at this level because of the depth and diversity of the combination of message structures that are possible. This construct is convenient when the same operation must be performed on all fields, and the meta-function supplied to ForEachType does not contain mutable inputs, such as dynamic_offset.

Therefore, the familiar recursive processing of each indexed field is performed through the end of the TypeList of parameters. The recursive loop first writes the current Datum, then makes the recursive call to the next field. There is a PackDatum and UnpackDatum specialization created for each type.

PackDatum / UnpackDatum

The PackDatum specialization for the array and vector types contain the more complicated implementations. This is because once they begin serializing their data, they must inspect each sub-field and determine if it can be written directly, or must be placed in a new round of pack_message calls. Then based upon the size_trait of the sub-type for the sequence containers, the most efficient implementation will be selected.

Here is the PackDatum functor for a fundamental-type:

C++

template< size_t   IdxT,      
          typename MsgT, 
          typename BufferT,
          typename TraitT
        >
struct PackDatum
{
  void operator()(MsgT     &msg,
                  BufferT  &buffer,
                  size_t    dynamic_offset)
  {
    // Removed typedefs for brevity
    value_type &value = 
      msg.template FieldAt<IdxT>().get();
    size_t offset = Hg::OffsetOf
                       < IdxT, 
                         typename MsgT::format_type
                       >::value
                     + dynamic_offset;
    buffer.set_data(value, offset);
  }
};

PackDatum for Array and Vector

I am not going to go into much deeper detail regarding the array and vector PackDatum implementations, because they could fill an entire blog entry on their own. However, I will elucidate the basic flow for these constructs.

The primary difference in their functor is another layer of indirection where a function called SerializeArray or SerializeVector is called dependent upon its type. The number of bytes written is captured and added to the dynamic_offset count.

The SerializeSequence function invokes a specialized functor based upon the sub-type contained within the sequence container. There is an implementation that is specialized for each Datum type supported in Alchemy. This is the single point of code in the entire library seems to be too tightly coupled. As the array and vector require knowledge and processing specific to each type supported by the library.

These implementations do choose between item-by-item transfers or bulk-transfers based on the type. I have this section of code marked to return to and attempt to further de-couple the type of processing required to properly serialize these types.

For now I find a little bit of solace knowing that this coupled behavior is found at the end of a processing chain as opposed to the central processing hub for how messages are structured. Therefore, I feel comfortable letting the library continue to grow for a short while before I decide the best course of corrective action.

If you would like to view the pack / unpack implementations for the array or vector, you can find them here on GitHub.

pack_array: Alchemy/Hg/detail/pack_array.h[^]
pack_vector: Alchemy/Hg/detail/pack_vector.h[^]
unpack_array: Alchemy/Hg/detail/unpack_array.h[^]
unpack_vector: Alchemy/Hg/detail/unpack_vector.h[^]

Summary

Reworking the serialization system to support a sequence of types has been one of the most challenging tasks that I have faced in Alchemy up to this point. In part because I was forced to rethink conventions that I had already put in place. Another reason is the fundamental change in how buffers are accounted and managed in order to support messages of variable size.

I am content with its current state. Although, as I stated, I do have some portions of the sequence serializer code that I want to revisit. I would try to further decouple the solution as well as work in further optimizations.

Speaking of optimizations, I am excited for the next post that I will be writing about Alchemy, the performance benchmarks. I compare a small set of scenarios defined in Alchemy with a naïve, hand-written serializer that uses 1-byte aligned packed structures, and the byte-order conversion functions that are included with network libraries. As a teaser, my first comparison run was abysmal. I even contemplated discontinuing the project until I did a little bit of analysis to find the root cause of my bottlenecks.

Needless to say, I would not still be developing Alchemy if I had not been able to overcome the limitations of my first implementation. I will describe what I found, what I changed, and what I learned helps improve the speed of execution in development.

To know and not do, is to not yet know

Tags: Alchemy, CPP, Code, Design

	`// **************************************************`
	`// @Note: Type T must be a typelist.`
	`//`
	`template< class T >`
	`struct make_Hg_type_list`
	`{`
	`protected:`
	`// The next item on the list to process.`
	`typedef typename`
	`Hg::front<T>::type next_type;`

	`// The remainder of the unprocessed types.`
	`typedef typename`
	`Hg::pop_front<T>::type list_tail;`

	`public:`
	`// Make the Hg compatible list.`
	`typedef typename`
	`detail::make_Hg_worker< next_type,`
	`list_tail,`
	`Hg::TypeList<Hg::MT>`
	`>::type type;`
	`};`

	`template< typename T,`
	`typename TailT,`
	`typename HgT`
	`>`
	`struct make_Hg_worker`
	`{`
	`// The result of inspecting a type and`
	`// potentially adjusting the type.`
	`typedef typename`
	`AdjustType<T>::type adjusted_type;`

	`// The next item on the list to process.`
	`typedef typename`
	`Hg::front<TailT>::type next_type;`

	`// The remainder of the unprocessed types.`
	`typedef typename`
	`Hg::pop_front<TailT>::type list_tail;`

	`// ...`

	`// ...`

	`// A recursively built list of processed`
	`// types that have been adjusted to be`
	`// compatible for Hg message types.`
	`typedef typename`
	`make_Hg_worker< next_type,`
	`list_tail,`
	`HgT`
	`>::type hg_list;`

	`// The list of types converted for Hg.`
	`typedef typename`
	`Hg::push_front< hg_list,`
	`adjusted_type`
	`>::type type;`
	`};`

	`template< class T >`
	`struct AdjustType`
	`{`
	`typedef typename`
	`ReplaceType < T,`
	`typename DeduceTypeTrait<T>::type`
	`>::type type;`
	`};`

	`template< class VectorT >`
	`struct ReplaceType< VectorT, vector_trait>`
	`{`
	`// Extract the value type declaration from inside the vector declaration.`
	`typedef typename`
	`VectorT::value_type value_type;`

	`// Deduce the type trait of value_type`
	`// for declaration of the type sequence.`
	`typedef typename`
	`DeduceTypeTrait<value_type>::type type_trait;`

	`typedef typename`
	`DeclareTypeSequence < VectorT,`
	`type_trait`
	`>::type type;`
	`};`

code_of_the_damned();