Alchemy: Nested Types

Apr 22

Alchemy: Nested Types

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

I am almost done describing the first set of features that I was targeting when I set out to create Alchemy. The only remaining feature to be documented is the ability to have nested types. Basically, structs within structs. This entry describes the approach that I took as well as some of the challenges that I had to conquer in order to create a usable solution.

The Concept

The concept for adding nested field types seemed straight-forward, and it actually is. Recursion with templates and template specialization were the primary tools that I had employed to get as far as I have up to this point. Creating a form of compile-time reflection allowed me to iterate over all fo the fields of a top-level structure. The elegance of the solution was to simply reuse the existing processing on a second-layer structure.

This code is rotting from the inside

I was able to get the basic recursive calls into place rather quickly. However, some of my design decisions caused a great deal of resistance to some of the new changes. One of the challenges was caused by the difference in parameterized-type baggage that each type of data element required. This raised some red flags for me because there were only three datatypes, fundamental, bit-lists and nested structs.

The other issue stemmed from my original approach to memory management. I had envisioned the entire message to be a data-structure that handled memory management and allowed the user to concentrate on the data fields. The data-fields would be serialized as they were accessed, therefore the buffer would always be ready to send off at a moments notice.

The trouble arose because I didn’t want the messages to always be initialized. There are plenty of temporary copies that are created, and performance would have taken an enormous hit if buffers were allocated and immediately destroyed without ever being used. As I was saying, the trouble arose because there were still plenty of times that I had to have the buffers initialized with the proper data.

Templates are supposed to reduce the amount of code that a developer must create and maintain, not multiply it. These are observations that I kept in mind for any possible future revisions.

Fighting the internal structure

In order to initialize the buffers, I found myself creating numerous template specializations to handle each type of data, especially the nested structure. Furthermore, these specializations were located deep inside of the Datum object, which was supposed to be a very generic adapter for all of the data types.

The conundrum I faced, was an internal data field that was stored within each Datum. However, when I would encounter a nested data-type, I didn’t need to create any internal data, because the nested structure held a collection of its own Datum objects. These Datum objects already had their own data. I needed to find a way to avoid the creation of redundant data when I encountered nested fields.

On Shadow Buffers

For the temporary messages, there were never going to have memory allocated for them, I usually needed a way to pass data. So in each Datum, a value_type variable was defined to act as temporary storage. These shadow buffers exist in the current form of Alchemy as well. Albeit a bit refined for efficiency and convenience.

When a message did not have an allocated buffer, and all of the value data was stored in these shadow buffers, scenarios would occur where the data would not be properly flushed to the new object once memory was allocated.

Abstracting the aberration

To combat the ever-growing need for more specializations, which I was also dispatching by hand, I reached for a natural tool to solve this, more template specializations. Actually, I created some adapter classes that encapsulated the type information that I needed, including the data type required for the shadow buffer. I then used the type defined by the user to specialize each data type and allow for the correct functions to be called with tag-dispatching.

Let me explain.

Static vs. Dynamic Polymorphism

When using template heavy solutions, you learn to take advantage of static polymorphism. This is a type of polymorphism that takes advantage of implicit declarations defined within an object. Contrast this with dynamic polymorphism, public inheritance, where the interface declarations must be explicit. There is a direct relationship explicitly defined when inheritance is used.

The implied relationship that is employed with static polymorphism occurs simply because we expect a field named ‘X’ or a function named ‘Y’. When the code is compiled, the template is instantiated, and as long as the implied resource is present, our code will compile. The polymorphism them becomes fixed, or static.

If you have ever wondered why there are so many typedefs defined within the STL containers, value_type, pointer, const_pointer, key_type, size_type and so on. It is because the algorithms used through-out the implementation are using static polymorphism to make the containers interchangeable, as well as the different functions used to implement the basic functionality.

When you have a consistent type-convention to build upon, you are then able to create objects with orthogonal relationships that can still naturally interact with one and other.

Static Polymorphism in use

I created a FieldType struct to abstraction to define the data type I required for each type of Datum that is declared within a Hg message structure.

C++

template< typename FieldT > 
struct field_data_t 
{
  // The matching data type for the index.
  // The default implementation uses 
  // the same type as the index type. 
  typedef FieldT value_type; 
};

Notice there are only declarations in this structure. There are no member variable definitions. This is the first part of my abstraction. This next structure has a default template implementation, and each nested definition gets its own specialization:

C++

template< typename FieldT, 
          size_t kt_offset = 0 
        > 
struct FieldTypes 
{ 
  // The type at the index of the 
  // parent type container. 
  typedef FieldT index_type; 
 
  // The specified value type for 
  // the current Datum. 
  typedef typename 
    field_data_t< index_type >::value_type 
                  value_type; 
  value_type      m_shadow_data; 
};

This struct definition allows me to map the original type extracted from the defined TypeList, and statically map it to replacement type that is best suited for the situation. The fundamental types simply map to themselves. On the other hand, the nested types map to a specialization similar to the definition below. You will notice that a reference to the value_type is defined in this specialization as opposed to an actual data member like the default template. This is how I was able to overcome the duplication of resources

The ‘F’ in the definition below is actually extracted from a MACRO that generates this specialization for each nested field. ‘F’ represents the nested format type.

C++

template< typename storage_type, 
          size_t kt_offset 
        > 
struct field_types < F, storage_type,kt_offset > 
  : F##_payload< storage_type, kt_offset > 
{ 
  typedef F index_type; 
  typedef F##_payload < storage_type, kt_offset > value_type; 
 
  field_types() 
    : m_shadow_data(This()) 
  { } 
 
  value_type& This() 
  { return *this; } 
 
  value_type &  m_shadow_data; 
};

It is important to note that defining references inside of classes or structs is perfectly legal and can provide value. However, these references do come with risks. The references must be initialized in the constructor, which implies you know where you want to point them towards. Also, you most likely will need to create a copy constructor and an assignment operator to ensure that you point your new object instances to the correct reference.

Finally, here is a convenience template that I created to simplify the definitions that I had to use when defining the Datum objects. This struct encapsulates all of the definitions that I require for every field. When I finally arrived at this solution, things started to fall into place because I only needed to know the TypeList, Index, and relative offset for the field in the final buffer.

I have been able to refine this declaration even further, the most current version of Alchemy no longer needs the kt_offset declaration.

C++

template< size_t Idx, 
          typename format_t, 
          size_t kt_offset 
        > 
struct DefineFieldType 
{ 
  // The type extracted at the current 
  // index defined in the parent TypeList. 
  typedef typename 
    Hg::TypeAt< Idx, format_t >::type index_type; 
 
  // The field type definition that maps 
  // a field type with it's value_type. 
  typedef typename 
    detail::FieldTypes< index_type, 
                        OffsetOf::value + kt_offset 
                      > type; 
};

Other Mis-steps

There are a few issues that were discovered as my API was starting to be put to use. However, the problems that were uncovered were fundamental issues with the original structure and design. Therefore, I would have to take note of these issues and address them at a later time.

Easy to use incorrectly, Difficult to use… at all

The original users of the Hg messages were also confused on how the buffers were intended to be managed and they created some clever wrapper classes and typedefs to make sure they always had memory allocated for the message.

When I saw this, I knew that I had failed in my design because the interface was not intuitive, it created surprising behavior, and most of all, I found myself fumbling to explain how it was supposed to work. I was able to add a few utility functions similar to the C++ Standard Libraries std::make_shared< T >, but this only masked the problem. This wasn’t an acceptable fix to the actual problem.

Redundant Declarations

There was one more painful mistake that I discovered during my integration of nested data types. The mechanism that I used to declare a top-level message made them incompatible with becoming a nested field of some other top-level message. In order to achieve this, the user would have had to create a duplicate definition with a different name.

This was by no means a maintainable solution. I did keep telling myself this was a proof-of-concept and I would resolve it once I could demonstrate the entire concept was feasible.

Requiring two different MACROs to be used to declare a top-level message, and a nested field should have been a red flag to me that I was going to run into trouble later on because how I was treating the two fundamentally similar structures differently:

C++

// Top-level message MACRO
DECLARE_PAYLOAD_HEADER(F)
 
// A struct intended to be used as 
// a nested field inside of a payload header.
DECLARE_NESTED_HEADER(F)

Summary

If I had known what I was building from the start, and had put it to use in more realistic situations during test, I may have been able to avoid the trouble that I encountered while adding the ability to support nested structures. The solution really should have been an elegant recursive call to support the next structure on the queue.

I am actually glad it didn’t work out so smoothly though, because I tried many different approaches and learned a bit more each time I tried a technique where it would be most valuable. Every now and then I still find myself struggling to wrap my head around a problem and find a clean solution. In this context, they now come much easier for me.

Oh yeah, and I eventually did reach the elegant recursive nested fields, which you will learn how in my next Alchemy entry describing my use of proxy objects.

To know and not do, is to not yet know

Tags: Alchemy, CPP, Design

	`template< typename FieldT >`
	`struct field_data_t`
	`{`
	`// The matching data type for the index.`
	`// The default implementation uses`
	`// the same type as the index type.`
	`typedef FieldT value_type;`
	`};`

	`template< typename FieldT,`
	`size_t kt_offset = 0`
	`>`
	`struct FieldTypes`
	`{`
	`// The type at the index of the`
	`// parent type container.`
	`typedef FieldT index_type;`

	`// The specified value type for`
	`// the current Datum.`
	`typedef typename`
	`field_data_t< index_type >::value_type`
	`value_type;`
	`value_type m_shadow_data;`
	`};`

	`template< typename storage_type,`
	`size_t kt_offset`
	`>`
	`struct field_types < F, storage_type,kt_offset >`
	`: F##_payload< storage_type, kt_offset >`
	`{`
	`typedef F index_type;`
	`typedef F##_payload < storage_type, kt_offset > value_type;`

	`field_types()`
	`: m_shadow_data(This())`
	`{ }`

	`value_type& This()`
	`{ return *this; }`

	`value_type & m_shadow_data;`
	`};`

	`template< size_t Idx,`
	`typename format_t,`
	`size_t kt_offset`
	`>`
	`struct DefineFieldType`
	`{`
	`// The type extracted at the current`
	`// index defined in the parent TypeList.`
	`typedef typename`
	`Hg::TypeAt< Idx, format_t >::type index_type;`

	`// The field type definition that maps`
	`// a field type with it's value_type.`
	`typedef typename`
	`detail::FieldTypes< index_type,`
	`OffsetOf::value + kt_offset`
	`> type;`
	`};`

	`// Top-level message MACRO`
	`DECLARE_PAYLOAD_HEADER(F)`

	`// A struct intended to be used as`
	`// a nested field inside of a payload header.`
	`DECLARE_NESTED_HEADER(F)`

code_of_the_damned();