« Alchemy: Benchmarks and Optimizations | Alchemy: Vectors » |
A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.
The alterations required up to this point have been relatively minor to integrate array
s and vector
s into Alchemy. That does not mean that the solutions were clean and simple from the beginning. The exercise for integrating the serialization support of these containers was quite challenging. These containers became especially challenging because of the possibilities they created for flexibility with data management
Rabbit holes
Because of the foundation work that I laid out for the array
and vector
, I had the simple serialization scenarios working fairly quickly. It was only once I peered deeper into testing that I realized what challenges were ahead of me. Only after I was able to work through each problem, another one or possibly three appeared behind it.
I found myself on a journey down a rabbit hole.
The problem with a rabbit hole is that you start following it, and you have almost no context to look back and see how far you have actually travelled. Even worse, you have no feel for how much deeper it goes.
I followed the rabbit hole to solve two problems. One path I followed, and followed, and followed, and eventually abandoned. However, this did branch off early to the other path that became a necessity to make these sequential containers a viable and useful constructs.
Abandoned path
The path that I ended up abandoning was the situation where you create a sequence of BitList
types. I initially started to follow this path because of the size of the BitList
, v2 specifically. The BitList
as an object took up much more space than its data does by itself. However, accessing the data alone does not provide the accessor functions that make the BitList
valuable.
So I created specialized BitList
sequence containers. These worked to some degree. However, they were extremely slow. They also added an enormous amount of complexity to a mostly clean solution. Moreover, it was only a partial solution. There were many more special cases that I would need to code around if I continued to follow this path.
In the end I exclaimed out loud with certainty to my computer screen "Using an array of bit-fields would be fools errand! Whoever builds a message like this should reconsider and make a better design." So I left the code for these containers in place, just in case I decided to pursue this path again in the future.
It turns out later that eventually I accidentally ended up on that fools quest. I was using an Alchemy message to read in a bitmap, and I wanted to perform pixel-by-pixel manipulations. I did find a better way, and you will read about it when I add my post regarding the demonstration application: "Stegonography". (BTW, No! I didn't end up creating the BlitList... but I'll have to consider that.)
Necessity, the mother of invention
What I discovered chasing the BitList through these containers, is that the std::vector
also has the same problem. Its memory footprint is slightly larger than its data payload. Damnit! so do the nested fields.
It turns out that any type in the top-level message is handled naturally by virtue of the initial implementation. This is also true for any level of traversal into the message with a nested field type. However, the trouble appears if you attempt to place an array
, a nested-type, a vector
, or a BitList
within an array
or a vector
.
These types would be very cumbersome to use, (and therefore probably not used) if I could not support the diverse range of nesting. I worked through the solution for these combinations:
- Array of Arrays
- Array of Nested-types
- Array of Vectors
- Vector of Arrays
- Vector of Nested-types
I handled an array
or vector
of BitLists
by converting the BitList
to its internal integral type. The full data now fits in the allotted space, but the caller will need to cast or assign the indexed value to the BitList
type if they want to access a bit-field value.
As far as attempting to solve the vector
of vectors
case, I started to believe that I was messing with forces beyond my understanding. For all I know attempting to wrangle in this case could disrupt the entire space-time continuum. I'll leave that to the folks at CERN.
In all seriousness, the scenario of vector
of vectors
starts to become a very convoluted and complex task (think vector
of vectors
of vectors
. Since the container is a vector
, it already requires a run-time based solution. For me to solve this in Alchemy would provide very little value compared to the risk it will introduce in the correctness of the code, as well as its maintainability.
If a user encounters a message where they need to support a vector
of vectors
, I believe the best approach would be to allocate a single vector
of bytes large enough to contain all of the data, and develop a custom processor to meet that specific situation. The door is still open, and we can revisit and attempt to solve this case at any time. However, I believe there are far more valuable additions that can be made to Alchemy than this one.
How to over-stuff a data buffer
The short answer: "With meta-data".
If you have not recognize this by now, or at least stopped and said out loud, template meta-programming is all about the type information. Type information is meta-data. It is not directly manipulated at run-time, it is the context the compiler builds the processing logic upon. All assumptions and deductions make by the compiler, code-generator and linker are based upon the types and constant information supporting the logic of your code.
The reason I have trouble trying to store a vector
inside of an array
, is because there is extra run-time bookkeeping that is necessary to operate the vector
run-time construct. However, since Alchemy is its own code-generator of sorts, we can manipulate the meta-data to become part of the logic that is generated by the compiler.
Therefore the data can remain its raw fixed size, and the type of message to be processed provides the context we need to correctly apply the meta-data. Essentially we are using the meta-data and compiler logic to help us encode more data in a buffer than would normally fit.
Transforming the meta-data
Once I realized that it was the run-time types that were populating the sequence containers, I started to search for a solution. I was searching for a method that would allow me to prevent run-time types from being the target type processed when the compile-time reflection
algorithms were at work, such as the ForEachType
processor.
At first I started replacing the types at the point-of-execution in the specialized functions. This proved to be the correct approach to solve the problem, however, the cost was too great. Implementing specialized templates for each potential case and performing the work inline
started to make the source code in this region a cluttered mess.
I still believed this was the correct approach, I just needed to find a cleaner way to implement the solution. I decided to create a second TypeList
that was appropriate for use with the Hg
meta-processing, such as the serialization of data. This second list would substitute the most appropriate type within the array
and vector
to generate the proper logic.
Hg::make_hg_type_list
I created this reference table to help me understand the collection of relationships that I had to deal with.
Type<T>: |
Sub-type<S>: |
Substitution Required: |
Fundamental | n/a | none |
BitList | n/a | none |
Nested | n/a | Hg::make_Hg_type_list<T>::type |
Array | Fundamental | none |
Array | BitList | BitList integral type |
Array | Nested | Hg::make_Hg_type_list<S>::type |
Array | Array | yes / Sub-type also processed |
Array | Vector | vector / Sub-type also processed |
Vector | Fundamental | none |
Vector | BitList | BitList integral type |
Vector | Nested | Hg::make_Hg_type_list<S>::type |
Vector | Array | yes / Sub-type also processed |
Vector | Vector | Warning: Potential risk of disrupting the space-time continuum |
I will provide a brief overview and a few samples for how I implemented this substitution table. If you want to inspect the full implementation you can find it here on GitHub: Alchemy/Hg/make_Hg_type_list.h[^]
This is the meta-function that processes over a TypeList
, to define a new typedef
called type, which is the appropriate substitution for the table listed above.
C++
// ************************************************** | |
// @Note: Type T must be a typelist. | |
// | |
template< class T > | |
struct make_Hg_type_list | |
{ | |
protected: | |
// The next item on the list to process. | |
typedef typename | |
Hg::front<T>::type next_type; | |
| |
// The remainder of the unprocessed types. | |
typedef typename | |
Hg::pop_front<T>::type list_tail; | |
| |
public: | |
// Make the Hg compatible list. | |
typedef typename | |
detail::make_Hg_worker< next_type, | |
list_tail, | |
Hg::TypeList<Hg::MT> | |
>::type type; | |
}; |
detail::make_Hg_worker has a base implementation and a terminating specialization. The terminator simply typedefs the last parameterized type. Thus the recursion stops. The default template defines a set of types designed to build a new list that mirrors the original; with the exception that some of the types will have adjusted replacements as needed:
Let's break this one into a few chunks. The typedef
s defined in this sample will be used to create the recursive expansion, and the final result:
C++
template< typename T, | |
typename TailT, | |
typename HgT | |
> | |
struct make_Hg_worker | |
{ | |
// The result of inspecting a type and | |
// potentially adjusting the type. | |
typedef typename | |
AdjustType<T>::type adjusted_type; | |
| |
// The next item on the list to process. | |
typedef typename | |
Hg::front<TailT>::type next_type; | |
| |
// The remainder of the unprocessed types. | |
typedef typename | |
Hg::pop_front<TailT>::type list_tail; | |
| |
// ... |
This is the remainder of the worker template. The first typedef
is the recursive call to complete the processing of the other fields. The typedef
for Hg::push_front
defines the final replacement TypeList
.
C++
// ... | |
| |
// A recursively built list of processed | |
// types that have been adjusted to be | |
// compatible for Hg message types. | |
typedef typename | |
make_Hg_worker< next_type, | |
list_tail, | |
HgT | |
>::type hg_list; | |
| |
// The list of types converted for Hg. | |
typedef typename | |
Hg::push_front< hg_list, | |
adjusted_type | |
>::type type; | |
}; |
There are three more templates worth describing, otherwise the remainder of definitions are a set of template specializations that are used to fulfill the requirements mapped out in the table above. The first template is a simple adapter that I included for clarity. It's contents could have been placed completely inline inside of the make_Hg_worker
template. However, I believe it makes the final solution much clearer. At the same time, it moves all type replacement logic to a single point outside of the main worker function.
C++
template< class T > | |
struct AdjustType | |
{ | |
typedef typename | |
ReplaceType < T, | |
typename DeduceTypeTrait<T>::type | |
>::type type; | |
}; |
The next sample is one specialization of a ReplaceType
template. A specialization of this template exists for each Alchemy supported data type. This sample demonstrates the vector
replacement.
C++
template< class VectorT > | |
struct ReplaceType< VectorT, vector_trait> | |
{ | |
// Extract the value type declaration from inside the vector declaration. | |
typedef typename | |
VectorT::value_type value_type; | |
| |
// Deduce the type trait of value_type | |
// for declaration of the type sequence. | |
typedef typename | |
DeduceTypeTrait<value_type>::type type_trait; | |
| |
typedef typename | |
DeclareTypeSequence < VectorT, | |
type_trait | |
>::type type; | |
}; |
The final typedef
in the previous sample uses the final template meta-function DeclareTypeSequence
to define the replacement type
. DeclareTypeSequence
is actually a simple declaration, which redefines an array or vector to use a type that will manage the expanded type, rather than the type specified in the original TypeList. This is where the meta-data becomes useful, and is applied to construct the proper run-time structures for managing the serialized data.
C++
template< typename T, | |
typename A, | |
template <class, class> typename VectorT | |
> | |
struct DeclareTypeSequence < VectorT<T,A>, nested_trait > | |
{ | |
// Define a typedef that represents the actual data-type | |
// that reprsents the type-list T passed into this array. | |
typedef typename | |
FieldTypes <T>::value_type value_type; | |
| |
typedef | |
std::vector< value_type, A> type; | |
}; |
Hg::make_Hg_type_list
is instantiated with the same TypeList
that is used to define a new HG_BEGIN_FORMAT
definition. The Hg::make_Hg_type_list
version of the replaced TypeList
is then used in the field_data_t
specialization that is created for every Alchemy data-type, that defines the value_type
to be used for storing message data.
The substitute TypeList
is a solution that evolved after a number of iterative attempts to elegantly solve serialization for the new sequential containers. What I demonstrate in the next section is the current serialization implementation used in Alchemy that is the result of an evolved solution.
pack_message
pack_message
is the implementation that is called by the Hg::message
object to take the run-time data stored within the object, and pack it into a raw-storage buffer in the format specified by the message's TypeList
definition.
With the addition of a variable-sized field, two specializations of this function exist. The compiler tag-dispatches to the correct function based on the message's SizeTrait
characteristic.
Fixed-size operation
C++
template< typename MsgT, | |
typename BufferT, | |
typename SizeTraitT | |
> | |
bool | |
pack_message( MsgT& msg_values, | |
BufferT & fixed_buffer) | |
{ | |
return detail::pack_fixed_size_message( msg_values, | |
fixed_buffer, | |
SizeTraitT()); | |
} |
Variable-size operation
C++
template< typename MsgT, | |
typename BufferT, | |
typename SizeTraitT | |
> | |
size_t pack_message(MsgT &msg_values, | |
BufferT &buffer, | |
size_t offset) | |
{ | |
return detail::pack_message < MsgT, | |
BufferT | |
>(msg_values, | |
buffer, | |
offset, | |
SizeTraitT()); | |
} |
The unpack_message
template functions are similarly defined.
detail::pack_message
Both versions of Hg::pack_message
call a different version of the internal detailed version. The difference between these two functions is in how they initialize the message buffer that receives the serialized data. The variable-sized implementations prepare for process that may require dynamic offsets based on the placement and size of vectors defined in the middle of the message. Once both versions initialize the PackMessageWorker
, the recursive process of serializing the message data begins.
C++
template< typename MsgT, | |
typename BufferT | |
> | |
BufferT < | |
pack_message( MsgT &msg_values, | |
size_t size, | |
BufferT & buffer, | |
const dynamic_size_trait&) | |
{ | |
// Memory is only allocated for | |
// the variable-sized message. | |
buffer.resize(size); | |
| |
detail::PackMessageWorker < 0, | |
Hg::length<typename MsgT::format_type>::value, | |
MsgT, | |
BufferT | |
> pack; | |
// This additional parameter tracks the dynamic | |
// adjustments of variable-sized messages. | |
size_t dynamic_offset = 0; | |
pack(msg_values, buffer, dynamic_offset); | |
return buffer; | |
} |
Both versions call the function operator of the same PackMessageWorker
object. The only difference, is the variable-sized implementation provides an extra parameter called, dynamic_offset
, which propagates the total number of bytes that have accumulated from variable-length fields. This indicates to the detail-leveled functions to make appropriate adjustments with respect to the sizes pre-calculated at compile-time.
C++
template< size_t Idx, | |
size_t Count, | |
typename MsgT, | |
typename BufferT | |
> | |
struct PackMessageWorker | |
{ | |
void operator()(MsgT &message, | |
BufferT &buffer) | |
{ | |
size_t dynamic_offset = 0; | |
WriteDatum< Idx, MsgT, BufferT>( | |
message, | |
buffer, | |
dynamic_offset); | |
| |
PackMessageWorker < Idx+1, Count, MsgT, BufferT> pack; | |
pack(message, buffer); | |
} | |
| |
void operator()(MsgT &message, | |
BufferT &buffer, | |
size_t &dynamic_offset) | |
{ /* ... */ } | |
}; |
Where is ForEachType<T>
The ForEachType
construct is not used at this level because of the depth and diversity of the combination of message structures that are possible. This construct is convenient when the same operation must be performed on all fields, and the meta-function supplied to ForEachType
does not contain mutable inputs, such as dynamic_offset
.
Therefore, the familiar recursive processing of each indexed field is performed through the end of the TypeList
of parameters. The recursive loop first writes the current Datum
, then makes the recursive call to the next field. There is a PackDatum
and UnpackDatum
specialization created for each type.
PackDatum / UnpackDatum
The PackDatum
specialization for the array
and vector
types contain the more complicated implementations. This is because once they begin serializing their data, they must inspect each sub-field and determine if it can be written directly, or must be placed in a new round of pack_message
calls. Then based upon the size_trait
of the sub-type for the sequence containers, the most efficient implementation will be selected.
Here is the PackDatum
functor for a fundamental-type:
C++
template< size_t IdxT, | |
typename MsgT, | |
typename BufferT, | |
typename TraitT | |
> | |
struct PackDatum | |
{ | |
void operator()(MsgT &msg, | |
BufferT &buffer, | |
size_t dynamic_offset) | |
{ | |
// Removed typedefs for brevity | |
value_type &value = | |
msg.template FieldAt<IdxT>().get(); | |
size_t offset = Hg::OffsetOf | |
< IdxT, | |
typename MsgT::format_type | |
>::value | |
+ dynamic_offset; | |
buffer.set_data(value, offset); | |
} | |
}; |
PackDatum for Array and Vector
I am not going to go into much deeper detail regarding the array
and vector
PackDatum
implementations, because they could fill an entire blog entry on their own. However, I will elucidate the basic flow for these constructs.
The primary difference in their functor is another layer of indirection where a function called SerializeArray
or SerializeVector
is called dependent upon its type. The number of bytes written is captured and added to the dynamic_offset
count.
The SerializeSequence function invokes a specialized functor based upon the sub-type contained within the sequence container. There is an implementation that is specialized for each Datum
type supported in Alchemy. This is the single point of code in the entire library seems to be too tightly coupled. As the array
and vector
require knowledge and processing specific to each type supported by the library.
These implementations do choose between item-by-item transfers or bulk-transfers based on the type. I have this section of code marked to return to and attempt to further de-couple the type of processing required to properly serialize these types.
For now I find a little bit of solace knowing that this coupled behavior is found at the end of a processing chain as opposed to the central processing hub for how messages are structured. Therefore, I feel comfortable letting the library continue to grow for a short while before I decide the best course of corrective action.
If you would like to view the pack / unpack implementations for the array
or vector
, you can find them here on GitHub.
- pack_array: Alchemy/Hg/detail/pack_array.h[^]
- pack_vector: Alchemy/Hg/detail/pack_vector.h[^]
- unpack_array: Alchemy/Hg/detail/unpack_array.h[^]
- unpack_vector: Alchemy/Hg/detail/unpack_vector.h[^]
Summary
Reworking the serialization system to support a sequence of types has been one of the most challenging tasks that I have faced in Alchemy up to this point. In part because I was forced to rethink conventions that I had already put in place. Another reason is the fundamental change in how buffers are accounted and managed in order to support messages of variable size.
I am content with its current state. Although, as I stated, I do have some portions of the sequence serializer code that I want to revisit. I would try to further decouple the solution as well as work in further optimizations.
Speaking of optimizations, I am excited for the next post that I will be writing about Alchemy, the performance benchmarks. I compare a small set of scenarios defined in Alchemy with a naïve, hand-written serializer that uses 1-byte aligned packed structures, and the byte-order conversion functions that are included with network libraries. As a teaser, my first comparison run was abysmal. I even contemplated discontinuing the project until I did a little bit of analysis to find the root cause of my bottlenecks.
Needless to say, I would not still be developing Alchemy if I had not been able to overcome the limitations of my first implementation. I will describe what I found, what I changed, and what I learned helps improve the speed of execution in development.
Recent Comments