|« How I Avoid Making Mistakes||Coliru Test »|
This is an entry for the continuing series of blog entries that documents the design and implementation process of a library. This library is called, Network Alchemy[^]. Alchemy performs data serialization and it is written in C++.
I have written about many of the concepts that are required to implement Alchemy. However, up to this point Alchemy has only remained an idea. It's time to use the concepts that I have demonstrated in previous entries and create a prototype of the library. This will allow us to evaluate its value and determine if it has the potential to fulfill its intended goals.
Notice how I referred to the code I am presenting in this entry as a Prototype. That's because I am careful to refer to my "proof of concept " work only as a prototype. In fact, it is common for me to develop an idea this far during the design phase of a project to determine of an idea is feasible. I want to know that a path of development is likely to succeed before I commit to it.
Building a prototype provides a plethora of information that can help complete the design phase as well as succeed in the stages that follow. For example, I can examine how difficult this approach has been, how much time I have spent to reach this point, how well the result matches my needs and expectations, and finally, I can compare this approach to similar problems that I have solved previously. Sometimes it is not even possible to understand the problem at hand unless you experiment.
I typically develop my prototypes in a test harness. However, I do not write the tests as thoroughly. The test harness acts more like a sandbox to play in, rather than a set of regression test's to protect me in the future. Individual tests can be used to logically separate and document different ideas as you develop them. with a little bit of discipline, this sandbox implementation can be a chronicIe of how you reached this outcome. The best benefit of all is the prototype runs on a test harness, it hardly resembles a complete program that management may perceive as almost complete.
I will expand on this last point in the future post. Remember, this is just a prototype. If you do decide to follow the approach you took with this experimental code, resist the urge to start you actual project from the point of the prototype. Feel free to reference it, but do not continue to build upon it.
How can you arrive at a destination, if you don't know where you're going? You need to have a plan, or at least a set of goals to define what you are working towards.
As a proof of the concept, I want to be able to demonstrate these things:
- define a message format
- populate the data with the same syntax as the public fields in a
- Convert the byte-order for the appropriate fields in the message
- All of these must require minimal effort for the end-user
The Typelist is the foundation, which this solution is built upon. The Typelist is simply a type. Instantiating it as an object at run-time provides no value. There is not any data in the object, nor are there any functions defined within it. The value is in the type information encoded within its definition.
The Typelist alone cannot achieve any of the goals that I laid out above. However, it is a key contributor to solving the first problem, define a message format.
The message interface object is another important component to solve the first goal. The
Message object is a
template class, and the parameterized types are a
Typelist and a byte-order. The
Message is the object that will be instantiated and manage the users data. These two components combined can demonstrate the first goal.
Datum object is a proxy container that will provide the access that we need to interact with the data fields defined in our message. Value constructors, conversion operators, and assignment operators are used to allow the user to interact with the
Datum as if it were the type specified in the
Message. This provides the ability to both get> and set the values of these fields naturally, like a
This takes care of the second goal populate the data with the same syntax as the public fields in a
When I first started this blog, I created a post demonstrating some basic
template programming techniques, and even a little template meta-programming. The entry described the issue of byte-order processing when you want to have portable code. Most platforms that support network programming also have a few functions to convert the byte-order of integer values. However, the functions are designed for types of a specific size. This can lead to maintenance problems if the type of a value is changed, but the conversion function is not.
The solution that I presented will handle all of the details. You simply call the byte-order conversion function for each and every field in your message, and the compiler will select the correct function specialization for each value. Furthermore, the constructs are designed to not perform any operations if the byte-order of your variable is already in the order requested by the caller.
For example, if you call
to_host with a data value that is already in host order, the function will return the same value passed in. Fields that do not require byte-order conversions, such as
char, are given a no-op implementation as well. Your optimizing compiler will simply eliminate these function calls in your release build.
This byte-order conversion code can solve may of the conversion issues elegantly. But it cannot provide enough functionality by itself to demonstrate any of the goals.
I described how to navigate through the different types defined in the
Typelist with meta-functions. These operations are the key to allow use to programmatically process a user-defined message. We have the ability to calculate the required size of the buffer, the offset of each field, the size of each field, and most importantly, the type.
The challenge will be to create a loop that knows how to loop through each of the fields. Because remember, all of the types must be known at compile-time. We cannot rely on runtime decisions to create the correct solution. Moreover, it will be most beneficial if we can keep all of the decisions static, because this will give the optimizer more information to work with, and hopefully the result will be a faster program.
The Typelist operations combined with the byte-order conversion templates should be able to easily accomplish goal 3 Convert the byte-order for the appropriate fields in the message.
The only way to verify the final goal, All of these must require minimal effort for the end-user, is to put these components together and evaluate the experience for ourselves.
Assembling the pieces
Let's start by defining a simple message to work with. I am going to use three parameters of different types and sizes for this demo to keep the examples short, and still provide variety.
We are only defining the types of the fields, and their position in the message at this point, that is why there are no names associated with the fields. Otherwise, with this syntax it is similar to how
structs are defined. So far, so good. Next we need to define the message object and populate it with the data fields.
This will satisfy goal 1 and 2, and so far it wasn't that much work; more than a
struct definition, but it seems we are on a good path. I have learned through experience, and looking through a lot of code, mostly Boost, that it is helpful to provide alias declarations to your parameterized types. Now, we need to consider what is it going to take to use the Typelist operators on this
We have specified a template parameter for the Typelist, but it has not been incorporated into the implementation in any way. My entry about the Message Interface describes many utility functions and derives from a message Typelist class. This is how the connection will be made between the parameters defined by the user, the Typelist, and the main Message object.
In my actual definition of the
Message outer class, I have specified
HostByteOrder as the default type. This is because most users of the class will only be concerned with the
HostByteOrder type. The
NetByteOrder messages should only be encountered right before the message is serialized and sent over the network or saved to a file.
Demonstration of goals 1 and 2
Programmatic byte-order conversion
We have the message structure, data fields, a
Typelist to define the format, and even
Typelist operations and a defined index into the
Typelist. But there is not yet a way to associate a
Datum entry with a type entry in the
Typelist. Since we're prototyping, let's come up with something quick-and-dirty.
Woah, woah, woah... WTF is that? (I also said that when I previewed the code first pasted into the blog). Let's break it down one line at a time. As the comment indicates, this is a template member function of the user defined class, which requires an unsigned number that I have designated as an index (line 6). it returns a reference to a
Datum object that requires an index, and a
Typelist (line 7). The function is called
FieldAt(), (line 8). Finally, this function does not have an implementation.
There is a new template function declaration that does not have an implementation. This is not a big change, however, this path is slowly becoming more complicated. So I am becoming a little weary of this solution. But I will cautiously continue to see what we can learn and accomplish.
FieldAt() function is intended to return a reference to the actual
Datum at the specified index. This will make the connection between the Typelist operations and the actual message fields. We only provide a declaration of the template, because there will be a unique implementation of
FieldAt() defined for each
Datum in the message. If an invalid index is requested, the program will not compile. Here is the definition that is required for each field:
This is not much extra work, but it is definitely more complicated than defining a single field in a
struct. It is also prone to copy-paste errors. We have managed to bridge the gap, and I would at least like to see if the concept works. So what would it take to programmatically process each field of the message? Keep in mind, I want everything to be determined at compile-time as much as possible. Otherwise a simple for-loop would solve the problem. I like the generic algorithms in STL, especially
std::for_each. I think that it would be possible to use the standard version if I had an iterator mechanism to feed to the function. But I don't, yet. So I set out to create a static version of this function because I think it will be the simpler path.
What do I mean by "Static ForEach?" I want to provide a functor to a meta-function that can compile and optimize as much of the processing as possible. The other challenge is that our code would require a lot of dispatch code to handle the variety of types that could potentially be processed if we simply used a callback function or hand-written loop. The reason is because template functions can only be called when every parameterized value is defined. A function like this will not compile:
main.cpp:13:5: error: no matching function for call to 'print_t' print_t< i >(); ^~~~~~~~~~~~ main.cpp:4:6: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'IdxT' void print_t() ^ 1 error generated.
Go ahead. It's ok, give it a whirl.
Now, the code in the block below is what the compiler needs to successfully compile that code. That is why a regular for loop will not solve this problem. If you want to see it work, replace the for-loop in the block above with this code:
What we need to do, is figure out a way to get the compiler to generate the code block above with explicitly specified parameterized values. The equivalent of a loop in meta-programming is recursion and a meta-function must be implemented as a
struct. With this knowledge, we can create a static and generic for_each function to process our Typelists.
We want to make the library calls as simple as possible. Because if our library is too much work to setup and use, it won't be used. Therefore, the first thing to define is a template function that will be called by the user. Except in this case, we are the user, because we will be wrapping this function call in the byte-order conversion logic.
Next let's define the basic structure of our template functor, and work our way inward:
The object above contains a constructor to allow use to store a reference, avoiding unnecessary copies. It also implements the function
operator()(). This operator could take parameters if we desired, but in this case it is not necessary. The parameters would be placed in the second set of parenthesis.
Now we need to define a function that operates like this:
It's actually a pretty simple and elegant solution, at least on paper or pseudo-code. The syntax of templates in C++, and differences in how each compiler handles templates obscures the elegance of this solution quite a bit. Nonetheless, here is the most standards compliant version as compiled with G++.
Admittedly, that is just ugly to look at. That is why I recommend to new developers, learning how to use STL, to not step into the STL containers. It is riddled with syntax like this, except they use a lot of underscores and variable names that only have two letters. The real implementation of this in my source files have many comments to help break up the morass of compound statements and odd-syntax.
How does this work?
I wish I knew, but it compiles so it must be good right?!
I am joking, mostly. The syntax originally started out much simpler. As I added more complex capabilities the syntax became stricter. The syntax alterations that were required by each compiler gradually morphed into the block above. Look at the block above, can you deduce what the code starting at line 9 is doing?
This block decomposes the code, including the odd use of
template that seems out of place:
Why is that odd
template keyword is required to help disambiguate the intention of your call for the parser. Look at the tokens that would be generated without the user of
After grouping a collection of the symbols, the compiler may end up with this:
The compiler might think you are trying to perform a less-than operation between a member function call and the first template parameter. That is the reason. In this case, it may seem obvious that the only conclusion is that we are calling a parameterized function, however, I am sure there are examples that exist that are much more ambiguous. Unfortunately I do not have any of them. Here is a list of the tokens with the template hint for the parser to continue looking for the right angle-bracket to close the template definition:
ftor. template operator()
< IndexT, ... >
In case you are curious, Visual Studio would not accept the code with the
template hint. This is one of the only places in Alchemy that requires a compiler
#ifdef to solve the problem. I'm very proud of that.
One last thing to explain with the static functor, the terminating case. Here is the code I am referring to:
First, the required range to be called when instantiating this static for_each loop, is the first-index through the actual last-index. Not one passed the last-index as with STL algorithms. Therefore, while the processed index is not the last index, the function will continue to recurse. The last time the process function will be called is when the index is incremented and becomes equal to the last index. The function will be called, the last index functor will be called, then the if-statement is evaluated.
Since the current index is the same as the last index, no more recursion occurs and the chain ends. However, if the value_if statement were not in place for the call to the last index, a call would have been declared to process one passed the last index, and this would be invalid. Even though there is no logical chance that function will be called, the compiler will still attempt to generate that instantiation. The value_if test prevents this illegal declaration from occurring.
Return to Byte-Order Conversion
I wasn't thinking that I would have to explain the static for-each loop to complete this entry, but it's over now. We need to things to complete the byte-order conversion functionality, a functor to pass to the for-each processor, and a user-accessible function. Here is the implementation of the functor that calls the previous byte-order swap implementation:
This is the function object processing function. This function only processes one item then it exits.
This is the meta-function that finally invokes the byte-order conversion code. It is invoked on (line 10) above.
Let's give the user a simple function to invoke byte-order conversion on an Alchemy message. This first block is a generic function written to reduce redundant code.
Here are the actual functions intended for the user.
Demonstration of goal 3
When I reached this point, my enthusiasm was restored. This is exactly what I was aiming for when I set out to create Alchemy. Once the user has defined a message, they are natural and easy to work with. All of that hidden work for programmatic byte-order conversion is automatically handled for the user now. Again, it all depends on the user successfully defining a message, which has turned out to me more complicated than I wanted. So let's turn to another technique that I am fond of, at least for complicated definitions.
At the bottom of my preprocessor post, I mention one of my favorite techniques that are used quite extensively in ATL and WTL to create table definitions. That is almost exactly what we need. Let's see what it would take to simplify the work required for a user to define an Alchemy message.
As a temporary usage convention through development, whatever name is used for the
Typelist definition, the text 'Msg' will be appended to it.
Demonstration of Goal 4
That was straight-forward. Here is a sample message definition with this set of MACROs.
I think that qualifies as simple to use. I think adding the actual index is annoying, and it should be possible to remove it, and it is possible. Because I have done it. However, I will save that for a future post, because this one has already passed 4000 words.
We have the type information from the
Typelist; why do we have to specify the type in the
HG_DATUM entry? It would be possible to remove them, however, I think they act as a nice reference to have right next to the name of the field. It would also be possible to keep the user 'honest' and double-check that the type specified in the MACRO matches the type they define in the
Typelist. I am not going to worry about that for now, because I think it is possible to actually define the
Typelist inside of the message definition above. There will definitely be an entry posted when I do that.
Now we have seen that it is possible to define a message, and interact with it as simply as if it were a
struct. It is always nice to reach a point like this and see a basic working proof-of-concept. There were some surprising challenges to overcome to reach this point. However, in creating these few pieces, my skills with the functional programming paradigm have started to improve my problem-solving skills with the other programming paradigms.
More work needs to be done before this library can be used for anything. In the near future I will describe how I implemented serialization to buffers, and I will add these additional types: Packed-bits, nested-fields, arrays and vectors.
When I started writing about Alchemy, I had intended to publish code with each posting that represented the progress up to that point. However, I have developed the library far beyond this point, and I am factoring in some of lessons that I have learned and used to improve the code. Even with source version control, it is a lot of work to create a downloadable set of source that matches what is demonstrated, compiles, and is generally useful.
So instead, I would like to let you know that Network Alchemy[^] is hosted on GitHub, I encourage you to take a look and send some feedback. At this point I am optimizing components, and fixing a few pain points in its usage. While I program primarily on Windows, another contributor is in the process of integrating auto-tools so that an install and build setup can be created for all of the *nix-based systems.