Embedded Alchemy

Alchemy Send feedback »

Alchemy is a collection of independent library components that specifically relate to efficient low-level constructs used with embedded and network programming.

The latest version of Embedded Alchemy[^] can be found on GitHub.
The most recent entries as well as Alchemy topics to be posted soon:
  • Steganography[^]
  • Coming Soon: Alchemy: Data View
  • Coming Soon: Quad (Copter) of the Damned
Alchemy: Documentation[^]

C++: Template Meta-Programming 2.0

CodeProject, C++ 4 feedbacks »

I was amazed after I had converted only the first few portions of the TypeList from the C++98 implementation to Modern C++. I have decided to convert my Alchemy API to use Modern C++ in order to truly learn the nuances by application of acquired knowledge. The code of these meta-programs are very elegant and completely readable. This really does feel like a new language, and an entirely new version of meta-programming.

The elegance enabled by the new constructs will allow me to present a complete TypeList implementation for Modern C++ in this entry.

I have corrected errors and uploaded a new implementation file.


Compare and Contrast

My primary goal is to compare the differences between template meta-programming with with C++98 and Modern C++ (C++11 and beyond). While the concepts remain similar, the newer language has been expanded in a way that makes this a more natural way to compose functional programming logic with C++.

To be clear, I want to explicitly state that meta-programming has its places, and generally it will not be in core-application logic. Libraries and utilities that will be developed and tested, but are less likely to require maintenance work are good candidates for these types of solutions. Application programmers can take advantage of these complex machinations, yet the caller may never even realize what magic exists behind the curtain. A great example would the the C++ Standard Library itself.

Frame of reference

Andrei Alexandrescu made the concept of meta-programming accessible to the masses in his book Modern C++ Design, published in 2001. He demonstrates and develops many useful components for meta-programming that he created as part of his library, Loki.

This is the style of meta-programming that I first learned, and the type of structure that, Alchemy, is built around. I have altered the implementation and interfaces for my meta-constructs to suit my needs, compared to what is presented in the book. The next few sections demonstrate how the same tasks are accomplished in both versions of the language.

Then

The most important of these constructs is the TypeList. The TypeList is a work-horse construct that can be in a variety of unique ways, yet does not contain any internal data or run-time code. structs become the natural type to act as a functional container, which performs all of its compile-time, or static, operations on types, and stores values in static const values or enums.

To simplify expressions, I made liberal use of typedefs. This helped me avoid the repitition of verbose template expressions and at the same time give a label to the purpose of that template. Sometimes there are no ways to simplify expressions other than turning to the pre-processor. I prefer to avoid the pre-processor at all costs in my application logic. However, I have grown accustomed to leaning on the pre-processor to generate code for me for repetitive definitions that appear in a class or at the global scope.

Here is an example of how Alchemy's TypeList is constructed. Internally, a TypeNode provides the declarations for the head and tail types.

C++

// A trait class to assist with tag-dispatching.
struct container_trait{ };
// Represents the end of a type list.
struct empty{};
 
template< typename H,
          typename T
        >
struct TypeNode
{
  typedef H head;
  typedef T tail;
};

Now the definition of the TypeList to show the organization of the structure:

C++

template< class T0, class T1, class T2, class T3,
          class T4, class T5, class T6, class T7,
          class T8, class T9, class T10,class T11,
        >
struct TypeList
  : container_trait
{  
  typedef
    TypeNode<T1,  TypeNode<T2,  TypeNode<T3,
    TypeNode<T4,  TypeNode<T5,  TypeNode<T6,  
    TypeNode<T7,  TypeNode<T8,  TypeNode<T9,  
    TypeNode<T10, TypeNode<T11, TypeNode<T12, MT>
    > > > > > > > > > > >                type;
    // Alchemy continues on to 32
};

Composing a structure should be much simpler than this nested definition. Therefore, I decided to wrap the inner declaration with a simpler outer definition. Unfortunately, there are only a few facilities available to customize template declarations. The best option in my particular situation was template specialization.

I wanted to provide a natural interaction with my TypeList object, and still allow support for a variable number of parameters. Thirty-two was my initial number of parameters that I would support. I can live with writing thirty-two specializations once. However, I had many operations that I would also implement, and each of those would require specialized implementations as well. So I resorted to the preprocessor to generate the code for me.

Here is the definition of the MACRO, and how it was used. It generates the code in the block from above:

C++

// It seems my syntax highlighter for MACRO requires some attention
#define tmp_ALCHEMY_TYPELIST_DEF(S) \
template<TMP_ARRAY_##S(typename T)> \
struct TypeList<TMP_ARRAY_##S(T)> \
  : container_trait \
{ \
typedef TMP_ARRAY_##S(TypeNode<T), empty TMP_REPEAT_##S(>)  type; \
}
 
// Define specializations of this array from 1 to 31 elements
tmp_ALCHEMY_TYPELIST_DEF(1);
tmp_ALCHEMY_TYPELIST_DEF(2);
tmp_ALCHEMY_TYPELIST_DEF(3);

Yes, I definitely left out many of the gory details for the definitions of the MACROs. But why would you want them? We're moving forward into the future; but you can still access them from Alchemy on GitHub.

The direct usage of the TypeList was then much more accessible to the user. Also, there was no need for them to use any MACROs to define a new TypeList:

C++

typedef TypeList
        <
          int,
          long,
          float,
          char
        > types;

Now

There are two primary additions to Modern C++ that make template programming in general a pleasure to use, and that is to not even mention meta-programming itself:

  1. Variadic templates:
  2. Similar to variadic function parameters, this feature allows a variable number of arguments to be used in a template definition.

  3. Template aliases:
  4. This allows the using keyword to be used in situations similar to typedef. However, unlike typedef, using can be defined as a template. Therefore, it is compatible with partially-specialized templates.

Here are the definitions that I required when I ported my code to Modern C++ (don't worry, I will explain the syntax afterwards):

C++

// Forward Declarations
struct empty { };
struct container_trait { };
 
template< typename... T >
struct type_node;
 
template< typename... NodesT >
struct type_list;

And an implementation for these types:

C++

// An empty terminating node.
template< typename... T >
struct type_node<>
{
  using head = empty;
  using tail = empty;
};
 
// Recursive parameter pack node
template< typename    H,
           typename... T
        >
struct type_node<H, T...>
  : type_node<T...>
{
  using head = H;
  using tail = type_node<T...>;
};
 
template< typename... NodesT >
struct type_list
  : container_trait
{
  using nodes = type_node<NodesT...>
};

No, really! That's it! We get the exact same usage as the code from above, and I'm not even sure that I need to explain this last chunk of code.

I admit, I had a few false-starts trying to get a grasp on the parameter pack. No, not to reach this point, neither of the code samples above are good for anything except defining a list of types. My first challenge appeared when I tried to create a meta-function to give me the type of parameter at a specific index in the list.

Let me introduce the new constructs, then I will demonstrate some of the elegant solutions that barely scratch the surface of their capabilities.

The parameter pack...

The variable defined within a variadic template is called the parameter pack.

The parameter pack is essentially a list of types delimited by commas. To define one as a parameterized type in a template definition, use the ellipsis between the typename or class declaration and the name that you assign your type. There can be whitespace before and after the ellipsis if you desire...or not. However, the general style that you are likely to see places the ellipsis attached to the type-declaration and a space before the type-name.

C++

// Most common style
template<typename... T> struct common;
 
// These are all legal too.
template<typename ...T>  struct spaces_before;
template<typename ... T> struct spaces_both;
template<typename...T>   struct spaces_none;
 
// Template parameter packs are legal for
// use with template functions as well.
template<typename... T>
T function(T... params);

You may have noticed in my type_list implementation and the declaration of the template function that I placed the ellipsis after the declared name. This is how you invoke the parameter pack in your logic.

Invoke the parameter pack

What does it mean to invoke the parameter pack?

Nothing really. You're setting it where you want to apply it, and the compiler goes to work ripping apart the parameter pack and generating your code. However, the compiler does need a little bit of help. You will need two things if you are generating code from the parameter pack:

  1. Recursive definition:
  2. This is a definition that will be implicitly called by the compiler as many times as necessary until it reaches your terminating case. If you refer to the definition of the type_list, you will see that the parameter pack is applied in a context where another type is placed before it, separated with a common. This essentially peels one or more types away from the parameter pack at a time. In this sense, the template parameter pack is similar to the variadic MACRO usage.[/codespan]

  3. Terminating condition:
  4. A condition that will handle the case of an empty list, or at least terminate the recursion before the compiler attempts to go beyond the end of the parameter pack. It is not necessary for this to be an entirely different definition.

Size of a parameter pack

A convenient sizeof... operator has been provided to match the syntax of the parameter pack. This version of the operator is not related in anyway to the classic sizeof operator.

C++

template< typename... T >
struct length
  : std::integral_constant<std::size_t, sizeof...(T)>
{ };

The parameter pack cannot be a variable

The parameter pack must be decomposed completely at compile-time. It cannot be the sole definition of a typedef or a using alias.

C++

template< typename... T >
struct param_pack
{
  using save_for_later = T...;
};

However, that does not mean that we are helpless. There is an idiom that exists with template programming that allows us to extract the type from a template parameter. I am not sure if it has a name.

Let me demonstrate it for you. This is the definition you are most likely to find on the Internet for a TypeList:

C++

template< typename... T > struct typelist { };

The previous code is completely legal because the parameter pack expansion is defined and terminated with this definition. With another template that is given the right specialization, we can extract the parameter pack from the original type definition.

To demonstrate this, let's create a length meta-function that will report the number of elements in the type_list that I defined above. We need to declare a default version of the length meta-function. This function does not necessarily need to be implemented.

C++

// Default declaration
// This does not require an implementation
template< typename... T > struct length;
 
// This specialization allows us to identify and access
// the parameter pack defined within our type_list.
template< typename... T >
struct length <type_list<T...>>
  : std::integral_constant<std::size_t, sizeof...(T)>
{ };

We can use the parameter pack from the type_list because we specialized this template solely for the this type. The compiler does a best fit comparison when attempting to resolve types, and finds this version.

Template Aliases

Up until this point, we have had the typedef, which has served us well. However, it does have its shortcomings. I believe the most notable is that partial template specialization is not supported by a typedef. The template alias does provide this support.

C++

// I want to provide a simple type to create
// a map of strings to another type.
template< typename T >
using StringMap = std::map<std::string, T>;
 
// Can now be used as:
StringMap<int>     named_ints;

Here's a more complex example:

C++

// I want to map names to lists of things.
template< typename T >
using NamedListMap = std::map<std::string, std::list<T>;
 
NamedListMap<unsigned int> lotto_picks_by_state;

Improves readability of templates

There is one other feature of template aliases that I did not fully appreciate until I started to use them. Most code examples do not get complex enough to allow you to fully appreciate the second feature. Let me demonstrate, then I will get into the gory details.

This is an example of an additional declaration that was added to C++14, but is possible in C++11. I am not sure if this technique wasn't discovered until after C++11, or they left it out to keep it from becoming C++12.

C++

// The C++ Standard Library contains useful
// meta-functions to manipulate types.
// This one converts type T into T*
template< class T >
struct add_pointer;
 
// This is used like this:
typedef typename std::add_pointer<void>::type   void_ptr;
// eww! ^^^^^^^^                         ^^^^
 
// Or directly...
typename std::add_pointer<void>::type   p_void = 0;
// I think I just threw up in my mouth...
// No wonder templates have a bad reputation.

These definitions appear in C++14, but you can use this technique in C++11.

C++

// Template alias for std::add_pointer
template< class T >
using add_pointer_t = typename std::add_pointer<void>::type;
 
// New usage:
typedef add_pointer_t<void>   void_ptr;
 
// And directly...
add_pointer_t<void>   p_void = nullptr;

Detailed explanation

typedefs are a common way to reduce clutter in code. Primarily with templates because the use of template type declarations require you to qualify the type with typename if you are using a dependent-type.

What is a dependent-type?

That is a very good question. To help with the explanation dependent type is a shortened version of the name template parameter dependent type. I'm surprised the C++ community hasn't just adopted TPDT, but I digress. A dependent type is a sub-type declared within a template class or struct.

typename is required when referencing sub-items in a template. I say sub-items because other things can be defined within a struct, that are accessed in the same manner as a dependent type, like a static variable. typename is a clue to the compiler that you want it to be interpreted as a type.

The capabilities of the template alias allow us to clearly specify beforehand that we mean a type. Therefore both the typename qualifier and sub-type required to access the dependent name are managed by the alias. This greatly simplifies code when there are many template types to deal with. Template meta-programming is a prime example.

One Last Tip

In the fall of 2014, N4115 Parameter Pack Searching[^] was proposed with some additions to the utility library. This would add a common form of the idiom that I described above to gain access to a parameter pack. The name proposed for the type is packer.

I was trying to modify an existing parameter pack, and I just couldn't put the pieces together. So that is when I searched and found N4115 when I found N4144 Searching and Manipulation of Parameter Packs[^], by Bill Seymour and Stephan T. Lavavej. This is an amended version of the first document and it adds manipulation utilities. One in particular is add_to.

I already demonstrated the concepts of packer, however, in my code I refer to it as param_pack. Here is how add_to is implemented. Multiple specializations are declared to handle the possibility of adding a parameter pack to a parameter pack.

C++

template<class T, class U> struct add_to;
// Add to the front of the list
template<typename T, typename... ArgsU>
struct add_to<T, param_pack<ArgsU...>>
{
  using type = param_pack<T, ArgsU...>;
};
 
// Add to the back of the list
template<typename... ArgsT, typename U>
struct add_to<param_pack<ArgsT...>, U>
{
  using type = param_pack<ArgsT..., U>;
};
 
// Combine two lists
template<class... ArgsT, class... ArgsU>
struct add_to<param_pack<ArgsT...>, param_pack<ArgsU...>>
{
  using type = param_pack<ArgsT..., ArgsU...>;
};
 
// And the template alias
template<typename T, typename U>
using add_to_t = typename add_to<T,U>::type;

You will see a demonstration of the usage in the next section.


A Modern C++ TypeList

I searched the Internet, albeit briefly, and I did not find any Modern C++ implementations of a TypeList that did not expand beyond this definition:

C++

template< typename... T > struct typelist { };

I found the fundamentals to convert the code that I already have into modern form. I want to convert and get it integrated first. If there are better practices, I can adjust the implementation in a working test harness.

I have already shown the definition of the basic type_list structure that I use as well as a demonstration of the length and param_pack, and the implementation for add_to. In the code below, I have omitted the forward declarations and template aliases that I define in the type list header file.

I am going to blast through the different operations that I have built so I do not take up too much more of your time. If something is not clear, please drop a comment and I can further explain or even add more detail to the description.

I have posted a link to the single header file that contains all of these definitions at the bottom.

make_type_list

I wanted to be able to make a type_list from an existing set of internal type_nodes, and then later, a param_pack.

C++

template< typename... T>
struct make_type_list< type_node<T...>>
  : container_trait
{
   using type  = type_list<T...>;
};
 
template< typename... T>
struct make_type_list< param_pack<T...>>
  : container_trait
{
   using type  = type_list<T...>;
};

type_at

Query the type of element at a specified index in the type_list. This item required a helper template that I called type_of_node.

C++

template< std::size_t    IdxT,
          typename  NodesT
        >
struct type_of_node
{
  using type =
    typename type_of_node<IdxT-1, typename NodesT::tail>::type;
};
 
// Terminating specialization
template< typename  NodesT >
struct type_of_node<0, NodesT>
{
  using type = typename NodesT::head;
};

Now for the actual implementation of type_at.

C++

template< std::size_t IdxT,
          typename    T  
        >
struct type_at
{
  using nodes = typename T::nodes;
  using type  = typename type_of_node<IdxT, nodes>::type;
  using rest  = typename make_type_list<typename nodes::tail>::type;
};
 
// A terminating case for type_at
template< std::size_t IdxT,
          typename    T  
        >
struct type_at<IdxT, empty>
{
  using nodes = empty;
  using type  = empty;
  using rest  = empty;
};

I added the declaration of nodes to simplify the declaration for type. This wasn't strictly necessary. I added rest for convenience in other solutions. rest returns a type_list of the elements remaining after the specified index.

For example, if there were 10 elements in a type list and index 6 was specified. A type list with elements [7,8,9] would be returned.


Stop me if I go too fast for the rest of these.

front

C++

template< typename T >
struct front
{
  /// Type of the first element in the list.
  using type = type_at_t<0, T>;
  using rest = typename type_at<0, T>::rest;
};

back

C++

template< typename T >
struct back
{
  /// Type of the last element in the list.
  using type = type_at_t<length<T>::value-1, T>;
};

pop_front

C++

template< typename T >
struct pop_front
{
  using type = typename front<T>::rest;
};

push_front

C++

template< typename F, typename L >
struct push_front
{
private:
   using params = typename to_param_pack<typename L::nodes>::type;
   using sum    = typename add_to<F, params>::type;
 
public:
   using type   = typename make_type_list<sum>::type;
};

push_back

C++

template<typename L, typename B>
struct push_back
{
private:
   using params = typename to_param_pack<typename L::nodes>::type;
   using sum    = typename add_to<params, B>::type;
 
public:
   using type   = typename make_type_list<sum>::type;
};

New functionality updated since the original post

pop_back

C++

template<typename T>
struct pop_back
{
   using type = typename split_t<length<T>::value - 1, T>::type;
};
 
// Terminating specialization
template< >
struct pop_back
{
   using type = empty;
};

move_item

Move a specified number of elements from the front of the second list, to the end of the first list. This function is used to implement split, which is then used to implement pop_back.

C++

template<std::size_t CountT, typename T, typename R>
struct move_item
{
private:
  using first = push_back<T, front_t<R>>;
  using last  = pop_front<R>;
 
public:
   using type = typename move_item<CountT-1, first, last>::type;
   using rest = typename move_item<CountT-1, first, last>::rest;
};
 
// Terminating specialization
template<typename T, typename R>
struct move_item<0, T, R>
{
   using type = T;
   using type = R;
};

split

Splits the list into two separate lists at the specified pivot index.

C++

template<std::size_t PivotT, typename T>
struct split
{
  static_assert(PivotT <= length<T>::value,
                "The split pivot index is out of range");
 
   using type = typename move_item<PivotT, type_list<>, T>::type;
   using rest = typename move_item<PivotT, type_list<>, T>::rest;
};

Summary

Again, I am pleased at how much simpler my code has become with these new additions. It's still C++. It's like C++ with the Hemi. Statements can be expressed more tersely, which actually increases the readability of the code as opposed to the lose meaning. Repetitive typing and redundant code can also be reduced.

If you frequently program with templates, or are even a big fan of the C++ Standard Library, you owe it to yourself to become familiar with these two features.

As promised, here is a link to the full type list implementation:


C++: auto

CodeProject, C++ Send feedback »

The keyword auto has been given a new behavior since the C++11 Standard was ratified. Instantly I could appreciate the value of its new function when I considered things like declaring an iterator for a container. However, I was skeptical of any value that auto could provide for general purpose use.

Now that I have had a chance to study this new keyword I believe that it is quite a valuable addition to Modern C++. While there are situations that could still cause some grief, they are not difficult to detect, and the solutions to the problem are straight-forward to apply

Basic Usage

Here are a few examples for reference that will give context to this discussion. auto has been repurposed to allow developers to take advantage of the type information known to the compiler. This provides the potential benefit to make code easier to read and write.

I have added a comment that shows the type that each auto variable will be assigned. This is also the same declaration that would be required if auto were not used.

C++

std::map<unsigned long, std::string>    keys;
auto value = 101;             // int
auto other = 101ul;           // unsigned long
 
// std::map<unsigned long, std::string>::iterator
auto  iter = keys.find(value);
 
// std::string
auto  entry = iter != keys.end()
            ? iter->second
            : "";
 
// std::string::size_type
auto len = entry.size();      // std::string::size_type
if (len > 0)
{
  auto first = entry[0];      // char
  auto data = entry.c_str();  // const char*
  auto first_data = data[0];  // char
}

Initial Doubts

I am not the only programmer to have doubts about allowing the system to automatically choose the types of my variables. There are a number of questions asked around the web like this one from StackExchange and other sites.

There were two fundamental sources of my doubt.

Flashbacks of Visual Basic

I spent a brief moment of my career developing with Visual Basic. (Hey! I only did it to pay for tuition while I was in college. sigh).

To declare a new variable you would use the keyword Dim. If you hadn't defined a variable explicitly with Dim, then the compiler would give you an error. Variables that were not given an explicit type in Visual Basic used a type called Variant.

If you ever have programmed with Microsoft's COM, then you are aware that a Variant could be coerced into different types at run-time. BasicallyEssentially allowing this strongly-typed static language to bend the rules when using the Variant.

It did not take me long to realize that auto in Modern C++ is nothing like a Variant.

The chaos of using unknown types

This is only a misconceived notion. The type is well known by the compiler after it verifies the correctness of your code and before it generates any of the final object code. The syntax used in the declarations and statements have well-defined rules to determine these types. This is especially true for C++; any version of C++.

There are subtle coercion rules that have always existed to allow C code to be more compatible with the stronger type-checking that is present in C++. These rules mostly deal with the width of an integer or converting an object from one type to a compatible type without cluttering the syntax.

If you still have doubts, inspect the type of assigned to your variable within your IDE or debugger. There is a definitive type that has been deduced and assigned for every auto variable. This type is fixed and will not change.

Automatic type deduction

Are you familiar with templates and how the types are deduced for parametric variables? If so, then you know almost all there is to know about how the type is determined for auto variables. If you aren't familiar with all of the rules, it's OK, because the compiler is.

The rules are the same between the two forms of variable declaration with one exception, which is when a curly-braced initializer list is considered in the type deduction. In this case, the auto declared variable assumes the type to be an std::initializer_list that was also added to Modern C++. A template-deduced type in this situation requires you to explicitly specify the std::initializer_list type. The type T contained within the std::initializer_list<T> will then be deduced by the template.

I also want to remind you about the existence of Type Decay[^], it seems probable to me that you will encounter this at some point using auto. This coercion rule applies to both template and auto type deduction.

C++ 14

C++14 has added the possibility for auto to be used as the return type for a function.

C++

// Let's wrap the map lookup code
// from above into a convenient function.
// The return type will be a std::string.
auto createName(unsigned index)
{
  auto  iter = keys.find(value);
 
  return iter != keys.end()
         ? iter->second
         : "";
}

It is important to note, that every return point from the function must result with the same return type, which also allows the possibility for type conversion of the values into the same type. This is more evidence that demonstrates there is nothing magical about auto. The final code that it generates follows the same rules as if you had explicitly defined the values yourself.

There is a catch

The auto type must use the rules for template type-deduction. This means that it is not possible to use auto when you would like to return a std::initializer_list.

C++

// auto return types use the same
// type deduction rules as templates.
auto Fibonacci_5()
{
  // This type of use is not legal.
  return {1,1,2,3,5};
}

This rule also applies to the type deduction for the evaluation of lambda expressions.

C++

// This version cannot be deduced by the
// compiler and results in an error.
std::vector<int> numbers;
auto initSequence =
  [&numbers](const auto& sequence)
  {
    numbers = sequence;
  };
 
initSequence({1,1,2,3,5});

To make this expression valid, the same adjustment required for templates is required. The std::initializer_list&lt;T> needs to be specified, then the type T can be deduced.

C++

// The initializer_list is explicitly specified
// to make this lambda expression legal.
std::vector<int> numbers;
auto initSequence =
  [&numbers](const std::initializer_list<auto>& sequence)
  {
    numbers = sequence;
  };

Advantages

The advantages of auto are immediately apparent when you consider the amount of clutter that is removed when you use iterators or a moderately complicated template statement.

C++

// Something like this:
std::shared_ptr<std::vector<std::pair<T, U>>> sp_values
  = std::make_shared<std::vector<std::pair<T, U>>>();
  ...
std::vector<std::pair<T, U>>::iterator iter
  = std::find(sp_values.begin(), sp_values.end(), 100);
 
// becomes:
auto sp_values = std::make_shared<std::vector<std::pair<T, U>>>();
auto iter      = std::find(sp_values.begin(), sp_values.end(), 100);

However, there are other advantages that are not immediately obvious.

Less typing (explicitly and with your keyboard)

Yeah, this is the low hanging fruit. I should mention, there is one exception, and that is if the type you need is an int. I also suppose there exists sadistic programmers that typedef or alias, one and two character length type names; so that would be another exception.

Along the same theme of less typing is refactoring data-fields. Changing types or function signatures. If you use auto effectively, you may only need to update the declaration in the header file and the functions definition to complete your refactor.

auto defined variables will be initialized

When a variable is defined with auto, it must be assigned an initial value. Otherwise there would be no way to determine its type:

C++

auto count;      // This is illegal, must be initialized
 
auto value = 10; // OK, assigned type int

One thing that I try to teach developers that I mentor, is to not define the variable until you need it. Many developers still seem to forward declare all of their variables at the top of a function. That isn't necessary. Furthermore, objects with constructors may not be cheap, and if you drop out of the function before you use the constructed object, well that's just wasteful.

I think declaring your variables at the point they become necessary using auto will help advance this style.

Portability

This next set of declarations is a common occurrence, where the programmer explicitly selected the type for the variable, and they almost got it right, or maybe not at all. Selecting the incorrect type may be the difference between correct behavior and a security vulnerability. Another consequence that is less severe, depending on who you ask, would be truncation, or data loss.

C++

// No, std::size_t
int len = ::strlen(buffer);
 
// Maybe, if you're on a 32-bit system.
// Narrowing truncation is possible on a 64-bit system.
// std::size_type, which is generally defined as
// std::size_t
std::string name = "Code of the Damned";
unsigned long size = name.size();
 
// No not even close, are you even trying?
// Of course I'll still accept it...
double how_big = ::strlen(buffer);

It seems that I am always correcting warnings such as thiswarning: '<' : signed/unsigned mismatch. While learning secure coding practices I adopted the habit of using unsigned types for values that will be used as an index to reference memory. In fact, std::size_t has become my defacto unsigned type. It is guaranteed to be large enough to reference the largest address on the system.

As it turns out, it's not always possible, or even correct for an unsigned type to be used in comparison operations. Then I am left with a cast of some sort. auto solves this problem by selecting a compatible type with your initialization value. The most frequent place this seems to occur is within for loops:

C++

// Yes, 'i' is signed and size() returns an unsigned value
for (int i = 0; i < name.size(); ++i)
{ ... }
 
// auto doesn't fix things alone.
// However, adding a type-specifier to 0 helps.
for (auto i = 0ul; i < name.size(); ++i)
{ ... }
 
// This is Modern C++.
// Use a range-based for loop and be done with it.
for (auto i : name)
{ ... }

Efficiency

That is correct. Consider all of the coercion and transformation techniques that may be used by the compiler to allow your program to compile successfully and run efficiently. Techniques such as move-semantics, const-correctness, temporary copies, the list goes on.

I have already argued that we don't always succeed in selecting the correct type just to hold the size of a container. Selecting the best type for efficiency is an even more difficult decision. The most probable place an incorrectly selected type could affect performance is within a loop. However, another time to consider replacing your type declaration to auto is when you are running a profiler trying to optimize your code, and you identify a hot-spot that seems to be making unnecessary copies. Using an auto declaration may give the compiler the freedom it needs to select a more efficient type and eliminate those copies.


Be aware of proxy classes

Yes, there is another potential gotcha that is important to know exists. Proxy classes are useful in variety of ways. The list below is not exhaustive, and it is likely the proxy class exists for more than one reason found on the list:

  • Transparently provide remote access
  • Delay expensive calculations until required
  • Encapsulate logic for efficiency
  • Protect direct access to encapsulated data
  • Formulate expression templates

Libraries commonly use proxy classes to simplify the syntax required to interface with the library objects. Expression templates a very common with math-based libraries to retain the natural mathematic syntax that is possible in C++. A Matrix class for example, or even a multi-dimensional array.

Example: Alchemy Proxies

I use proxy[^] classes in Alchemy to make the data fields of a message appear to the caller to be a fundamental type that is part of a struct. Each field is contained within its own proxy object, which knows how to efficiently manage the different operations supported in Alchemy.

Each proxy class contains a type-conversion function to extract the actual value from the proxy object. Here is an example definition of a 3-dimensional point.

C++

// Define an Alchemy message structure
// to represent a 3-d point.
ALCHEMY_STRUCT(pt3d_t,
  ALCHEMY_DATUM (double, X),
  ALCHEMY_DATUM (double, Y),
  ALCHEMY_DATUM (double, Z)
);
 
// Create an initialized instance of the message with:
// X=1.0, Y=2.0, Z=3.0
Hg::basic_msg<pt3d_t>  pt_msg = {1.0, 2.0, 3.0};

A comparison of the results between explicit type declaration and auto type declaration.

C++

// x_val is assigned 1.0
double x_val = pt_msg.X;
 
// ax_val is given the type:
// Hg::detail::DataProxy
//   < Hg::fundamental_trait, 0, Hg::TypeList<double, double, double>>
auto ax_val = pt_msg.X;

Clearly, ax_val is not assigned the type double, as would be desired in most circumstances. That is because the compiler actually does get the same type for the auto declared variable to the explicitly declared double x_val.

If that were the end, the result would be a compiler error. However, the proxy class implements a type-conversion function, operator value_type() const, where this is the definition for value_type, typedef double value_type; Now the compiler has the leeway to coerce the proxy object into the declared type, double, and the statement becomes valid.

Another name for proxy classes that have conversion operators like this, are also referred to as, invisible proxies.

What is the solution?

First, recognize that auto is doing exactly what it is supposed to do in this case. Then we can admit that it does not actually arrive at the desired type, a double.

Item 6 from, Scott Meyer's, Effective Modern C++ offers a solution that he calls the explicitly typed initializer idiom. The solution is to explicitly specify the desired type with a static_cast.

C++

// ax_val is now a double and will be assigned the value 1.0
auto ax_val = static_cast<double>(pt_msg.X);

This scenario is most likely to occur when the library you are using allows for a terse and expressive syntax. When you suspect auto is not assigning the type that you expected, take a look at the definition of the class that owns the expression. If the return type is a proxy object you can either explicitly cast to the desired type, or revert to the classic form of type declaration.


Summary

auto has been re-purposed in Modern C++. A cursory glance of the change reveals some benefits. However, the benefits run much deeper than the novelty applications that you can imagine. Adopting a consistent use of auto to declare your variable types will most likely improve your codes correctness, portability, efficiency, and most of all readability. If you have previously considered auto, and decided that it wasn't for you, take a moment to reconsider. auto has great potential.

Make Your Comments Matter

general, communication, CodeProject, maintainability Send feedback »

There are many different philosophies with regards to how source code should be commented. The gamut of these philosophies range from "Every single statement must have a comment." to "Comments are useless; avoid them at all costs!" I am not even going to attempt to explain the disparity of range. However, I will attempt to address and suggest potential solutions the reasons that are often cited for these extreme approaches to commenting code.

To make a decision

I think that it is important to remind everyone that judicious decision making is required to write high-quality software. These are the types of decisions that cannot be easily made by following a flow-chart or a rote memorized rule. If it were that simple to make a decision, programmers would not be in such great demand.

As skilled professionals we often try to distill the rules to simpler pieces of advice to help less-experienced colleagues become better, faster. Unfortunately, information is often lost when we attempt to simplify our experiences into a neat set of rules.

Rules cannot always be reduced to a simple choice to help make the final decision.

Context

A general purpose rule tends to lack context. Context becomes very important once you move beyond the most trivial decisions. Events, statements, and ideas can not always be fully understood without the settings or circumstances involved.

A action such as self-defense cannot be determined without understanding the circumstances and settings involved, the context. Literal jokes are funny because a meaning is interpreted out of context. Context allows us to interpret meanings in forms that differ from their strict literal meaning.

Context adds clarity. Laws attempt to create rules for society and omit the context as much as possible to make the law more generally applicable. A law that is written with more specific terms is limited to fewer situations where it can be applied. This leaves loop-holes for criminals to take advantage of, and when interpreted most literally, innocent people may be punished.

Context cannot always provide the answer.

Everybody knows that!

Skills that once seemed impossible to learn, often becomes natural once we learn to master that skill. As a beginner, the slightest distraction could interrupt our train of thought. Added confusion tends to make decisions that are not familiar more difficult, which could lead to failure.

Skilled masters may not even consciously realize all of the decisions they are making as they perform their skill. Most average adult humans have mastered the skill of walking. Walking and talking, no problem (viral internet videos demonstrate that walking and texting is entirely different). Distracting a toddler learning to walk, or a child learning to ride a bike without training wheels often leads to a crash or fall.

My point is, as we become more skilled we can forget the things that we actually know and naturally do without consciously thinking about it. We may have spent thousands of hours learning and understanding something. Often we walk away with the well earned conclusion, and possibly a few memorable mistakes we made along the way. For most people, it is difficult to remember or imagine not knowing their mastered skill.

This leads to gaps in explanations, or assumptions about when a rule should be applied. Because, "that's just common knowledge" to the skilled master.

Expert's are not always the best people to create guidelines for making decisions. This is especially true when the guidelines are aimed at a group with a wide variation of skill level.

Where's the value?

This is a question that I learned to ask while I was practicing TDD and learning how to unit-test. I soon learned that I could ask this question in many different contexts. Asking and answering this question can help simplify the decision-making process.

For example, when considering a particular approach to solving a code problem, I try to compare the value of the solution to the amount of risk the solution may add. If the amount of risk out-weighs any perceived value, the decision becomes simple; find another way.

To answer this question will not make every decision simpler. However, it often can be used to reduce the set of choices that are available for you to select from.

This question alone has changed how I approach commenting source code. If a modification or addition of code does not add value, why does the code need to be modified. Keep in mind that deleting code can also add value to your program; in the form of improved readability, and less complexity.

Criticisms of Code Commenting

Just about all of the opinions that I have encountered for code comments are valid, for certain contexts. There is no opinion or rule, which I have heard that I believe is absolutely true for all scenarios. Let's take some time to consider these opinions.

Every line/statement must be commented

In all fairness, the commenting rules for this guideline originates from the aeronautics industry. I will not argue with a policy for code that is used in critical applications where major bodily harm or death could occur. I should also add organizations like NASA, where hundreds of millions of dollars is spent, then you have to wait for nine years for your space-craft to reach its destination. You want to be sure that it will work properly when it gets there.

Remember thought, extra comments are not necessarily a good thing. When they don't add value, sometimes they add clutter, other times they use up space, in the worse case they are misleading.

Comments do not get maintained along with the code

I have seen this many times. To make matters worse, it is not always easy to tell which item is incorrect, the comments or the code. For comments to provide value, they must be accurate. In a situation where the comments are completely inaccurate, there may be more value in simply removing the comments rather than trying to update them.

If this argument is considered when paired with the previous guideline, then there is definitely a strong case to avoid comments altogether. Prohibiting the use of a language feature also prevents the possibility of any benefit that feature could have provided. It does, however, require that your organization allows and expects your engineers to think.

The code should be clear and self-documenting

I agree with this statement.

However, descriptive names still have a limit to the amount of information that they can convey. At some point, long names become obnoxious. If not for the inconvenience of the extra typing, then for clutter that is added reducing the readability of the code.

This type of policy does not translate well to all programming paradigms. Structured and imperative languages will have the most success with this approach. Each step is part of a well-defined sequence. Furthermore, while the code sequence may be readable, the purpose for the entire sequence of commands may not be immediately clear. I tend to encounter this most frequently in monolithic functions.

Yes, these are comments... useless comments

Pertinent examples can illustrate this point much better than I could ever describe with words.

Cut-and-Paste to comment

C++

if (SUCCESS != error)
{
  ...
} // if (SUCCESS != error)

The context of the curly brace is already a good indicator that something is ending here. Many developer IDEs can help wiht the rest. For example, Ctrl + ] will jump between matching braces and parenthesis.

There is one exception where I actually do like ending a closing curly brace with a comment. That is at the end of a namespace. I find this is especially useful with deeply nested namespaces. This type of comment is valuable to me because I use it as a visual cue to help navigate namespace-scope.

Captain Obvious

Sky-diving without a parachute is a once in a lifetime experience!

C++

// Increment the index.
index++;

What is the purpose of this code... ?

My first instinct is to look at the header file.

C++

/// @SteamPunk.h
/// This is the declaration of the SteamPunk class.

I am an optimistic-realist, so I will try looking at the class declaration.

C++

/// The SteamPunk class specializes the BasicMachine class
/// for the steam-punk functionality.
class SteamPunk
  : public BasicMachine
{  
  ...
};

Fine. I will figure out what this class does on my own. "What happens when I set a mode and what modes can I set?"

C++

/// Sets the mode.
/// @param[in] mode   - The value of the mode to set
void SetMode(int mode);

None of the comments from this section gave me any useful information. All that they did was duplicate explanations that were clear from the self-documented code. This is a situation where the comments were added solely to satisfy a code-quality checkbox.

We have always done it this way

Source control has been around for decades. However, before it was truly widespread there was a common practice of adding your edit history as a comment to the top of the source file. The way I see it, this is just more clutter. You can glean all of the history information from your version control tools and actually see the changes that were made by viewing a diff-comparison between the versions.

C++

/// 1/15/2002 - ABC: Initial implementation
/// 3/28/2004 - ZED: Added that crazy feature
/// 4/02/2004 - TEL: Fixed ZED's crazy feature
/// 5/14/2007 - PJT: Upgraded the logging system
/// 2/28/2010 - ABC: Fixed that memory leak
/// 8/02/2011 - LOG: Changed to another logging system

Increase the value of your code with useful comments

Again! Remember to ask what value are you providing with each modification. If the comment you are about to write can easily be gleaned from reading the next few lines, your comment becomes less valuable.

Comments should be used to annotate and document your source code in ways that are not possible with code alone.

Document purpose

If you are familiar with the domain for which your program is being written, the purpose of an object may be obvious to you. However, new programmers working with the same code, may not possess the correct vocabulary to work effectively with the source code. Moreover, it could slow down or even prevent other developers from being capable of working within your code.

Document usage

Self-documenting code does not always provide enough information for someone to appropriately use your class or function. The meaning of a function name could be misinterpreted, or the purpose of each input variable may be ambiguous.

There are a number of automated tools that extract specially formatted comments to generate external documentation that you can use for reference independently of your code. Some of these tools are capable of providing so much more than documentation. For instance, Doxygen can generate relationship diagrams for your class definitions. You can also incorporate Mark-down definitions directly in your code to generate up-to-date design documentation.

Clarify intentions

Sometimes the intended purpose for a sequence of logic is just not obvious (probably more than sometimes). When it is not obvious, summarize your intentions. If you make a mistake in your code, it may make it easier to integrate that test for a corner case that you missed if it is clear what the code was intended to do.

Document rationale

We build poor quality code one statement at a time. Later we may try to repent for our sins and clean up the code. However, sometimes there is no reason that is clearly apparent for why code is structured the way it is.

Perhaps atomic operations could replace more expensive synchronization objects like a mutex, if the statements were re-ordered. Since the order of the statements is important, documenting this code structure would be valuable for any future developer that works with this code.

Some times similar statements placed in proximity of each other improves the readability of the code. The prettiest code in the world becomes a useless program if it does not function properly.

Summarize blocks of behavior

I believe one of the most valuable places to comment code is at the beginning of a statement block. Even if the block contains 10 lines of the simple statements that can be easily followed, summarize what the 10 lines accomplish.

If the comments can be trusted, this allows the reader to combine that collection of logic into one abstract notion while reading the rest of the code.

What does the snippet of code in the example below do?

C++

// Creates a linear gradient at a specified angle
bool AngularGradient(...)
{
...
  if (0 == (offset % 2))
  {
    std::swap(cC[0], cC[1]);
    std::swap(alpha[0], alpha[1]);
  }
 
  offset = std::abs(offset - 4) % 4;
  COLORREF clr[4] = { c1, cC[0], c2, cC[1]};
  std::rotate(clr, clr + offset, clr + 4);
 
  USHORT   alphaV[4] = { alpha1, alpha[0], alpha2, alpha[1]};
  std::rotate(alphaV, alphaV + offset, alphaV + 4);
...
}

Here is the comment that I have placed above this section of code. It summarizes an entire block of logic as well as documents my rationale and intent of the behavior for the code:

C++

bool AngularGradient(...)
{
...
  // As the angle's start point changes quadrants, the colors will need to
  // rotate around the vertices to match the starting position.
  if (0 == (offset % 2))
  {
    std::swap(cC[0], cC[1]);
    std::swap(alpha[0], alpha[1]);
  }
  // Remainder of snippet excluded
  ...
}

The code snippet above is from one of my favorite articles discussing the AlphaBlend and GradientFill functions from the Win32 API. I posted it at CodeProject.com a few years ago. Here is a link if you are interested in reading more from that article.Win32 Image Composition[^]

Summary

Programming source code is an attempt to express an abstract thought into a concrete idea. The intended audience for this expressed idea is usually a compiler or interpreter. This can lead to some very cryptic commands, because all that matters in this context is following discrete rules of the programming language. A secondary audience are the humans that will continue to develop and maintain the program.

Code comments are a way for us to express intent, rationale for a decision, instructions of usage, or just a simple reminder. Comments can provide value. However, value is not usually achieved by blindly following a set of rules in a coding guideline. Carefully thought through and conscious decision-making is generally required. As is the case for the source code itself.

Preparing to Know Modern C++

general, CodeProject, C++, maintainability Send feedback »

"To know and not do, is to not yet know"

This Zen mantra has been the signature that I have placed at the end of every entry since I started this blog. This mantra is the impetus of this entry, my decision to know how to use Modern C++.

For the past 12 years I have worked at companies where backwards compatibility and portable software have been crucial for success. This often precludes newer technologies from being adopted for the sake of compatibility. This is unfortunate when considering the recently adopted to C++ standards, C++11 and C++14, which has become collectively known as Modern C++. Portability is one of the most important qualities that I value in software; that is why my personal projects such as Alchemy has not yet incorporated any of the features available in Modern C++.

I believe that this is a mistake, and that I must evolve just as C++ has begun to evolve. This entry is a collection of my thoughts of the features, techniques and challenges that I believe that I will need to learn to become proficient with Modern C++. As much as I can recognize them at this point, I also catalogue some of the rules and idioms from the classic approach that I will need to unlearn to fully take advantage of this new language.

Preparing for the journey

There will definitely be some challenges in this transition. First, some of the topics like, Universal References and Reference Collapsing seem convoluted on the surface. Then as I think I am beginning to understand, I don't think about them for a day or so, and I have lost what I learned. That is why it is important that I apply these techniques while I become familiar with them. Otherwise there is no reasonable hope that I will ever master them, or even use them correctly.

It is important that I have a project on which I can practice these new techniques. I plan on wrapping up some final issues in Alchemy using C++ with TR1 (Classic C++) and then creating a tag. From that point on, Alchemy will be the instrument that I use to learn and master the techniques in Modern C++. I have studied a few resources with regards to Modern C++, such as Scott Meyer's Effective Modern C++ as well as Stroustrup's recent C++11 - The C++ Programming Language.

The second challenge is rooted in the fact that my current success depends on my ability to remain sharp with the use Classic C++. This does not seem like a difficult task on the surface. For instance, simply remember that range-based for loops, auto and decltype are not available in Classic C++. This is an example of when we tend to over-simplify and abstract away the details of the more subtle differences that exist. A glaring example is the difference in in the preferred use of references between the two languages in order to maximize your chances of success with Move Semantics and Perfect Forwarding.

New language, new mindset

Thoughts on C++11[^]

Surprisingly, C++11 feels like a new language: The pieces just fit together better than they used to and I find a higher-level style of programming more natural than before and as efficient as ever...

...My ideal is to use programming language facilities to help programmers think differently about system design and implementation. I think C++11 can do that...
    --Bjarne Stroustrup

I have been tentatively studying the additions to Modern C++ since 2011. I tend to agree more and more with Stroustrup's comments on C++11 with each level that I delve deeper into the additional features added to the language. Programming, in general, can be viewed as a very simple task and skill. At its base, you are only commanding a machine to perform specific tasks one stet at a time to achieve a goal.

Nuances

That notion is where we all make the mistake from time to time, and forget about the underlying nuances that differentiate how each language is used most effectively. Also the nuances that appear based upon the programming domain that you are operating within; application development differs from database development, which differs from low-level algorithm development, which in turn differs from system-level and device driver development. Then there is the world of web-development, automated scripting and the forms of programming that many may not consider programming, such as developing spreadsheets in Excel.

My skills as a software developer have become so natural that I can effectively pick up the syntax for most other programming languages within a few days. Ask me to make some adjustments to a JAVA application, no problem; develop this test utility in C#, sure thing; augment this Python program, shell scripts or make file system, I'm on it. The thing that differentiates me from a well versed JAVA developer is that I do not yet understand the nuances of JAVA well enough to effortlessly create elegant solutions with the language, unlike the seasoned JAVA developer. It is foolish to think otherwise.

Previous Experience

From time to time I think back to a automated GUI test tool that I developed for the GUI team at work that uses C#. I used my knowledge of Windows, and hooked into its User Interface Automation library to create flexible UI tests from generated code in a way similar to the UI test features that first appeared in Visual Studio 2010 Ultimate. My solution was robust, it followed roughly the same principles that I use for object-oriented development with C++, and yet my implementation felt clumsy. It was not elegant. Much of the code was duplicated, or required adjustments in multiple locations with each additional feature type that I added.

I was not accustomed to the nuances of C#. I know best, how to develop in C++. My solutions started with the structure of a C++ program, and I adapted the locations where there was no C++ equivalent. If I stumbled upon a practice in C# that was different from C++, I also adapted those practices as well. However, when I finally handed the tool off to an experienced C# developer that would continue to maintain the tool, I felt the need to apologize for my inelegant solutions due to lack of experience with C#.

I believed there was a better solution than what I had been able to create in C#, because I knew had to steer clear of the troubles that I was running into if I could have developed that tool in C++. I knew how to avoid the code duplication, the multiple locations that must modified for each new type that is added.

Learning the features and syntax available in a new language is simple. Learning how to use them appropriately is the challenge.

Learning a "different" way with familiar tools

Learning a new way to do things is not always better, sometimes it is just, different. In some cases, the better method really depends on the context in which it is used. There are many features in Modern C++ that appeal to me. I have wanted to start practicing with them for a very long time (in technological time). However, my primary IDE and compiler, Visual Studio, has not provided support for most of these compelling features until VS2013. With the anticipated release of VS 2015, I will be able to use all of these desired features.

I realize that most of these features have existed in other compilers for quite some time, such as GCC And Clang. Once again, for a living, the type of software that I develop can rarely make use of the latest technology and newest compilers. Generally I have a parallel make project setup for GCC on another platform, like Linux, that I compile and test along side my Visual Studio projects. Now that my favored tools more completely support Modern C++, I can continue to develop in ways that are familiar to my current processes, as I learn and evolve to be effective with a new form of this language.

What do I find compelling?

If it's not obvious from the theme of my sight and the majority of the topics that I choose to discuss, I prefer maintainability in most circumstances; robustness, security and portability are also qualities that I highly value. All of this goes without saying that correctness is an absolute. An incorrect program is a worthless program.

Many of these topics are chosen directly from Effective Modern C++ because the C++ community helped Scott Meyers select these topics as the features that are likely to be misunderstood. The other features are simply welcome additions that will instantly clean up some of my code, especially my use of templates. Here is the list of topics I plan to explore and document in the near future, in no particular order:

  • Range-based for
  • Uniform initialization and initializer lists
  • Template aliases
  • Variadic Templates
  • auto, declspec and template type-deduction rules
  • constexpr
  • Scoped and strongly-typed enums
  • Delegating constructors
  • Explicit conversion operators
  • Universal references (&&) and reference collapsing
  • Move semantics and perfect forwarding
  • Lambda expressions
  • Query and control of alignment (especially for Alchemy)
  • Threading facilities
    • Multi-threaded memory model
    • Atomics
    • Synchronization constructs
    • Futures and Promises
    • Thread-local storage
  • Chrono

There are some features that I am already familiar with and have used for quite some time. These are primarily the extensions to the C++ Standard Library that were added as a part of the Technical Report 1 (TR1) specification. In other cases, they are features that could generally be approximated, but were not generally integrated into the language itself:

  • static assertions
  • shared_ptr, weak_ptr and unique_ptr
  • type_traits

Some of these topics I have written about previously, and I still may decide to write about the others in the future. Because although the class shared_ptr implies that it is only useful for managing pointers, it can be applied in many more contexts to enforce robust resource management in your programs.

Summary

In a way I feel like I am living in the past watching the future pass me by as C++ continues to evolve. When in fact I have actually been practicing modern C++ development for many years now. Only, now there exists better tools and forms of the language to express these same techniques. The advantages gained by employing these new tools lead to more readable code, more explicit intent by the author, and improved forms of syntax to make things that have already existed more convenient to use; constructs like the std::algorithm library.

I have enjoyed using C++ through-out my entire career. I enjoy having one foot touching the system where each byte counts, and the other at a much higher more abstract level creating expressive constructs that feel naturally integrated within the language. The designers of C++ have always made a conscious effort to remain backward compatible to support existing programs, without considering C, stretches almost 35 years. I think it is important to understand how to effectively apply the latest capabilities of C++; especially when considering its integration into existing programs.

That is exactly what I intend to do.

When are we gonna learn?

communication, CodeProject, design, knowledge, engineering Send feedback »

As you gain expertise you begin to realize how little you actually know and understand. I have found this to be true of most skills. It’s easy to fall into the trap where you believe that you continue to grow your expertise each year, and thus have less and less to learn. Read on and I will demonstrate what I mean.

Books are statically-typed

(With the exception of e-books)

Books are fixed once they’re typeset and sent off to the publisher for printing. When you acquire one of these tomes, you hold a collection of thoughts, ideas, and discoveries taken as a snap-shot in time. On the other hand, the Internet is the present, continually moving towards the future. News stories are updated in real-time. Ideas, typos, and corrections are updated instantly.

We are producing more content than we can consume. Any idiot with access to the Internet (I am one of them) can publish their own thoughts on a blog. Some of us are less ignorant than others. Some blogs, news articles, tutorials or even YouTube videos are informative and worth consuming if you want to improve your craft.

Keeping Current

I like to try to keep current. I am amazed by all of the technologies or ideas that I learn about, that I thought were fairly new, but it turns out they are 15 years old now. It also amazes me when I have discussions with colleagues and some of the information and technologies I have known about for years seem brand-new to them.

A brief digression to pique your curiosity

Something that is similar, but actually a bit off-topic is learning that one of your favorite bands songs is actually a cover of another bands song. The previous bands work may have existed for two years or twenty years before the cover song was recorded by the other band. Yet, you were unaware that the previous song existed. Here are a few examples:

  • Alabama Song (Whiskey Bar) by Kurt Weill (1930);
        covered by Doors (1967)
  • Knockin’ On Heavens Door by Bob Dylan (1973);
        covered by Guns N’ Roses (1991)
  • Jolene by Dolly Parton (1974);
        covered by The White Stripes (2004)
  • School Day by Chuck Berry (1957);
        covered by AC/DC (1975)
  • Sweet Dreams by Eurythmics (1983);
        covered by Marilyn Manson (1995)
  • Tainted Love/ by Gloria Jones (1964);
        by Soft Cell (1981)

  • … And one of my favorites:
  • Hurt by Nine Inch Nails (1994);
        covered by Johnny Cash (2003)

Time to learn

There is great content on the Internet. Social networks like Twitter help bring some of these gems into the spot-light, albeit for a very short time. However, the resources do exist.

Here’s where I shock myself (metaphorically, I’m a Watt, I’m well grounded); after almost two decades of experience I have recognized many patterns of behavior, implementation techniques and development processes that seem to be more successful than others. I may still be trying to summarize and articulate my abstract conclusion, and I read a book, and discover the author is describing the exact conclusion that I had reached! Only I hadn’t yet been able to put my thought into words.

This is crazy, let’s check the printing date of these books:

  • Design Patterns: Elements of Reusable Object-Oriented Software by Gamma, Helm, Johnson, Vlissides (1995)
  • The C++ Programming Language (Special Edition) by Bjarne Stroustrup (2000)
  • The Mythical Man Month by Fred Brooks (1975)!

What’s so special about these books?

The Stroustrup’s C++ and the Gang of Four book’s both have some extremely insightful chapter’s. They summarize a number of conclusions that I have either reached on my own, or that I did not have enough context to properly absorb if I happened to read them the first time. The Mythical Man Month is a collection of essays on software development, in which the wisdom has proven to be timeless.

Let me give a brief overview of the valuable insights that are documented in these books. These are concepts that if they were taught to me in college, I simply do no remember. My best guess is that I didn’t have enough context to relate all of the practical advice that is given especially compared to the overwhelming majority of theory that is taught at universities.

Fast forward one decade… Yes I started to recognize some of these patterns, and I picked up new books to read; yet these three books stand out. I am surprised that the information in these books are not better internalized and practiced in our industry. I know that I trade would be much better off if everyone understood these concepts. This information is not only for software developers, but software project managers and anyone that is involved in making decisions during a development project.

Design Patterns (The Gang of Four)

The most insightful chapter in this book is, Chapter 1. How could I miss that?

Here’s how. If you are like me, you cracked open the book, saw the nicely drawn diagrams, well organized descriptions for each of the 23 design patterns described in the book, and simply skimmed through the patterns when searching for a clean solution.

It’s also very likely that you started reading Chapter 1, saw that the first five sections were primarily bullet-point lists of patterns, a few diagrams. Each of these first five sections are very brief. The section 1.6 starts, and it is dense with text. Very few diagrams, or bullet-point lists, just a lot of text.

So you skipped to the next chapter and started learning about A Case Study: Designing a Document Editor. Scratch that, after looking at the first few pages, you skipped chapter two as well and moved on to the next section entirely, The Design Pattern Catalog. Finally, in chapter 3 you start learning about the creational patterns.

If you did read all of chapter 1, and absorbed every iota of valuable wisdom, you can skip to the next book, but be sure to read the summary. For the rest of us, unless you had any amount of experience with object-oriented software development when you read this chapter, the majority of the value did not have a chance to sink in.

Here is what you may have missed, and why it is worth your time to revisit these few sections of Chapter 1 in the Gang of Four’s Design Patterns.

Section 1.6: How design patterns solve problems

The first few pages of this section describe how design patterns will help you solve problems. If you have done even a small amount of object-oriented development this material will seem obvious to you, and you have that urge to skim and skip. Don’t do it!

About five or so pages into this section, pg. 16 on my copy, there is a section titled Class versus Interface Inheritance. This is where the good stuff begins. The authors make a distinction between inheriting from a class to reuse existing functionality and inheriting from an interface to become compatible with an existing implementation. While both can facilitate polymorphism, inheriting from an interface is more likely to produce code that is truly polymorphic.

With regards to C++, public inheritance is the equivalent of inheriting from an interface. However, the lines are blurred, because if you’re actually inheriting from a class publically then you get both benefits, code reuse and polymorphic compatibility. The trouble begins when you begin to override the functions of the base class and alter the meaning of the interface. This breaks the concept described in Liskov’s Substitutability Principle (LSP).

Public Inheritance leads to a subtype

LSP helps guide you to a solution that allows you to easily replace a base-class instance, with any of its derived instances, and no special processing is required. This is the ultimate goal of polymorphism. When it is achieved, the payoff is enormous; it takes the form of elegant solutions that require very little code. If you adhere to LSP when you created a derived class, you are creating a subtype of the base class.

Robust and elegant polymorphism depends on creating proper subtypes. It is only possible to create a subtype of a base class when your derivation is public. The should be an immediate clue to know how to proceed. When you use protected or private derivation, you are only using deriving for the benefits of code reuse from the base class.

Taken to the extreme, this means it is a good practice to public derive from abstract base classes, classes in which every virtual function is declared pure virtual. This has the downside that you lose the benefit of immediate code reuse through derivation. However, your intention is all that more clear that you intend for this new class to be a subtype of the base class.

This section has much more to offer. It continues to discuss how to put the various reuse mechanisms to work; recommends choosing composition over inheritance, and even mentions inheritance versus parameterized types (generic programming). I recommend that you revisit this chapter, especially this section at least once a year.

As for the other two sections, 1.7 and 1.8, they briefly describe how to select then use a design pattern. These are brief and worth reviewing as well. Remember, a design pattern is simply that, a pattern. They are less effective if they are used as pre-written solutions expected to be dropped into place and used.

Software Design Patterns are not the solution to your problem. Rather, they provide guidance on how to create a robust solution. It’s important to remember that.

The C++ Programming Language (Special Edition)

That’s right! The special edition version is very important. This is the only version of the book that I am aware of that contains the three chapters that eloquently and succinctly summarize how to improve your software design, development and project management skills.

Part IV: Design Using C++

There are three chapters in this section. They start from a high-level, and each chapter moves its focus to topics that are more closely related to code, specifically C++:

  • 23. Development and Design
  • 24. Design and Programming
  • 25. Roles of Classes

Every time I return to these chapters, I need to pick up a new color of highlighter because I latch on to something that I missed in all of the previous times that I read these chapters. I am certain that eventually every word will be highlighted as I continue to return and read them.

Highlighted

It is also helpful to lend out your copy and allow others to highlight sections that they find salient.

Other's highlighted

Scott Meyers Effective C++ is one of my most valued books on my shelf. I would dare say that when these three chapters are considered alone, they would be the most valuable collection of pages regarding software development that I would refer others towards. Thank you Dr. Stroustrup.

There is too much information for me to even attempt to scratch the surface. Overall, there is only about 100 pages total. I will however indicate some of the important concepts that Dr. Stroustrup gave words to things I had been thinking for years, but did not know how to organize them so succinctly. Chapter 23 is by far the most abstract, yet valuable content if this section.

Chapter 23: Development and Design

This chapter is a collection of observations of software development for organizations and individuals. One of the most important concepts to help guide any designer and developer is, “to be clear about what you are trying to build.” So many projects fail because they never establish that goal. There is no cookbook, or as, Fred Brooks, states silver bullet to replace intelligence and experience for design and development. Iteration is important in software development, especially experimentation and prototyping.

This chapter is what inspired a previous blog entry that I wrote Software Maintenance is a Myth[^]. Software does not fall apart or need regular maintenance to stay in good working order. Every time you make a change to a functional piece of software, you are performing a mini-engineering and development cycle.

There are brief discussions on the use of models, experimentation, the reuse of code, testing, and efficiency. Best of all, there are a few comments regarding project management. My favorite statement of all; it is added as an annotation at the bottom:

An organization that treats its programmers as morons will soon have programmers that are willing and able to act like morons only.

section 23.5 Management
I smile every time I think of that comment; I shake my head; then I go back to acting like a moron.

There is one last priceless piece of advice and direction from this chapter for anyone that is interested in improving their organization. I cannot paraphrase it, or state it any better that the words directly from the book:

Where innovation is needed, senior technical people, analysts, designers, programmers, etc., have a critical and difficult role to play in the introduction of new techniques. These are the people who must learn new techniques and in many cases unlearn old habits. This is not easy. These individuals have typically made great professional investments in the old ways of doing things and rely on successes achieved using these ways of operating for their technical reputation. So do many technical managers.

section 23.5.3 Individuals

The thought continues to emphasize the importance of communication. Communicate to both alleviate the fear of change, to avoid underestimating the problems that exist by sticking with the old ways, as well as not over-estimating the benefits of the proposed new ways.

Chapter 24: Design and Programming

Chapter 24: Roles of Classes

There is not much that I can really add that will be revolutionary for chapters 24 and 25, except that they are brief overviews of C++ and how to begin applying the concepts that are presented earlier in the book. Regardless of the number of other sources that you have read to learn and understand object-oriented development, I am almost certain you will find something new to add to your wisdom and experience.

The Mythical Man Month

This book is a collection of essays on software engineering. Some of the essays are based on case studies, such as Mr. Brooks experience at IBM as he took over the development of the OS/360. Other essays are based on observations and conclusions related to personal experience or studies performed by others.

Your obligation

If you have not read the Mythical Man Month you owe it to yourself, and the rest of us to read it. Just multiply all of the monetary values by $8.00 to account for the total inflation rate from 1960 to 2014 of 700%. With those adjustments, it will feel like you are reading a book that was printed just last year (for the first time at least).

Brook’s Law

Brook’s Law:

“Adding manpower to a late software project makes it later”
Frederick P. Brooks

Brook’s Law originates from the chapter with the same title as the book, The Mythical Man Month. The rationale behind the concept is that time that the trained engineers currently producing on the project, must stop and focus efforts on training the new engineers to be productive. It is also important to consider the number of sequential versus independent tasks for the creation of a product schedule. Sequential tasks inherently limit the speed of the schedule based on previous dependencies.

An Enormous Source of Wisdom

This book contains many ideas, analogies, observations, conclusions and empirical evidence to help educate a software professional to become better in their craft. Conceptual Integrity in the design architecture is a common theme repeated throughout the book. Brooks describes the concept of a development team that is structured and as effective as a medical surgical team. The essay No Silver Bullet discusses how no project will duplicate the exact same circumstances of previously successful projects. Therefore you must always be willing to adjust and adapt to the differences if you are to succeed.

There are also commentaries on the “Second System Effect", in which you build one to throw it away. In the 20th anniversary edition of the book, Brooks re-evaluates the topics and his ideas from 20 years earlier, and states that quicker feedback loops are a better solution. This means smaller development cycles, and continually feedback through testing and integration while the system is still under construction. Ideas very similar to agile thinking.

I think most importantly, Brooks continues to underscore how important effective communication is between the management, technical leadership, software developers and every other role in the project. Communication relates to another well-known law:

Conway’s Law:

“Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.”
–Melvin Conway

It appears that this statement was dubbed in 1968 at the National Symposium on Modular Programming, the same conference in which Larry Constantine introduced the concept of Coupling and Cohesion[^].

… And now to the point

I read and hear over and over, “There has got to be a better way of doing things, and here it is…". I have read quite frequently that Computer Science and Software Engineering are in their infancies compared to other engineering disciplines. Every now and then Software Engineering is proclaimed to not even be an engineering discipline, it is an art or a craft; anything but science. I don’t want to defend or dispute any of these claims.

What I really want to say:

WTF!

We continue to repeat the same mistakes that we were making 50 years ago. To add insult to injury, it is costing companies more money, and there are many more of us making these exact same mistakes. In fact, labor statistics forecast there are a shortage of software professionals; companies need more of us to repeat these mistakes and waste their money.

Software is a profession where a little bit of knowledge, even less experience, and absolutely no wisdom can be dangerous; probably any mixture of those three qualities. It seems that we spend entirely too much time inventing the Next Big Technology and less time learning sound practices and skills. Some programmer, or company thinks that the existing programming languages are the problems, “Things would be better if we left out this feature, but added this feature, and then slap some lipstick on it along with a snazzy name.”

I am not a civil engineer, but even in my studies of computer science we were taught the lesson of the Tacoma Narrows bridge. I would imagine that civil engineers perform an even deeper case study of that engineering anomaly as well as others. Possibly even perform experiments and study other projects that were both successful and unsuccessful.

Much of the knowledge and wisdom imparted by these resources are wasted if the reader does not have enough experience to associate context for these concepts that are being described. That is why I suggest you revisit the books in your personal and corporate libraries. Keep track of authors of blogs that you find valuable, even if all of their content doesn’t seem to apply to you. You may happen to revisit an entry in the future that you are ready to absorb because you have new experiences that apply.

Summary

Basically, I’m frustrated for two reasons, 1) I had to stumble upon this information many years too late, 2) I may have actually been presented with this information earlier in my career and I was not mature or wise enough to recognize its value.

For those of you that I said it was probably ok to skip to the summary, please do a better job of spreading this knowledge around. Work at becoming a mentor. If you don’t like mentoring, I know, sometimes it can be frustrating, point out the info to the engineer that does enjoy mentoring. The surprise to me in all of this is that I stumbled upon this nuggets of knowledge even though some of them had been in my library for at least a decade. It was not until I had enough experience and was ready to absorb this wisdom, or even had some context with which to associate the advice.

Unit Testing a Singleton in C++

adaptability, CodeProject, C++, unittest Send feedback »

I have written about the Singleton[^] before. As a quick review from what I previously stated, I don't think the Singleton is misunderstood, I think it is the only software design pattern that most people do understand. Those that call the Singleton an anti-pattern believe that it is overused. It's simple enough in concept compared to the other patterns, that itself may explain why it is used so often. Another criticism that I hear is that it is difficult to unit-test, or at least unit-test properly with a fresh fixture for each test. No, it's not, and I will demonstrate how.

Note: There are two solutions in this entry. The first is a convoluted exploration of searching for a solution that works. The second is a simple solution that is created by approaching the problem from a new direction.

What's the problem

To create robust unit-tests, a best-practice is to entirely tear-down any resources related to your System Under Test (SUT) object before you run another test. Each new test is run in what is referred to as a fresh-fixture. A fresh-fixture is a clean test environment where you control every aspect of the environment for your test. Therefore, you can reduce the possibilities of side-effects interfering with your tests. You are basically independently testing every aspect of your SUT.

The difficulty that we encounter when attempting to test a Singleton, is that quite often they are created as global variables at startup. That is, even before your main entry-point function is called. With the improved version of the Singleton in which you declared a destructor, the problem becomes even more difficult.

Testability starts with your implementation

One of the most important factors that affects your chances of successfully creating a robust test harness around your software, is your software's implementation. This includes the public interface, the details that you abstract from the user and the data that you encapsulate within your implementation. One of the strengths of object-oriented programming that I think is often overlooked, is your ability to encapsulate the data, is what gives you the power to enforce the invariants of your classes.

One of the invariants of the classic Singleton, is that at most only one instance will exist for the system. How we choose to enforce that invariant will have a great affect on our ability to properly test the Singleton. Here is a classic implementation of the Singleton:

C++

// Source: Design Patterns (GoF)
// Page:   129
class Singleton
{
public:
  static Singleton* Instance();
protected:
  Singleton();
private:
  static Singleton* m_pInstance;
};

Since there is a static member variable the data field must be declared in a .cpp file, or in some other way to guarantee only one instance exists when the linker combines the object code. Here is the rest of the implementation:

C++

Singleton* Singleton::m_pInstance = 0;
 
Singleton* Singleton::Instance()
{
  if (0 == m_pInstance)
  {
    m_pInstance = new Singleton;
  }
 
  return m_pInstance;
}

Demonstration of Usage

Before I demonstrate the testing portion, let's demonstrate the usage of this object so that we can better understand the problems we will be faced with when we attempt to test the object.

C++

// When a caller wants to reference the object,
// they use the static member function Instance().
Singleton* pObj = Singleton::Instance();
 
// However, these usages will generate compiler errors,
// because the constructor is not declared public:
 
Singleton second;  // Error    
Singleton* pThird  // Error
  = new Singleton();

There are problems with this implementation

This implementation has problems, and we haven't even reached the part where we see if this can be tested. No explicit destructor has been declared, therefore the user is free to call delete on the Singleton. An explicitly declared destructor is a necessity to create a safe Singleton implementation.

C++

// Without a destructor explicitly defined,
// the user can do this:
std::shared_ptr<singleton>
  p_sharedObj(Singleton::Instance());
 
// As well as this.
delete Singleton::Instance();

In both of the previous scenarios, the compiler is fine with the way the user interacted with the object. However, you will have a dangling pointer and no way to recover from it. You have a few options.

  • Declare the destructor public and assigned the instance pointer back to zero. With this case, testing with a fresh fixture is once again possible and this article become a moot point:
  • Declare the destructor non-public, and the user will not be able to explicitly delete the Singleton instance. I would suggest protected access, because it will leave the possibility to subclass your Singleton if you choose to.
  • Use the C++11 way to remove the destructor altogether:
  • C++

    ~Singleton() = delete;

This implementation can be tested if...

If you selected the second option above for declaring your destructor and you declared it with protected scope, there is hope of performing best-practice unit-testing on this Singleton. However, it's not pretty. Remember, "Testability starts with your implementation."

If you have declared your destructor protected, we can derive a SingletonTest class to give us access to the destructor. Now for the ugly part. You must define your destructor to be of this form:

C++

~Singleton()
{
  if (0 != m_pInstance)
  {
    // Add all of your other destruction
    // responsibilities at this point.
 
    // Hold a copy of the singleton pointer,
    // and zero out the static member-data
    // before calling delete on the object.
    Singleton* pTemp = m_pInstance;
    m_pInstance = 0;
 
    delete pTemp;
  }
}

That's a curious bit of code isn't it? Why not just call delete m_pInstance?

Because, we are dealing with a paradox.Think about it.

What type of pointer is m_pInstance?

A Singleton class pointer.

To which class does the destructor with this code belong?

The Singleton class.

... and we typically get into a classes destructor by calling...

delete!

So how did we get into the destructor for the class Singleton if we haven't yet called delete on the only instance of this class? Furthermore, what would happen if we just deleted the pointer, then zeroed it out?

The Reveal

This is the part where I give you the Ocean's 11 style breakdown. We'll start with what you saw, or at least think you saw.

As I mentioned, we will start with a sub-class derived from the original Singleton.

C++

class TestSingleton
  : Singleton
{
public:
  // We don't even need a default constructor
  // or destructor because the compiler gives us one.
};

We create a Setup and Teardown routine for this test suite:

C++

TestSingleton *pTest = 0;
void Setup()
{
  pTest = new TestSingleton;
}
 
void Teardown()
{
  delete pTest;
  pTest = 0;
}

Here is a typical logical flow of a few tests in our test harness were structured like this:

C++

// Test 1
    Setup();
    Singleton* pObj = Singleton::Instance();
    // Testy things occur here
    Teardown();
 
    // Test 2
    Setup();
    Singleton* pObj = Singleton::Instance();
    // Now testier things occur
    Teardown();

If we instrumented the constructor and destructor for both the Singleton and TestSingleton objects, we would see something like this:

Test 1
  Construct Singleton.
  Construct TestSingleton
  Construct Singleton.
  Destroy TestSingleton
  Destroy Singleton.
Test 2
  Construct Singleton.
  Construct TestSingleton
  Construct Singleton.
  Destroy TestSingleton
  Destroy Singleton.

Why is the constructor for the Singleton getting called twice? That is because it is getting called once when the TestSingleton is created, and once when the real singleton is created by calling Singleton::Instance(). As simple as the class is now, my guess is that would not work so well on a production class singleton, especially if the object was constrained due to system resources.

We simply cannot allow the constructor of a singleton to be called twice. Furthermore, all of that fancy guard code I told you to put into place in the destructor is there to prevent an endless loop of destruction from being called. Honestly. It's in an endless recursive loop that will continue until a stack-overflow occurs. We can't really have the destructor of a singleton to be called twice either. The way we structured the Singleton::~Singleton() function is about as clean as we will be able to get.

So how do we avoid double construction, enter the paradox, and get away before Terry Benedict realizes we got away with all of his money? We'll use a pinch. Just kidding, we are going to basically use the TestSingleton as a Trojan Horse to get us access to the protected destructor of the base Singleton class. However, we will not ever construct the TestSingleton. Also, the way I instructed you to organize the destructor of the Singleton will make it safe, for the time being.

The Trick

We can allocate memory for an object, and avoid calling its constructor by calling operator new directly. Yes, you can do that. It essentially performs the same action as calling malloc. To make this work, we need to make three adjustments.

  1. Change Setup to allocate memory without calling the constructor:
  2. C++

    void Setup()
    {
      pTest = (TestSingleton*)operator new(sizeof(TestSingleton));
    }
  3. Teardown to free memory without calling the destructor:
  4. C++

    void Teardown()
    {
      pTest->DestroySingleton();
      operator delete(pTest);
     
      pTest = 0;
    }
  5. Publish a method to delete the Singleton from the TestSingleton:
  6. C++

    void DestroySingleton()
    {
      // Here is where the paradox originates...
      // We are invoking the destructor directly.
      // Which in turn will invoke ~Singleton().
      this->~TestSingleton();
    }

Watch the solution for yourself:

C++

int main(int argc, _TCHAR* argv[])
{  
  cout << "Test 1\n";
  {
    Setup();
    Singleton* pObj = Singleton::Instance();
 
    cout << "pObj is " << pObj << ".\n";
 
    Teardown();
  }
 
  cout << "Test 2\n";
  {
    Setup();
    Singleton* pObj = Singleton::Instance();
 
    cout << "pObj is " << pObj << ".\n";
 
    Teardown();
  }
 
  return 0;
}
}
Run this code

Output:

Test 1
  Construct Singleton.
  pObj is 00F9E348.
  Destroy TestSingleton
  Instance is 00F9E348.
  Destroy Singleton.
  Instance is 00000000.
Test 2
  Construct Singleton.
  pObj is 00F95CA0.
  Destroy TestSingleton
  Instance is 00F95CA0.
  Destroy Singleton.
  Instance is 00000000.

Admittedly, that is a very convoluted solution, and it even made our original code more fragile for maintenance. There are much better ways. In order to find this solution, we will need to approach the problem from a different direction.

Testability starts with your implementation

One of the most important factors that affects, yes, yes, yes; I already explained this, and hopefully I just proved my point. The implementation for that Singleton sucks (rates low on the testability scale). Here is a much simpler implementation, and solution for testing, which does not resort to using friend.

If you are considering adding friend to make a class testable, Don't do it. There is always a better way when testing.

Start with the same header implementation for your Singleton. Place the implementation for the Singleton's static members from above in their own file, separate from the main Singleton.cpp implementation. For example, a file called Singleton_instance.cpp. Now structure your code similar to below:

A better way, and simpler too

When it comes time to test, replace the file Singleton_instance.cpp with one that provides flexibility to erase and create new instances of the Singleton. For example, Test_singleton_instance.cpp:

C++

enum TestPhase
{
  k_unknown = 0,
  k_setup,
  k_testing,
  k_teardown
};
 
TestPhase  g_testPhase = k_unknown;
// ... Same definition for m_pInstance
 
Singleton* Singleton::Instance()
{
  if (k_teardown == g_testPhase)
  {
    delete m_pInstance;
    m_pInstance = 0;
 
    return 0;
  }
  // ... continue with normal Instance logic.
}

Only two last adjustments to Setup and Teardown:

C++

void Setup()
{
  g_testPhase = k_setup;
  // Add any other necessary setup code.
  g_testPhase = k_testing;
}
 
void Teardown()
{
  g_testPhase = k_teardown;
  Singleton* pEmpty = Singleton::Instance();
 
  assert(pEmpty == 0);
}

We have just introduced a way to get access to the Singleton's non-public destructor, and we didn't even need to resort to some of the subversive techniques that are possible in C++. All that was required was to use two compilation units to implement the Singleton, rather than one as is traditionally done.

If you're concerned that we didn't test the Singleton::Instance() function, simply create a new test harness for this portion of the object implementation. With the original implementation only two tests are required.

  • Verify the object is created correctly on the first call.
  • Verify the same object is returned on each successive call.

Summary

We solved a problem that seemed difficult, if not impossible at first without compromising the integrity of the SUT. With an open mind, and a bit of ingenuity, we were able to find a very clean solution. More importantly, we did not arrive at our clean solution on the first attempt. Software development is an iterative process. If you are not satisfied with your first solution, explore other possibilities. Especially for C++.

Alchemy: Benchmarks and Optimizations

general, CodeProject, C++, Alchemy, engineering, optimize Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs automated data serialization with compile-time reflection. It is written in C++ using template meta-programming.

Benchmark testing and code profiling is a phase that can often be avoided for the majority of development. That is, if you develop wisely. Selecting appropriate data structures and algorithms for the task at hand. Avoiding pre-mature optimization is about not getting caught up on the minute details before you even have a working system. That doesn’t mean to through out good decision making altogether. Well I have reached the point in Alchemy, where I have a feature-set that is rich enough to make this a useful library. This entry chronicles my discoveries for how well Alchemy performs and the steps I have taken to find and improve the areas where improvement has been required.

Benchmark Tests

To get started, I simply wanted to capture the performance of a different collection of the Alchemy supported types in a few arranged structures.

Each of the tests are run across a pre-allocated buffer of 512 MB. The structures for each test will extract as many messages as they can from the 512 MB buffer. Each message that is extracted will:

  1. Deserialize the message from the buffer
  2. Convert the byte-order
  3. Serialize the message into a different buffer

This is a summary of the size of each structure and the number of iterations that were performed over the 512 MB block of memory:

Test Name

Struct Size

Iterations

basic 14 28247922
packed 7 76695844
unaligned 19 28256363
complex 86 6242685
array 1024 524288
no_conversion 64 8388608

Here is a brief overview of the tests that I selected to exercise the basic features of Alchemy. I will also continue to add more tests over time.

1) Basic

This message structure contains a collection of fundamental types, where all of the types are aligned on a minimum of a two-byte boundary.

C++

//  *******************************
struct Basic
{
  int32_t         i32;
  uint32_t        u32;
  int16_t         i16;
  uint16_t        u16;
  char            ch;
  uint8_t         u8;
};

C++

//  *******************************
HG_BEGIN_FORMAT(Basic,
  HG_DATUM(int32_t,   i32),
  HG_DATUM(uint32_t,  u32),
  HG_DATUM(int16_t,   i16),
  HG_DATUM(uint16_t,  u16),
  HG_DATUM(char,      ch),
  HG_DATUM(uint8_t,   u8)
);

2) Packed

The PackedBits test is not complete. Currently I only define the bit-fields for Alchemy because I wanted to see how well they performed for basic serialization. Also, all of my test implementations follow a similar pattern with a simple loop. I will need to devise a test that accesses the different bit-field values from these structures to test the access speed of the data. For now, they are simply processed in bulk as integers.

C++

//  *******************************
struct Packed
  uint32_t        set_a;
  uint16_t        set_b;
  uint8_t         set_c;
};

C++

//  *******************************
HG_BEGIN_PACKED (uint32_t, SetA)
  HG_BIT_FIELD   (0,   fifteen, 15)
  HG_BIT_FIELD   (1,   two,     2)
  HG_BIT_FIELD   (2,   five,    5)
  HG_BIT_FIELD   (3,   eight,   8)
  HG_BIT_FIELD   (4,   one,     1)
HG_END_PACKED
 
//  *******************************
HG_BEGIN_PACKED (uint16_t, SetB) // ...
HG_BEGIN_PACKED (uint8_t, SetC)   // ...
 
//  *******************************
HG_BEGIN_FORMAT(Packed,
  HG_DATUM(SetA,            set_a),
  HG_DATUM(SetB,            set_b),
  HG_DATUM(SetC,            set_c)
);

3) Unaligned

While I have seen most message structures have taken care to try to properly align the different parameter sizes on proper boundaries, it is still possible to receive data that is not properly aligned in memory. This test purposely offsets the alignment of the data structure so none of the fields are properly aligned.

For those that are unaware, the x86 architecture is very accommodating and will not crash for unaligned memory. It will happily perform two partial reads, then OR your data together. Of course there is a huge speed penalty to rely on this behavior.

C++

//  *******************************
struct Unaligned
{
  char            ch;
  uint32_t        u32_a;
  uint32_t        u32_b;
  uint32_t        u32_c;
  int16_t         i16_a;
  int16_t         i16_b;
  int16_t         i16_c;
};

C++

//  *******************************
HG_BEGIN_FORMAT(Unaligned,
  HG_DATUM(char,            ch),
  HG_DATUM(uint32_t,        u32_a),
  HG_DATUM(uint32_t,        u32_b),
  HG_DATUM(uint32_t,        u32_c),
  HG_DATUM(int16_t,         i16_a),
  HG_DATUM(int16_t,         i16_b),
  HG_DATUM(int16_t,         i16_c)
);

4) Complex

Complex is the test that contains a bit of everything, except a vector. All of the previous types are included in the format as well as an extra 32-bit integer. Furthermore, the Basic structure defined from a previous test is defined as an array. The current number of elements in the array is 10. This can be changed at the top of the benchmark type definition file.

C++

//  *******************************
struct Complex
{
  uint32_t        seq;
  Basic           basic[k_complex_basic_count];
  Packed          bits;
  Unaligned       unaligned;
};

C++

//  *******************************
HG_BEGIN_FORMAT(Complex,
  HG_DATUM(uint32_t,                  seq),
  HG_ARRAY(Basic, k_complex_basic_count, basic),
  HG_DATUM(Packed,                    bits),
  HG_DATUM(Unaligned,                 unaligned)
);

5) Array

This array test was added fairly recently. It contains a single field that is an array of 32-bit integers. The current size of this array is 256 elements. Similar to the array in the complex test, this value can be change by adjusting the global variable.

C++

//  *******************************
struct Array
{
  uint32_t        items[k_array_test_count];
};

C++

//  *******************************
HG_BEGIN_FORMAT(Array_test,
  HG_ARRAY(uint32_t, k_array_test_count,   items)
);

6) No Conversion

This scenario contains 16 individual 32-bit integers, and the byte-order conversion step is omitted from the test. However, the data is still read in, copied, then written out to a separate buffer. As you will see in the results, this is one are that Alchemy has a lot of room for improvement.

C++

//  *******************************
struct NoConversion
{
  uint32_t            ch_0;
  uint32_t            ch_1;
  // ...
  uint32_t            ch_14;
  uint32_t            ch_15;
};

C++

//  *******************************
HG_BEGIN_FORMAT(NoConversion,
  HG_DATUM(uint32_t, ch_0),
  HG_DATUM(uint32_t, ch_1),
  // ...
  HG_DATUM(uint32_t, ch_14),
  HG_DATUM(uint32_t, ch_15)
);

What a waste of time!

I completed the first set of features, and even received feedback from a production environment and used it to improve my design and implementation. It was time to create some benchmarks. I wanted to verify that I created something unique and useful, and that any trade-offs that existed were well worth the cost.

Alchemy was designed to be very simple. It should be simple to define a packed buffer format for binary wire transfer, portably and safely between a variety of different computer architectures. Maintainability was the most valued quality for this library. For those qualities I felt it would easily be worth a 10% premium over a convoluted and hand-written solution that was possibly more efficient.

I was not prepared for the results that I received the first time I ran my 4 simple benchmark tests.

Denial led me to run the same program without any changes at least three or four more times, “It’ll straighten itself out, right?!”

What?

The results were consistent each and every time.

I calmly stood up from my desk, said out loud “What a fucking waste of 18 months!", and I walked away in disgust.

Alchemy was over 100 times slower than the naïve implementations that I wrote by hand. 100 time slower than the type of implementations that I had grown sick of maintaining for at least 10 years.

Take a walk, breathe, relax

I admit, I got a little extremely worked up. I was anticipating great things. I gave myself a little bit of room for disappointment just-in-case Alchemy was slower, but that margin was only 10%, not 2 orders of magnitude. Cognitive Dissonance doesn’t sit well with me. So the new relaxed persona that I brought back to my desk, sat down and simply dug in to see what could possibly be the issue.

Was it a debug build?

No.

I already experienced that in the first phase of my grief. I did see that I had initially compiled it in debug then watched and waited, and waited as it slowly chugged through the tests. Then I thought “woah, I can’t wait to see how long the hand-written version takes.” I blinked, and it was done.

Yes, debug builds of Alchemy are extremely slow because of all of the layers of one-statement abstractions. I will give more details in the summary.

I reconfigured for release, and I quickly worked passed denial. I already told you about my experience with anger.

Like comparing apples and oranges…

I really can’t tell you what happened to both bargaining and depression. They’re probably repressed memories now. Either way, when I returned, I had already accepted that I would need to investigate. I have also learned through years of experience that there is always a better way. Better in this case would mean faster.

What did I find?

After only five minutes into my search I recognized that I had brought dynamic-memory of the buffers completely into the Hg::Message object. I was using completely static buffers on the stack for the hand-written variant. I reasoned that had to account for a decent portion of the discrepancies. So yes, I basically tested a comparison of two different circumstances, apples vs. oranges, static vs. dynamic memory.

What did I do?

I did not have any mechanism in place to actually allow the user to provide the buffer for the Hg::Message to work with. An use case example would be a buffer stored on the stack that the caller wanted the packed result to be copied directly into. I did have the concept of a Hg::StoragePolicy, which allows the user to manage the source of allocated memory as well as provide rules for how to interact with buffers. However, to implement that solution would be a kludge that would result in an awkward API for the user.

I decided to simply overload the data member function of the message object. This second version would allow the caller to pass in a buffer and it’s size. With a few internal adjustments to the Hg::Message implementation I was able to make both versions transparently call the function that performs all of the actual work.

It turned out that I did need to create a specialized Hg::StoragePolicy for static data and a specialized Hg::MessageBuffer implementation. However, with the overloaded member to accept static buffers supplied by the caller, I saw an immediate increase in performance, an order of magnitude to be precise.

Fantastic! Alchemy is only 10 times slower than the naïve approach. Ignorance truly is bliss, because I was thinking “Hey, it could be worse! Like, 100x as slow.”

A Closer Inspection

After comparing the relative percentage differences in performance, it was clear my BitList implementation was absolutely killing performance. It was like an anchor holding back a drag-racer. Since I used BitList types in two of my tests, I decided to improve that implementation next.

This is the teaser, the full-details will be revealed in the write up for my third, and current implementation in use for the BitList type. Along the way I also decided to renamed it PackedBit. Because essentially that is what the data-type does, and it helps separate itself from the notion of bit-fields altogether.

I discovered the virtual anchor that was halting performance was the growing collection of references that were added to the PackedBits structure. There was one new reference for each sub-field that was added. I allow for a maximum of 32 sub-fields in the PackedBits structure. Therefore, the amount of unique data fields packed into a single structure was directly related to the decrease in performance.

After I resolved this issue, things were starting to look up again, because now I was only 4x slower. In percentages, 400% slower is 40x slower than I was expecting and hoping.

Time to profile

You know you are in the optimization phase when you turn to a code profiler to determine what your program is spending all of it’s time doing. It turns out, the majority of the time for both runs is spent in standard library calls, especially ::memcpy. That wasn’t a big surprise to me, although I was surprised that the amount of time was 60%.

Excluding standard calls and digging deeper what did I find?

  • Copies, lots of copies; by the copy constructor no-less.
  • Empty object creation and initialization for fields that were never used.
  • More copies. This time by other methods of duplicating data in Alchemy.

There was a single cause for all three inefficiencies

What was the cause you ask?

Stupidity? No, and that’s harsh.

More like ignorance, or not completely understanding a language feature. There were a few cases of oversight as well.

My use of references

I like references, because they simplify code in many circumstances. Just like pointers, they have limitations and some scenarios to be aware of. However, if your data will live on the stack while it is passed into a function as a parameter, they are fantastic. I have even converted verified pointers to references to make my syntax and code cleaner.

Now realize that a reference looks and behaves exactly like a pointer in hardware. It is simply a bit of syntactic sugar that eases the semantics of our code. I was well aware of this during my development.

My slip up occurred on the template functions that I created to request fields by index. The compiler wasn’t happy enough with me simply passing in a single number with no parameters to deduce which template instantiation I was actually after. So I relented, and added an unused parameter to the function declaration. I have noticed that in this context pointers-to-the-type were generally used.

I was stubborn and stuck with the references, because I like them better when I can use them. I initialized my code with a default instance of the field-type, and moved on to the next task. Here is the problem. A Datum entry is defined similarly to this inside of a message format structure:

C++

// For this example:
//   Hg::TypeAt<0,format_type>::type is uint32_t.
typedef
  Hg::detail::DeduceProxyType
    < 0,
      format_type
    >::type                   Proxylen;
typedef Proxylen::datum_type  datum_len;
 
Proxylen                      len;
 
datum_len& FieldAtIndex(const datum_len&)
{
  return *static_cast<datum_len *>(&len);
}

There are a few typedefs that simplify the declarations for data types and functions. One of them is the named-variable the user interacts with, which is actually a proxy object. Then there is a function called FieldAtIndex that facilitates the compile-time reflection, and allows the address of the field for this index in the TypeList to be returned. For a uint32_t, like in this example, the default constructor will basically create a zero filled buffer when called by the parent container.

My reasoning failed to account for the large nested structure types and arrays for which I added support. FieldAtIndex was another function that was reported as a bottle-neck by my profiler. When I stepped through the code in the debugger I realized my mistake.

References cannot be NULL. When I would instantiate the function as shown in the example below, a temporary object was being created, always. This is very inefficient for something that is never even referenced.

C++

template< size_t IDX>
Datum<IDX, format_type>& FieldAt()
{
  typedef Datum   < IDX,
                    format_type>    datum_type_t;
 
  return this_type::FieldAtIndex(datum_type_t());
}

Now I understand why pointers are used in this context. I modified my implementation to use a pointer in this declaration and I also changed the way this function is invoked.

C++

// New Declaration
datum_len& FieldAtIndex(const datum_len*)
{ /* ... */ }
 
// New Instantiation
template< size_t IDX>
Datum<IDX, format_type>& FieldAt()
{
  // ...
  return this_type::FieldAtIndex((datum_type_t*)0);
}

I search for any unnecessary copies

I fired up the benchmark test again, and saw another enormous boost in performance. I continued to combed through the code looking for inefficiencies, then running the tests. Slowly continuing to improve the overall performance. Each iteration I made sure that I assigned values to references if they were only temporaries, or were going to be passed into another function that requires a reference. I found a handful of cases. These were primarily located in my specialized implementations of the byte_order, pack, and unpack calls.

Finally, I could see Alchemy’s potential

I was much more pleased with the results after all of the adjustments that I made above. Some of them I was directed towards with the help of a profiler, others were realizations of inefficient techniques as I was perusing my code.

These are the first benchmarks that I released:

previous_benchmark

Three of the four tests perform very well now. The only thing that troubled me was the nested type fields were consistently 25%-33% slower. This is the performance that I lived with for the next few months. I figured that I would revisit optimizations after some time away from these improvements. The benchmark/optimization process up to this point was probably about 3 weeks.

One last change

For about four months I just could not improve the performance of nested types beyond about 25% slower than the hand-written implementation. I tried many modifications, hoping to find something. As I started writing this article I briefly looked over my code once more while adding a few more tests, and I found it!

There was one location where I was making an unnecessary copy, rather than taking a reference. This code is located in the byte-order conversion details. The functor that I create an pass into the ForEachType meta-function is initialized with an input and accepting output message. I was making a copy of the const input message rather than storing it as a reference.

The difference in changes is shown below:

C++

// The offending declaration
  const from_message_type        input;
 
  // The input field is passed in as a reference ...
  explicit
    ByteOrderConversionFunctor(
      const from_message_type& from,
            to_message_type&   to)
    : input(from)
    , output(to)
  { }
 
  // Modify the data member to be a const reference:
  const from_message_type&       input;

Generally the performance is comparable if not better than the hand-written implementation.

Alchemy FTW!

Current Results

current_benchmark

This test reveals an area where Alchemy will need to focus at some point in the future. The extra processing that is performed just for the copy adds an enormous amount of overhead.

Under-performing

Slow Debug mode

As far as debug-mode. Processing large sets of data is extremely slow because of all of the layers of abstraction. Most of the layers simply disappear as variables are directly assigned inline and the code becomes compiled almost to a level of a dedicated hand-optimized solution for each new type.

I have plans to make it simple to generate an Alchemy library that contains message data types and their conversion functions. These objects could then be compiled into a release build for use when debugging other regions of your applications.

Summary

I realize that the benchmarks that I have created are not a comprehensive test of the library. That is why I still have a few doubts about making claims of discovering the process of Cold Fusion for software. As a suggestion, one of my colleagues suggested I simply claim to have discovered Luke-warm Fusion for now. I would like to add quite a few more tests, as well as include some examples that are adapted from real-world situations. I will continue to update the status on GitHub as well as any notable improvements here at Code of the Damned.

I am working on a dedicated landing page for Alchemy to consolidate all of the information that I have catalogued up to this point. I will also place the latest performance results there when I activate that page.

Alchemy: Array / Vector Serialization

C++, maintainability, Alchemy, design Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

The alterations required up to this point have been relatively minor to integrate arrays and vectors into Alchemy. That does not mean that the solutions were clean and simple from the beginning. The exercise for integrating the serialization support of these containers was quite challenging. These containers became especially challenging because of the possibilities they created for flexibility with data management

Rabbit holes

Because of the foundation work that I laid out for the array and vector, I had the simple serialization scenarios working fairly quickly. It was only once I peered deeper into testing that I realized what challenges were ahead of me. Only after I was able to work through each problem, another one or possibly three appeared behind it.

I found myself on a journey down a rabbit hole.

The problem with a rabbit hole is that you start following it, and you have almost no context to look back and see how far you have actually travelled. Even worse, you have no feel for how much deeper it goes.

I followed the rabbit hole to solve two problems. One path I followed, and followed, and followed, and eventually abandoned. However, this did branch off early to the other path that became a necessity to make these sequential containers a viable and useful constructs.

Abandoned path

The path that I ended up abandoning was the situation where you create a sequence of BitList types. I initially started to follow this path because of the size of the BitList, v2 specifically. The BitList as an object took up much more space than its data does by itself. However, accessing the data alone does not provide the accessor functions that make the BitList valuable.

So I created specialized BitList sequence containers. These worked to some degree. However, they were extremely slow. They also added an enormous amount of complexity to a mostly clean solution. Moreover, it was only a partial solution. There were many more special cases that I would need to code around if I continued to follow this path.

In the end I exclaimed out loud with certainty to my computer screen "Using an array of bit-fields would be fools errand! Whoever builds a message like this should reconsider and make a better design." So I left the code for these containers in place, just in case I decided to pursue this path again in the future.

It turns out later that eventually I accidentally ended up on that fools quest. I was using an Alchemy message to read in a bitmap, and I wanted to perform pixel-by-pixel manipulations. I did find a better way, and you will read about it when I add my post regarding the demonstration application: "Stegonography". (BTW, No! I didn't end up creating the BlitList... but I'll have to consider that.)

Necessity, the mother of invention

What I discovered chasing the BitList through these containers, is that the std::vector also has the same problem. Its memory footprint is slightly larger than its data payload. Damnit! so do the nested fields.

It turns out that any type in the top-level message is handled naturally by virtue of the initial implementation. This is also true for any level of traversal into the message with a nested field type. However, the trouble appears if you attempt to place an array, a nested-type, a vector, or a BitList within an array or a vector.

These types would be very cumbersome to use, (and therefore probably not used) if I could not support the diverse range of nesting. I worked through the solution for these combinations:

  • Array of Arrays
  • Array of Nested-types
  • Array of Vectors
  • Vector of Arrays
  • Vector of Nested-types

I handled an array or vector of BitLists by converting the BitList to its internal integral type. The full data now fits in the allotted space, but the caller will need to cast or assign the indexed value to the BitList type if they want to access a bit-field value.

As far as attempting to solve the vector of vectors case, I started to believe that I was messing with forces beyond my understanding. For all I know attempting to wrangle in this case could disrupt the entire space-time continuum. I'll leave that to the folks at CERN.

In all seriousness, the scenario of vector of vectors starts to become a very convoluted and complex task (think vector of vectors of vectors. Since the container is a vector, it already requires a run-time based solution. For me to solve this in Alchemy would provide very little value compared to the risk it will introduce in the correctness of the code, as well as its maintainability.

If a user encounters a message where they need to support a vector of vectors, I believe the best approach would be to allocate a single vector of bytes large enough to contain all of the data, and develop a custom processor to meet that specific situation. The door is still open, and we can revisit and attempt to solve this case at any time. However, I believe there are far more valuable additions that can be made to Alchemy than this one.

How to over-stuff a data buffer

The short answer: "With meta-data".

If you have not recognize this by now, or at least stopped and said out loud, template meta-programming is all about the type information. Type information is meta-data. It is not directly manipulated at run-time, it is the context the compiler builds the processing logic upon. All assumptions and deductions make by the compiler, code-generator and linker are based upon the types and constant information supporting the logic of your code.

The reason I have trouble trying to store a vector inside of an array, is because there is extra run-time bookkeeping that is necessary to operate the vector run-time construct. However, since Alchemy is its own code-generator of sorts, we can manipulate the meta-data to become part of the logic that is generated by the compiler.

Therefore the data can remain its raw fixed size, and the type of message to be processed provides the context we need to correctly apply the meta-data. Essentially we are using the meta-data and compiler logic to help us encode more data in a buffer than would normally fit.

Transforming the meta-data

Once I realized that it was the run-time types that were populating the sequence containers, I started to search for a solution. I was searching for a method that would allow me to prevent run-time types from being the target type processed when the compile-time reflection algorithms were at work, such as the ForEachType processor.

At first I started replacing the types at the point-of-execution in the specialized functions. This proved to be the correct approach to solve the problem, however, the cost was too great. Implementing specialized templates for each potential case and performing the work inline started to make the source code in this region a cluttered mess.

I still believed this was the correct approach, I just needed to find a cleaner way to implement the solution. I decided to create a second TypeList that was appropriate for use with the Hg meta-processing, such as the serialization of data. This second list would substitute the most appropriate type within the array and vector to generate the proper logic.

Hg::make_hg_type_list

I created this reference table to help me understand the collection of relationships that I had to deal with.

Type<T>:

Sub-type<S>:

Substitution Required:

Fundamental n/a none
BitList n/a none
Nested n/a Hg::make_Hg_type_list<T>::type
     
Array Fundamental none
Array BitList BitList integral type
Array Nested Hg::make_Hg_type_list<S>::type
Array Array yes    / Sub-type also processed 
Array Vector vector / Sub-type also processed
     
Vector Fundamental none
Vector BitList BitList integral type
Vector Nested Hg::make_Hg_type_list<S>::type
Vector Array yes    / Sub-type also processed 
Vector Vector Warning: Potential risk of disrupting the space-time continuum  

I will provide a brief overview and a few samples for how I implemented this substitution table. If you want to inspect the full implementation you can find it here on GitHub: Alchemy/Hg/make_Hg_type_list.h[^]

This is the meta-function that processes over a TypeList, to define a new typedef called type, which is the appropriate substitution for the table listed above.

C++

//  **************************************************
//  @Note:  Type T must be a typelist.
//
template< class T >
struct make_Hg_type_list
{
protected:
  // The next item on the list to process.
  typedef typename                
    Hg::front<T>::type             next_type;
 
  // The remainder of the unprocessed types.
  typedef typename
    Hg::pop_front<T>::type         list_tail;
 
public:                                        
  // Make the Hg compatible list.
  typedef typename                      
    detail::make_Hg_worker< next_type,
                            list_tail,
                            Hg::TypeList<Hg::MT>
                          >::type       type;
};

detail::make_Hg_worker has a base implementation and a terminating specialization. The terminator simply typedefs the last parameterized type. Thus the recursion stops. The default template defines a set of types designed to build a new list that mirrors the original; with the exception that some of the types will have adjusted replacements as needed:

Let's break this one into a few chunks. The typedefs defined in this sample will be used to create the recursive expansion, and the final result:

C++

template< typename T,
          typename TailT,
          typename HgT
        >
struct make_Hg_worker
{
  // The result of inspecting a type and
  // potentially adjusting the type.
  typedef typename
    AdjustType<T>::type           adjusted_type;
 
  // The next item on the list to process.
  typedef typename
    Hg::front<TailT>::type        next_type;
 
  // The remainder of the unprocessed types.
  typedef typename
    Hg::pop_front<TailT>::type    list_tail;
 
  // ...

This is the remainder of the worker template. The first typedef is the recursive call to complete the processing of the other fields. The typedef for Hg::push_front defines the final replacement TypeList.

C++

// ...
 
  // A recursively built list of processed
  // types that have been adjusted to be
  // compatible for Hg message types.
  typedef typename                      
    make_Hg_worker< next_type,          
                    list_tail,          
                    HgT                
                  >::type               hg_list;
 
  // The list of types converted for Hg.
  typedef typename  
    Hg::push_front< hg_list,
                    adjusted_type      
                  >::type               type;
};

There are three more templates worth describing, otherwise the remainder of definitions are a set of template specializations that are used to fulfill the requirements mapped out in the table above. The first template is a simple adapter that I included for clarity. It's contents could have been placed completely inline inside of the make_Hg_worker template. However, I believe it makes the final solution much clearer. At the same time, it moves all type replacement logic to a single point outside of the main worker function.

C++

template< class T >
struct AdjustType
{
  typedef typename
    ReplaceType < T,
                  typename DeduceTypeTrait<T>::type
                >::type                 type;
};

The next sample is one specialization of a ReplaceType template. A specialization of this template exists for each Alchemy supported data type. This sample demonstrates the vector replacement.

C++

template< class VectorT >
struct ReplaceType< VectorT, vector_trait>
{
  // Extract the value type declaration from inside the vector declaration.
  typedef typename
    VectorT::value_type                 value_type;
 
  // Deduce the type trait of value_type
  // for declaration of the type sequence.
  typedef typename
    DeduceTypeTrait<value_type>::type   type_trait;
 
  typedef typename
    DeclareTypeSequence < VectorT,
                          type_trait
                        >::type         type;
};

The final typedef in the previous sample uses the final template meta-function DeclareTypeSequence to define the replacement type. DeclareTypeSequence is actually a simple declaration, which redefines an array or vector to use a type that will manage the expanded type, rather than the type specified in the original TypeList. This is where the meta-data becomes useful, and is applied to construct the proper run-time structures for managing the serialized data.

C++

template< typename T,
          typename A,
          template <class, class> typename VectorT
        >
struct DeclareTypeSequence < VectorT<T,A>, nested_trait  >
{
  // Define a typedef that represents the actual data-type
  // that reprsents the type-list T passed into this array.
  typedef typename
    FieldTypes <T>::value_type          value_type;
 
  typedef
    std::vector< value_type, A>         type;
};

Hg::make_Hg_type_list is instantiated with the same TypeList that is used to define a new HG_BEGIN_FORMAT definition. The Hg::make_Hg_type_list version of the replaced TypeList is then used in the field_data_t specialization that is created for every Alchemy data-type, that defines the value_type to be used for storing message data.

The substitute TypeList is a solution that evolved after a number of iterative attempts to elegantly solve serialization for the new sequential containers. What I demonstrate in the next section is the current serialization implementation used in Alchemy that is the result of an evolved solution.

pack_message

pack_message is the implementation that is called by the Hg::message object to take the run-time data stored within the object, and pack it into a raw-storage buffer in the format specified by the message's TypeList definition.

With the addition of a variable-sized field, two specializations of this function exist. The compiler tag-dispatches to the correct function based on the message's SizeTrait characteristic.

Fixed-size operation

C++

template< typename MsgT,
          typename BufferT,
          typename SizeTraitT
        >
bool
  pack_message( MsgT& msg_values,
                BufferT & fixed_buffer)
{
  return   detail::pack_fixed_size_message( msg_values,
                                            fixed_buffer,
                                            SizeTraitT());
}

Variable-size operation

C++

template< typename MsgT,
          typename BufferT,
          typename SizeTraitT
        >
size_t pack_message(MsgT &msg_values,
                    BufferT  &buffer,
                    size_t    offset)
{
  return detail::pack_message < MsgT,
                                BufferT
                              >(msg_values,
                                buffer,
                                offset,
                                SizeTraitT());
}

The unpack_message template functions are similarly defined.

detail::pack_message

Both versions of Hg::pack_message call a different version of the internal detailed version. The difference between these two functions is in how they initialize the message buffer that receives the serialized data. The variable-sized implementations prepare for process that may require dynamic offsets based on the placement and size of vectors defined in the middle of the message. Once both versions initialize the PackMessageWorker, the recursive process of serializing the message data begins.

C++

template< typename MsgT,
          typename BufferT
        >
BufferT <
  pack_message( MsgT      &msg_values,
                size_t     size,
                BufferT  & buffer,
                const dynamic_size_trait&)
{
  // Memory is only allocated for
  // the variable-sized message.
  buffer.resize(size);
 
  detail::PackMessageWorker < 0,
                              Hg::length<typename MsgT::format_type>::value,
                              MsgT,
                              BufferT
                            > pack;
  // This additional parameter tracks the dynamic
  // adjustments of variable-sized messages.
  size_t dynamic_offset = 0;
  pack(msg_values,  buffer, dynamic_offset);
  return buffer;
}

Both versions call the function operator of the same PackMessageWorker object. The only difference, is the variable-sized implementation provides an extra parameter called, dynamic_offset, which propagates the total number of bytes that have accumulated from variable-length fields. This indicates to the detail-leveled functions to make appropriate adjustments with respect to the sizes pre-calculated at compile-time.

C++

template< size_t    Idx,
          size_t    Count,
          typename  MsgT,
          typename  BufferT
        >
struct PackMessageWorker
{
  void operator()(MsgT &message,
                  BufferT  &buffer)
  {
    size_t dynamic_offset = 0;
    WriteDatum< Idx, MsgT, BufferT>(
      message,
      buffer,
      dynamic_offset);
 
    PackMessageWorker < Idx+1, Count, MsgT, BufferT> pack;
    pack(message, buffer);
  }
 
  void operator()(MsgT &message,
                  BufferT  &buffer,
                  size_t   &dynamic_offset)
  { /* ... */ }
};

Where is ForEachType<T>

The ForEachType construct is not used at this level because of the depth and diversity of the combination of message structures that are possible. This construct is convenient when the same operation must be performed on all fields, and the meta-function supplied to ForEachType does not contain mutable inputs, such as dynamic_offset.

Therefore, the familiar recursive processing of each indexed field is performed through the end of the TypeList of parameters. The recursive loop first writes the current Datum, then makes the recursive call to the next field. There is a PackDatum and UnpackDatum specialization created for each type.

PackDatum / UnpackDatum

The PackDatum specialization for the array and vector types contain the more complicated implementations. This is because once they begin serializing their data, they must inspect each sub-field and determine if it can be written directly, or must be placed in a new round of pack_message calls. Then based upon the size_trait of the sub-type for the sequence containers, the most efficient implementation will be selected.

Here is the PackDatum functor for a fundamental-type:

C++

template< size_t   IdxT,      
          typename MsgT,
          typename BufferT,
          typename TraitT
        >
struct PackDatum
{
  void operator()(MsgT     &msg,
                  BufferT  &buffer,
                  size_t    dynamic_offset)
  {
    // Removed typedefs for brevity
    value_type &value =
      msg.template FieldAt<IdxT>().get();
    size_t offset = Hg::OffsetOf
                       < IdxT,
                         typename MsgT::format_type
                       >::value
                     + dynamic_offset;
    buffer.set_data(value, offset);
  }
};

PackDatum for Array and Vector

I am not going to go into much deeper detail regarding the array and vector PackDatum implementations, because they could fill an entire blog entry on their own. However, I will elucidate the basic flow for these constructs.

The primary difference in their functor is another layer of indirection where a function called SerializeArray or SerializeVector is called dependent upon its type. The number of bytes written is captured and added to the dynamic_offset count.

The SerializeSequence function invokes a specialized functor based upon the sub-type contained within the sequence container. There is an implementation that is specialized for each Datum type supported in Alchemy. This is the single point of code in the entire library seems to be too tightly coupled. As the array and vector require knowledge and processing specific to each type supported by the library.

These implementations do choose between item-by-item transfers or bulk-transfers based on the type. I have this section of code marked to return to and attempt to further de-couple the type of processing required to properly serialize these types.

For now I find a little bit of solace knowing that this coupled behavior is found at the end of a processing chain as opposed to the central processing hub for how messages are structured. Therefore, I feel comfortable letting the library continue to grow for a short while before I decide the best course of corrective action.

If you would like to view the pack / unpack implementations for the array or vector, you can find them here on GitHub.

Summary

Reworking the serialization system to support a sequence of types has been one of the most challenging tasks that I have faced in Alchemy up to this point. In part because I was forced to rethink conventions that I had already put in place. Another reason is the fundamental change in how buffers are accounted and managed in order to support messages of variable size.

I am content with its current state. Although, as I stated, I do have some portions of the sequence serializer code that I want to revisit. I would try to further decouple the solution as well as work in further optimizations.

Speaking of optimizations, I am excited for the next post that I will be writing about Alchemy, the performance benchmarks. I compare a small set of scenarios defined in Alchemy with a naïve, hand-written serializer that uses 1-byte aligned packed structures, and the byte-order conversion functions that are included with network libraries. As a teaser, my first comparison run was abysmal. I even contemplated discontinuing the project until I did a little bit of analysis to find the root cause of my bottlenecks.

Needless to say, I would not still be developing Alchemy if I had not been able to overcome the limitations of my first implementation. I will describe what I found, what I changed, and what I learned helps improve the speed of execution in development.

Alchemy: Vectors

adaptability, C++, Alchemy, design Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

It's time to break some barriers that have existed within Alchemy since its inception, message with fixed sizes. While the storage policy concept that I use with the message buffer allows Alchemy to dynamically allocate memory for messages, the current structure of the library only allows messages whose size is known at compile-time.

There is already so much value in what Alchemy is capable of accomplishing, even with the static size limitation. However, the only way for Alchemy to expand and reach its potential, is to remove this limitation and provide support for dynamically sized messages. This entry will demonstrate the changes that were required to achieve this goal.

Why size matters

Size is a fundamental property in the realm of data serialization. This is especially true for wire-transfer protocols. All of the meta-functions originally developed for Alchemy depend upon the size of the field type in one way or another. Therefore, opening up the possibility for field lengths to differ from message to message means that basic assumptions about the amount of available space are no longer valid.

The size of fundamental types is known at compile-time due to the sizeof operator. The offset of a field is known because all of the sizes of the fields located before the current field in the TypeList are added together. The total message size is calculated by adding all of the messages field sizes together.

To be completely accurate, it is not the actual size of the message that we're concerned with, it is its ability to vary in size that concerns us.

These assumptions allow us to write the code in such a way that basic checks are placed up-front, and error handling can be largely ignored through the remainder of the processing. This has the effect of simplifying the code in the interior. Less clutter could mean more readable and maintainable code. Furthermore, less code would presumably mean faster execution times.

Finally, it is entirely possible for the compiler to take advantage of the fixed length and position of each field to perform optimizations. The only way one can be certain is to investigate what the compiler actually generates, which I have not done yet; at least with regards to optimizations. If a field inside the message could have a variable size, the compiler can no longer use fixed-length assumptions to optimize code. This also holds true for fields that are considered optional in a message; their presence is not required.

While working through this endeavor, I will remain completely cognizant of what needs to be altered, and what can remain unchanged. A task such as this can easily become a shotgun surgery operation, and that only invites chaos. Chaos is not welcomed in computing unless you're performing simulations, even then only pseudo-chaos is welcomed.

std::vector

I would like to make a few brief comments regarding the std::vector. This is the most flexible container in the standard library. If you are new to C++, or have had an aversion to angle-brackets through your C++ career, you owe it to yourself to learn this container.

The std::vector has a very small memory foot-print. It is extremely efficient in practice due to its guarantee to be stored in contiguous memory, which improves cache coherence by spatial locality. It should already feel natural to anyone that is already familiar with basic arrays; in fact it can be used any place where an array can be used. Most of all, it can improve the reliability of your programs because it carries the added bonus of managing it's own dynamic memory allocation and can perform checked access operations.

Before I started my investigation for how to introduce a variable-sized field in Alchemy, I tried to consider how it would be used. I had only one or two simple use cases in mind because of questions colleagues had asked me about its capabilities. With very little known as what to expect, I quickly thought of the std::vector and its versatility.

Although there were a few unknowns that still remain, I imagined that an interface modelled after the std::vector would provide all of the flexibility that one would need for interacting with a variable-sized field. Modelling the std::vector would also provide the additional bonus that most developers would already be familiar with the interface, which makes this decision seem all the more prudent.

Other considerations

When working with a native array or a std::vector, the size of the structure is generally obtainable by some means associated with the data type itself. However, when a raw data buffer is received, there are very few assumptions that can be made about the contents of the data. In most cases, the data needs to be de-serialized and verified as it is read into useful data types.

How is Alchemy supposed to determine how many bytes to assign to a std::vector field before populating the fields that follow afterwards in the message?

I looked at many different message structure formats that I was familiar with. I recognized that there are generally two different methods the size of variable-length payloads are specified.

Explicit message field

Supplying a field that indicate the size of a payload or specific field in a message structure seems to be a method that appears frequently. This type of solution should be simple enough to accommodate.

To implement this, I imagined that I would supply a third field in the vector type declaration MACRO similar to the HG_ARRAY MACRO. This third field would allow the user to specify the name of a field that has been defined earlier in the structure. Then rather using a fixed number like the array, a value could be read from a field at run-time. The value specified in the field would need to indicate the number of elements in the vector field, and not its raw size in bytes.

One possible complication that I can foresee, is what if the specified count is actually in a parent structure that contains a nested field with the vector type?

Run-time parsing

I have used many objects and libraries in my career where the developer implemented the bare minimum necessary to be able to make their library useful. However, they stopped short of providing a complete solution. To simply stop at the pre-defined field and state, "This is how the API must be used", would be succumbing to that same mentality. There are quite a few applications Alchemy could not support if it did not provide a way to determine the size of a dynamic buffer at run-time.

Well I don't want to create a half-assed API. There is no simple way to automatically handle this detail for the user; there are simply too many possibilities to attempt to provide a solution in this manner. While the original goal of Alchemy is to simplify and automate data serialization, that does not mean that every corner case must be handled automatically.

Therefore, I will not try to automatically solve all of the potential ways a user's buffer size may be specified. However, I will create a complete solution by providing a hook that the user can use to register a callback function if they want to provide this type of solution themselves. This allows Alchemy to still provide a complete solution. Furthermore, if the field size is actually specified in bytes, the user can supply a callback and convert the buffer size to an element count.

Dynamic-length declaration in Alchemy

In many ways the declaration for a vector in Alchemy is much simpler than the array, even though so much more is happening. The reason is because there is only one possible way to define a vector; by using the HG_DYNAMIC MACRO. This MACRO has the same usage as the HG_ARRAY, there are three fields: 1) type, 2) count and 3) name.

The only complication is the count field must either map to a previous field in the same structure that is an integer type, or it must be a callback function.

Definition MACROS

The vector field definition MACROs are used similarly to the HG_ARRAY.

C++

HG_BEGIN_FORMAT(Msg_t)
  ...
  HG_DATUM  (size_t,     len)
  HG_DYNAMIC(char,  len, path)
  ...
HG_END_FORMAT(Msg_t)

I have also provided a version of the MACRO that allows the user to specify an allocator to use with the dynamic field. The sample below depicts the user-define vector size callback being used to report the size.

C++

//  ***************************************************
//  The function signature of the callback used to
//  determine the size of a dynamic-length field.
//
//  @param p   The entire input message buffer.
//  @param n   The length of the input message.
//
//  @return    The calculated length of the dynamic field.
//
typedef
  size_t (*pfnGetDatumSize)(const uint8_t* p, size_t n);

This sample demonstrates extracting a collection of NULL terminated strings from a buffer, where the last string in the buffer is double-NULL terminated.

C++

//  ***************************************************
size_t StringCount(const uint8_t* pBuffer, size_t length)
{
  size_t count = 0;
 
  char const* pCur =  reinterpret_cast<char const*>(pBuffer);
  char const* pLast = pCur;
  std::advance(pLast, length);
 
  size_t  len = 1;
  while (pCur < pLast)
  {
    len = std::char_traits<char>::length(pCur);
    if (0 == len)
      break;
 
    ++count;
    // Advance past this current item, including the NULL.
    std::advance(pCur, len+1);
  }
 
  return count;
}

C++

typedef std::vector<char>         char_str;
 
HG_BEGIN_FORMAT(Msg_t)
  ...
  HG_DYNAMIC(char_str,  StringCount, items)
  ...
HG_END_FORMAT(Msg_t)

MACRO implementation

How does one MACRO handle both definitions?

A pair of private function templates are defined within the message structure that is created by the HG_BEGIN_FORMAT definition. These functions are defined as template functions for two reasons:

  1. The correct function type will be instantiated based on the type of field that is declared.
  2. An instance of these functions will only be instantiated if a dynamic field is declared in the message.

Here are the definitions of the functions that are added to the end of the message declaration:

C++

private:
  template< typename T,
            typename U
          >
  size_t DatumSize(T value, U*)
  {
    return value;
  }
 
  template< typename U >
  size_t DatumSize(pfnGetDatumSize ftor, U* buffer)
  {
    if (buffer->empty())
    {
      return 0;
    }
 
    return ftor(buffer->data(), buffer->size());
  }

Delicate integration

What could be so delicate?

In my experience, when facing a feature addition such as this, where an entirely new fundamental concept is added to existing software, it can quickly turn into shotgun surgery. I didn't want to jump in altering every file in order to weave the ability to manage dynamic allocation. As I mentioned earlier, I thought of a way that I could break my big problem into two smaller problems. One of which has already been solved, static-sized fields.

The question that remains, is how can I best differentiate between the different portions of my message. As much a possible, I still want to allow the compiler to resolve actions when the code is generated. Once again, a solution based on tag-dispatching seemed to be the most appropriate solution.

I could create a meta-function to determine if a field was a statically-sized field, or dynamically-sized. Furthermore, I could test each field in a message to determine if the entire message could have a variable size, or if it was always a fixed size. This solution would allow me to still take advantage of every possible optimization the compiler could muster for fixed-size messages, and still be able to accommodate dynamic messages.

Differentiate between static and dynamic sizes

If you recall my description of the tag-dispatch system I setup for the array type, I specified a static_size_trait for the array type. I have also declared a similar type-trait for dynamic fields like the vector called, dynamic_size_trait. I am not going to repeat all of the tag-dispatch definitions that I created for the vector, because they are very similar.

Instead, here is the collection of meta-functions that are used to determine the size type of a message. Here is the top-level meta function that reports the size trait for the message:

C++

//  ***************************************************
/// Indicates the type size the specified
/// message is, static or dynamic.
///
/// @paramt T  A TypeList definition.
///
/// @return   A typedef called *type* is defined
///              to return the size trait.
///               - static_size_trait: a fixed-size message.
///               - dynamic_size_trait: Dynamically-sized
///                 message. At least some portion
///                 requires runtime processing to  
///                 determine the full size of the message.
///
template< typename T >
struct message_size_trait
  : std::conditional
    < has_dynamic<T>::value,
      dynamic_size_trait,
      static_size_trait
    >
{ };

The previous meta-function is used in many locations where it is important for the selection of an algorithm or other meta-processing functions. Inside of the std::conditional statement, you can see a meta-function called has_dynamic. This function is simple enough. It actually calls another meta-function that I hide in the Hg::detail namespace, and this function performs a recursive survey of the types in the TypeList for dynamically-sized fields.

C++

//  ***************************************************
template< typename T >
struct has_dynamic
  : detail::HasDynamic_Impl<T,
                             type_container<T>::value
                           >
{ };

HasDynamic Implementation

We are working with a diverse set of types. Some of the types a basic fundamental types, while others can be containers of types. Let's start with a simple default template that will handle all non-container types:

C++

template< typename T,
          bool isContainer = false
        >
struct HasDynamic_Impl
  : std::false_type
{ };

Let's handle the next simplest case, true for the vector. As the vector is currently the only type that is sized dynamically.

C++

// Explicit HasDynamic implementation for vectors
template< class T,
          typename A,
          bool  isContainerT
        >
struct HasDynamic_Impl< std::vector<T,A>,
                        isContainerT
                      >
  : std::true_type
{ };

The last portion of this implementation requires a test for all other type containers. The only two type containers that remain are the array and the TypeList. The function below tests the type at the front of the list to determine if it is a vector. If so, true is returned, and the function is complete. Otherwise, the first element in the list is removed, and the remainder of the list is tested for dynamically sized fields.

C++

// HasDynamic implementation for type_containers
template< typename T>
struct HasDynamic_Impl<T, true>
  : value_if< vector_value< typename front<T>::type>::value,
              bool,
              true,
              has_dynamic<typename pop_front<T>::type>::value
            >
{ };

Calculate the size of a message

Now we have the ability to discriminate between fixed and variably sized messages. Previously, when a buffer was required for a fixed-size message, the size() member function could be called to return the constant value that was pre-calculated from the TypeList for which the message was instantiated. A new mechanism is required to incorporate dynamic size calculations, without disrupting the existing interface.

We can use the has_dynamic meta-function to select the type of size calculation function to use when calculating the message size. The first step for integration is to replace the size() member function to call a new function based on the result of has_dynamic for the current message type.

C++

size_t Message::size() const
{
  return Msg_size<this_type,
                  k_has_dynamic>::calculate(*this);
}

Message::size() calls one of two external meta-functions that are implemented to calculate the size of a fixed-length message and a variable-length message. Here is the implementation for the fixed-length calculation. It is basically the implementation that was previously used in Message::size(), when all messages were a fixed-length.

C++

template< class HgMessageT >
struct Msg_size<HgMessageT, false>
{
  typedef HgMessageT message_t;
 
  static size_t calculate(const message_t &msg)
  {
    return Hg::SizeOf<typename HgMessageT::format_type>::value;
  }
};

The variable-length function looks like it is quite a bit more complicated. However, it starts with the same implementation that is used to calculate a fixed-length message. A few typedefs are created for clarity. Finally, a function designed to accumulate the sum of the length for all variable-length fields is called. This function is called dynamic_size_of. The sum of the fixed-length and variable-length calculations is returned as the total size of this message.

C++

template< class HgMessageT,
          bool  has_dynamic
        >
struct Msg_size
{
  typedef HgMessageT message_t;
 
  static size_t calculate(const message_t &msg)
  {
    typedef typename
      message_t::message_type     message_type;
    typedef typename
      message_t::byte_order_type  byte_order_type;
    typedef typename
      message_t::storage_type     storage_type;
 
    size_t fixed_size   =
      Hg::SizeOf<typename HgMessageT::format_type>::value;
    size_t dynamic_size = dynamic_size_of<message_type,
                                          byte_order_type,
                                          storage_type
                                        >(msg);
    return fixed_size + dynamic_size;
  }
};

Hg::dynamic_size_of

dynamic_size_of is a function template, and not a meta-function. This function must be executed at run-time because it is used to inspect incoming data buffers to determine how much space to allocate for the vector fields.

Two implementations of this function exist. Both implementations end up calling the same internal worker function with the same set of parameters. Two versions of this function are required to be able to properly handle nested messages. A top-level message is instantiated inside of the Hg::Message class. Hg::Message handles the memory management and provides the natural syntax interface for interacting with the child fields of a message.

When a message contains nested-messages, there is no need to wrap the nested version in its own Hg::Message object. In fact, not only would it further complicate most of the logic to do so, it would also be less efficient because the internal memory buffer allocated by Hg::Message would be a collection of smaller message buffers, rather than one continuous buffer shared by all message fields.

This concept is one of the trickier problems that I encounter in Alchemy. The problem did not appear until I started to really dig deep and create pathological unit-tests for the library. Furthermore, the solution was not immediately apparent. Because while I knew that a problem existed, I was not getting the calculated size results, I had a difficult time tracking down the cause of the problem.

The two definitions for Hg::dynamic_size_of are listed below:

C++

//  ***************************************************
template< typename T >
size_t dynamic_size_of(const T& msg)
{
  return detail::DynamicSizeWorker< T,
                                    has_dynamic<T
                                  >::value>().size(msg);
}
 
//  ***************************************************
template< typename MessageT,
          typename ByteOrderT,
          typename StorageT
        >
size_t dynamic_size_of(const Message< MessageT,
                                      ByteOrderT,
                                      StorageT
                                    >& msg)
{
  return detail::DynamicSizeWorker
           < MessageT,
             has_dynamic<
               typename MessageT::format_type>::value
           >().size(msg);
}

DynamicSizeWorker

Finally we have reached the workhorse for calculating the length for all of the dynamically sized fields in a message. This function works very similarly to the byte-order conversion function that I demonstrated in a previous post. A functor that is designed to calculate the variable-length field size of a field is initialized. Then the meta-function ForEachType is used to invoke this functor on each field in the message.

This function will only be called if a message is determined to have a dynamic-length field. This is another reason why there are two versions of Hg::dynamic_size_of. A top-level Hg::Message may have been determined to contain a variable-length field, and it may also contain nested fields. However, if there are no variable-length fields in any of the nested fields, it is not necessary to inspect them for the dynamic size calculation.

This layering approach allows the best function to be selected at every level of the calculation, based on it's type-traits. The extra calculations are only paid for where the feature is used. Because the compiler is setting up the dispatch of these calls, there is not even a compare and branch instruction to pay for due to a run-time if statement.

This is the worker function that invokes the ForEachType meta-function.

C++

template< typename MessageT,
          bool     IsDynamicT
        >
struct DynamicSizeWorker
{
  static
    size_t size(const MessageT& msg)
  {
    typedef MessageT                      message_type;
    typedef typename  
      message_type::format_type           format_type;
 
    // Initialize a functor to query for the
    // dynamic size of each field.
    detail::DynamicSizeFunctor< message_type > ftor(msg);
    Hg::ForEachType < 0,
                      Hg::length<format_type>::value - 1,
                      format_type
                    > (ftor);
 
    return ftor.size();  
  }
};

I am not going to demonstrate the contents of the functor because it is very similar in structure and content to the ByteOrderCoverter functor.

One important difference to note, is that it is possible for a vector to contain a collection of nested message fields. This required me to introduce another layer of indirection to first detect this scenario, then properly process each nested message, only if they contained variable-length parameters themselves.

Here is a link to the file that holds this logic if you would like to take a closer look for yourself: Alchemy/Hg/detail/message_dynamic_detail.h[^]

Results

I am very pleased with the final results that I was able to achieve with the vector. The path to a robust solution was not immediately clear to me at first. With a little bit of thought I was able to discover a suitable approach. As always with solving programming problems, the problem must be broken down into a smaller set of problems.

The solution felt very attainable once I recognized that I would only need to create a break in my current implementation at each location in the message where a dynamically sized field occurred. Compare this to my original fears that I would need to convert the entire system to a run-time calculated solution.

Summary

Integrating memory support for a dynamically-sized field turned out to be simpler than I had feared. I was able to create an elegant solution that did not require any modifications to the other data types. This demonstrated to me that this portion of my library truly is orthogonal; a good sign for a maintainable code-base into the future.

There is one more step that I had to complete before I had full support for both arrays and vectors. This was the addition of pack and unpack serialization calls. The problems were not necessarily difficult to solve. The challenges started when I recognized the depth to which elements could be nested inside of both messages and the new sequence containers that I had created. Both containers required the same solution to this problem. Therefore, I will tackle the description for both the array and vector serialization logic in the next Alchemy entry.

Alchemy: Arrays

adaptability, C++, Alchemy, design Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

Once Alchemy was functional and supported a fundamental set of types, I had other development teams in my department approach me about using Alchemy on their product. Unfortunately, there was one type I had not given any consideration to up to this point, arrays. This group needed the ability to have variable sized messages, where the array payload started at the last byte of the fixed-format message. At that point, I had no clean solution to help deal with that problem.

Expanding the type support

This request brought two new issues to my attention that I had never considered before. Both of these features needed to be added to Alchemy for it to be a generally useful library.

  1. Support for arrays or sequences of the same type
  2. Support for variable-sized structures. Every message type in Alchemy is currently fixed in size.

I chose to handle these issues as two separate types. Primarily because I believed that allowing fixed-size fields to remain as a fixed-size definition would have a better chance of being optimized by the compiler. I wanted to use a completely separate data type for message definitions that required a dynamically-sized buffer.

I like the C++ Standard Library very much. I also use it as a model when I am designing interfaces or new constructs. Therefore, I decided to mimic the array and vector types from the standard library. I will cover the challenges that I encountered adding support for the array in this entry. The next post will focus on the vector and how I chose to solve the new problem of dynamically sized message objects.

Defining an Alchemy Array

Even before I could figure out how the code would need to be modified, and what specializations I needed to accommodate an array type, I needed a way to declare an array in both the TypeList and the Hg message definition. The simple and natural candidate was to jump to the basic array syntax:

C++

char data[10];

This is when I discovered the concept of Type Decay[^], and that I lost the extra array information for my declaration. This problem was simple enough to solve, however, because of the obscure work-around involved, I didn't think that this would a good enough solution. This solution would be fraught with potential for mistakes.

C++

char (&data)[10];

This is when I decided that while I would allow traditional array declaration syntax, the syntax I would promote would be based on the std::array class.

C++

std::array<char, 10>

Again, this caused one new wrinkle. The std::array definition is not legal in the HG_DATUM MACRO that is used to declare a field entry within the Hg message format, due to the extra comma. The pre-processor is not smart enough to parse through angle-brackets, therefore, it sees three parameters for a MACRO that it only expects to see two parameters.

C++

//       1              2    3
HG_DATUM(std::array<char, 10>, name)
//               -------^    ^-------

[sigh] Once again, this has a simple solution; simply wrap the template argument of the MACRO within parenthesis.

And once again [sigh], I thought this would lead to simple errors that would not have an intuitive solution. We are working with the pre-processor after all. This is one area where I do believe it is best to avoid the pre-processor whenever possible. However, it would just be too painful to not take advantage of it's code generating abilities.

The best solution then was to add a new HG_DATUM MACRO type to support arrays.

C++

// Error-prone
// Also creates unintelligible compiler errors
// if the parenthesis are left off the template.
//
// However, both formats are still accepted.
HG_DATUM((std::array<char, 10>), name)
HG_DATUM(char (&)[10], name)
 
// The official array declaration MACRO
// that will be demonstrated throughout
// the documentation and demonstrations.
HG_ARRAY(char, 10, name)

With a simple MACRO to create the correct declaration of the template form for a std::array, I can re-use the error-prone form internally, hidden behind the preferred array MACRO.

C++

#define DECLARE_ARRAY(T,N) (std::array<T ,N>)
#define HG_ARRAY(T, N, P)  HG_DATUM(DECLARE_ARRAY(T,N), P)

Blazing new trails

The difficult first step was now behind me. The next step was to create a DataProxy specialization that would allow the array to behave like an array, yet still interact properly with the underlying Datum that manages a single parameter.

Tag-dispatch

There are many more ways an array can be classified as far as types are concerned for tag-dispatching. Therefore, I then created a set of type-trait definitions to identify and dispatch arrays. I was also dealing with the addition of vectors at the same time, so I will include the definitions that help distinguish arrays.

C++

//  **********************************************
// Indicates the field or message has a fixed
// static size at compile-time.
struct static_size_trait { };
 
//  **********************************************
// Sequence types are a class of types that
// contain more than one element of the same
// type in series.
struct sequence_trait
  : nested_trait      { };
 
//  **********************************************
/// A sequence type that has a fixed size.
struct array_trait
  : sequence_trait    
  , static_size_trait
{ };

These were only the type-definitions that I would use to distinguish the types to be processed. I still needed some discerning meta-functions to identify the traits of a Datum.

C++

//  **********************************************
//  Fixed-Length Homogenous Containers
//  **********************************************
//  Detect a std::array type.
//  This is the default false case.
template< typename T >
struct is_std_array
  : std::false_type
{ };
 
//  **********************************************
//  Affirmative std::array type.
template< typename T,
          size_t   N
        >
struct is_std_array<std::array<T,N> >
  : std::true_type
{ };

We're not done yet. We still need a value test to identify natively-defined array types. For this one, I create a few utility meta-functions that simplified the overall syntax, and made the expression easier to read. I believe I have mentioned them before, if not, this is Déjà vu. The utility templates are And, Or, and Not.

C++

//  **********************************************
//  This is the default false case.
//  Detect native array types.
template< typename T>
struct array_value
  : And < Or < std::is_base_of<array_trait, T>,
               is_std_array<T>
             >,
          Not < std::is_base_of<vector_trait, T> >
        >
{ };

One final set of tests were required. I needed to be able to distinguish the sequence-types from the fundamental-types and the type-containers.

C++

//  **********************************************
//  Multi-variable types are containers of
//  homogenous entries
template< typename T >
struct sequence_value
  : std::integral_constant
      < bool,
        Or< typename std::is_base_of<sequence_trait, T>,
          Or< vector_value<T>,
              array_value<T>
            >
          >::value
      >
{ };
 
//  **********************************************
template< >
struct sequence_value<MT>
  : std::integral_constant<bool, false>
{ };

Are you looking back at the tag-dispatch identifiers thinking what I thought when I reached this point?

"Shit just got real!"

If this code were a NASA astronaut candidate, it would have just passed the first phase of High-G training[^]. Most likely with 'eyeballs out', and to top it all off, we avoided G-LOC without wearing G-suits.

Back to Earth

A new piece of information that was required was the extent of the array (number of elements). At this point in time, I have started to become quite proficient at determining how best to extract this information. Given a single type T, how can I extract a number out of the original definition?

This is why we went through the effort to avoid Type Decay. Because somewhere inside of that definition for type T, hides all of the information that we need.

C++

//  Forward Declarations *************************
//  No Default Implementation
template< class ArrayT >
struct array_size;
 
//  **********************************************
//  Extracts the array extent from a
//  std::array definition.
template< class T, size_t N>
struct array_size< std::array<T, N> >
  : std::integral_constant<size_t, N>
{ };

We have declared a default template without an implementation called array_size. If that were the end, any attempt to instantiate array_size would result in a compiler error that indicated no matches.

Therefore, we create a specialization that will match the default template for our array type. Furthermore, we define our specialization in such a way that will extract the extent (element count) from the type, and become a parameterized entry in the template definition. The specialization of the template occurs at this point in the declaration:

C++

...
struct array_size< std::array<T, N> >
...

Because the std::array<T, N> becomes a single type, it qualifies as a valid specialization for the default declaration without an implementation. The std::array of specialized definition explicitly indicates that we only want matches for the array. Therefore, we were able to extract both the original type and the extent of the array. In this case the extent is defined as a std::integral_constant and the type T is discarded.

This technique will appear many more times through-out the solution where we will use the type and discard the extent, or even use both values.

DataProxy<array_trait, IdxT, FormatT>

It's time to demonstrate some of the ways that the array's DataProxy differs from the other types that have already been integrated into Alchemy. First, the standard typedefs have been added to the array's proxy to match the other types: format_type, datum_type, field_type, index_type and data_type.

The extent is referenced many times throughout the implementation, therefore, I wanted a clean way to access this value without requiring a call to the previously defined array_size template:

C++

//  Constants ************************************
static
  const size_t k_extent = array_size<index_type>::value;

One more new type definition is required. Up until now, the type that we needed to process in the structured message, was the same type that was defined in the message. A slight exception is the nested-type, however, we were able to handle that with a recursive call.

These new sequence containers, both the array and vector, are containers themselves that contain a set of their own types. These are the types that we actually want to consider when we programmatically process the data with our generic algorithms. Here is the array proxies definition of the value_type:

C++

//  **********************************************
typedef typename
  std::conditional
  <
    std::is_base_of<array_trait,
                    index_type>::value,
    index_type,                  
    typename field_type::value_type                  
  >::type                               value_type;

The only a few remaining pieces that are new to the array's DataProxy implementation.

Some new ways to query for structure sizes. One method will query the extent of the array, the other queries the number of bytes required to store this construct in a buffer.

C++

//  **********************************************
// Returns the extent of the array.
size_t size() const                            
{
  return this->get().size();
}
 
//  **********************************************
// Returns the number of bytes that are
// required to hold this array in a buffer.
size_t size_of() const                          
{
  return sizeof(this->get());
}

The set operations get optimized implementations that depend on standard library algorithms. There is also an override implementation to accept natively defined arrays as well as the std::array object.

C++

//  **********************************************
void set(const value_type& value)
{
  std::copy( value.begin(),
             value.end(),
             begin());
}
 
//  **********************************************
void set(const data_type (&value)[k_extent])
{
  std::copy( &value[0],
            (&value[0]) + k_extent,
              begin());
}

Finally, here is a small sample set of how the proxy pass-through functions are implemented to forward calls to the internal array instantiation.

C++

//  **********************************************
reference at(size_t idx)                        
{
  return this->get().at(idx);
}
 
//  **********************************************
const_reference operator[](size_t idx) const    
{
  return this->get()[idx];
}
 
//  **********************************************
reference operator[](size_t idx)                
{
  return this->get()[idx];
}
 
//  **********************************************
const_reference front() const                  
{
  return this->get().front();
}

Now let's start the real work

Believe it or not, all of the previous code was just the infrastructure required to support a new type.

And, believe it or not, because of the orthogonal structure, generic interfaces, type generators and large amount of Mountain Dew that has been used to build this framework, there are only a few small specializations left to implement in order to have a fully-supported array data type in Alchemy.

It may be helpful to review this previous topic on how I have structured the Serialization[^] operations for Alchemy. However, it isn't necessary to understand how processing the individual data within the array functions.

Here is the declaration of the array byte-order conversion meta-function.

C++

//  **********************************************
//  The array's specialized byte-order meta-function
template< typename T,
          typename StorageT
        >
struct ConvertEndianess<T, StorageT, array_trait>
{
  template <typename ArrayValueT>
  void operator()(const ArrayValueT &input,
                        ArrayValueT &output)
  {
    // ...
  }
};

Here is the actual implementation for converting the byte-order elements in the array:

C++

template <typename ArrayValueT>
  void operator()(const ArrayValueT &input,
                        ArrayValueT &output)
  {
    // Convenience typedefs
    typedef typename
      ArrayValueT::value_type   value_type;
 
    typedef typename
      DeduceTypeTrait
        <value_type>::type      type_trait;
 
    // Create an endian converter for the
    // arrays defined value_type.
    ConvertEndianess< value_type,
                      StorageT,
                      type_trait
                    > swap_order;
    for (size_t index = 0; index < input.size(); ++index)
    {
      swap_order(input[index], output[index]);
    }
  }

I would like to draw your attention to the real work in the previous function. The majority of the code in the previous snippet is used for declarations and type definitions. The snippet below only contains the code that is performing work.

C++

ConvertEndianess< value_type,
                      StorageT,
                      type_trait
                    > swap_order;
 
    for (size_t index = 0; index < input.size(); ++index)
    {
      swap_order(input[index], output[index]);
    }

"That's all I have to say about that."

Summary

Generic programming is powerful. There is no reason C++ must be written as verbosely as C is often written. Alchemy is production-quality code. The error handling is in place; in most cases it occurs at compile-time. If it won't compile, the program is not logically correct and would fail at run-time as well.

I did not demonstrate the pack and unpack operations for the array. They are slightly more complicated because of the nature of this container. When I started to explore real-world scenarios, I realized that I may run into arrays-of-arrays and arrays-of-nested-types-that-contain-arrays.

The solution isn't much more complicated than the byte-order converter. However, it does require a few more specialized templates to account for this potential Cat-in-the-Hat absurd, yet likely, scenario. The next entry will describe how I expanded Alchemy to support dynamically sized messages for vectors. Then a follow-up article will demonstrate how the array and vector types are serialized in Alchemy, and are capable of handling a deep nesting of types.

Contact / Help. ©2024 by Paul Watt; Charon adapted from work by daroz. CMS / cheap web hosting.
Design & icons by N.Design Studio. Skin by Tender Feelings / Skin Faktory.