Embedded Alchemy

Alchemy Send feedback »

Alchemy is a collection of independent library components that specifically relate to efficient low-level constructs used with embedded and network programming.

The latest version of Embedded Alchemy[^] can be found on GitHub.
The most recent entries as well as Alchemy topics to be posted soon:
Alchemy: Documentation[^]

I just completed my Masters Degree in Cybersecurity at Johns Hopkins University. I plan to resume Alchemy's development. I plan to use my newly acquired knowledge to add constructs that will help improve the security of devices built for the Internet of (Insecure) Things.

When are we gonna learn?

communication, CodeProject, design, knowledge, engineering Send feedback »

As you gain expertise you begin to realize how little you actually know and understand. I have found this to be true of most skills. It’s easy to fall into the trap where you believe that you continue to grow your expertise each year, and thus have less and less to learn. Read on and I will demonstrate what I mean.

Books are statically-typed

(With the exception of e-books)

Books are fixed once they’re typeset and sent off to the publisher for printing. When you acquire one of these tomes, you hold a collection of thoughts, ideas, and discoveries taken as a snap-shot in time. On the other hand, the Internet is the present, continually moving towards the future. News stories are updated in real-time. Ideas, typos, and corrections are updated instantly.

We are producing more content than we can consume. Any idiot with access to the Internet (I am one of them) can publish their own thoughts on a blog. Some of us are less ignorant than others. Some blogs, news articles, tutorials or even YouTube videos are informative and worth consuming if you want to improve your craft.

Keeping Current

I like to try to keep current. I am amazed by all of the technologies or ideas that I learn about, that I thought were fairly new, but it turns out they are 15 years old now. It also amazes me when I have discussions with colleagues and some of the information and technologies I have known about for years seem brand-new to them.

A brief digression to pique your curiosity

Something that is similar, but actually a bit off-topic is learning that one of your favorite bands songs is actually a cover of another bands song. The previous bands work may have existed for two years or twenty years before the cover song was recorded by the other band. Yet, you were unaware that the previous song existed. Here are a few examples:

  • Alabama Song (Whiskey Bar) by Kurt Weill (1930);
        covered by Doors (1967)
  • Knockin’ On Heavens Door by Bob Dylan (1973);
        covered by Guns N’ Roses (1991)
  • Jolene by Dolly Parton (1974);
        covered by The White Stripes (2004)
  • School Day by Chuck Berry (1957);
        covered by AC/DC (1975)
  • Sweet Dreams by Eurythmics (1983);
        covered by Marilyn Manson (1995)
  • Tainted Love/ by Gloria Jones (1964);
        by Soft Cell (1981)

  • … And one of my favorites:
  • Hurt by Nine Inch Nails (1994);
        covered by Johnny Cash (2003)

Time to learn

There is great content on the Internet. Social networks like Twitter help bring some of these gems into the spot-light, albeit for a very short time. However, the resources do exist.

Here’s where I shock myself (metaphorically, I’m a Watt, I’m well grounded); after almost two decades of experience I have recognized many patterns of behavior, implementation techniques and development processes that seem to be more successful than others. I may still be trying to summarize and articulate my abstract conclusion, and I read a book, and discover the author is describing the exact conclusion that I had reached! Only I hadn’t yet been able to put my thought into words.

This is crazy, let’s check the printing date of these books:

  • Design Patterns: Elements of Reusable Object-Oriented Software by Gamma, Helm, Johnson, Vlissides (1995)
  • The C++ Programming Language (Special Edition) by Bjarne Stroustrup (2000)
  • The Mythical Man Month by Fred Brooks (1975)!

What’s so special about these books?

The Stroustrup’s C++ and the Gang of Four book’s both have some extremely insightful chapter’s. They summarize a number of conclusions that I have either reached on my own, or that I did not have enough context to properly absorb if I happened to read them the first time. The Mythical Man Month is a collection of essays on software development, in which the wisdom has proven to be timeless.

Let me give a brief overview of the valuable insights that are documented in these books. These are concepts that if they were taught to me in college, I simply do no remember. My best guess is that I didn’t have enough context to relate all of the practical advice that is given especially compared to the overwhelming majority of theory that is taught at universities.

Fast forward one decade… Yes I started to recognize some of these patterns, and I picked up new books to read; yet these three books stand out. I am surprised that the information in these books are not better internalized and practiced in our industry. I know that I trade would be much better off if everyone understood these concepts. This information is not only for software developers, but software project managers and anyone that is involved in making decisions during a development project.

Design Patterns (The Gang of Four)

The most insightful chapter in this book is, Chapter 1. How could I miss that?

Here’s how. If you are like me, you cracked open the book, saw the nicely drawn diagrams, well organized descriptions for each of the 23 design patterns described in the book, and simply skimmed through the patterns when searching for a clean solution.

It’s also very likely that you started reading Chapter 1, saw that the first five sections were primarily bullet-point lists of patterns, a few diagrams. Each of these first five sections are very brief. The section 1.6 starts, and it is dense with text. Very few diagrams, or bullet-point lists, just a lot of text.

So you skipped to the next chapter and started learning about A Case Study: Designing a Document Editor. Scratch that, after looking at the first few pages, you skipped chapter two as well and moved on to the next section entirely, The Design Pattern Catalog. Finally, in chapter 3 you start learning about the creational patterns.

If you did read all of chapter 1, and absorbed every iota of valuable wisdom, you can skip to the next book, but be sure to read the summary. For the rest of us, unless you had any amount of experience with object-oriented software development when you read this chapter, the majority of the value did not have a chance to sink in.

Here is what you may have missed, and why it is worth your time to revisit these few sections of Chapter 1 in the Gang of Four’s Design Patterns.

Section 1.6: How design patterns solve problems

The first few pages of this section describe how design patterns will help you solve problems. If you have done even a small amount of object-oriented development this material will seem obvious to you, and you have that urge to skim and skip. Don’t do it!

About five or so pages into this section, pg. 16 on my copy, there is a section titled Class versus Interface Inheritance. This is where the good stuff begins. The authors make a distinction between inheriting from a class to reuse existing functionality and inheriting from an interface to become compatible with an existing implementation. While both can facilitate polymorphism, inheriting from an interface is more likely to produce code that is truly polymorphic.

With regards to C++, public inheritance is the equivalent of inheriting from an interface. However, the lines are blurred, because if you’re actually inheriting from a class publically then you get both benefits, code reuse and polymorphic compatibility. The trouble begins when you begin to override the functions of the base class and alter the meaning of the interface. This breaks the concept described in Liskov’s Substitutability Principle (LSP).

Public Inheritance leads to a subtype

LSP helps guide you to a solution that allows you to easily replace a base-class instance, with any of its derived instances, and no special processing is required. This is the ultimate goal of polymorphism. When it is achieved, the payoff is enormous; it takes the form of elegant solutions that require very little code. If you adhere to LSP when you created a derived class, you are creating a subtype of the base class.

Robust and elegant polymorphism depends on creating proper subtypes. It is only possible to create a subtype of a base class when your derivation is public. The should be an immediate clue to know how to proceed. When you use protected or private derivation, you are only using deriving for the benefits of code reuse from the base class.

Taken to the extreme, this means it is a good practice to public derive from abstract base classes, classes in which every virtual function is declared pure virtual. This has the downside that you lose the benefit of immediate code reuse through derivation. However, your intention is all that more clear that you intend for this new class to be a subtype of the base class.

This section has much more to offer. It continues to discuss how to put the various reuse mechanisms to work; recommends choosing composition over inheritance, and even mentions inheritance versus parameterized types (generic programming). I recommend that you revisit this chapter, especially this section at least once a year.

As for the other two sections, 1.7 and 1.8, they briefly describe how to select then use a design pattern. These are brief and worth reviewing as well. Remember, a design pattern is simply that, a pattern. They are less effective if they are used as pre-written solutions expected to be dropped into place and used.

Software Design Patterns are not the solution to your problem. Rather, they provide guidance on how to create a robust solution. It’s important to remember that.

The C++ Programming Language (Special Edition)

That’s right! The special edition version is very important. This is the only version of the book that I am aware of that contains the three chapters that eloquently and succinctly summarize how to improve your software design, development and project management skills.

Part IV: Design Using C++

There are three chapters in this section. They start from a high-level, and each chapter moves its focus to topics that are more closely related to code, specifically C++:

  • 23. Development and Design
  • 24. Design and Programming
  • 25. Roles of Classes

Every time I return to these chapters, I need to pick up a new color of highlighter because I latch on to something that I missed in all of the previous times that I read these chapters. I am certain that eventually every word will be highlighted as I continue to return and read them.

Highlighted

It is also helpful to lend out your copy and allow others to highlight sections that they find salient.

Other's highlighted

Scott Meyers Effective C++ is one of my most valued books on my shelf. I would dare say that when these three chapters are considered alone, they would be the most valuable collection of pages regarding software development that I would refer others towards. Thank you Dr. Stroustrup.

There is too much information for me to even attempt to scratch the surface. Overall, there is only about 100 pages total. I will however indicate some of the important concepts that Dr. Stroustrup gave words to things I had been thinking for years, but did not know how to organize them so succinctly. Chapter 23 is by far the most abstract, yet valuable content if this section.

Chapter 23: Development and Design

This chapter is a collection of observations of software development for organizations and individuals. One of the most important concepts to help guide any designer and developer is, “to be clear about what you are trying to build.” So many projects fail because they never establish that goal. There is no cookbook, or as, Fred Brooks, states silver bullet to replace intelligence and experience for design and development. Iteration is important in software development, especially experimentation and prototyping.

This chapter is what inspired a previous blog entry that I wrote Software Maintenance is a Myth[^]. Software does not fall apart or need regular maintenance to stay in good working order. Every time you make a change to a functional piece of software, you are performing a mini-engineering and development cycle.

There are brief discussions on the use of models, experimentation, the reuse of code, testing, and efficiency. Best of all, there are a few comments regarding project management. My favorite statement of all; it is added as an annotation at the bottom:

An organization that treats its programmers as morons will soon have programmers that are willing and able to act like morons only.

section 23.5 Management
I smile every time I think of that comment; I shake my head; then I go back to acting like a moron.

There is one last priceless piece of advice and direction from this chapter for anyone that is interested in improving their organization. I cannot paraphrase it, or state it any better that the words directly from the book:

Where innovation is needed, senior technical people, analysts, designers, programmers, etc., have a critical and difficult role to play in the introduction of new techniques. These are the people who must learn new techniques and in many cases unlearn old habits. This is not easy. These individuals have typically made great professional investments in the old ways of doing things and rely on successes achieved using these ways of operating for their technical reputation. So do many technical managers.

section 23.5.3 Individuals

The thought continues to emphasize the importance of communication. Communicate to both alleviate the fear of change, to avoid underestimating the problems that exist by sticking with the old ways, as well as not over-estimating the benefits of the proposed new ways.

Chapter 24: Design and Programming

Chapter 24: Roles of Classes

There is not much that I can really add that will be revolutionary for chapters 24 and 25, except that they are brief overviews of C++ and how to begin applying the concepts that are presented earlier in the book. Regardless of the number of other sources that you have read to learn and understand object-oriented development, I am almost certain you will find something new to add to your wisdom and experience.

The Mythical Man Month

This book is a collection of essays on software engineering. Some of the essays are based on case studies, such as Mr. Brooks experience at IBM as he took over the development of the OS/360. Other essays are based on observations and conclusions related to personal experience or studies performed by others.

Your obligation

If you have not read the Mythical Man Month you owe it to yourself, and the rest of us to read it. Just multiply all of the monetary values by $8.00 to account for the total inflation rate from 1960 to 2014 of 700%. With those adjustments, it will feel like you are reading a book that was printed just last year (for the first time at least).

Brook’s Law

Brook’s Law:

“Adding manpower to a late software project makes it later”
Frederick P. Brooks

Brook’s Law originates from the chapter with the same title as the book, The Mythical Man Month. The rationale behind the concept is that time that the trained engineers currently producing on the project, must stop and focus efforts on training the new engineers to be productive. It is also important to consider the number of sequential versus independent tasks for the creation of a product schedule. Sequential tasks inherently limit the speed of the schedule based on previous dependencies.

An Enormous Source of Wisdom

This book contains many ideas, analogies, observations, conclusions and empirical evidence to help educate a software professional to become better in their craft. Conceptual Integrity in the design architecture is a common theme repeated throughout the book. Brooks describes the concept of a development team that is structured and as effective as a medical surgical team. The essay No Silver Bullet discusses how no project will duplicate the exact same circumstances of previously successful projects. Therefore you must always be willing to adjust and adapt to the differences if you are to succeed.

There are also commentaries on the “Second System Effect", in which you build one to throw it away. In the 20th anniversary edition of the book, Brooks re-evaluates the topics and his ideas from 20 years earlier, and states that quicker feedback loops are a better solution. This means smaller development cycles, and continually feedback through testing and integration while the system is still under construction. Ideas very similar to agile thinking.

I think most importantly, Brooks continues to underscore how important effective communication is between the management, technical leadership, software developers and every other role in the project. Communication relates to another well-known law:

Conway’s Law:

“Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.”
–Melvin Conway

It appears that this statement was dubbed in 1968 at the National Symposium on Modular Programming, the same conference in which Larry Constantine introduced the concept of Coupling and Cohesion[^].

… And now to the point

I read and hear over and over, “There has got to be a better way of doing things, and here it is…". I have read quite frequently that Computer Science and Software Engineering are in their infancies compared to other engineering disciplines. Every now and then Software Engineering is proclaimed to not even be an engineering discipline, it is an art or a craft; anything but science. I don’t want to defend or dispute any of these claims.

What I really want to say:

WTF!

We continue to repeat the same mistakes that we were making 50 years ago. To add insult to injury, it is costing companies more money, and there are many more of us making these exact same mistakes. In fact, labor statistics forecast there are a shortage of software professionals; companies need more of us to repeat these mistakes and waste their money.

Software is a profession where a little bit of knowledge, even less experience, and absolutely no wisdom can be dangerous; probably any mixture of those three qualities. It seems that we spend entirely too much time inventing the Next Big Technology and less time learning sound practices and skills. Some programmer, or company thinks that the existing programming languages are the problems, “Things would be better if we left out this feature, but added this feature, and then slap some lipstick on it along with a snazzy name.”

I am not a civil engineer, but even in my studies of computer science we were taught the lesson of the Tacoma Narrows bridge. I would imagine that civil engineers perform an even deeper case study of that engineering anomaly as well as others. Possibly even perform experiments and study other projects that were both successful and unsuccessful.

Much of the knowledge and wisdom imparted by these resources are wasted if the reader does not have enough experience to associate context for these concepts that are being described. That is why I suggest you revisit the books in your personal and corporate libraries. Keep track of authors of blogs that you find valuable, even if all of their content doesn’t seem to apply to you. You may happen to revisit an entry in the future that you are ready to absorb because you have new experiences that apply.

Summary

Basically, I’m frustrated for two reasons, 1) I had to stumble upon this information many years too late, 2) I may have actually been presented with this information earlier in my career and I was not mature or wise enough to recognize its value.

For those of you that I said it was probably ok to skip to the summary, please do a better job of spreading this knowledge around. Work at becoming a mentor. If you don’t like mentoring, I know, sometimes it can be frustrating, point out the info to the engineer that does enjoy mentoring. The surprise to me in all of this is that I stumbled upon this nuggets of knowledge even though some of them had been in my library for at least a decade. It was not until I had enough experience and was ready to absorb this wisdom, or even had some context with which to associate the advice.

Unit Testing a Singleton in C++

adaptability, CodeProject, C++, unittest Send feedback »

I have written about the Singleton[^] before. As a quick review from what I previously stated, I don't think the Singleton is misunderstood, I think it is the only software design pattern that most people do understand. Those that call the Singleton an anti-pattern believe that it is overused. It's simple enough in concept compared to the other patterns, that itself may explain why it is used so often. Another criticism that I hear is that it is difficult to unit-test, or at least unit-test properly with a fresh fixture for each test. No, it's not, and I will demonstrate how.

Note: There are two solutions in this entry. The first is a convoluted exploration of searching for a solution that works. The second is a simple solution that is created by approaching the problem from a new direction.

What's the problem

To create robust unit-tests, a best-practice is to entirely tear-down any resources related to your System Under Test (SUT) object before you run another test. Each new test is run in what is referred to as a fresh-fixture. A fresh-fixture is a clean test environment where you control every aspect of the environment for your test. Therefore, you can reduce the possibilities of side-effects interfering with your tests. You are basically independently testing every aspect of your SUT.

The difficulty that we encounter when attempting to test a Singleton, is that quite often they are created as global variables at startup. That is, even before your main entry-point function is called. With the improved version of the Singleton in which you declared a destructor, the problem becomes even more difficult.

Testability starts with your implementation

One of the most important factors that affects your chances of successfully creating a robust test harness around your software, is your software's implementation. This includes the public interface, the details that you abstract from the user and the data that you encapsulate within your implementation. One of the strengths of object-oriented programming that I think is often overlooked, is your ability to encapsulate the data, is what gives you the power to enforce the invariants of your classes.

One of the invariants of the classic Singleton, is that at most only one instance will exist for the system. How we choose to enforce that invariant will have a great affect on our ability to properly test the Singleton. Here is a classic implementation of the Singleton:

C++

// Source: Design Patterns (GoF)
// Page:   129
class Singleton
{
public:
  static Singleton* Instance();
protected:
  Singleton();
private:
  static Singleton* m_pInstance;
};

Since there is a static member variable the data field must be declared in a .cpp file, or in some other way to guarantee only one instance exists when the linker combines the object code. Here is the rest of the implementation:

C++

Singleton* Singleton::m_pInstance = 0;
 
Singleton* Singleton::Instance()
{
  if (0 == m_pInstance)
  {
    m_pInstance = new Singleton;
  }
 
  return m_pInstance;
}

Demonstration of Usage

Before I demonstrate the testing portion, let's demonstrate the usage of this object so that we can better understand the problems we will be faced with when we attempt to test the object.

C++

// When a caller wants to reference the object,
// they use the static member function Instance().
Singleton* pObj = Singleton::Instance();
 
// However, these usages will generate compiler errors,
// because the constructor is not declared public:
 
Singleton second;  // Error    
Singleton* pThird  // Error
  = new Singleton();

There are problems with this implementation

This implementation has problems, and we haven't even reached the part where we see if this can be tested. No explicit destructor has been declared, therefore the user is free to call delete on the Singleton. An explicitly declared destructor is a necessity to create a safe Singleton implementation.

C++

// Without a destructor explicitly defined,
// the user can do this:
std::shared_ptr<singleton>
  p_sharedObj(Singleton::Instance());
 
// As well as this.
delete Singleton::Instance();

In both of the previous scenarios, the compiler is fine with the way the user interacted with the object. However, you will have a dangling pointer and no way to recover from it. You have a few options.

  • Declare the destructor public and assigned the instance pointer back to zero. With this case, testing with a fresh fixture is once again possible and this article become a moot point:
  • Declare the destructor non-public, and the user will not be able to explicitly delete the Singleton instance. I would suggest protected access, because it will leave the possibility to subclass your Singleton if you choose to.
  • Use the C++11 way to remove the destructor altogether:
  • C++

    ~Singleton() = delete;

This implementation can be tested if...

If you selected the second option above for declaring your destructor and you declared it with protected scope, there is hope of performing best-practice unit-testing on this Singleton. However, it's not pretty. Remember, "Testability starts with your implementation."

If you have declared your destructor protected, we can derive a SingletonTest class to give us access to the destructor. Now for the ugly part. You must define your destructor to be of this form:

C++

~Singleton()
{
  if (0 != m_pInstance)
  {
    // Add all of your other destruction
    // responsibilities at this point.
 
    // Hold a copy of the singleton pointer,
    // and zero out the static member-data
    // before calling delete on the object.
    Singleton* pTemp = m_pInstance;
    m_pInstance = 0;
 
    delete pTemp;
  }
}

That's a curious bit of code isn't it? Why not just call delete m_pInstance?

Because, we are dealing with a paradox.Think about it.

What type of pointer is m_pInstance?

A Singleton class pointer.

To which class does the destructor with this code belong?

The Singleton class.

... and we typically get into a classes destructor by calling...

delete!

So how did we get into the destructor for the class Singleton if we haven't yet called delete on the only instance of this class? Furthermore, what would happen if we just deleted the pointer, then zeroed it out?

The Reveal

This is the part where I give you the Ocean's 11 style breakdown. We'll start with what you saw, or at least think you saw.

As I mentioned, we will start with a sub-class derived from the original Singleton.

C++

class TestSingleton
  : Singleton
{
public:
  // We don't even need a default constructor
  // or destructor because the compiler gives us one.
};

We create a Setup and Teardown routine for this test suite:

C++

TestSingleton *pTest = 0;
void Setup()
{
  pTest = new TestSingleton;
}
 
void Teardown()
{
  delete pTest;
  pTest = 0;
}

Here is a typical logical flow of a few tests in our test harness were structured like this:

C++

// Test 1
    Setup();
    Singleton* pObj = Singleton::Instance();
    // Testy things occur here
    Teardown();
 
    // Test 2
    Setup();
    Singleton* pObj = Singleton::Instance();
    // Now testier things occur
    Teardown();

If we instrumented the constructor and destructor for both the Singleton and TestSingleton objects, we would see something like this:

Test 1
  Construct Singleton.
  Construct TestSingleton
  Construct Singleton.
  Destroy TestSingleton
  Destroy Singleton.
Test 2
  Construct Singleton.
  Construct TestSingleton
  Construct Singleton.
  Destroy TestSingleton
  Destroy Singleton.

Why is the constructor for the Singleton getting called twice? That is because it is getting called once when the TestSingleton is created, and once when the real singleton is created by calling Singleton::Instance(). As simple as the class is now, my guess is that would not work so well on a production class singleton, especially if the object was constrained due to system resources.

We simply cannot allow the constructor of a singleton to be called twice. Furthermore, all of that fancy guard code I told you to put into place in the destructor is there to prevent an endless loop of destruction from being called. Honestly. It's in an endless recursive loop that will continue until a stack-overflow occurs. We can't really have the destructor of a singleton to be called twice either. The way we structured the Singleton::~Singleton() function is about as clean as we will be able to get.

So how do we avoid double construction, enter the paradox, and get away before Terry Benedict realizes we got away with all of his money? We'll use a pinch. Just kidding, we are going to basically use the TestSingleton as a Trojan Horse to get us access to the protected destructor of the base Singleton class. However, we will not ever construct the TestSingleton. Also, the way I instructed you to organize the destructor of the Singleton will make it safe, for the time being.

The Trick

We can allocate memory for an object, and avoid calling its constructor by calling operator new directly. Yes, you can do that. It essentially performs the same action as calling malloc. To make this work, we need to make three adjustments.

  1. Change Setup to allocate memory without calling the constructor:
  2. C++

    void Setup()
    {
      pTest = (TestSingleton*)operator new(sizeof(TestSingleton));
    }
  3. Teardown to free memory without calling the destructor:
  4. C++

    void Teardown()
    {
      pTest->DestroySingleton();
      operator delete(pTest);
     
      pTest = 0;
    }
  5. Publish a method to delete the Singleton from the TestSingleton:
  6. C++

    void DestroySingleton()
    {
      // Here is where the paradox originates...
      // We are invoking the destructor directly.
      // Which in turn will invoke ~Singleton().
      this->~TestSingleton();
    }

Watch the solution for yourself:

C++

int main(int argc, _TCHAR* argv[])
{  
  cout << "Test 1\n";
  {
    Setup();
    Singleton* pObj = Singleton::Instance();
 
    cout << "pObj is " << pObj << ".\n";
 
    Teardown();
  }
 
  cout << "Test 2\n";
  {
    Setup();
    Singleton* pObj = Singleton::Instance();
 
    cout << "pObj is " << pObj << ".\n";
 
    Teardown();
  }
 
  return 0;
}
}
Run this code

Output:

Test 1
  Construct Singleton.
  pObj is 00F9E348.
  Destroy TestSingleton
  Instance is 00F9E348.
  Destroy Singleton.
  Instance is 00000000.
Test 2
  Construct Singleton.
  pObj is 00F95CA0.
  Destroy TestSingleton
  Instance is 00F95CA0.
  Destroy Singleton.
  Instance is 00000000.

Admittedly, that is a very convoluted solution, and it even made our original code more fragile for maintenance. There are much better ways. In order to find this solution, we will need to approach the problem from a different direction.

Testability starts with your implementation

One of the most important factors that affects, yes, yes, yes; I already explained this, and hopefully I just proved my point. The implementation for that Singleton sucks (rates low on the testability scale). Here is a much simpler implementation, and solution for testing, which does not resort to using friend.

If you are considering adding friend to make a class testable, Don't do it. There is always a better way when testing.

Start with the same header implementation for your Singleton. Place the implementation for the Singleton's static members from above in their own file, separate from the main Singleton.cpp implementation. For example, a file called Singleton_instance.cpp. Now structure your code similar to below:

A better way, and simpler too

When it comes time to test, replace the file Singleton_instance.cpp with one that provides flexibility to erase and create new instances of the Singleton. For example, Test_singleton_instance.cpp:

C++

enum TestPhase
{
  k_unknown = 0,
  k_setup,
  k_testing,
  k_teardown
};
 
TestPhase  g_testPhase = k_unknown;
// ... Same definition for m_pInstance
 
Singleton* Singleton::Instance()
{
  if (k_teardown == g_testPhase)
  {
    delete m_pInstance;
    m_pInstance = 0;
 
    return 0;
  }
  // ... continue with normal Instance logic.
}

Only two last adjustments to Setup and Teardown:

C++

void Setup()
{
  g_testPhase = k_setup;
  // Add any other necessary setup code.
  g_testPhase = k_testing;
}
 
void Teardown()
{
  g_testPhase = k_teardown;
  Singleton* pEmpty = Singleton::Instance();
 
  assert(pEmpty == 0);
}

We have just introduced a way to get access to the Singleton's non-public destructor, and we didn't even need to resort to some of the subversive techniques that are possible in C++. All that was required was to use two compilation units to implement the Singleton, rather than one as is traditionally done.

If you're concerned that we didn't test the Singleton::Instance() function, simply create a new test harness for this portion of the object implementation. With the original implementation only two tests are required.

  • Verify the object is created correctly on the first call.
  • Verify the same object is returned on each successive call.

Summary

We solved a problem that seemed difficult, if not impossible at first without compromising the integrity of the SUT. With an open mind, and a bit of ingenuity, we were able to find a very clean solution. More importantly, we did not arrive at our clean solution on the first attempt. Software development is an iterative process. If you are not satisfied with your first solution, explore other possibilities. Especially for C++.

Alchemy: Benchmarks and Optimizations

general, CodeProject, C++, Alchemy, engineering, optimize Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs automated data serialization with compile-time reflection. It is written in C++ using template meta-programming.

Benchmark testing and code profiling is a phase that can often be avoided for the majority of development. That is, if you develop wisely. Selecting appropriate data structures and algorithms for the task at hand. Avoiding pre-mature optimization is about not getting caught up on the minute details before you even have a working system. That doesn’t mean to through out good decision making altogether. Well I have reached the point in Alchemy, where I have a feature-set that is rich enough to make this a useful library. This entry chronicles my discoveries for how well Alchemy performs and the steps I have taken to find and improve the areas where improvement has been required.

Benchmark Tests

To get started, I simply wanted to capture the performance of a different collection of the Alchemy supported types in a few arranged structures.

Each of the tests are run across a pre-allocated buffer of 512 MB. The structures for each test will extract as many messages as they can from the 512 MB buffer. Each message that is extracted will:

  1. Deserialize the message from the buffer
  2. Convert the byte-order
  3. Serialize the message into a different buffer

This is a summary of the size of each structure and the number of iterations that were performed over the 512 MB block of memory:

Test Name

Struct Size

Iterations

basic 14 28247922
packed 7 76695844
unaligned 19 28256363
complex 86 6242685
array 1024 524288
no_conversion 64 8388608

Here is a brief overview of the tests that I selected to exercise the basic features of Alchemy. I will also continue to add more tests over time.

1) Basic

This message structure contains a collection of fundamental types, where all of the types are aligned on a minimum of a two-byte boundary.

C++

//  *******************************
struct Basic
{
  int32_t         i32;
  uint32_t        u32;
  int16_t         i16;
  uint16_t        u16;
  char            ch;
  uint8_t         u8;
};

C++

//  *******************************
HG_BEGIN_FORMAT(Basic,
  HG_DATUM(int32_t,   i32),
  HG_DATUM(uint32_t,  u32),
  HG_DATUM(int16_t,   i16),
  HG_DATUM(uint16_t,  u16),
  HG_DATUM(char,      ch),
  HG_DATUM(uint8_t,   u8)
);

2) Packed

The PackedBits test is not complete. Currently I only define the bit-fields for Alchemy because I wanted to see how well they performed for basic serialization. Also, all of my test implementations follow a similar pattern with a simple loop. I will need to devise a test that accesses the different bit-field values from these structures to test the access speed of the data. For now, they are simply processed in bulk as integers.

C++

//  *******************************
struct Packed
  uint32_t        set_a;
  uint16_t        set_b;
  uint8_t         set_c;
};

C++

//  *******************************
HG_BEGIN_PACKED (uint32_t, SetA)
  HG_BIT_FIELD   (0,   fifteen, 15)
  HG_BIT_FIELD   (1,   two,     2)
  HG_BIT_FIELD   (2,   five,    5)
  HG_BIT_FIELD   (3,   eight,   8)
  HG_BIT_FIELD   (4,   one,     1)
HG_END_PACKED
 
//  *******************************
HG_BEGIN_PACKED (uint16_t, SetB) // ...
HG_BEGIN_PACKED (uint8_t, SetC)   // ...
 
//  *******************************
HG_BEGIN_FORMAT(Packed,
  HG_DATUM(SetA,            set_a),
  HG_DATUM(SetB,            set_b),
  HG_DATUM(SetC,            set_c)
);

3) Unaligned

While I have seen most message structures have taken care to try to properly align the different parameter sizes on proper boundaries, it is still possible to receive data that is not properly aligned in memory. This test purposely offsets the alignment of the data structure so none of the fields are properly aligned.

For those that are unaware, the x86 architecture is very accommodating and will not crash for unaligned memory. It will happily perform two partial reads, then OR your data together. Of course there is a huge speed penalty to rely on this behavior.

C++

//  *******************************
struct Unaligned
{
  char            ch;
  uint32_t        u32_a;
  uint32_t        u32_b;
  uint32_t        u32_c;
  int16_t         i16_a;
  int16_t         i16_b;
  int16_t         i16_c;
};

C++

//  *******************************
HG_BEGIN_FORMAT(Unaligned,
  HG_DATUM(char,            ch),
  HG_DATUM(uint32_t,        u32_a),
  HG_DATUM(uint32_t,        u32_b),
  HG_DATUM(uint32_t,        u32_c),
  HG_DATUM(int16_t,         i16_a),
  HG_DATUM(int16_t,         i16_b),
  HG_DATUM(int16_t,         i16_c)
);

4) Complex

Complex is the test that contains a bit of everything, except a vector. All of the previous types are included in the format as well as an extra 32-bit integer. Furthermore, the Basic structure defined from a previous test is defined as an array. The current number of elements in the array is 10. This can be changed at the top of the benchmark type definition file.

C++

//  *******************************
struct Complex
{
  uint32_t        seq;
  Basic           basic[k_complex_basic_count];
  Packed          bits;
  Unaligned       unaligned;
};

C++

//  *******************************
HG_BEGIN_FORMAT(Complex,
  HG_DATUM(uint32_t,                  seq),
  HG_ARRAY(Basic, k_complex_basic_count, basic),
  HG_DATUM(Packed,                    bits),
  HG_DATUM(Unaligned,                 unaligned)
);

5) Array

This array test was added fairly recently. It contains a single field that is an array of 32-bit integers. The current size of this array is 256 elements. Similar to the array in the complex test, this value can be change by adjusting the global variable.

C++

//  *******************************
struct Array
{
  uint32_t        items[k_array_test_count];
};

C++

//  *******************************
HG_BEGIN_FORMAT(Array_test,
  HG_ARRAY(uint32_t, k_array_test_count,   items)
);

6) No Conversion

This scenario contains 16 individual 32-bit integers, and the byte-order conversion step is omitted from the test. However, the data is still read in, copied, then written out to a separate buffer. As you will see in the results, this is one are that Alchemy has a lot of room for improvement.

C++

//  *******************************
struct NoConversion
{
  uint32_t            ch_0;
  uint32_t            ch_1;
  // ...
  uint32_t            ch_14;
  uint32_t            ch_15;
};

C++

//  *******************************
HG_BEGIN_FORMAT(NoConversion,
  HG_DATUM(uint32_t, ch_0),
  HG_DATUM(uint32_t, ch_1),
  // ...
  HG_DATUM(uint32_t, ch_14),
  HG_DATUM(uint32_t, ch_15)
);

What a waste of time!

I completed the first set of features, and even received feedback from a production environment and used it to improve my design and implementation. It was time to create some benchmarks. I wanted to verify that I created something unique and useful, and that any trade-offs that existed were well worth the cost.

Alchemy was designed to be very simple. It should be simple to define a packed buffer format for binary wire transfer, portably and safely between a variety of different computer architectures. Maintainability was the most valued quality for this library. For those qualities I felt it would easily be worth a 10% premium over a convoluted and hand-written solution that was possibly more efficient.

I was not prepared for the results that I received the first time I ran my 4 simple benchmark tests.

Denial led me to run the same program without any changes at least three or four more times, “It’ll straighten itself out, right?!”

What?

The results were consistent each and every time.

I calmly stood up from my desk, said out loud “What a fucking waste of 18 months!", and I walked away in disgust.

Alchemy was over 100 times slower than the naïve implementations that I wrote by hand. 100 time slower than the type of implementations that I had grown sick of maintaining for at least 10 years.

Take a walk, breathe, relax

I admit, I got a little extremely worked up. I was anticipating great things. I gave myself a little bit of room for disappointment just-in-case Alchemy was slower, but that margin was only 10%, not 2 orders of magnitude. Cognitive Dissonance doesn’t sit well with me. So the new relaxed persona that I brought back to my desk, sat down and simply dug in to see what could possibly be the issue.

Was it a debug build?

No.

I already experienced that in the first phase of my grief. I did see that I had initially compiled it in debug then watched and waited, and waited as it slowly chugged through the tests. Then I thought “woah, I can’t wait to see how long the hand-written version takes.” I blinked, and it was done.

Yes, debug builds of Alchemy are extremely slow because of all of the layers of one-statement abstractions. I will give more details in the summary.

I reconfigured for release, and I quickly worked passed denial. I already told you about my experience with anger.

Like comparing apples and oranges…

I really can’t tell you what happened to both bargaining and depression. They’re probably repressed memories now. Either way, when I returned, I had already accepted that I would need to investigate. I have also learned through years of experience that there is always a better way. Better in this case would mean faster.

What did I find?

After only five minutes into my search I recognized that I had brought dynamic-memory of the buffers completely into the Hg::Message object. I was using completely static buffers on the stack for the hand-written variant. I reasoned that had to account for a decent portion of the discrepancies. So yes, I basically tested a comparison of two different circumstances, apples vs. oranges, static vs. dynamic memory.

What did I do?

I did not have any mechanism in place to actually allow the user to provide the buffer for the Hg::Message to work with. An use case example would be a buffer stored on the stack that the caller wanted the packed result to be copied directly into. I did have the concept of a Hg::StoragePolicy, which allows the user to manage the source of allocated memory as well as provide rules for how to interact with buffers. However, to implement that solution would be a kludge that would result in an awkward API for the user.

I decided to simply overload the data member function of the message object. This second version would allow the caller to pass in a buffer and it’s size. With a few internal adjustments to the Hg::Message implementation I was able to make both versions transparently call the function that performs all of the actual work.

It turned out that I did need to create a specialized Hg::StoragePolicy for static data and a specialized Hg::MessageBuffer implementation. However, with the overloaded member to accept static buffers supplied by the caller, I saw an immediate increase in performance, an order of magnitude to be precise.

Fantastic! Alchemy is only 10 times slower than the naïve approach. Ignorance truly is bliss, because I was thinking “Hey, it could be worse! Like, 100x as slow.”

A Closer Inspection

After comparing the relative percentage differences in performance, it was clear my BitList implementation was absolutely killing performance. It was like an anchor holding back a drag-racer. Since I used BitList types in two of my tests, I decided to improve that implementation next.

This is the teaser, the full-details will be revealed in the write up for my third, and current implementation in use for the BitList type. Along the way I also decided to renamed it PackedBit. Because essentially that is what the data-type does, and it helps separate itself from the notion of bit-fields altogether.

I discovered the virtual anchor that was halting performance was the growing collection of references that were added to the PackedBits structure. There was one new reference for each sub-field that was added. I allow for a maximum of 32 sub-fields in the PackedBits structure. Therefore, the amount of unique data fields packed into a single structure was directly related to the decrease in performance.

After I resolved this issue, things were starting to look up again, because now I was only 4x slower. In percentages, 400% slower is 40x slower than I was expecting and hoping.

Time to profile

You know you are in the optimization phase when you turn to a code profiler to determine what your program is spending all of it’s time doing. It turns out, the majority of the time for both runs is spent in standard library calls, especially ::memcpy. That wasn’t a big surprise to me, although I was surprised that the amount of time was 60%.

Excluding standard calls and digging deeper what did I find?

  • Copies, lots of copies; by the copy constructor no-less.
  • Empty object creation and initialization for fields that were never used.
  • More copies. This time by other methods of duplicating data in Alchemy.

There was a single cause for all three inefficiencies

What was the cause you ask?

Stupidity? No, and that’s harsh.

More like ignorance, or not completely understanding a language feature. There were a few cases of oversight as well.

My use of references

I like references, because they simplify code in many circumstances. Just like pointers, they have limitations and some scenarios to be aware of. However, if your data will live on the stack while it is passed into a function as a parameter, they are fantastic. I have even converted verified pointers to references to make my syntax and code cleaner.

Now realize that a reference looks and behaves exactly like a pointer in hardware. It is simply a bit of syntactic sugar that eases the semantics of our code. I was well aware of this during my development.

My slip up occurred on the template functions that I created to request fields by index. The compiler wasn’t happy enough with me simply passing in a single number with no parameters to deduce which template instantiation I was actually after. So I relented, and added an unused parameter to the function declaration. I have noticed that in this context pointers-to-the-type were generally used.

I was stubborn and stuck with the references, because I like them better when I can use them. I initialized my code with a default instance of the field-type, and moved on to the next task. Here is the problem. A Datum entry is defined similarly to this inside of a message format structure:

C++

// For this example:
//   Hg::TypeAt<0,format_type>::type is uint32_t.
typedef
  Hg::detail::DeduceProxyType
    < 0,
      format_type
    >::type                   Proxylen;
typedef Proxylen::datum_type  datum_len;
 
Proxylen                      len;
 
datum_len& FieldAtIndex(const datum_len&)
{
  return *static_cast<datum_len *>(&len);
}

There are a few typedefs that simplify the declarations for data types and functions. One of them is the named-variable the user interacts with, which is actually a proxy object. Then there is a function called FieldAtIndex that facilitates the compile-time reflection, and allows the address of the field for this index in the TypeList to be returned. For a uint32_t, like in this example, the default constructor will basically create a zero filled buffer when called by the parent container.

My reasoning failed to account for the large nested structure types and arrays for which I added support. FieldAtIndex was another function that was reported as a bottle-neck by my profiler. When I stepped through the code in the debugger I realized my mistake.

References cannot be NULL. When I would instantiate the function as shown in the example below, a temporary object was being created, always. This is very inefficient for something that is never even referenced.

C++

template< size_t IDX>
Datum<IDX, format_type>& FieldAt()
{
  typedef Datum   < IDX,
                    format_type>    datum_type_t;
 
  return this_type::FieldAtIndex(datum_type_t());
}

Now I understand why pointers are used in this context. I modified my implementation to use a pointer in this declaration and I also changed the way this function is invoked.

C++

// New Declaration
datum_len& FieldAtIndex(const datum_len*)
{ /* ... */ }
 
// New Instantiation
template< size_t IDX>
Datum<IDX, format_type>& FieldAt()
{
  // ...
  return this_type::FieldAtIndex((datum_type_t*)0);
}

I search for any unnecessary copies

I fired up the benchmark test again, and saw another enormous boost in performance. I continued to combed through the code looking for inefficiencies, then running the tests. Slowly continuing to improve the overall performance. Each iteration I made sure that I assigned values to references if they were only temporaries, or were going to be passed into another function that requires a reference. I found a handful of cases. These were primarily located in my specialized implementations of the byte_order, pack, and unpack calls.

Finally, I could see Alchemy’s potential

I was much more pleased with the results after all of the adjustments that I made above. Some of them I was directed towards with the help of a profiler, others were realizations of inefficient techniques as I was perusing my code.

These are the first benchmarks that I released:

previous_benchmark

Three of the four tests perform very well now. The only thing that troubled me was the nested type fields were consistently 25%-33% slower. This is the performance that I lived with for the next few months. I figured that I would revisit optimizations after some time away from these improvements. The benchmark/optimization process up to this point was probably about 3 weeks.

One last change

For about four months I just could not improve the performance of nested types beyond about 25% slower than the hand-written implementation. I tried many modifications, hoping to find something. As I started writing this article I briefly looked over my code once more while adding a few more tests, and I found it!

There was one location where I was making an unnecessary copy, rather than taking a reference. This code is located in the byte-order conversion details. The functor that I create an pass into the ForEachType meta-function is initialized with an input and accepting output message. I was making a copy of the const input message rather than storing it as a reference.

The difference in changes is shown below:

C++

// The offending declaration
  const from_message_type        input;
 
  // The input field is passed in as a reference ...
  explicit
    ByteOrderConversionFunctor(
      const from_message_type& from,
            to_message_type&   to)
    : input(from)
    , output(to)
  { }
 
  // Modify the data member to be a const reference:
  const from_message_type&       input;

Generally the performance is comparable if not better than the hand-written implementation.

Alchemy FTW!

Current Results

current_benchmark

This test reveals an area where Alchemy will need to focus at some point in the future. The extra processing that is performed just for the copy adds an enormous amount of overhead.

Under-performing

Slow Debug mode

As far as debug-mode. Processing large sets of data is extremely slow because of all of the layers of abstraction. Most of the layers simply disappear as variables are directly assigned inline and the code becomes compiled almost to a level of a dedicated hand-optimized solution for each new type.

I have plans to make it simple to generate an Alchemy library that contains message data types and their conversion functions. These objects could then be compiled into a release build for use when debugging other regions of your applications.

Summary

I realize that the benchmarks that I have created are not a comprehensive test of the library. That is why I still have a few doubts about making claims of discovering the process of Cold Fusion for software. As a suggestion, one of my colleagues suggested I simply claim to have discovered Luke-warm Fusion for now. I would like to add quite a few more tests, as well as include some examples that are adapted from real-world situations. I will continue to update the status on GitHub as well as any notable improvements here at Code of the Damned.

I am working on a dedicated landing page for Alchemy to consolidate all of the information that I have catalogued up to this point. I will also place the latest performance results there when I activate that page.

Alchemy: Array / Vector Serialization

C++, maintainability, Alchemy, design Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

The alterations required up to this point have been relatively minor to integrate arrays and vectors into Alchemy. That does not mean that the solutions were clean and simple from the beginning. The exercise for integrating the serialization support of these containers was quite challenging. These containers became especially challenging because of the possibilities they created for flexibility with data management

Rabbit holes

Because of the foundation work that I laid out for the array and vector, I had the simple serialization scenarios working fairly quickly. It was only once I peered deeper into testing that I realized what challenges were ahead of me. Only after I was able to work through each problem, another one or possibly three appeared behind it.

I found myself on a journey down a rabbit hole.

The problem with a rabbit hole is that you start following it, and you have almost no context to look back and see how far you have actually travelled. Even worse, you have no feel for how much deeper it goes.

I followed the rabbit hole to solve two problems. One path I followed, and followed, and followed, and eventually abandoned. However, this did branch off early to the other path that became a necessity to make these sequential containers a viable and useful constructs.

Abandoned path

The path that I ended up abandoning was the situation where you create a sequence of BitList types. I initially started to follow this path because of the size of the BitList, v2 specifically. The BitList as an object took up much more space than its data does by itself. However, accessing the data alone does not provide the accessor functions that make the BitList valuable.

So I created specialized BitList sequence containers. These worked to some degree. However, they were extremely slow. They also added an enormous amount of complexity to a mostly clean solution. Moreover, it was only a partial solution. There were many more special cases that I would need to code around if I continued to follow this path.

In the end I exclaimed out loud with certainty to my computer screen "Using an array of bit-fields would be fools errand! Whoever builds a message like this should reconsider and make a better design." So I left the code for these containers in place, just in case I decided to pursue this path again in the future.

It turns out later that eventually I accidentally ended up on that fools quest. I was using an Alchemy message to read in a bitmap, and I wanted to perform pixel-by-pixel manipulations. I did find a better way, and you will read about it when I add my post regarding the demonstration application: "Stegonography". (BTW, No! I didn't end up creating the BlitList... but I'll have to consider that.)

Necessity, the mother of invention

What I discovered chasing the BitList through these containers, is that the std::vector also has the same problem. Its memory footprint is slightly larger than its data payload. Damnit! so do the nested fields.

It turns out that any type in the top-level message is handled naturally by virtue of the initial implementation. This is also true for any level of traversal into the message with a nested field type. However, the trouble appears if you attempt to place an array, a nested-type, a vector, or a BitList within an array or a vector.

These types would be very cumbersome to use, (and therefore probably not used) if I could not support the diverse range of nesting. I worked through the solution for these combinations:

  • Array of Arrays
  • Array of Nested-types
  • Array of Vectors
  • Vector of Arrays
  • Vector of Nested-types

I handled an array or vector of BitLists by converting the BitList to its internal integral type. The full data now fits in the allotted space, but the caller will need to cast or assign the indexed value to the BitList type if they want to access a bit-field value.

As far as attempting to solve the vector of vectors case, I started to believe that I was messing with forces beyond my understanding. For all I know attempting to wrangle in this case could disrupt the entire space-time continuum. I'll leave that to the folks at CERN.

In all seriousness, the scenario of vector of vectors starts to become a very convoluted and complex task (think vector of vectors of vectors. Since the container is a vector, it already requires a run-time based solution. For me to solve this in Alchemy would provide very little value compared to the risk it will introduce in the correctness of the code, as well as its maintainability.

If a user encounters a message where they need to support a vector of vectors, I believe the best approach would be to allocate a single vector of bytes large enough to contain all of the data, and develop a custom processor to meet that specific situation. The door is still open, and we can revisit and attempt to solve this case at any time. However, I believe there are far more valuable additions that can be made to Alchemy than this one.

How to over-stuff a data buffer

The short answer: "With meta-data".

If you have not recognize this by now, or at least stopped and said out loud, template meta-programming is all about the type information. Type information is meta-data. It is not directly manipulated at run-time, it is the context the compiler builds the processing logic upon. All assumptions and deductions make by the compiler, code-generator and linker are based upon the types and constant information supporting the logic of your code.

The reason I have trouble trying to store a vector inside of an array, is because there is extra run-time bookkeeping that is necessary to operate the vector run-time construct. However, since Alchemy is its own code-generator of sorts, we can manipulate the meta-data to become part of the logic that is generated by the compiler.

Therefore the data can remain its raw fixed size, and the type of message to be processed provides the context we need to correctly apply the meta-data. Essentially we are using the meta-data and compiler logic to help us encode more data in a buffer than would normally fit.

Transforming the meta-data

Once I realized that it was the run-time types that were populating the sequence containers, I started to search for a solution. I was searching for a method that would allow me to prevent run-time types from being the target type processed when the compile-time reflection algorithms were at work, such as the ForEachType processor.

At first I started replacing the types at the point-of-execution in the specialized functions. This proved to be the correct approach to solve the problem, however, the cost was too great. Implementing specialized templates for each potential case and performing the work inline started to make the source code in this region a cluttered mess.

I still believed this was the correct approach, I just needed to find a cleaner way to implement the solution. I decided to create a second TypeList that was appropriate for use with the Hg meta-processing, such as the serialization of data. This second list would substitute the most appropriate type within the array and vector to generate the proper logic.

Hg::make_hg_type_list

I created this reference table to help me understand the collection of relationships that I had to deal with.

Type<T>:

Sub-type<S>:

Substitution Required:

Fundamental n/a none
BitList n/a none
Nested n/a Hg::make_Hg_type_list<T>::type
     
Array Fundamental none
Array BitList BitList integral type
Array Nested Hg::make_Hg_type_list<S>::type
Array Array yes    / Sub-type also processed 
Array Vector vector / Sub-type also processed
     
Vector Fundamental none
Vector BitList BitList integral type
Vector Nested Hg::make_Hg_type_list<S>::type
Vector Array yes    / Sub-type also processed 
Vector Vector Warning: Potential risk of disrupting the space-time continuum  

I will provide a brief overview and a few samples for how I implemented this substitution table. If you want to inspect the full implementation you can find it here on GitHub: Alchemy/Hg/make_Hg_type_list.h[^]

This is the meta-function that processes over a TypeList, to define a new typedef called type, which is the appropriate substitution for the table listed above.

C++

//  **************************************************
//  @Note:  Type T must be a typelist.
//
template< class T >
struct make_Hg_type_list
{
protected:
  // The next item on the list to process.
  typedef typename                
    Hg::front<T>::type             next_type;
 
  // The remainder of the unprocessed types.
  typedef typename
    Hg::pop_front<T>::type         list_tail;
 
public:                                        
  // Make the Hg compatible list.
  typedef typename                      
    detail::make_Hg_worker< next_type,
                            list_tail,
                            Hg::TypeList<Hg::MT>
                          >::type       type;
};

detail::make_Hg_worker has a base implementation and a terminating specialization. The terminator simply typedefs the last parameterized type. Thus the recursion stops. The default template defines a set of types designed to build a new list that mirrors the original; with the exception that some of the types will have adjusted replacements as needed:

Let's break this one into a few chunks. The typedefs defined in this sample will be used to create the recursive expansion, and the final result:

C++

template< typename T,
          typename TailT,
          typename HgT
        >
struct make_Hg_worker
{
  // The result of inspecting a type and
  // potentially adjusting the type.
  typedef typename
    AdjustType<T>::type           adjusted_type;
 
  // The next item on the list to process.
  typedef typename
    Hg::front<TailT>::type        next_type;
 
  // The remainder of the unprocessed types.
  typedef typename
    Hg::pop_front<TailT>::type    list_tail;
 
  // ...

This is the remainder of the worker template. The first typedef is the recursive call to complete the processing of the other fields. The typedef for Hg::push_front defines the final replacement TypeList.

C++

// ...
 
  // A recursively built list of processed
  // types that have been adjusted to be
  // compatible for Hg message types.
  typedef typename                      
    make_Hg_worker< next_type,          
                    list_tail,          
                    HgT                
                  >::type               hg_list;
 
  // The list of types converted for Hg.
  typedef typename  
    Hg::push_front< hg_list,
                    adjusted_type      
                  >::type               type;
};

There are three more templates worth describing, otherwise the remainder of definitions are a set of template specializations that are used to fulfill the requirements mapped out in the table above. The first template is a simple adapter that I included for clarity. It's contents could have been placed completely inline inside of the make_Hg_worker template. However, I believe it makes the final solution much clearer. At the same time, it moves all type replacement logic to a single point outside of the main worker function.

C++

template< class T >
struct AdjustType
{
  typedef typename
    ReplaceType < T,
                  typename DeduceTypeTrait<T>::type
                >::type                 type;
};

The next sample is one specialization of a ReplaceType template. A specialization of this template exists for each Alchemy supported data type. This sample demonstrates the vector replacement.

C++

template< class VectorT >
struct ReplaceType< VectorT, vector_trait>
{
  // Extract the value type declaration from inside the vector declaration.
  typedef typename
    VectorT::value_type                 value_type;
 
  // Deduce the type trait of value_type
  // for declaration of the type sequence.
  typedef typename
    DeduceTypeTrait<value_type>::type   type_trait;
 
  typedef typename
    DeclareTypeSequence < VectorT,
                          type_trait
                        >::type         type;
};

The final typedef in the previous sample uses the final template meta-function DeclareTypeSequence to define the replacement type. DeclareTypeSequence is actually a simple declaration, which redefines an array or vector to use a type that will manage the expanded type, rather than the type specified in the original TypeList. This is where the meta-data becomes useful, and is applied to construct the proper run-time structures for managing the serialized data.

C++

template< typename T,
          typename A,
          template <class, class> typename VectorT
        >
struct DeclareTypeSequence < VectorT<T,A>, nested_trait  >
{
  // Define a typedef that represents the actual data-type
  // that reprsents the type-list T passed into this array.
  typedef typename
    FieldTypes <T>::value_type          value_type;
 
  typedef
    std::vector< value_type, A>         type;
};

Hg::make_Hg_type_list is instantiated with the same TypeList that is used to define a new HG_BEGIN_FORMAT definition. The Hg::make_Hg_type_list version of the replaced TypeList is then used in the field_data_t specialization that is created for every Alchemy data-type, that defines the value_type to be used for storing message data.

The substitute TypeList is a solution that evolved after a number of iterative attempts to elegantly solve serialization for the new sequential containers. What I demonstrate in the next section is the current serialization implementation used in Alchemy that is the result of an evolved solution.

pack_message

pack_message is the implementation that is called by the Hg::message object to take the run-time data stored within the object, and pack it into a raw-storage buffer in the format specified by the message's TypeList definition.

With the addition of a variable-sized field, two specializations of this function exist. The compiler tag-dispatches to the correct function based on the message's SizeTrait characteristic.

Fixed-size operation

C++

template< typename MsgT,
          typename BufferT,
          typename SizeTraitT
        >
bool
  pack_message( MsgT& msg_values,
                BufferT & fixed_buffer)
{
  return   detail::pack_fixed_size_message( msg_values,
                                            fixed_buffer,
                                            SizeTraitT());
}

Variable-size operation

C++

template< typename MsgT,
          typename BufferT,
          typename SizeTraitT
        >
size_t pack_message(MsgT &msg_values,
                    BufferT  &buffer,
                    size_t    offset)
{
  return detail::pack_message < MsgT,
                                BufferT
                              >(msg_values,
                                buffer,
                                offset,
                                SizeTraitT());
}

The unpack_message template functions are similarly defined.

detail::pack_message

Both versions of Hg::pack_message call a different version of the internal detailed version. The difference between these two functions is in how they initialize the message buffer that receives the serialized data. The variable-sized implementations prepare for process that may require dynamic offsets based on the placement and size of vectors defined in the middle of the message. Once both versions initialize the PackMessageWorker, the recursive process of serializing the message data begins.

C++

template< typename MsgT,
          typename BufferT
        >
BufferT <
  pack_message( MsgT      &msg_values,
                size_t     size,
                BufferT  & buffer,
                const dynamic_size_trait&)
{
  // Memory is only allocated for
  // the variable-sized message.
  buffer.resize(size);
 
  detail::PackMessageWorker < 0,
                              Hg::length<typename MsgT::format_type>::value,
                              MsgT,
                              BufferT
                            > pack;
  // This additional parameter tracks the dynamic
  // adjustments of variable-sized messages.
  size_t dynamic_offset = 0;
  pack(msg_values,  buffer, dynamic_offset);
  return buffer;
}

Both versions call the function operator of the same PackMessageWorker object. The only difference, is the variable-sized implementation provides an extra parameter called, dynamic_offset, which propagates the total number of bytes that have accumulated from variable-length fields. This indicates to the detail-leveled functions to make appropriate adjustments with respect to the sizes pre-calculated at compile-time.

C++

template< size_t    Idx,
          size_t    Count,
          typename  MsgT,
          typename  BufferT
        >
struct PackMessageWorker
{
  void operator()(MsgT &message,
                  BufferT  &buffer)
  {
    size_t dynamic_offset = 0;
    WriteDatum< Idx, MsgT, BufferT>(
      message,
      buffer,
      dynamic_offset);
 
    PackMessageWorker < Idx+1, Count, MsgT, BufferT> pack;
    pack(message, buffer);
  }
 
  void operator()(MsgT &message,
                  BufferT  &buffer,
                  size_t   &dynamic_offset)
  { /* ... */ }
};

Where is ForEachType<T>

The ForEachType construct is not used at this level because of the depth and diversity of the combination of message structures that are possible. This construct is convenient when the same operation must be performed on all fields, and the meta-function supplied to ForEachType does not contain mutable inputs, such as dynamic_offset.

Therefore, the familiar recursive processing of each indexed field is performed through the end of the TypeList of parameters. The recursive loop first writes the current Datum, then makes the recursive call to the next field. There is a PackDatum and UnpackDatum specialization created for each type.

PackDatum / UnpackDatum

The PackDatum specialization for the array and vector types contain the more complicated implementations. This is because once they begin serializing their data, they must inspect each sub-field and determine if it can be written directly, or must be placed in a new round of pack_message calls. Then based upon the size_trait of the sub-type for the sequence containers, the most efficient implementation will be selected.

Here is the PackDatum functor for a fundamental-type:

C++

template< size_t   IdxT,      
          typename MsgT,
          typename BufferT,
          typename TraitT
        >
struct PackDatum
{
  void operator()(MsgT     &msg,
                  BufferT  &buffer,
                  size_t    dynamic_offset)
  {
    // Removed typedefs for brevity
    value_type &value =
      msg.template FieldAt<IdxT>().get();
    size_t offset = Hg::OffsetOf
                       < IdxT,
                         typename MsgT::format_type
                       >::value
                     + dynamic_offset;
    buffer.set_data(value, offset);
  }
};

PackDatum for Array and Vector

I am not going to go into much deeper detail regarding the array and vector PackDatum implementations, because they could fill an entire blog entry on their own. However, I will elucidate the basic flow for these constructs.

The primary difference in their functor is another layer of indirection where a function called SerializeArray or SerializeVector is called dependent upon its type. The number of bytes written is captured and added to the dynamic_offset count.

The SerializeSequence function invokes a specialized functor based upon the sub-type contained within the sequence container. There is an implementation that is specialized for each Datum type supported in Alchemy. This is the single point of code in the entire library seems to be too tightly coupled. As the array and vector require knowledge and processing specific to each type supported by the library.

These implementations do choose between item-by-item transfers or bulk-transfers based on the type. I have this section of code marked to return to and attempt to further de-couple the type of processing required to properly serialize these types.

For now I find a little bit of solace knowing that this coupled behavior is found at the end of a processing chain as opposed to the central processing hub for how messages are structured. Therefore, I feel comfortable letting the library continue to grow for a short while before I decide the best course of corrective action.

If you would like to view the pack / unpack implementations for the array or vector, you can find them here on GitHub.

Summary

Reworking the serialization system to support a sequence of types has been one of the most challenging tasks that I have faced in Alchemy up to this point. In part because I was forced to rethink conventions that I had already put in place. Another reason is the fundamental change in how buffers are accounted and managed in order to support messages of variable size.

I am content with its current state. Although, as I stated, I do have some portions of the sequence serializer code that I want to revisit. I would try to further decouple the solution as well as work in further optimizations.

Speaking of optimizations, I am excited for the next post that I will be writing about Alchemy, the performance benchmarks. I compare a small set of scenarios defined in Alchemy with a naïve, hand-written serializer that uses 1-byte aligned packed structures, and the byte-order conversion functions that are included with network libraries. As a teaser, my first comparison run was abysmal. I even contemplated discontinuing the project until I did a little bit of analysis to find the root cause of my bottlenecks.

Needless to say, I would not still be developing Alchemy if I had not been able to overcome the limitations of my first implementation. I will describe what I found, what I changed, and what I learned helps improve the speed of execution in development.

Alchemy: Vectors

adaptability, C++, Alchemy, design Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

It's time to break some barriers that have existed within Alchemy since its inception, message with fixed sizes. While the storage policy concept that I use with the message buffer allows Alchemy to dynamically allocate memory for messages, the current structure of the library only allows messages whose size is known at compile-time.

There is already so much value in what Alchemy is capable of accomplishing, even with the static size limitation. However, the only way for Alchemy to expand and reach its potential, is to remove this limitation and provide support for dynamically sized messages. This entry will demonstrate the changes that were required to achieve this goal.

Why size matters

Size is a fundamental property in the realm of data serialization. This is especially true for wire-transfer protocols. All of the meta-functions originally developed for Alchemy depend upon the size of the field type in one way or another. Therefore, opening up the possibility for field lengths to differ from message to message means that basic assumptions about the amount of available space are no longer valid.

The size of fundamental types is known at compile-time due to the sizeof operator. The offset of a field is known because all of the sizes of the fields located before the current field in the TypeList are added together. The total message size is calculated by adding all of the messages field sizes together.

To be completely accurate, it is not the actual size of the message that we're concerned with, it is its ability to vary in size that concerns us.

These assumptions allow us to write the code in such a way that basic checks are placed up-front, and error handling can be largely ignored through the remainder of the processing. This has the effect of simplifying the code in the interior. Less clutter could mean more readable and maintainable code. Furthermore, less code would presumably mean faster execution times.

Finally, it is entirely possible for the compiler to take advantage of the fixed length and position of each field to perform optimizations. The only way one can be certain is to investigate what the compiler actually generates, which I have not done yet; at least with regards to optimizations. If a field inside the message could have a variable size, the compiler can no longer use fixed-length assumptions to optimize code. This also holds true for fields that are considered optional in a message; their presence is not required.

While working through this endeavor, I will remain completely cognizant of what needs to be altered, and what can remain unchanged. A task such as this can easily become a shotgun surgery operation, and that only invites chaos. Chaos is not welcomed in computing unless you're performing simulations, even then only pseudo-chaos is welcomed.

std::vector

I would like to make a few brief comments regarding the std::vector. This is the most flexible container in the standard library. If you are new to C++, or have had an aversion to angle-brackets through your C++ career, you owe it to yourself to learn this container.

The std::vector has a very small memory foot-print. It is extremely efficient in practice due to its guarantee to be stored in contiguous memory, which improves cache coherence by spatial locality. It should already feel natural to anyone that is already familiar with basic arrays; in fact it can be used any place where an array can be used. Most of all, it can improve the reliability of your programs because it carries the added bonus of managing it's own dynamic memory allocation and can perform checked access operations.

Before I started my investigation for how to introduce a variable-sized field in Alchemy, I tried to consider how it would be used. I had only one or two simple use cases in mind because of questions colleagues had asked me about its capabilities. With very little known as what to expect, I quickly thought of the std::vector and its versatility.

Although there were a few unknowns that still remain, I imagined that an interface modelled after the std::vector would provide all of the flexibility that one would need for interacting with a variable-sized field. Modelling the std::vector would also provide the additional bonus that most developers would already be familiar with the interface, which makes this decision seem all the more prudent.

Other considerations

When working with a native array or a std::vector, the size of the structure is generally obtainable by some means associated with the data type itself. However, when a raw data buffer is received, there are very few assumptions that can be made about the contents of the data. In most cases, the data needs to be de-serialized and verified as it is read into useful data types.

How is Alchemy supposed to determine how many bytes to assign to a std::vector field before populating the fields that follow afterwards in the message?

I looked at many different message structure formats that I was familiar with. I recognized that there are generally two different methods the size of variable-length payloads are specified.

Explicit message field

Supplying a field that indicate the size of a payload or specific field in a message structure seems to be a method that appears frequently. This type of solution should be simple enough to accommodate.

To implement this, I imagined that I would supply a third field in the vector type declaration MACRO similar to the HG_ARRAY MACRO. This third field would allow the user to specify the name of a field that has been defined earlier in the structure. Then rather using a fixed number like the array, a value could be read from a field at run-time. The value specified in the field would need to indicate the number of elements in the vector field, and not its raw size in bytes.

One possible complication that I can foresee, is what if the specified count is actually in a parent structure that contains a nested field with the vector type?

Run-time parsing

I have used many objects and libraries in my career where the developer implemented the bare minimum necessary to be able to make their library useful. However, they stopped short of providing a complete solution. To simply stop at the pre-defined field and state, "This is how the API must be used", would be succumbing to that same mentality. There are quite a few applications Alchemy could not support if it did not provide a way to determine the size of a dynamic buffer at run-time.

Well I don't want to create a half-assed API. There is no simple way to automatically handle this detail for the user; there are simply too many possibilities to attempt to provide a solution in this manner. While the original goal of Alchemy is to simplify and automate data serialization, that does not mean that every corner case must be handled automatically.

Therefore, I will not try to automatically solve all of the potential ways a user's buffer size may be specified. However, I will create a complete solution by providing a hook that the user can use to register a callback function if they want to provide this type of solution themselves. This allows Alchemy to still provide a complete solution. Furthermore, if the field size is actually specified in bytes, the user can supply a callback and convert the buffer size to an element count.

Dynamic-length declaration in Alchemy

In many ways the declaration for a vector in Alchemy is much simpler than the array, even though so much more is happening. The reason is because there is only one possible way to define a vector; by using the HG_DYNAMIC MACRO. This MACRO has the same usage as the HG_ARRAY, there are three fields: 1) type, 2) count and 3) name.

The only complication is the count field must either map to a previous field in the same structure that is an integer type, or it must be a callback function.

Definition MACROS

The vector field definition MACROs are used similarly to the HG_ARRAY.

C++

HG_BEGIN_FORMAT(Msg_t)
  ...
  HG_DATUM  (size_t,     len)
  HG_DYNAMIC(char,  len, path)
  ...
HG_END_FORMAT(Msg_t)

I have also provided a version of the MACRO that allows the user to specify an allocator to use with the dynamic field. The sample below depicts the user-define vector size callback being used to report the size.

C++

//  ***************************************************
//  The function signature of the callback used to
//  determine the size of a dynamic-length field.
//
//  @param p   The entire input message buffer.
//  @param n   The length of the input message.
//
//  @return    The calculated length of the dynamic field.
//
typedef
  size_t (*pfnGetDatumSize)(const uint8_t* p, size_t n);

This sample demonstrates extracting a collection of NULL terminated strings from a buffer, where the last string in the buffer is double-NULL terminated.

C++

//  ***************************************************
size_t StringCount(const uint8_t* pBuffer, size_t length)
{
  size_t count = 0;
 
  char const* pCur =  reinterpret_cast<char const*>(pBuffer);
  char const* pLast = pCur;
  std::advance(pLast, length);
 
  size_t  len = 1;
  while (pCur < pLast)
  {
    len = std::char_traits<char>::length(pCur);
    if (0 == len)
      break;
 
    ++count;
    // Advance past this current item, including the NULL.
    std::advance(pCur, len+1);
  }
 
  return count;
}

C++

typedef std::vector<char>         char_str;
 
HG_BEGIN_FORMAT(Msg_t)
  ...
  HG_DYNAMIC(char_str,  StringCount, items)
  ...
HG_END_FORMAT(Msg_t)

MACRO implementation

How does one MACRO handle both definitions?

A pair of private function templates are defined within the message structure that is created by the HG_BEGIN_FORMAT definition. These functions are defined as template functions for two reasons:

  1. The correct function type will be instantiated based on the type of field that is declared.
  2. An instance of these functions will only be instantiated if a dynamic field is declared in the message.

Here are the definitions of the functions that are added to the end of the message declaration:

C++

private:
  template< typename T,
            typename U
          >
  size_t DatumSize(T value, U*)
  {
    return value;
  }
 
  template< typename U >
  size_t DatumSize(pfnGetDatumSize ftor, U* buffer)
  {
    if (buffer->empty())
    {
      return 0;
    }
 
    return ftor(buffer->data(), buffer->size());
  }

Delicate integration

What could be so delicate?

In my experience, when facing a feature addition such as this, where an entirely new fundamental concept is added to existing software, it can quickly turn into shotgun surgery. I didn't want to jump in altering every file in order to weave the ability to manage dynamic allocation. As I mentioned earlier, I thought of a way that I could break my big problem into two smaller problems. One of which has already been solved, static-sized fields.

The question that remains, is how can I best differentiate between the different portions of my message. As much a possible, I still want to allow the compiler to resolve actions when the code is generated. Once again, a solution based on tag-dispatching seemed to be the most appropriate solution.

I could create a meta-function to determine if a field was a statically-sized field, or dynamically-sized. Furthermore, I could test each field in a message to determine if the entire message could have a variable size, or if it was always a fixed size. This solution would allow me to still take advantage of every possible optimization the compiler could muster for fixed-size messages, and still be able to accommodate dynamic messages.

Differentiate between static and dynamic sizes

If you recall my description of the tag-dispatch system I setup for the array type, I specified a static_size_trait for the array type. I have also declared a similar type-trait for dynamic fields like the vector called, dynamic_size_trait. I am not going to repeat all of the tag-dispatch definitions that I created for the vector, because they are very similar.

Instead, here is the collection of meta-functions that are used to determine the size type of a message. Here is the top-level meta function that reports the size trait for the message:

C++

//  ***************************************************
/// Indicates the type size the specified
/// message is, static or dynamic.
///
/// @paramt T  A TypeList definition.
///
/// @return   A typedef called *type* is defined
///              to return the size trait.
///               - static_size_trait: a fixed-size message.
///               - dynamic_size_trait: Dynamically-sized
///                 message. At least some portion
///                 requires runtime processing to  
///                 determine the full size of the message.
///
template< typename T >
struct message_size_trait
  : std::conditional
    < has_dynamic<T>::value,
      dynamic_size_trait,
      static_size_trait
    >
{ };

The previous meta-function is used in many locations where it is important for the selection of an algorithm or other meta-processing functions. Inside of the std::conditional statement, you can see a meta-function called has_dynamic. This function is simple enough. It actually calls another meta-function that I hide in the Hg::detail namespace, and this function performs a recursive survey of the types in the TypeList for dynamically-sized fields.

C++

//  ***************************************************
template< typename T >
struct has_dynamic
  : detail::HasDynamic_Impl<T,
                             type_container<T>::value
                           >
{ };

HasDynamic Implementation

We are working with a diverse set of types. Some of the types a basic fundamental types, while others can be containers of types. Let's start with a simple default template that will handle all non-container types:

C++

template< typename T,
          bool isContainer = false
        >
struct HasDynamic_Impl
  : std::false_type
{ };

Let's handle the next simplest case, true for the vector. As the vector is currently the only type that is sized dynamically.

C++

// Explicit HasDynamic implementation for vectors
template< class T,
          typename A,
          bool  isContainerT
        >
struct HasDynamic_Impl< std::vector<T,A>,
                        isContainerT
                      >
  : std::true_type
{ };

The last portion of this implementation requires a test for all other type containers. The only two type containers that remain are the array and the TypeList. The function below tests the type at the front of the list to determine if it is a vector. If so, true is returned, and the function is complete. Otherwise, the first element in the list is removed, and the remainder of the list is tested for dynamically sized fields.

C++

// HasDynamic implementation for type_containers
template< typename T>
struct HasDynamic_Impl<T, true>
  : value_if< vector_value< typename front<T>::type>::value,
              bool,
              true,
              has_dynamic<typename pop_front<T>::type>::value
            >
{ };

Calculate the size of a message

Now we have the ability to discriminate between fixed and variably sized messages. Previously, when a buffer was required for a fixed-size message, the size() member function could be called to return the constant value that was pre-calculated from the TypeList for which the message was instantiated. A new mechanism is required to incorporate dynamic size calculations, without disrupting the existing interface.

We can use the has_dynamic meta-function to select the type of size calculation function to use when calculating the message size. The first step for integration is to replace the size() member function to call a new function based on the result of has_dynamic for the current message type.

C++

size_t Message::size() const
{
  return Msg_size<this_type,
                  k_has_dynamic>::calculate(*this);
}

Message::size() calls one of two external meta-functions that are implemented to calculate the size of a fixed-length message and a variable-length message. Here is the implementation for the fixed-length calculation. It is basically the implementation that was previously used in Message::size(), when all messages were a fixed-length.

C++

template< class HgMessageT >
struct Msg_size<HgMessageT, false>
{
  typedef HgMessageT message_t;
 
  static size_t calculate(const message_t &msg)
  {
    return Hg::SizeOf<typename HgMessageT::format_type>::value;
  }
};

The variable-length function looks like it is quite a bit more complicated. However, it starts with the same implementation that is used to calculate a fixed-length message. A few typedefs are created for clarity. Finally, a function designed to accumulate the sum of the length for all variable-length fields is called. This function is called dynamic_size_of. The sum of the fixed-length and variable-length calculations is returned as the total size of this message.

C++

template< class HgMessageT,
          bool  has_dynamic
        >
struct Msg_size
{
  typedef HgMessageT message_t;
 
  static size_t calculate(const message_t &msg)
  {
    typedef typename
      message_t::message_type     message_type;
    typedef typename
      message_t::byte_order_type  byte_order_type;
    typedef typename
      message_t::storage_type     storage_type;
 
    size_t fixed_size   =
      Hg::SizeOf<typename HgMessageT::format_type>::value;
    size_t dynamic_size = dynamic_size_of<message_type,
                                          byte_order_type,
                                          storage_type
                                        >(msg);
    return fixed_size + dynamic_size;
  }
};

Hg::dynamic_size_of

dynamic_size_of is a function template, and not a meta-function. This function must be executed at run-time because it is used to inspect incoming data buffers to determine how much space to allocate for the vector fields.

Two implementations of this function exist. Both implementations end up calling the same internal worker function with the same set of parameters. Two versions of this function are required to be able to properly handle nested messages. A top-level message is instantiated inside of the Hg::Message class. Hg::Message handles the memory management and provides the natural syntax interface for interacting with the child fields of a message.

When a message contains nested-messages, there is no need to wrap the nested version in its own Hg::Message object. In fact, not only would it further complicate most of the logic to do so, it would also be less efficient because the internal memory buffer allocated by Hg::Message would be a collection of smaller message buffers, rather than one continuous buffer shared by all message fields.

This concept is one of the trickier problems that I encounter in Alchemy. The problem did not appear until I started to really dig deep and create pathological unit-tests for the library. Furthermore, the solution was not immediately apparent. Because while I knew that a problem existed, I was not getting the calculated size results, I had a difficult time tracking down the cause of the problem.

The two definitions for Hg::dynamic_size_of are listed below:

C++

//  ***************************************************
template< typename T >
size_t dynamic_size_of(const T& msg)
{
  return detail::DynamicSizeWorker< T,
                                    has_dynamic<T
                                  >::value>().size(msg);
}
 
//  ***************************************************
template< typename MessageT,
          typename ByteOrderT,
          typename StorageT
        >
size_t dynamic_size_of(const Message< MessageT,
                                      ByteOrderT,
                                      StorageT
                                    >& msg)
{
  return detail::DynamicSizeWorker
           < MessageT,
             has_dynamic<
               typename MessageT::format_type>::value
           >().size(msg);
}

DynamicSizeWorker

Finally we have reached the workhorse for calculating the length for all of the dynamically sized fields in a message. This function works very similarly to the byte-order conversion function that I demonstrated in a previous post. A functor that is designed to calculate the variable-length field size of a field is initialized. Then the meta-function ForEachType is used to invoke this functor on each field in the message.

This function will only be called if a message is determined to have a dynamic-length field. This is another reason why there are two versions of Hg::dynamic_size_of. A top-level Hg::Message may have been determined to contain a variable-length field, and it may also contain nested fields. However, if there are no variable-length fields in any of the nested fields, it is not necessary to inspect them for the dynamic size calculation.

This layering approach allows the best function to be selected at every level of the calculation, based on it's type-traits. The extra calculations are only paid for where the feature is used. Because the compiler is setting up the dispatch of these calls, there is not even a compare and branch instruction to pay for due to a run-time if statement.

This is the worker function that invokes the ForEachType meta-function.

C++

template< typename MessageT,
          bool     IsDynamicT
        >
struct DynamicSizeWorker
{
  static
    size_t size(const MessageT& msg)
  {
    typedef MessageT                      message_type;
    typedef typename  
      message_type::format_type           format_type;
 
    // Initialize a functor to query for the
    // dynamic size of each field.
    detail::DynamicSizeFunctor< message_type > ftor(msg);
    Hg::ForEachType < 0,
                      Hg::length<format_type>::value - 1,
                      format_type
                    > (ftor);
 
    return ftor.size();  
  }
};

I am not going to demonstrate the contents of the functor because it is very similar in structure and content to the ByteOrderCoverter functor.

One important difference to note, is that it is possible for a vector to contain a collection of nested message fields. This required me to introduce another layer of indirection to first detect this scenario, then properly process each nested message, only if they contained variable-length parameters themselves.

Here is a link to the file that holds this logic if you would like to take a closer look for yourself: Alchemy/Hg/detail/message_dynamic_detail.h[^]

Results

I am very pleased with the final results that I was able to achieve with the vector. The path to a robust solution was not immediately clear to me at first. With a little bit of thought I was able to discover a suitable approach. As always with solving programming problems, the problem must be broken down into a smaller set of problems.

The solution felt very attainable once I recognized that I would only need to create a break in my current implementation at each location in the message where a dynamically sized field occurred. Compare this to my original fears that I would need to convert the entire system to a run-time calculated solution.

Summary

Integrating memory support for a dynamically-sized field turned out to be simpler than I had feared. I was able to create an elegant solution that did not require any modifications to the other data types. This demonstrated to me that this portion of my library truly is orthogonal; a good sign for a maintainable code-base into the future.

There is one more step that I had to complete before I had full support for both arrays and vectors. This was the addition of pack and unpack serialization calls. The problems were not necessarily difficult to solve. The challenges started when I recognized the depth to which elements could be nested inside of both messages and the new sequence containers that I had created. Both containers required the same solution to this problem. Therefore, I will tackle the description for both the array and vector serialization logic in the next Alchemy entry.

Alchemy: Arrays

adaptability, C++, Alchemy, design Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

Once Alchemy was functional and supported a fundamental set of types, I had other development teams in my department approach me about using Alchemy on their product. Unfortunately, there was one type I had not given any consideration to up to this point, arrays. This group needed the ability to have variable sized messages, where the array payload started at the last byte of the fixed-format message. At that point, I had no clean solution to help deal with that problem.

Expanding the type support

This request brought two new issues to my attention that I had never considered before. Both of these features needed to be added to Alchemy for it to be a generally useful library.

  1. Support for arrays or sequences of the same type
  2. Support for variable-sized structures. Every message type in Alchemy is currently fixed in size.

I chose to handle these issues as two separate types. Primarily because I believed that allowing fixed-size fields to remain as a fixed-size definition would have a better chance of being optimized by the compiler. I wanted to use a completely separate data type for message definitions that required a dynamically-sized buffer.

I like the C++ Standard Library very much. I also use it as a model when I am designing interfaces or new constructs. Therefore, I decided to mimic the array and vector types from the standard library. I will cover the challenges that I encountered adding support for the array in this entry. The next post will focus on the vector and how I chose to solve the new problem of dynamically sized message objects.

Defining an Alchemy Array

Even before I could figure out how the code would need to be modified, and what specializations I needed to accommodate an array type, I needed a way to declare an array in both the TypeList and the Hg message definition. The simple and natural candidate was to jump to the basic array syntax:

C++

char data[10];

This is when I discovered the concept of Type Decay[^], and that I lost the extra array information for my declaration. This problem was simple enough to solve, however, because of the obscure work-around involved, I didn't think that this would a good enough solution. This solution would be fraught with potential for mistakes.

C++

char (&data)[10];

This is when I decided that while I would allow traditional array declaration syntax, the syntax I would promote would be based on the std::array class.

C++

std::array<char, 10>

Again, this caused one new wrinkle. The std::array definition is not legal in the HG_DATUM MACRO that is used to declare a field entry within the Hg message format, due to the extra comma. The pre-processor is not smart enough to parse through angle-brackets, therefore, it sees three parameters for a MACRO that it only expects to see two parameters.

C++

//       1              2    3
HG_DATUM(std::array<char, 10>, name)
//               -------^    ^-------

[sigh] Once again, this has a simple solution; simply wrap the template argument of the MACRO within parenthesis.

And once again [sigh], I thought this would lead to simple errors that would not have an intuitive solution. We are working with the pre-processor after all. This is one area where I do believe it is best to avoid the pre-processor whenever possible. However, it would just be too painful to not take advantage of it's code generating abilities.

The best solution then was to add a new HG_DATUM MACRO type to support arrays.

C++

// Error-prone
// Also creates unintelligible compiler errors
// if the parenthesis are left off the template.
//
// However, both formats are still accepted.
HG_DATUM((std::array<char, 10>), name)
HG_DATUM(char (&)[10], name)
 
// The official array declaration MACRO
// that will be demonstrated throughout
// the documentation and demonstrations.
HG_ARRAY(char, 10, name)

With a simple MACRO to create the correct declaration of the template form for a std::array, I can re-use the error-prone form internally, hidden behind the preferred array MACRO.

C++

#define DECLARE_ARRAY(T,N) (std::array<T ,N>)
#define HG_ARRAY(T, N, P)  HG_DATUM(DECLARE_ARRAY(T,N), P)

Blazing new trails

The difficult first step was now behind me. The next step was to create a DataProxy specialization that would allow the array to behave like an array, yet still interact properly with the underlying Datum that manages a single parameter.

Tag-dispatch

There are many more ways an array can be classified as far as types are concerned for tag-dispatching. Therefore, I then created a set of type-trait definitions to identify and dispatch arrays. I was also dealing with the addition of vectors at the same time, so I will include the definitions that help distinguish arrays.

C++

//  **********************************************
// Indicates the field or message has a fixed
// static size at compile-time.
struct static_size_trait { };
 
//  **********************************************
// Sequence types are a class of types that
// contain more than one element of the same
// type in series.
struct sequence_trait
  : nested_trait      { };
 
//  **********************************************
/// A sequence type that has a fixed size.
struct array_trait
  : sequence_trait    
  , static_size_trait
{ };

These were only the type-definitions that I would use to distinguish the types to be processed. I still needed some discerning meta-functions to identify the traits of a Datum.

C++

//  **********************************************
//  Fixed-Length Homogenous Containers
//  **********************************************
//  Detect a std::array type.
//  This is the default false case.
template< typename T >
struct is_std_array
  : std::false_type
{ };
 
//  **********************************************
//  Affirmative std::array type.
template< typename T,
          size_t   N
        >
struct is_std_array<std::array<T,N> >
  : std::true_type
{ };

We're not done yet. We still need a value test to identify natively-defined array types. For this one, I create a few utility meta-functions that simplified the overall syntax, and made the expression easier to read. I believe I have mentioned them before, if not, this is Déjà vu. The utility templates are And, Or, and Not.

C++

//  **********************************************
//  This is the default false case.
//  Detect native array types.
template< typename T>
struct array_value
  : And < Or < std::is_base_of<array_trait, T>,
               is_std_array<T>
             >,
          Not < std::is_base_of<vector_trait, T> >
        >
{ };

One final set of tests were required. I needed to be able to distinguish the sequence-types from the fundamental-types and the type-containers.

C++

//  **********************************************
//  Multi-variable types are containers of
//  homogenous entries
template< typename T >
struct sequence_value
  : std::integral_constant
      < bool,
        Or< typename std::is_base_of<sequence_trait, T>,
          Or< vector_value<T>,
              array_value<T>
            >
          >::value
      >
{ };
 
//  **********************************************
template< >
struct sequence_value<MT>
  : std::integral_constant<bool, false>
{ };

Are you looking back at the tag-dispatch identifiers thinking what I thought when I reached this point?

"Shit just got real!"

If this code were a NASA astronaut candidate, it would have just passed the first phase of High-G training[^]. Most likely with 'eyeballs out', and to top it all off, we avoided G-LOC without wearing G-suits.

Back to Earth

A new piece of information that was required was the extent of the array (number of elements). At this point in time, I have started to become quite proficient at determining how best to extract this information. Given a single type T, how can I extract a number out of the original definition?

This is why we went through the effort to avoid Type Decay. Because somewhere inside of that definition for type T, hides all of the information that we need.

C++

//  Forward Declarations *************************
//  No Default Implementation
template< class ArrayT >
struct array_size;
 
//  **********************************************
//  Extracts the array extent from a
//  std::array definition.
template< class T, size_t N>
struct array_size< std::array<T, N> >
  : std::integral_constant<size_t, N>
{ };

We have declared a default template without an implementation called array_size. If that were the end, any attempt to instantiate array_size would result in a compiler error that indicated no matches.

Therefore, we create a specialization that will match the default template for our array type. Furthermore, we define our specialization in such a way that will extract the extent (element count) from the type, and become a parameterized entry in the template definition. The specialization of the template occurs at this point in the declaration:

C++

...
struct array_size< std::array<T, N> >
...

Because the std::array<T, N> becomes a single type, it qualifies as a valid specialization for the default declaration without an implementation. The std::array of specialized definition explicitly indicates that we only want matches for the array. Therefore, we were able to extract both the original type and the extent of the array. In this case the extent is defined as a std::integral_constant and the type T is discarded.

This technique will appear many more times through-out the solution where we will use the type and discard the extent, or even use both values.

DataProxy<array_trait, IdxT, FormatT>

It's time to demonstrate some of the ways that the array's DataProxy differs from the other types that have already been integrated into Alchemy. First, the standard typedefs have been added to the array's proxy to match the other types: format_type, datum_type, field_type, index_type and data_type.

The extent is referenced many times throughout the implementation, therefore, I wanted a clean way to access this value without requiring a call to the previously defined array_size template:

C++

//  Constants ************************************
static
  const size_t k_extent = array_size<index_type>::value;

One more new type definition is required. Up until now, the type that we needed to process in the structured message, was the same type that was defined in the message. A slight exception is the nested-type, however, we were able to handle that with a recursive call.

These new sequence containers, both the array and vector, are containers themselves that contain a set of their own types. These are the types that we actually want to consider when we programmatically process the data with our generic algorithms. Here is the array proxies definition of the value_type:

C++

//  **********************************************
typedef typename
  std::conditional
  <
    std::is_base_of<array_trait,
                    index_type>::value,
    index_type,                  
    typename field_type::value_type                  
  >::type                               value_type;

The only a few remaining pieces that are new to the array's DataProxy implementation.

Some new ways to query for structure sizes. One method will query the extent of the array, the other queries the number of bytes required to store this construct in a buffer.

C++

//  **********************************************
// Returns the extent of the array.
size_t size() const                            
{
  return this->get().size();
}
 
//  **********************************************
// Returns the number of bytes that are
// required to hold this array in a buffer.
size_t size_of() const                          
{
  return sizeof(this->get());
}

The set operations get optimized implementations that depend on standard library algorithms. There is also an override implementation to accept natively defined arrays as well as the std::array object.

C++

//  **********************************************
void set(const value_type& value)
{
  std::copy( value.begin(),
             value.end(),
             begin());
}
 
//  **********************************************
void set(const data_type (&value)[k_extent])
{
  std::copy( &value[0],
            (&value[0]) + k_extent,
              begin());
}

Finally, here is a small sample set of how the proxy pass-through functions are implemented to forward calls to the internal array instantiation.

C++

//  **********************************************
reference at(size_t idx)                        
{
  return this->get().at(idx);
}
 
//  **********************************************
const_reference operator[](size_t idx) const    
{
  return this->get()[idx];
}
 
//  **********************************************
reference operator[](size_t idx)                
{
  return this->get()[idx];
}
 
//  **********************************************
const_reference front() const                  
{
  return this->get().front();
}

Now let's start the real work

Believe it or not, all of the previous code was just the infrastructure required to support a new type.

And, believe it or not, because of the orthogonal structure, generic interfaces, type generators and large amount of Mountain Dew that has been used to build this framework, there are only a few small specializations left to implement in order to have a fully-supported array data type in Alchemy.

It may be helpful to review this previous topic on how I have structured the Serialization[^] operations for Alchemy. However, it isn't necessary to understand how processing the individual data within the array functions.

Here is the declaration of the array byte-order conversion meta-function.

C++

//  **********************************************
//  The array's specialized byte-order meta-function
template< typename T,
          typename StorageT
        >
struct ConvertEndianess<T, StorageT, array_trait>
{
  template <typename ArrayValueT>
  void operator()(const ArrayValueT &input,
                        ArrayValueT &output)
  {
    // ...
  }
};

Here is the actual implementation for converting the byte-order elements in the array:

C++

template <typename ArrayValueT>
  void operator()(const ArrayValueT &input,
                        ArrayValueT &output)
  {
    // Convenience typedefs
    typedef typename
      ArrayValueT::value_type   value_type;
 
    typedef typename
      DeduceTypeTrait
        <value_type>::type      type_trait;
 
    // Create an endian converter for the
    // arrays defined value_type.
    ConvertEndianess< value_type,
                      StorageT,
                      type_trait
                    > swap_order;
    for (size_t index = 0; index < input.size(); ++index)
    {
      swap_order(input[index], output[index]);
    }
  }

I would like to draw your attention to the real work in the previous function. The majority of the code in the previous snippet is used for declarations and type definitions. The snippet below only contains the code that is performing work.

C++

ConvertEndianess< value_type,
                      StorageT,
                      type_trait
                    > swap_order;
 
    for (size_t index = 0; index < input.size(); ++index)
    {
      swap_order(input[index], output[index]);
    }

"That's all I have to say about that."

Summary

Generic programming is powerful. There is no reason C++ must be written as verbosely as C is often written. Alchemy is production-quality code. The error handling is in place; in most cases it occurs at compile-time. If it won't compile, the program is not logically correct and would fail at run-time as well.

I did not demonstrate the pack and unpack operations for the array. They are slightly more complicated because of the nature of this container. When I started to explore real-world scenarios, I realized that I may run into arrays-of-arrays and arrays-of-nested-types-that-contain-arrays.

The solution isn't much more complicated than the byte-order converter. However, it does require a few more specialized templates to account for this potential Cat-in-the-Hat absurd, yet likely, scenario. The next entry will describe how I expanded Alchemy to support dynamically sized messages for vectors. Then a follow-up article will demonstrate how the array and vector types are serialized in Alchemy, and are capable of handling a deep nesting of types.

C++: Type Decay

general, reliability, CodeProject, C++, design, security Send feedback »

I have previously written about code rot (code decay). This post is about decay in a different context. Essentially there are three sets of types in C++ that will decay, lose information. This entry will describe the concept, the circumstances, and in some cases ways to avoid type decay from occurring. This is an important topic for me to cover because the addition of support for arrays in Alchemy would have been much more difficult without knowledge of this concept.

Type Decay

Why do certain types decay? Maybe because they have a short half-life?! I actually do not know the reasoning behind all of the rules. I suspect they exist mostly to help things run much smoother. Type decay is a form of syntactic sugar. This is because the original type, T1, is attempting to be used in a context that does not accept that type. However, it does accept a type T2, that T1 can be converted to.

Generally the circumstances involve attempting to use a type T1, in an expression, as an operand, or initializing an object that expects a type T2. There are other special cases such as a switch statement where T2 is an integral type or when the expression T2 reduces to a bool.

The rules are quite involved. For details on the rules for order of conversion I recommend the page on Implicit Conversions[^] at cppreference.com.

Lvalue transformations

I am only going to delve into the implicit cast scenarios that relate to Lvalue transformations. This may sound redundant, but an Lvalue transformation is applied when an lvalue argument is used in a context where an rvalue is expected. Well, it's a lot more redundant if you substitute T1 for lvalue and T2 for rvalue.

Briefly, an lvalue is a type that can appear on the left hand of an assignment expression. In order for a type to qualify as an lvalue, it must be a non-temporary object or a non-temporary member function. This basically says that a data type with storage will exist when the time comes to write to storage.

An rvalue is just the opposite. It is an expression that identifies a temporary object and is not a value associated with any object. Literal values and function call return values are examples of rvalues, as long as the return type is not a reference.

L-value to R-value

This type of conversion occurs in order to allow expressions to be assign in a series of expressions, or as a result of situations where rvalues are not present. Such as a function that returns a reference to a type.

For this implicit conversion scenario, the lvalue is effectively copy-constructed into a temporary object so that it qualifies as an rvalue type. Other potential conversion adjustments may be made as well such as removing the cv-qualifiers (const and volatile).

This is a fairly benign scenario of type decay, unless your lvalue type has an extremely expensive copy-constructor.

Function to pointer

The second scenario is another simple case. If a the lvalue is a function-type, not the actual expression of a function call, just the type, it can be implicitly converted to a pointer. This explains why you can assign a function to an expression that requires a function pointer, yet you are not required to use the & to take the address of the function. Although if you do, you will still get the same results, because the implicit conversion no longer applies to the pointer to a function.

Array to pointer conversion

This is the case that I needed to understand in order to successfully add support for arrays to Alchemy. If an lvalue is an array-type T with a rank of N, the lvalue can be implicitly converted to a pointer to T. This pointer refers to the first element in the original array.

I have been using C++ for almost two decades, and I am surprised that I did not discover this before now. Take a look at the following code. What will it print when compiled and run on a 64-bit system?

C++

void decaying(char value[24])
{
  std::cout << "value contains " << sizeof (value) << "bytes\n";
}

Hopefully you surmised that since T1 is open to the implicit conversion to a pointer to a char, the sizeof call will return the size of a 64-bit pointer. Therefore, this string would be printed "T1 contains 8 bytes".

I discovered this when I was building my Alchemy unit-tests to verify that the size of an array data type was properly calculated from a TypeList definition. It only took a little bit of research for me to discover there is actually a special declaratory that can be used to force the compiler to prevent the implicit conversion of the array. Depending on your compiler and settings, you may get a helpful warning when this conversion is applied.

This declarator is called a noptr-declarator. To invoke this declaratory, use a *, & or && in front of the name of the array. Parenthesis will also need to be placed around the operators and the name of the array. The resulting definition becomes a pointer or a reference to an array of type T, rather than simply a pointer to T. The sample below shows the declaration that is required to avoid the implicit cast.

C++

void preserving(char (&value)[24])
{
  std::cout << "value contains " << sizeof (value) << "bytes\n";
}

Here is a brief example to demonstrate the syntax and differences:

C++

int main(int argc, char* argv[])
{
  char input[24];
 
  std::cout << "input contains " << sizeof(input) << " bytes\n";
 
  decaying(input);
  preserving(input);
 
  return 0;
}
Run this code

Output:

main:          input contains 24 bytes
decaying:      value contains 8 bytes
preserving:    value contains 24 bytes

This simple modification allowed me to preserve the type information that I needed to properly process array data types in Alchemy. In my next entry, I will demonstrate how template specialization can be used to dismantle the array to determine the type and the rank (number of elements) that are part of its definition.

std::decay< T >

This function is part of the C++ Standard Library starting with C++ 14. It can be used to programmatically perform the implicit casts of type decay on a type. This function will also remove any cv-qualifiers (const and volatile). Basically, the original type T will be stripped down to its basic type.

I haven't had a need to use this function in Alchemy. However, it is helpful to know about these utility functions and what is possible if I ever find the need to extract only the type.

Summary

The C++ compiler is a very powerful tool. Sometimes it attempts to coerce types and data into similar forms in order to compile a program. In most cases this is a very welcome feature because it allows for much simpler expressions and reduces clutter. However, there are some cases where the implicit casts can cause grief.

I stumbled upon the array to pointer type decay conversion during my development of Alchemy. Fortunately, there are ways for me to avoid this automatic conversion from occurring and I was able to work through this issue. Subtleties like this rarely appear during development. It is definitely nice to be aware that these behaviors exist, so you can determine how to work around them if you ever encounter one.

Alchemy: Proxy Fields

adaptability, C++, Alchemy, design Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

After I had completed my initial targetted set of features for Alchemy, demonstrated the library to my colleagues, and received the initial round of feedback, I was ready to correct some mistakes. The completion of nested structures in my API was very challenging for many reasons. This required each object to know entirely too much about the other constructs in the system. I was very motivated to find an elegant and effective solution because the next feature I decided to tackle would be very challenging, support for arrays. I turned to Proxies to solve this problem.

Proxy Design Pattern

You can learn more about the Proxy design pattern in the Gang of Four pattern book. This design pattern also referred to as the Surrogate. There are many types of proxies listed, such as Remote Proxy, Virtual Proxy, Protection Proxy and Smart Reference. To be technical, my use of proxies most closely relates to both the Virtual and Protection Proxy. However, in order to make all of the pieces fit together as smoothly as they do now, it also resembles the Adapter pattern, with a bit of Strategy, or as we refer to them in C++, policies.

Consequences

The benefits that were realized by the mixed use of these patterns are summarized below:

The adapter qualities of the solution helped eliminate the slight incompatibilities between the different types. This allowed the different data types to remain completely orthogonal and the code to be written in a very generic fashion.

The Proxy provided a natural location to hide the extra type definitions that were required to properly maintain the variety of Datum that is supported.

Policies are used to control the behavior and implementation selected to process byte-order management, storage access rules, and memory allocation.

Restructuring

I didn't actually arrive at the use of the proxy by judicious analysis, nor by randomly selecting a design pattern to "See if I can make this one fit". Before I moved on to arrays, I wanted to restructure the definition MACROS that were required to define a Hg message and its sub-fields. As I mentioned in the previous post, different MACROs were required by the nested type, based on its use.

Also, BitLists required their own MACRO inside of the nested field declarations because of their slight difference in structure than a regular data field. In essence, the BitList contains data like any other Datum, however, I used a virtual proxy in my original implementation to allow the sub-fields to be accessed with natural member data syntax.

I continued to rework the structure of the objects, relying on my strong set of unit-tests to refactor the library until I arrived at a solution that was clean and robust. Each iterative refactor cycle brought me closer and closer to a solution where every data type was held in a proxy.

Here is a sample of what a message definition looked like before the restructured definitions:

C++

typedef TypeArray
<
  uint8_t,
  uint16_t,
  mixed_bits,
  uint32_t,
  int16_t
> nested_format_t;
// A nested data-type can not behave as a top-level message.
// A separate definition is required.
HG_BEGIN_NESTED_FORMAT(nested_format_t)
  HG_MSG_FIELD(0,uint8_t   , zero)
  HG_MSG_FIELD(1,uint16_t  , one)
  HG_MSG_FIELD(2,mixed_bits, two)
  HG_MSG_FIELD(3,uint32_t  , three)
  HG_MSG_FIELD(4,int16_t   , four)
HG_END_NESTED_FORMAT
 
// A sample message declaration
HG_BEGIN_PAYLOAD(Hg::base_format_t)
    HG_MSG_FIELD(0, uint32_t,             word_0)
    HG_MSG_FIELD(1, Hg::nested_format_t,  nested)
HG_END_PAYLOAD

Here are the results of my simplified definitions:

C++

// The TypeList definition is the same
HG_BEGIN_FORMAT(nested_format_t)
  HG_DATUM(0,uint8_t   , zero)
  HG_DATUM(1,uint16_t  , one)
  HG_DATUM(2,mixed_bits, two)
  HG_DATUM(3,uint32_t  , three)
  HG_DATUM(4,int16_t   , four)
HG_END_FORMAT
// The nested formats are now compatible
HG_BEGIN_FORMAT(Hg::base_format_t)
    HG_DATUM(0, uint32_t,             word_0)
    HG_DATUM(1, Hg::nested_format_t,  nested)
HG_END_FORMAT

Finally, I thought it was idiotic that the index had to be explicitly stated for each field. Therefore I created a way to auto-count with MACROs using template specialization, and I was able to eliminate the need to specify the index for each Datum.

Unfortunately, the nature of my particular use of the auto-count MACRO is not compatible with the standard. This is because I need the template specializations to be defined within a class. The standard prohibits this and requires template specializations to be defined at a namespace scope.

I was able to port the entire solution in Visual Studio because its compiler is very lax on this restriction. None-the-less, I was still able to use my simplified MACROs because I adapted the non-standard __COUNTER__ MACRO to achieve my goal. I would have used __COUNTER__ in the first place, but I was trying to create a 100% portable solution.

I will most likely revisit this again in the future in search of another way. In the mean time, here is what the final Hg message definition looked like:

C++

HG_BEGIN_FORMAT(nested_format_t)
  HG_DATUM(uint8_t   , zero)
  HG_DATUM(uint16_t  , one)
  HG_DATUM(mixed_bits, two)
  HG_DATUM(uint32_t  , three)
  HG_DATUM(int16_t   , four)
HG_END_FORMAT

Application

There is only one difference between how I integrated the BitList proxy from my original implementation to the implementation that is currently used. I inverted the order of inheritance. In my original version the Datum was the top-level object, and was able to adjust which proxy or other object type was the base with SFINAE[^].

The new implementation starts with a Datum for all object types, and DataProxy is the derived class that hides, adapts and optimizes all of the specialized behavior of the different data types supported in Alchemy. Therefore, the message collection object stores a set of DataProxy objects with a single Datum type that behaves as the base class for all types. This is exactly the type of clean solution that I was searching for.

Basic Data Proxy

The goal for all of the DataProxy objects, is to provide an implied interface to access the underlying storage for the data type. The implicit interfaces allow static polymorphism to be used to associate each type of data to be cleanly associated with a Datum object.

The DataProxy objects provide constructors, assignment operators, and value conversion operators to the underlying value_type value as well as a reference to the value_type instance. After methods to gain access to the actual data, the DataProxy also provides a set of typedefs that allow the generic algorithms in Alchemy and the C++ Standard Library to effectively process the objects.

Here is a partial sample of the basic DataProxy objects class declaration, and object typedefs:

C++

template< typename  datum_trait,
          size_t    kt_idx,
          typename  format_t
        >
struct DataProxy
  : public Hg::Datum< kt_idx, format_t >
{
  typedef Hg::Datum < kt_idx,
                      format_t
                    >              datum_type;
  typedef typename
          datum_type::value_type   value_type;
  typedef datum_type&              reference;
  // ...

Nested Data Proxy

The adjustments that I made during my refactor went incredibly well. It went so well, in fact, that not only could I use any nested data type at both the top-level and as a sub-field, I was also able to use the same generic DataProxy that all of the fundamental data types use. There was no need for further specialization.

Deduce Proxy Type

I gained a new sense of confidence when I recognized this. Furthermore, the only trouble I would occasionally run into was how to keep the internal implementation clean and maintainable. I then started to collect all of the messy compile-time type selection into separate templates that I named DeduceX.

Given a set of inputs, the template would execute std::conditional statements, and define a single public typedef type that I would use in my final definition. Here is the type-deduction object that is used to declare the correct DataProxy type. This object determines the correct type by calling another type-deduction object to determine the type-trait that best defines the current data type.

C++

template< size_t    IdxT,          
          typename  FormatT
        >
struct DeduceProxyType
{
  // Deduce the traits from the value_type
  // in order to select the most
  // appropriate Proxy handler for value
  // management.
  typedef typename    
    DeduceTypeAtTrait < IdxT,                        
                        FormatT                      
                      >::type   selected_type;  
  // The selected DataProxy type for the
  // specified input type.
    typedef DataProxy < selected_type,
                        IdxT,
                        FormatT
                      >                   type;
};

Bit-list Proxy

If it were not for the child data fields published by the BitList, this object would also be able to use the basic DataProxy. The BitList differs from a nested data type in that the BitList does not process each of its child elements during data process actions. The data is stored internally in the same format it will be serialized. The nested type doesn't process its own data, rather it defers and recursively commands its children to process their data.

The functionality required of the BitListProxy is very minimal. A default constructor, copy constructor, assignment operator, and value conversion operator are required to provide access to the actual storage data. A portion of the BitListProxy is listed below:

C++

template< size_t    kt_idx,
          typename  format_t
        >
struct DataProxy< packed_trait,
                  kt_idx,
                  format_t>
  : public Datum<kt_idx , format_t>
{
  // ...
  DataProxy(DataProxy& proxy)
    : datum_type(proxy)
  {
    this->set(proxy.get());  
  }
  operator reference()
  {
    return *static_cast< datum_type* >(this);  
  }
  operator value_type()
  {
    return
      static_cast<datum_type*>(this)->
        operator value_type();
  }

Array / Vector Proxy

I have not yet introduced the array and vector types in how they relate to Alchemy. However, there is not much discussion required to describe their implementations of the DataProxy object. Besides, the sample code is probably starting to look a bit redundant by now. The complication created by these two types are the many new ways that data can be accessed within the data structure itself.

I wanted the syntax to remain as similar as possible to the versions of these containers in the C++ Standard Library. Therefore, in addition to the specialized constructors and value operators, I mimicked the corresponding interfaces for each container type.

Implementations for the subscript operator was provided for both classes as well as other accessor functions such as at, size, front, back, begin, and end of all forms. Using iterators through these proxy objects is transparent and just as efficient as the container themselves.

Since the vector manages dynamically allocated memory, a few more functions are required to handle the allocation and deallocation operations. These additional functions were added to the Hg::vector's DataProxy interface: clear, capacity, reserve, resize, erase, push_back and pop_back.

Results

I was very pleased with the results. For a while I felt like I was fumbling with a 3-D puzzle or some brain-teaser. I could visualize the result I wanted, and how the pieces should fit together. It just remained a challenge to find the right combination of definitions for the gears to interlock for this data processing contraption.

As I moved forward and added the implementations for arrays and vectors, I was able to pass a consistent definition to most of the algorithms, and allow the type-trait associated with the DataProxy object to properly dispatch my calls to the most appropriate implementation. Generally only two parameterized types were required for most of the functions, the TypeList that describes the format, and the Index of the data field within the TypeList.

One of the most valuable things that I took away from this development exercise is an improved understanding of generic programming principles. I also gained a better understanding of how apply functional programming principles such as currying and lazy evaluation of data.

Summary

Overall, the modification to Alchemy to separate the different supported data types with proxies has been the most valuable effort spent on this library. The difference between the proxy objects interfaces are somewhat radical. However, they have worked very well at providing the proper abstraction for adding additional features for the expanding set of the data types supported by Alchemy.

Alchemy: Nested Types

C++, Alchemy, design Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

I am almost done describing the first set of features that I was targeting when I set out to create Alchemy. The only remaining feature to be documented is the ability to have nested types. Basically, structs within structs. This entry describes the approach that I took as well as some of the challenges that I had to conquer in order to create a usable solution.

The Concept

The concept for adding nested field types seemed straight-forward, and it actually is. Recursion with templates and template specialization were the primary tools that I had employed to get as far as I have up to this point. Creating a form of compile-time reflection allowed me to iterate over all fo the fields of a top-level structure. The elegance of the solution was to simply reuse the existing processing on a second-layer structure.

This code is rotting from the inside

I was able to get the basic recursive calls into place rather quickly. However, some of my design decisions caused a great deal of resistance to some of the new changes. One of the challenges was caused by the difference in parameterized-type baggage that each type of data element required. This raised some red flags for me because there were only three datatypes, fundamental, bit-lists and nested structs.

The other issue stemmed from my original approach to memory management. I had envisioned the entire message to be a data-structure that handled memory management and allowed the user to concentrate on the data fields. The data-fields would be serialized as they were accessed, therefore the buffer would always be ready to send off at a moments notice.

The trouble arose because I didn’t want the messages to always be initialized. There are plenty of temporary copies that are created, and performance would have taken an enormous hit if buffers were allocated and immediately destroyed without ever being used. As I was saying, the trouble arose because there were still plenty of times that I had to have the buffers initialized with the proper data.

Templates are supposed to reduce the amount of code that a developer must create and maintain, not multiply it. These are observations that I kept in mind for any possible future revisions.

Fighting the internal structure

In order to initialize the buffers, I found myself creating numerous template specializations to handle each type of data, especially the nested structure. Furthermore, these specializations were located deep inside of the Datum object, which was supposed to be a very generic adapter for all of the data types.

The conundrum I faced, was an internal data field that was stored within each Datum. However, when I would encounter a nested data-type, I didn’t need to create any internal data, because the nested structure held a collection of its own Datum objects. These Datum objects already had their own data. I needed to find a way to avoid the creation of redundant data when I encountered nested fields.

On Shadow Buffers

For the temporary messages, there were never going to have memory allocated for them, I usually needed a way to pass data. So in each Datum, a value_type variable was defined to act as temporary storage. These shadow buffers exist in the current form of Alchemy as well. Albeit a bit refined for efficiency and convenience.

When a message did not have an allocated buffer, and all of the value data was stored in these shadow buffers, scenarios would occur where the data would not be properly flushed to the new object once memory was allocated.

Abstracting the aberration

To combat the ever-growing need for more specializations, which I was also dispatching by hand, I reached for a natural tool to solve this, more template specializations. Actually, I created some adapter classes that encapsulated the type information that I needed, including the data type required for the shadow buffer. I then used the type defined by the user to specialize each data type and allow for the correct functions to be called with tag-dispatching.

Let me explain.

Static vs. Dynamic Polymorphism

When using template heavy solutions, you learn to take advantage of static polymorphism. This is a type of polymorphism that takes advantage of implicit declarations defined within an object. Contrast this with dynamic polymorphism, public inheritance, where the interface declarations must be explicit. There is a direct relationship explicitly defined when inheritance is used.

The implied relationship that is employed with static polymorphism occurs simply because we expect a field named ‘X’ or a function named ‘Y’. When the code is compiled, the template is instantiated, and as long as the implied resource is present, our code will compile. The polymorphism them becomes fixed, or static.

If you have ever wondered why there are so many typedefs defined within the STL containers, value_type, pointer, const_pointer, key_type, size_type and so on. It is because the algorithms used through-out the implementation are using static polymorphism to make the containers interchangeable, as well as the different functions used to implement the basic functionality.

When you have a consistent type-convention to build upon, you are then able to create objects with orthogonal relationships that can still naturally interact with one and other.

Static Polymorphism in use

I created a FieldType struct to abstraction to define the data type I required for each type of Datum that is declared within a Hg message structure.

C++

template< typename FieldT >
struct field_data_t
{
  // The matching data type for the index.
  // The default implementation uses
  // the same type as the index type.
  typedef FieldT value_type;
};

Notice there are only declarations in this structure. There are no member variable definitions. This is the first part of my abstraction. This next structure has a default template implementation, and each nested definition gets its own specialization:

C++

template< typename FieldT,
          size_t kt_offset = 0
        >
struct FieldTypes
{
  // The type at the index of the
  // parent type container.
  typedef FieldT index_type;
 
  // The specified value type for
  // the current Datum.
  typedef typename
    field_data_t< index_type >::value_type
                  value_type;
  value_type      m_shadow_data;
};

This struct definition allows me to map the original type extracted from the defined TypeList, and statically map it to replacement type that is best suited for the situation. The fundamental types simply map to themselves. On the other hand, the nested types map to a specialization similar to the definition below. You will notice that a reference to the value_type is defined in this specialization as opposed to an actual data member like the default template. This is how I was able to overcome the duplication of resources

The ‘F’ in the definition below is actually extracted from a MACRO that generates this specialization for each nested field. ‘F’ represents the nested format type.

C++

template< typename storage_type,
          size_t kt_offset
        >
struct field_types < F, storage_type,kt_offset >
  : F##_payload< storage_type, kt_offset >
{
  typedef F index_type;
  typedef F##_payload < storage_type, kt_offset > value_type;
 
  field_types()
    : m_shadow_data(This())
  { }
 
  value_type& This()
  { return *this; }
 
  value_type &  m_shadow_data;
};

It is important to note that defining references inside of classes or structs is perfectly legal and can provide value. However, these references do come with risks. The references must be initialized in the constructor, which implies you know where you want to point them towards. Also, you most likely will need to create a copy constructor and an assignment operator to ensure that you point your new object instances to the correct reference.

Finally, here is a convenience template that I created to simplify the definitions that I had to use when defining the Datum objects. This struct encapsulates all of the definitions that I require for every field. When I finally arrived at this solution, things started to fall into place because I only needed to know the TypeList, Index, and relative offset for the field in the final buffer.

I have been able to refine this declaration even further, the most current version of Alchemy no longer needs the kt_offset declaration.

C++

template< size_t Idx,
          typename format_t,
          size_t kt_offset
        >
struct DefineFieldType
{
  // The type extracted at the current
  // index defined in the parent TypeList.
  typedef typename
    Hg::TypeAt< Idx, format_t >::type index_type;
 
  // The field type definition that maps
  // a field type with it's value_type.
  typedef typename
    detail::FieldTypes< index_type,
                        OffsetOf::value + kt_offset
                      > type;
};

Other Mis-steps

There are a few issues that were discovered as my API was starting to be put to use. However, the problems that were uncovered were fundamental issues with the original structure and design. Therefore, I would have to take note of these issues and address them at a later time.

Easy to use incorrectly, Difficult to use… at all

The original users of the Hg messages were also confused on how the buffers were intended to be managed and they created some clever wrapper classes and typedefs to make sure they always had memory allocated for the message.

When I saw this, I knew that I had failed in my design because the interface was not intuitive, it created surprising behavior, and most of all, I found myself fumbling to explain how it was supposed to work. I was able to add a few utility functions similar to the C++ Standard Libraries std::make_shared< T >, but this only masked the problem. This wasn’t an acceptable fix to the actual problem. 

Redundant Declarations

There was one more painful mistake that I discovered during my integration of nested data types. The mechanism that I used to declare a top-level message made them incompatible with becoming a nested field of some other top-level message. In order to achieve this, the user would have had to create a duplicate definition with a different name.

This was by no means a maintainable solution. I did keep telling myself this was a proof-of-concept and I would resolve it once I could demonstrate the entire concept was feasible.

Requiring two different MACROs to be used to declare a top-level message, and a nested field should have been a red flag to me that I was going to run into trouble later on because how I was treating the two fundamentally similar structures differently:

C++

// Top-level message MACRO
DECLARE_PAYLOAD_HEADER(F)
 
// A struct intended to be used as
// a nested field inside of a payload header.
DECLARE_NESTED_HEADER(F)

Summary

If I had known what I was building from the start, and had put it to use in more realistic situations during test, I may have been able to avoid the trouble that I encountered while adding the ability to support nested structures. The solution really should have been an elegant recursive call to support the next structure on the queue.

I am actually glad it didn’t work out so smoothly though, because I tried many different approaches and learned a bit more each time I tried a technique where it would be most valuable. Every now and then I still find myself struggling to wrap my head around a problem and find a clean solution. In this context, they now come much easier for me.

Oh yeah, and I eventually did reach the elegant recursive nested fields, which you will learn how in my next Alchemy entry describing my use of proxy objects.

Alchemy: BitLists Mk2

CodeProject, C++, maintainability, Alchemy, design, engineering Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

With my first attempt at creating Alchemy, I created an object that emulated the behavior of bit-fields, yet still resulted in a packed-bit format that was ABI compatible for portable wire transfer protocols. You can read about my design and development experiences regarding the first attempt here Alchemy: BitLists Mk1[^].

My first attempt truly was the epitome of Make it work. Because I didn't even know if what I was attempting was possible. After I released it, I quickly received feedback regarding defects, additional feature requests, and even reported problems with it's poor performance. This pass represents the Make it right phase.

Feedback

Feedback is an important aspect for any software development project that has any value, and would like to continue to provide value. Feedback appears in many types of formats including:

  • Bug Reports
  • Feature Requests
  • Usability
  • Performance Reports
  • Resource Usage

I realized there were limitations with the first implementation of the BitLists in Alchemy. However, I had created a simple proof-of-concept implementation for version 1, and I needed to finish the serialization and reflection features for the library to provide any value at all.

Imagine my surprise when I received the first requests to expand the capabilities of the BitLists from supporting 8 sub-fields to 32. This was already the most expensive structure as far as size in the library, and expanding its bit-field support to 32 quadrupled the size. Therefore, I knew I would have to revisit the design and implementation of this feature rather quickly.

Re-engineering the BitList

My primary concern with the first implementation of BitList was all of the predefined data fields that it contained. For each possible bit-field managed by the BitList, there was an instance of that variable in the object; whether that field was required or not. The primary reason, all of this work is occurring at compile-time. My initial choices appeared to be:

  1. Predeclare all of the fields up-front
  2. Create a different implementation for each BitList length

I often mull over projects in my head while I'm driving, walking and during other idle situations. It didn't take long for me to realize that I had already solved this problem of variable field length in other contexts of Alchemy. I solved it using list objects, just like the TypeList.

With the TypeList, I still only had a single implementation that reserves a slot for up to 32 fields. However, the use of code generatng MACROs and default template parameters allowed me to make this solution transparent to the user. My situation with the BitList is slightly different than the TypeList; I did have an idea to account for this difference.

BTW, if you are wondering why a list-based approach did not occur to me earlier, I had originally called this object a BitSet. I changed the name because of this new approach.

The new approach

My new approach would be to create a host, or head, list object that managed all of the external interfaces and details. Internally, I would build up the list one node at a time until the desired number of entries were defined within the BitList itself. The difference between this BitList and the TypeList is the TypeList does not contain any data or variables that can be referenced by the user. The BitList on the other hand, is an object that is fully defined at compile-time and performs real work at run-time.

The final structure is an embedded linked-list. Similar to the TypeList, each node contains a way to reference the next node in the list. This could possibly become a problem that would impede performance for a run-time based solution. However, once the compiler processes this structure, the code will perform as if it is a flattened array.

I estimated the modifications to be relatively benign to all of the objects outside of the new BitList because it was not dependent on any external objects. Moreover, the meta-functions that I created to calculate the type, size, and offsets of the message fields were generic templates that would recognize the new BitList as the previous BitSet.

The implementation

Modifying the code to use this new format was not terribly difficult. I did struggle with the recursive nature of the algorithm to be able to acheive the same results as the original BitSet. The primary issue that I encountered was how to correctly terminate the list. I would end up with one too-many sub-fields, or the bit-count would not be correct.

This is a situation where one can truly experience the value of having a strong collection of unit-tests written for an object. Very few of my tests required any type of adjustment, and I could quickly tell if my refactoring efforts made the system at least as good as it was before, or worse.

Let's take a brief look at the internals of this implementation.

The BitList host

Similar to my first attempt, I created a base class to root the variable to hold the data and basic contruction logic for these fields. This type also helps determine when the end of the list has been reach. However, this class cannot be the sole terminator as I will demonstrate momentarily.

C++

template< typename T,
          typename FieldsT >;
struct BasicBitList
  : public bitfield_trait
{
  typedef BitField< 0, 0, value_type >;    nil_bits_t;
 
protected:
  value_type            *m_pData;
  value_type            temp_data;
 
  // Can only be instantiated by a derived class.
  BasicBitList()
    : m_pData(&temp_data)
    , temp_data(0)
  { }
 
  BasicBitList(value_type &data_field)
    : temp_data(0)
  {
    m_pData = &data_field;
  }
};

One more thing I would like to point out that never sat well with me, were the two data fields that I defined. The first is a pointer called, m_pData, which actually points to the real storage location for the bit-field data. This pointer is then passed into each bit-field node to allow the nodes to all reference the same piece of data.

The second defined value is called, temp_data, and acts as a place-holder for an uninitialized BitList. This allows the constructor to always assign the field, m_pData, to a valid memory location. It also allows the use of references in the bit-field node objects, which then avoids the need to check for NULL pointers. Still, this part of my implementation always felt like slapping a band-aid on a symptom, rather than solving the actual problem. I am able to eventually remove this band-aid, but not until my third iteration of the BitList.

Let's move from the base class to the top-level host for the BitList, BitFieldList. This first section of logic shows the declaration of the class, common typedefs and constants. This is the class that acts as the head; of the linked-list structure. I am able to make all of the activity resolve itself at compile-time by deriving from a BitFieldNode, rather than declaring the BitFieldNode as an actual data member of this class.

The parameterized type, RootT, accepts the BasicBitList shown previously. This is passed through each BitFieldNode in the linked list until the terminator. This allows each BitFieldNode to have access to the same member data that all of the BitFieldNodes share.

C++

template< typename RootT,
            typename SeqT>
struct BitFieldList
  : public BitFieldNode< RootT, 0, 0, SeqT >
{
  typedef typename
    RootT::value_type                      value_type;
  typedef typename
    BitFieldNode< RootT, 0, 0, SeqT >      base_type;
 
  //  Constants *********************************
  static
    const size_t k_size  = sizeof(value_type);

The only other definitions for this class are the set of constructors that initiate the cascading initialization process that occurs as each derived BitFieldNode is initialized.

C++

// Default constructor
  BitFieldList()
    : base_type()
  {
    RootT::m_data = 0;
  }
 
  // Const Value constructor,
  // Typically used for temporary instances.
  // This call will use internal memory to store values.
  BitFieldList(const value_type &data_field)
    : base_type()
  {
    RootT::value(data_field);
  }
 
  // Value constructor.
  BitFieldList(value_type &data_field)
    : base_type(data_field)
  { }

The BitFieldNode

This class is the innovation of the second iteration of the BitList object. Rather than storing a fixed collection of these types in the container object, a compile-time linked-list is created by deriving the current node from the next node in the sequence. This would most likely cause a very inefficient implementation if it were a solution based on dynamic polymorphism. However, I found through analysis of the compiled code that the overall structure is flattened out by the compiler, and becomes as efficient as if they were all declared as peers in the same container.

C++

template< typename RootT,
            size_t    IndexT,
            size_t    OffsetT,
            typename  SeqT>
struct BitFieldNode
  : BitFieldNode< RootT,
                  IndexT+1,
                  OffsetT+Front< SeqT >::value,
                  typename GetTail< SeqT >::type
                >
{
  //  Typedef **************************************
  typedef typename
    GetTail< SeqT >::type                  tail_type;
 
  typedef typename
    BitFieldNode < RootT,
                   IndexT+1,
                   OffsetT+Front< SeqT >::value,
                   tail_type
                 >                         base_type;
 
  typedef typename
    RootT::value_type                      value_type;

While the BitFieldNode derives from the next node in the sequence, this instance defines a BitField object to manage the mask and shift operations to operate on the appropriate bits in the common storage data. The BitField object remains unchanged from the version I used in the first BitList implementation. It is also the same version that I have previously written about.

The salient detail to recognize in the constructors, is the base class is first initialized, then the local BitField is initialized. Remember the base type will eventually reach the BasicBitList that houses and initializes the actual storage data for this entire structure. Therefore on the return pass, each BitFieldNode is initialized with valid data starting from the tail node and finishing at the head.

C++

// Default Constructor.
  BitFieldNode()
    : base_type()
    , m_field( RootT::GetFieldAddress(m_field) )
  {
    m_field.attach((value_type*)&rhs.m_field);
  }
 
  // Value constructor
  BitFieldNode(value_type &data_field)
    : base_type(data_field)
    , m_field(GetFieldAddress< IndexT >(m_field))
  {
    // assigns the reference of data_field
    // to the data member, m_data.
    m_field.attach(&data_field);
  }
 
private:
  BitField< OffsetT,
            Front< SeqT >;::value,
            value_type >            &m_field;
};

The BitFieldNode Terminator

At the end of the set of nodes in the linked-list, is a specialization of the BitFieldNode that derives from the passed in BasicBitList rather than the next node in the chain. I cannot recall the exact troubles that I ran into trying to find the correct set of definitions to properly terminate this list. I do remember that I continued to switch back-and-forth between having an incorrect number of sub-nodes in the list and an incorrect offset for each field.

The code below is the final correct solution at which I arrived. The first block contains the declaration and the constructors.

C++

template< typename  RootT,
            size_t    IndexT,
            size_t    OffsetT
          >
struct BitFieldNode< RootT, IndexT, OffsetT, MT>
  : public RootT
{
  //  Typedefs *******************************
  typedef typename
    RootT::value_type              value_type;
 
  BitFieldNode(const BitFieldNode &rhs)
    : RootT(rhs)
  { }
 
  BitFieldNode(value_type &data_field)
    : RootT(data_field)
  { }

The function below is something that I carried over from the first implementation as well. It allows internal references to the actual BitField objects to be queried by index. The basic mechanism that allows the compile-time reflection to occur over this type.

C++

// A parameterized method that allows the
// BitList heirarchy to pragmatically access
// the different bit-fields defined in the list
// without knowledge of the name of the field.
template< size_t    Idx,
          typename  field_t >
field_t& GetFieldAddress(const field_t&)
{
  typedef FieldIndex< Idx, RootT, field_t::k_size> field_index_t;
 
  // Calls a template member function "GetField"
  //  found in the base class "RootT".
  //
  // The static_cast from this to a type "RootT"
  // is required to give a hint to the compiler where
  // to look for the "GetField" function.
  return
    static_cast< RootT* >(this)->template
                     GetField< field_index_t >(field_t());
}

The results

Overall I was pleased with this solution, especially compared to my first-pass attempt. This version felt more like a Make it Right type of solution, rather than the starting phase of Make it Work. A few portions of the implementation still didn't sit right with me. For example, an extra dead-weight temporary to ensure that I never dereferenced a NULL pointer.

There is one other source of pain that I experienced with this implementation. This is a result of the deep-derivation that I used to create the compile-time linked list. When I was working through the defects and creating new tests, I spent a fair amount of time in the debugger inspecting the contents of BitList variables and the data found in each node. Unfortunately, this data structure was not very convenient to troubleshoot.

Here are two snapshots of what the following BitList definition looks like when expanded in the Visual Studio debugger:

C++

HG_BEGIN_BITLIST (uint32_t, mixed_set)
  HG_BIT_FIELD   (0,   first,   5)
  HG_BIT_FIELD   (1,   second,  4)
  HG_BIT_FIELD   (2,   third,   3)
  HG_BIT_FIELD   (3,   fourth,  2)
  HG_BIT_FIELD   (4,   fifth,   1)
  HG_BIT_FIELD   (5,   sixth,   16)
  HG_BIT_FIELD   (6,   seventh, 1)
HG_END_BITLIST

It is easy to see that the value stored in the entire BitList is 0xfeedcodf. However, that is also the value displayed when the bit-fields are expanded. Therefore the debug data viewer provides much less value than I had hoped it would.

This view is an expanded view of just the type names for each field in the derived BitFieldNode chain.

Needless to say this irritated me enough that I eventually created some custom Visual Studio visualizers that I added to Alchemy to correct this inconvenience.

Summary

The usage of the BitList changed from the point-of-view of the final user. However, I was much more satisfied with this solution than the original solution. This solution expands gracefully to any length the user defines (within reason). The amount of space required for each field is reduced to only require space if a BitFieldNode is defined. Therefore, only as much space is allocated as is required for the data structure.

Unfortunately, this data type still remains the most expensive data type that is supported by Alchemy. One pointer is added for each BitField specified in the BitList. This amounts to 4*32 = 128 bytes on a 32-bit system for a BitList with 32 1-bit flags. This is in addition to the other overhead incurred for each type.

I was pushing off my worries about performance as I was not ready yet to write the benchmark tests. I will admit I was concerned, and as you will soon find out, I discovered some very compelling reasons to make my third attempt at implementing the BitList.

Contact / Help. ©2017 by Paul Watt; Charon adapted from work by daroz. CMS / cheap web hosting / adsense.
Design & icons by N.Design Studio. Skin by Tender Feelings / Skin Faktory.