Embedded Alchemy

Alchemy Send feedback »

Alchemy is a collection of independent library components that specifically relate to efficient low-level constructs used with embedded and network programming.

The latest version of Embedded Alchemy[^] can be found on GitHub.
The most recent entries as well as Alchemy topics to be posted soon:
Alchemy: Documentation[^]

I just completed my Masters Degree in Cybersecurity at Johns Hopkins University. I plan to resume Alchemy's development. I plan to use my newly acquired knowledge to add constructs that will help improve the security of devices built for the Internet of (Insecure) Things.

Learn Hexadecimal and Binary

general, beginner 1 feedback »

The binary, octal and hexadecimal number systems pervade all of computing. Every command is reduced to a sequence of strings of 1s and 0s for the computer to interpret. These commands seem like noise, garbage, especially with the sheer length of the information. Becoming familiar with binary and other number systems can make it much simpler to interpret the data.

Once you become familiar with the relationships between the number systems, you can manipulate the data in more convenient forms. Your ability to reason, solve and implement solid programs will grow. Patterns will begin to emerge when you view the raw data in hexadecimal. Some of the most efficient algorithms are based on the powers of two. So do yourself a favor and become more familiar with hexadecimal and binary.

Number Base Conversion and Place Value

I am going to skip this since you can refer to my previous entry for a detailed review of number system conversions[^].

Continuous Data

Binary and hexadecimal are more natural number systems for use with computers because they both have a base of 2 raised to some exponent; binary is 21 and hexadecimal is 24. We can easily convert from binary to decimal. However, decimal is not always the most useful form. Especially when we consider that we don't always have a nice organized view of our data. Learning to effectively navigate between the formats, especially in your head, increases your ability to understand programs loaded into memory, as well as streams of raw data that may be found in a debugger or analyzers like Wireshark.

The basic unit of data is a byte. While it is true that a byte can be broken down into bits, we tend to work with these bundled collections and process bytes. Modern computer architectures process multiple bytes, specifically 8 for 64-bit computers. And then there's graphics cards, which are using 128, 192 and even 256-bit width memory accesses. While these large data widths could represent extremely large numbers in decimal, the values tend to have encodings that only use a fraction of the space.

Recognize Your Surroundings

What is the largest binary value that can fit in an 8-bit field?

It will be a value that has eight, ones: 1111 1111. Placing a space after every four bits helps with readability.

What is the largest hexadecimal value that can fit in an 8-bit field?

We can take advantage of the power of 2 relationship between binary and hexadecimal. Each hexadecimal digit requires four binary bits. It would be very beneficial to you to commit the following table to memory:

Dec 0123 4567 891011 12131415
Bin 0000000100100011 0100010101100111 1000100110101011 1100110111101111
Hex 0x00x10x20x3 0x40x50x60x7 0x80x90xA0xB 0xC0xD0xE0xF

Now we can simply take each grouping of four bits, 1111 1111, and convert them into hex-digits, FF.

What is the largest decimal value that can fit in an 8-bit field? This isn't as simple, we must convert the binary value into decimal. Using the number base conversion algorithm, we know that the eighth bit is equal to 28, or 256. Since zero must be represented the largest value is 28 - 1, or 255.

Navigating the Sea of Data

Binary and hexadecimal (ok, and octal) are all number systems whose base is a power of two. Considering that computers work with binary, the representation of binary and other number systems that fit nicely into the logical data blocks, bytes, become much more meaningful. Through practice and the relatively small scale of the values that can be represented with 8-bits, converting a byte a to decimal value feels natural to me. However, when the data is represented with 2, 4 and 8 bytes, the values can grow quickly, and decimal form quickly loses its meaning and my intuition becomes useless as to what value I am dealing with.

For example:

What is the largest binary value that can fit in an 16-bit field?

It will be a value that has sixteen, ones: 1111 1111 1111 1111.

What is the largest hexadecimal value that can fit in an 16-bit field? Again, let's convert each of the blocks of four-bits into a hex-digit, FFFF.

What is the largest decimal value that can fit in an 16-bit field?

Let's see, is it 216 - 1, so that makes it 65355, 65535, or 65555, it's somewhere around there.

Here's a real-world example that I believe many people are familiar with is the RGB color encoding for pixels. You could add a fourth channel to represent an alpha channel an encode RGBA. If we use one-byte per channel, we can encode all four channels in a single 32-bit word, which can be processed very efficiently.

Imagine we are looking at pixel values in memory and the bytes appear in this format: RR GG BB. It takes two hexadecimal digits to represent a single byte. Therefore, the value of pure green could be represented as 00 FF 00. To view this same value as a 24-bit decimal, is much less helpful, 65,280‭.

If we were to change the value to this, 8,388,608‬, what has changed? We can tell the value has changed by roughly 8.3M. Since a 16-bit value can hold ~65K, we know that the third byte has been modified, and we can guess that it has been increased to 120 or more (8.3M / 65K). But what is held in the lower two bytes now? Our ability to deduce information is not much greater than an estimate. The value in hexadecimal is 80 00 00.

The difference between 8,388,608‬ and 8,388,607‬ are enormous with respect to data encoded at the byte-level:

8,388,608‬‬ 8,388,607‬‬‬
00 80 00 00 7F FF
 

Now consider that we are dealing with 24-bit values in a stream of pixels. For every 12-bytes, we will have encoded 4 pixels. Here is a representation of what we would see in the data as most computer views of memory are grouped into 32-bit groupings:

‭4,260,948,991‬‬   ‭2,568,312,378‬‬‬   ‭3,954,253,066
FD F8 EB FF   99 15 56 3A   8C B1 1D 0A
     

Binary

I typically try to use binary only up to 8-bits. Anything larger than that, I simply skip the first 8-bits (1-byte), and focus on the next 8-bits. For example: 1111 1111 1111 1111. As I demonstrated with the RGB color encoding, values do not always represent a single number. In fact, it is a stream of data. So whether there is one byte, four bytes, or a gigabytes worth of data, we usually process it either one byte or one word at a time. We actually break down decimal numbers into groups by using a comma (or some other regional punctuation) to separate thousands, e.g. ‭4,294,967,295, 294 of what? Millions.

Binary manipulation can be found in many contexts. One of the most common is storing a collection of flags in an unsigned buffer. These flags can be flipped on and off with the Boolean flag operations of your programming language. Using a mask with multiple bits allows an enumerated value with more than two options to be encoded within the binary buffer. I'm not going to go to describe the mechanics here, I simply want to demonstrate that data is encoded in many ways, and there are many reasons for you to become proficient at picking apart byte-streams down to the bit.

Hexadecimal

It is just as beneficial to be able to convert between hexadecimal and decimal in your head from 1 to 255, especially if you ever deal with selecting colors for web-pages or you edit images in programs like Photoshop. It only takes two hex-digits to represent an 8-bit byte. If you memorize the values that are defined by the high-order hex-digit, reading hexadecimal byte values becomes almost as easy as reading decimal values. There are two tables listed below. The table on the left indicates the value mapping for the high-order and low-order hex-digits of a byte. The table on the right contains a set of landmark values that you will most certainly encounter and find useful:

161Decimal  160Decimal
0x0000x00
0x10160x11
0x20320x22
0x30480x33
0x40640x44
0x50800x55
0x60960x66
0x701120x77
0x801280x88
0x901440x99
0xA01600xA10
0xB01760xB11
0xC01920xC12
0xD02080xD13
0xE02240xE14
0xF02400xF15
Decimal  Hexadecimal
320x20
640x40
1000x64
1270x7F
1280x80
1680xA8
1920xC0
2000xC8
2240xE0
2390xEF
2500xFA
 
 
 
 
 

Some of the numbers listed in the table on the right are more obvious than others for why I chose to included them in the map. For example, 100, that's a nice round number that we commonly deal with daily. When you run into 0x64, now you can automatically map that in your mind to 100 and have a reference-point for its value. Alternatively, if you have a value such as 0x6C, you could start by calculating the difference: 0x6C - 0x64 = 8; add that to 100 to know the decimal value is 108.

Some of you will recognize the values 127, 168, 192, 224 and 238. For those that do not see the relevance of these values, they are common octets found in landmark network IP addresses. The table below shows the name and common dotted-decimal for as well as the hexadecimal form of the address in both big-endian and little-endian format:

Name   IPv4 Address   
Hexadecimal
 big-endianlittle-endian
local-host127.0.0.10x7F0000010x0100007F
private range192.168.x.x0xC0A800000x0000A8C0
multicast base address224.0.0.00xE00000000x000000E0
multicast last address239.255.255.2550xEFFFFFFF0xFFFFFFEF

One additional fact related to the IPv4 multicast address range, is the official definition declares any IPv4 address with the leading four bits set to 1110 to be a multicast address. 0xE is the hex-digit that maps to binary 1110. This explains why the full multicast range of addresses is from 0xE0000000 to 0xEFFFFFFF, or written in decimal dotted notation as 224.0.0.0 to 239.255.255.255.

Octal

I'm sure octal has uses, but I have never ran into a situation that I have used it.

Actually, there is one place, which is to demonstrate this equality:

Oct 31 = Dec 25

Landmark Values

Binary has landmark values similar to the way decimal does. For example, 1's up to 10's, 10's through 100, then 1000, 1M, 1B ... These values are focused on powers of 10, and arranged in groups that provide a significant range to be meaningful when you round-off or estimate. Becoming proficient with the scale of each binary bit up to 210, which is equal to 1024, or 1K. At this point, we can use the powers of two in multiple contexts; 1) Simple binary counting, 2) measuring data byte lengths.

I have constructed a table below that shows landmark values with the power of 2. On the left I have indicated the name of the unit if we are discussing bits and the size of a single value; for example, 8-bits is a byte. I haven't had the need to explore data values larger than 64-bits yet, but it's possible that some of you have. To the right I have indicated the units of measure used in computers when we discuss sizes. At 1024 is a kilobyte, and 1024 kilobytes is officially a megabyte (not if you're a hard drive manufacturer...). I continued the table up through 280, which is known as a "yotta-byte." Becoming familiar with the landmarks up to 32-bits is probably enough for most people. To the far right I converted the major landmark values into decimal to give you a sense of size and scale for these numbers.

Unit
(bits)
Binary
Exponent
Unit
(bytes)
  Place
Value
  Largest
Value
Bit (b)20Byte (B) 1
21Word 2
22Double-word 4
Nibble23Quad-word 8
24 16
25 32
26 64
Byte (B)27 128255
28 256
29 512
210Kilobyte (KB) 1024
220Megabyte (MB) 102421,048,576
230Gigabyte (GB) 102431,073,741,824
Double-word231 10243·24,294,967,295
232 10243·22
240Terabyte (TB) 10244
250Petabyte (PB) 10245
260Exabyte (EB) 10246
Quad-word263 10246·239,223,372,036,854,775,807
270Zettabyte (ZB) 10247
280Yottabyte (YB)10248

Summary

Looking at a large mass of symbols, such as a memory dump from a computer, can appear to be overwhelming. We do have tools that we use to develop and debug software help us organize and make sense of this data. However, these tools cannot always display this data in formats that are helpful for particular situations. In these cases, understanding the relationships between the numbers and their different representations can be extremely helpful. This is especially true when the data is encoded at irregular offsets. I presented a few common forms that you are likely to encounter when looking at the values stored in computer memory. Hopefully you can use these identity and navigation tips to improve your development and debugging abilities.

Number Base Conversion

general, beginner, math 5 feedbacks »

I have started writing an entry that discusses the value of becoming familiar with the binary (base-2) and hexadecimal (base-16) number systems because they are generally more useful to a programmer than decimal (base-10). My daughter is currently in high-school and she is taking a programming course. One of the things that she is currently learning how to do is count in binary. So I decided to expand my explanation of conversion between number systems as a reference for her and all of those who would like a refresher. The entry following this one will describe how binary and hexadecimal will make you a more effective programmer.

Place Value: coefficient·radixplace

To be able to convert numbers between different numerical bases, it is important to review the concept of place value. Each place, or column, in a number represents a value equal to number system base raised to the power of its place index starting at 0. The official name of the base-value is the radix. For example, consider the first three place values for a number system that has a radix of b.

b2+b1+b0

If we are dealing with decimal (base-10), the radix = 10 and we would have place values of:

102 + 101 + 100

Now we have the 1's column, 10's column, and 100's column:

100 + 10 + 1

The number stored at each column in the number is called the coefficient. To construct the final number, we multiply the coefficient by its place-value and add the results at each place together. Here is the decimal number 237 broken down by place value:

2·b2 +3·b1 +7·b0

2·102 + 3·101 + 7·10

200 + 30 + 7

237

Hopefully decimal form is so natural to you that 237 seems like a single number, rather than the sum of place values that are multiplied by their coefficient.

The Binary Number System

There are 10 types of people, those that understand binary and those that don't

If you are wondering what the eight other types of people are, continue reading.

Binary is a base-2 number system. This means that each column in a binary number represents a value of 2 raised to the power of its place index. A number system requires a number of symbols to represent each place value that is equal to its Base value, and zero is always the first symbol to include in this set. For instance, decimal (base ten) requires ten symbols: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Therefore, binary (base two) only requires two symbols: {0, 1}.

Adding the radix as a subscript after a number is a common notation to indicate the base-value of a number when there is a possibility for confusion. Unless the context specifically indicates a different number system, decimal form is assumed. For example, the value 2 in binary:

102 = 210

If we had used that notation in the joke at the beginning... well, it just wouldn't have been as funny.

Another place that you will see the subscript notation is when you study logarithms:

logax = logbx / logba

I'll leave the details for logarithms for another time.

Counting in Binary

When learning anything new, it can be helpful to map something that you already know to the new topic. For a new number system, counting with both number systems can be a helpful exercise. Counting in all number systems uses the same process:

  1. Start with zero in the least significant column
  2. Count up until you have used all of the symbols in increasing order in the least significant, 1's, column
  3. When the limit is reached, increment the value in the next column, and reset the current column to zero.
  4. If the next column has used all of the symbols, increment the column after that and reset the current column.
  5. Once no further columns reach their limit, return to step 2 to continue counting.

Starting with decimal, if we count up from zero to 9, we get:

0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 ...roll over

We are now at step 3, we have reached the limit, so we increment the next column from an implied 0 to 1, and reset the current column for the result of:

10

Continuing to count, we increment the 1's column and rolling over the successive columns as necessary:

11 - 12 - 13 ... 98 - 99 ... roll over

100

Here is 0-15 in binary. When working with computers, and binary in general, you will typically see zeroes explicitly written for the more significant columns. We require 4 binary digits to represent 15.

Binary  Sum of Columns  Decimal
00000 + 0 + 0 + 00
00010 + 0 + 0 + 11
00100 + 0 + 2 + 02
00110 + 0 + 2 + 13
01000 + 4 + 0 + 04
01010 + 4 + 0 + 15
01100 + 4 + 2 + 06
01110 + 4 + 2 + 17
10008 + 0 + 0 + 08
10018 + 0 + 0 + 19
10108 + 0 + 2 + 010
10118 + 0 + 2 + 111
11008 + 4 + 0 + 012
11018 + 4 + 0 + 113
11108 + 4 + 2 + 014
11118 + 4 + 2 + 115

The Hexadecimal Number System

Hexadecimal is a base-16 number system. Therefore, we will need sixteen symbols to represent the place values. We can start with the ten numbers used in decimal, and we use letters of the alphabet to represent the remaining six symbols. Although letter-case can matter in programming, the letters used in hexadecimal are case-insensitive. Here is a mapping of the hexadecimal values to decimal:

Decimal:  { 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 }
Hexadecimal:  { 0 1 2 3 4 5 6 7 8 9 A B C D E F }

Number Base Conversion

Let's discuss how to convert between number systems with different base values. Specifically, we will describe how to convert from "Decimal to Basex" and "Basex to Decimal".

Decimal to Basex

Here is the algorithm to convert a number in decimal to a different base:

  1. Divide the decimal number by the radix of the target base
  2. The remainder from step 1 becomes the value for the current column.
  3. Use the quotient (answer) from step 1 as the decimal value to calculate the next column.
  4. Return to step 1 and repeat until the quotient is zero.

Let's return to the number 237 and convert it to a binary number:

decimal:237
radix:2
DecimalRadix  Quotient  Remainder
237/2  118  1 (20)
118/2  59  0 (21)
59/2  29  1 (22)
29/2  14  1 (23)
14/2  7  0 (24)
7/2  3  1 (25)
3/2  1  1 (26)
1/2  0  1 (27)
binary:11101101

Here is 237 converted to a hexadecimal number:

decimal:237
radix:16
DecimalRadix  Quotient  Remainder
237/16  14  D16 (13) (160)
14/16  0  E16 (14) (161)
hexadecimal:ED16

A common notation to represent hexadecimal when dealing with computers and in programming languages themselves, is to prepend an 'x' in front of the number like so: xED.

Here is one more decimal-to-hexadecimal example:

decimal:3,134,243,038
radix:16
DecimalRadix  Quotient  Remainder
3,134,243,038/16  195,890,189  E (14) (160)
195,890,189/16  12,243,136  D (13) (161)
12,243,136/16  765,196  0 ( 0) (162)
765,196/16  47,824  C (12) (163)
47,824/16  2,989  0 ( 0) (164)
2,989/16  186  D (13) (165)
186/16  11  A (10) (166)
11/16  0  B (11) (167)
hexadecimal:xBAD0C0DE

Basex to Decimal

Actually, I have already demonstrated how to convert a number from a base different than ten, into decimal. Once again, here is the complete formula, where cx represents the coefficients at each place-column.

cn·bn + ... + c2·b2 + c1·b1 + c0·b0

As an example, let's convert the binary answer back into decimal:

1·27 + 1·26 + 1·25 + 0·24 + 1·23 + 1·22 + 0·21 + 1·20

1·128 + 1·64 + 1·32 + 0·16 + 1·8 + 1·4 + 0·2 + 1·1

128 + 64 + 32 + 8 + 4 + 1

237

Basex to Basey

Is it possible to convert a number from Basex directly to Basey without converting to decimal (base-10) first?

Yes, however, you will need to perform all of your math operations in either Basex or Basey. The algorithms that I have presented are performed with base-10 since that is the number system most people are familiar with.

Demonstration

Here is a short demonstration program to convert a decimal number into a value of any numeral base between 2-36.

Why between those two ranges?

Try to imagine how a base-1 number system would work with only the symbol {0} to work with. Alternatively, we can combine the numerical set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} with the alphabetic set: {A, B, C ... X, Y, Z} to create a set of 36 symbols.

C++

// The set of possible symbols for representing other bases.
const char symbols[] = {'0', '1', '2', '3', '4', '5',
                        '6', '7', '8', '9', 'A', 'B',
                        'C', 'D', 'E', 'F', 'G', 'H',
                        'I', 'J', 'K', 'L', 'M', 'N',
                        'O', 'P', 'Q', 'R', 'S', 'T',
                        'U', 'V', 'W', 'X', 'Y', 'Z'};

Here is the base-conversion algorithm from above written in C++.

C++

void ConvertToBase( const unsigned long decimal,
                    const unsigned long radix)
{
  unsigned long remainder[32] = {0};
  unsigned long quotient = decimal;
  unsigned char place    = 0;
 
  while (0 != quotient)
  {
    unsigned long value = quotient;
    remainder[place] = value % radix;
    quotient = (value - remainder[place]) / radix;
 
    place++;
  }
 
  cout << decimal << " in base " << radix << " is ";
 
  for (unsigned char index = 1; index <= place; index++)
  {
    cout << symbols[remainder[place - index]];
  }
}

The values are from the examples above. You can modify the values that are used with the function calls to ConvertToBase in the program below:

C++

int main(int argc, char* argv[])
{
  ConvertToBase(237,        2);
  ConvertToBase(237,        10);
  ConvertToBase(237,        16);
  ConvertToBase(3134243038, 16);
  ConvertToBase(3134243038, 36);
 
  return 0;
}
Run this code

Output:

237 in base 2 is 11101101
237 in base 10 is 237
237 in base 16 is ED
3134243038 in base 16 is BAD0C0DE
3134243038 in base 36 is 1FU1PEM

Summary

Ever since you learned to count in decimal you have been using exponents with a base-10; it's just that no one ever made a big deal of this fact. To use a different numeral system, such as binary (base-2) or hexadecimal (base-16), you simply need to determine a set of symbols to represent the values and you can use the same counting rules that you use in base-10, except that you have a different number of symbols. Converting between any number system is possible, however, it is simplest to convert to decimal first if you want to continue to use base-10 arithmetic operations.

Rvalue References Applied

C++, Alchemy, engineering Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs automated data serialization with compile-time reflection. It is written in C++ using template meta-programming.

My previous entry was a condensed overview on rvalue references. I described the differences between value expressions and types. I also summarized as much wisdom as I could collect regarding how to effectively use move semantics and perfect-forwarding. After I completed the essay, I was eager to integrate move semantics for my serialization objects in Alchemy. This entry is a journal of my experience optimizing my library with rvalue references.

Motivation

My initial motivation for learning and writing about rvalue references was because I was finding it difficult to improve the performance of Alchemy with them. In fact, I tended to make the performance of my library much worse that without move semantics.

The first thing this says to me, is that the compilers are doing a pretty damn good job of optimizing my code without the move constructors. The next conclusion was shrouded in mystery.

"Am I using this correctly?"
"... I followed the advice in Effective Modern C++, why is it getting worse?"
"... Maybe I have done so well that all of the copied have been elided by the compiler?!"

After a few fruitless attempts to improve the performance in the evenings, I decided I better dig in and truly understand what is happening before I revisit Alchemy and move semantics. I learned a lot simply trying to write a few compelling programs to demonstrate the concepts.

Now it was time to return back to integrate move semantics into my library.

Measuring Success

I have developed a benchmark application that is included as part of the Alchemy source on GitHub. As part of my initial tuning of Alchemy I was able to achieve a modicum of success in Classic C++ (C++98). If you are unfamiliar with the details of the benchmark application, you can learn more from this previous post: Alchemy: Benchmarks[^].

Here is a summary of the machine and compiler settings displayed in this post:

  • Machine:
    • Intel Core i7-4790L @ 4.00GHz
    • 16 GB RAM
    • Windows 8.1
  • Compiler - 64-bit Visual Studio 2013:
    • Maximize Speed: /O2
    • Enable Intrinsic (Yes): /Oi
    • Favor Speed: /Ot
    • Omit Frame Pointers (Yes): /Oy
    • Whole Program Optimization (Yes): /GL
  • Data set size: 512 MB

The numbers below, reflect what I started with when began to add the move operations to Hg.

Test Control Hg Diff Percent
Basic: 0.4133 s 0.3193 s -0.0940 s 22.74%
Packed: 0.3959 s 0.3519 s -0.0440 s 11.12%
Unaligned: 0.4391 s 0.4425 s 0.0034 s -0.773%
Complex: 0.7485 s 0.7654 s 0.0169 s -2.261%
Array: 0.5141 s 0.1409 s -0.3732 s 72.59%
Total: 2.511 s 2.0574 s -0.3732 s 18.06%

The thing that was most challenging to improve up to this point was the speed of the Complex test. This test uses a nested message structure, that contains an array and all of the message structures from the other tests. I found many temporary copies, which I eliminated to reach this point. However, the relatively low performance number of this test compared to the others indicated that I most likely had other temporary copies the were lurking within the code.

What did I discover?

The very first thing that I did when I returned to Alchemy, was search every class for a destructor, copy constructor, or assignment operator. The new rules the compiler uses to automatically generate special member-functions will halt the use of move semantics dead in its tracks if you are not paying attention.

There are very few destructors in Alchemy. The majority of the data is transient, and actually managed by the caller. There is the Hg::MsgBuffer, which uses a std::vector to store serialized message data. This class has a destructor. All of the others manage a single field of data that is composited within the class.

On the other hand, Alchemy has plenty of copy constructors and assignment operators. Alchemy provides value semantics for all of the message sub-fields. These fields behave much like the fundamental types that they encapsulate. To provide a natural syntax, there are generally two types of assignment operators in each Datum and Proxy class. The first is the assignment operator for the object, and the second accepts the value_type the object represents.

I discovered places where I missed supplying a move constructor for all of the value types that had some form of copy operation. I also found a few places where I supplied copy and move constructors for sub-types of a class, but not for the class itself.

C++

//  ******************************************************
/// Copy Constructor
Message(const message_type&amp; rhs)
{
  *static_cast< message_type *>(this) = rhs;
}
 
//  ******************************************************
/// Move Constructor
Message(message_type&amp;&amp; rhs)
  : base_type(std::move(rhs))
{ }
 
//  ******************************************************
//  Discovered that I was missing these versions:
//  Message(const Message&amp; rhs)
//  Message(Message&amp;&amp; rhs)
//

Results

These changes had very little effect on the results. Setting break points within my code showed that the move constructors were now being called. So why didn't the results change? I kept searching.

FieldTypes

I started to scrutinize the inheritance hierarchy of the Hg::Message class and the Hg::Datum types that it contained. I verified that I was calling the proper base class operations and moving the rvalue expressions into these function calls.

Then I reached the end of the inheritance hierarchy, which existed a very simple class, FieldTypes. FieldTypes can be found in ./Alchemy/Hg/Datum/basic_datum.h.

This class provides a generic structure to provide a common way to hold and access the actual data storage created for each message datum. The TraitT type allows for tag-dispatching for special cases such as a std::vector or a nested message type.

C++

template< typename FieldT,
          typename TraitT =
            typename Hg::detail::deduce_type_trait<FieldT>::type
        >
struct FieldTypes
{
  using index_type = FieldT;
  using value_type = typename field_data_t<index_type>::value_type;
 
  value_type& reference()
  {
    return m_data;
  }
 
  const value_type& data() const
  {
    return m_data;
  }
 
protected:
  value_type            m_data;        
};

I took a look at the specialization that I had created for the nested message type. Here is the declaration of that class:

C++

template< typename FieldT >
struct FieldTypes <FieldT, nested_trait>
  : public field_data_t<FieldT>::value_type
{
  using index_type = FieldT;
  using value_type = typename field_data_t<index_type>::value_type;
 
// ...
};

There is one interesting line that caught my attention, and helped me narrow down the cause of the issue:

C++

: public field_data_t<FieldT>::value_type

The reason this is interesting, is because I was searching for how the storage was represented and accessed in a nested type. In this case, rather than containing a member data field, the data is provided by derivation. For a nested type, the base class is the Alchemy format definition. This class then contains a ProxyDatum, which derives from a Datum, which derives from a FieldType and brings us back to the level we are currently inspecting.

It's not the nested type after all...

After looking at this, it occurred to me that the default generated move operations were clearly being used by the compiler. I have not added any code that would prevent this in the message definitions and nested fields. However, that did not prevent entire sets of fields from being moved and copied.

I had created plenty of value constructors and assignment operators to accept value types, but I had not made any attempt to optimize the movement of these fields within the most basic structures that managed these data fields. So I added copy and move constructor implementations to the FieldTypes class to allow every possible conversion to be considered when moving the fundamental, and container data types.

This brought me my first big gain in performance.

... or maybe it is the nested type?

Unfortunately, there still seemed to be a major problem with the nested types.

I followed the execution path of the benchmark program in the debugger. I was looking for any clues where a copy was being performed rather than a move. Then I located the next big gain in performance.

In the byte-order processing code, I discovered a difference in processing for the nested field-types compared to all of the other types of data. The implementation constructs a Hg::Message object from that raw Alchemy message format that is used to represent the nested field type. This is repackaged in order to recursively use the same processing logic that converts the other field-types performed by convert_byte_order.

C++

struct ConvertEndianess<T, StorageT, nested_trait>
{
  template <typename NestedValueT>
  void operator()(const NestedValueT &input,
                        NestedValueT &output)
  {
    // Construct a shallow message wrapper around the nested data.
    from_type  from(input);
    to_type    to;
 
    // Pass this message to be byte-order swapped.
    output = convert_byte_order<from_type,
                                to_order>(from, to).values();
  }
};

You can see that I pass the output field as an input parameter to this function call. In order to improve the speed of this function I needed to figure out how to more efficiently construct this temporary message, or alter the processing logic to accept this nested field-type.

Moving local resources... don't do it

This was my first attempt to improve the performance of this temporary instance of the message looked like this:

C++

template <typename NestedValueT>
void operator()(const NestedValueT &input,
                      NestedValueT &output)
{
  // Pass this message to be byte-order swapped.
  output = convert_byte_order<from_type,
                              to_order>(from_type(input), to_type()).values();
}

That helped, but not like I had hoped. Then I made this adjustment:

C++

output = std::move(convert_byte_order(/*params*/));

That was what I needed!

Then I ran the unit-tests, and a number of them failed. The ones that contained a vector actually crashed. The output of byte-order conversion was showing me that the data was being destroyed. I followed the logic in the debugger and discovered my problem.

I am creating the instance of to_type locally. This is passed into the conversion function and all is well. Then the data is moved into output and what appears is garbage. I was confused at first. Then I watched the addresses of all of the data items.

to_type is being created on the stack, and that portion of the stack is destroyed before it has a chance to be moved to output. I tried many different variations of the this approach. My final conclusion is that I would not be able to achieve what I was after without moving the creation of the to_type object outside of the function call.

However, I could not do that because that would change the usage of the library. I want to keep the interaction as simple as possible for the user. Therefore, I reworked the structure of the code just a little and this is the final implementation:

C++

template <typename NestedValueT>
void operator()(const NestedValueT &input,
                      NestedValueT &output)
{
    // Pass this message to be byte-order swapped.
    to_type to(output);
    convert_byte_order<from_type, to_order>(from_type(input),
                                            to);
    output = std::move(to);
}

convert_byte_order no longer returns a value. That is because the value it would return is the to_type object that is created, and we still have access to that instance because it is passed-by-reference. Therefore, when I am done with it, it is moved into the output parameter, which also is passed-by-reference into the current function.

Final Results

Here are where the current performance results stand based on the adjustments that I made to optimize with move semantics:

Test Control Hg Diff Percent
Basic: 0.4133 s 0.2423 s -0.1710 s 41.37%
Packed: 0.3959 s 0.3403 s -0.0556 s 14.05%
Unaligned: 0.4391 s 0.2295 s -0.2096 s 47.74%
Complex: 0.7485 s 0.5573 s -0.1912 s 25.54%
Array: 0.5141 s 0.1376 s -0.3765 s 73.23%
Total: 2.5109 s 1.5071 s -1.0039 s 39.98%

Here is a table that compares the final times for each test with the previous implementation of Alchemy, as well as the results after adding move semantics. I think you will agree that the small amount of effort required to add move semantics is well worth spending.

Test Alchemy w/ Move
Semantics
Diff Overall
Change
Basic: 0.3193 s 0.2423 s -0.0770 s 24%
Packed: 0.3519 s 0.3403 s -0.0116 s 3%
Unaligned: 0.4425 s 0.2295 s -0.2130 s 48%
Complex: 0.7654 s 0.5573 s -0.2081 s 27%
Array: 0.1409 s 0.1376 s -0.0033 s 2%
Total: 2.0574 s 1.5071 s -0.5503 s 27%

Summary

So far I have learned that move semantics is a somewhat finicky feature that is new with Modern C++. However, the rules are not difficult to learn. After an afternoon's worth of effort applying the principles of rvalue references to Alchemy I was able to realize a 25% increase in performance. This was time well spent.

C++: Rvalue References

CodeProject, C++ 2 feedbacks »

Rvalue references were introduced with C++11, and they are used to implement move semantics and perfect-forwarding. Both of these techniques are ways to eliminate copies of data parameters for efficiency. There is much confusion around this new feature that uses the && operator, because its meaning is often based on the context it is used. It is important to understand the subtleties around rvalue references in order for them to be effective. This entry will teach you how to use the rvalue reference with plenty of live-demonstrations.

Move it!

When I first learned of move semantics, I expected that this feature would be more or less automatic, much like the copy constructor. As it turns out, there are common programming practices that will actually hinder the compiler's ability to generate and use move operations. The concept of move semantics and perfect-forwarding are very simple. However, without understanding a few of the nuances of rvalue references, these idioms will seem fickle when you try to put them to use.

It is important to have a basic understanding of the fundamental components of C++ that have shaped how this new feature was added to the language, and why the explicit steps are required. Therefore, let's start with some background information and vocabulary, then work our way to the main topic.

Lvalue and Rvalue

Syntax expressions are evaluated and assigned both a type and a value category. We are concerned with the differences between the different value categories as we try to understand rvalue references. Specifically we are interested in the lvalue and rvalue categories.These terms are derived from the arguments on each side of the assignment operator. 'L' for left, to which values are assigned, and 'R' for right that contains the value to be assigned. However, this is only a simplification of their definition.

Another way to look at these terms is how they manifest in the final program. Lvalues are expressions that identify non-temporary objects. Essentially, they have addressable storage for loading and storing data. An rvalue is an expression that refers to a temporary object, or a value that is not associated with any object.

An lvalue is not necessarily modifiable. A good example is a constant expression qualified with the const keyword. After its initialization, the expression has storage that can be addressed, but the value cannot be modified. Therefore, lvalues are further distinguished by modifiable lvalues and non-modifiable lvalues.

Here is a list of items that are lvalue expressions:

  • Non-modifiable:
    • String literals
    • Constant expressions
  • Modifiable:
    • The name of a variable
    • Function calls that return lvalue references
    • Pre-increment and pre-decrement operators
    • Dereference and assignments
    • Expressions cast to lvalue reference type

Here is a list of items that are rvalue expressions:

  • Literal values: true, 27ul, 3.14 (except string literals)
  • Function call expressions that do not return a reference
  • Expressions composed from arithmetic, relational, logical and bit-wise operators
  • The post-fix increment and decrement operators
  • Cast expression to any type other than a reference type
  • Lambda expressions

Does it have a name?

There is a simple way that can help you determine if you are dealing with an lvalue or an rvalue.

Can you refer to the expression by name?

A value that can be referenced by name is an lvalue. This is not an absolute, but it is a good rule of thumb to help you generally reason about your data values. An example of an exception is a member-function. Also, this does not cover all expressions that are considered lvalues. Examples of lvalue expressions that do not have names are string literals and function call expressions that return an lvalue reference.

xvalues, prvalues, glvalues...

In the cursory overview of expression values, I have left out the description of some of the exceptions to the rules and sub-categories for lvalues and rvalues. These other categories that capture the remaining situations. However, going even deeper into the nuances digresses from the original topic, and will only add more confusion. Therefore I will simply leave you with the knowledge that these other categories exist, and a reference of where you can learn more about them. Value categories at cppreference.com[^]

& (lvalue reference)

An lvalue reference is what we generally call a reference. It is also important to note that it is a type. This is in contrast to value categories, which I described in the previous section. Here is a brief review of the concepts associated with references:

  • A reference is an alias to an object or a function that already exists
  • A reference must be initialized when it is defined
  • It cannot be re-seated (reassigned) after it is created
  • It is not legal to create arrays, pointers or references to references (except with templates)

The most common use for an lvalue reference is to pass parameters by-reference in function calls.

C++

void LogMessage(std::string const &msg)
{
  // msg is an alias for the input parameter at the call site.
  // Therefore, a copy of the string is avoided.
}

I prefer to use references over pointers, except when there is a possibility to receive an empty pointer. The logic becomes much simpler when writing safe production-quality code. The need to verify pointer input parameters is eliminated. In some cases, after I verify a pointer parameter, I will dereference it and assign it to a reference. A similar situation is when I perform some type of cast on a pointer I usually dereference and assign it to a reference of the new type.

C++

// I know what you're thinking...
// I interface with a lot of C and legacy C++
int process_state(
  const SystemInputs *p_inputs,
  void* p_context
)
{
  if ( !p_inputs
    || !p_context)
  {
    return k_error_invalid_parameter;
  }
 
  SystemInputs& input = *p_inputs;
  SystemState&  state = *reinterpret_cast< SystemState* >(p_context);
 
  // ...
}

If a function returns an lvalue reference, then the function call expression will be an lvalue expression. This is use of references is used to implement the at and operator[] member-functions of std:: vector.

C++

// Where reference is an alias for T&
reference operator[]( size_type pos )
{
  // Return the requested element
  // from the heap-allocated data array
  return p_data[pos];
}

Dangling References

Although references do make code easier to work and reason with, they are not perfect. Similar to a pointer, the possibility still exists for the object that was used to initialize a reference is destroyed before the reference is destroyed. This leaves you with a dangling reference, which leaves your code executing in the unspecified behavior territory.

The stack is one of the safest places to create a reference. That is with the assumption that the new reference will go out of scope before or at the same time as the object used to initialize the reference.

This is the reason why you do not return a reference from a function call, in which you return a locally created variable. Either your object was created on the stack and will be destroyed after the return statement is evaluated, or your object was dynamically allocated, which you would have no way to free the memory when you were done.

C++

std::string&amp; FormatError(int errCode)
{
  std::string errorText;
  // Populate the string with the proper error message.
 
  return errorText;
  // errorText is now destroyed.
  // The caller receives a dangling reference.
}

&& (rvalue reference)

Prior to C++11, it was not possible to declare an rvalue as a reference. The only place it was legal to declare a reference was with an lvalue expression. C++11 introduces the && operator, which now allows references to be defined for rvalue expressions. An rvalue reference is a type.

Remember that one type of rvalue is an expression that refers to a temporary object. An rvalue reference is used to extend the lifetime of a temporary object. The most compelling place to apply rvalue references are with object construction and assignment. This allows compilers to replace expensive copy operations with less expensive moves. The formal name given to this feature is move semantics. Another exciting use is applied to template function parameters, in which the technique known as perfect-forwarding is used.

In overload resolution for function calls, the rvalue reference type is given precedence of lvalue reference.

Move Semantics

Allows you to control the semantics of moving your user-defined types. It is actually possible to accomplish this with classic C++. However, you would have to forego the copy constructor. With the rvalue reference, it is now possible to provide both a move constructor and a copy constructor within the same object.

Perfect-Forwarding

Makes it possible to create function templates that are able to pass their arguments to other functions in a way that allows the target function to receive the exact same objects.

[Intermission]

I presented that long and detail-oriented introduction up front so you would have context with most of the details to understand why this movement isn't always automatic. Also, hopefully I have presented the details in a memorable order to help you remember the proper actions required for each situation. We will continue to introduce details gradually, and I will summarize with a set of rules to lead you in a successful direction.

Reference Collapsing

Reference collapsing is part of the type deduction rules used for function templates. The rules are applied based upon the context of the function call. The type of argument passed to the specific instantiation is considered when determining the type for the final function call. This is necessary to protect against unintentional errors from occurring where lvalues and rvalues are concerned.

I mentioned earlier in the section regarding references that it was not legal to create a reference to a reference, with the exception of templates. It's time to demonstrate what I mean:

C++

int   value      = 0;     // OK: Fundamental type
int&  ref        = value; // OK: Reference to type
int& &ref_to_ref = ref;   // Error: Reference to reference not allowed
 
// Now we have rvalue references
int&& rvalue_bad = ref;   // Error: Rvalue reference cannot bind to lvalue
                          // Remember, if it has a name, it is an lvalue
int&& rvalue_ref = 100;   // OK: A literal value is an rvalue

Templates follow a set of type-deduction rules to determine what type should be assigned to each parameterized value of the template. Scott Meyers provides a very thorough description of these rules in Item 1 of "Effective Modern C++". Suffice to say, the important rules to note are:

  • If an argument is a reference, the reference is not considered during type deduction
  • Lvalue arguments are given special consideration in certain circumstances (this is where reference collapsing applies)

The rules of reference collapsing

The rules are actually very simple. The rules have the same output of the AND truth table; where an lvalue reference, &, is 0 and an rvalue reference, &&, is 1. I think it is subtly fitting, given the other meaning of the && operator. This should make it easier to remember the rules as well.

  Reference Collapsing Truth Table
  Truth Table: Reference Collapsing Rules - & := 0, && := 1

New Rules for compiler generated functions

Hopefully you are well aware that the compiler may generate four special member-functions for a class as needed with classic C++. If not, it's never too late to learn. The four functions are:

  • Default Constructor (If no other constructor has been defined)
  • Destructor
  • Copy Constructor
  • (Copy) Assignment Operator

Two additional functions have been added to this set to properly manage the new concept of move operations.

  • Move Constructor
  • Move Assignment Operator

The default behavior of a generated move function is similar to the copy-based counterparts. A move operation for each member of the object. However, the compiler is much more conservative about automatically choosing to generate these new functions when compared to the others. The primary reason is the notion that if the default move behavior is not sufficient that you elect to implement your own, then the default copy behavior most likely will not be sufficient either. Therefore it will not automatically generate the copy-based functions when you implement either of the move functions.

Furthermore, if you implement only one of the move operations, it will not automatically implement the other move operation for the same logic. In fact, no compiler generated move operations will be created if the user-defined type has implemented its own destructor or copy operation. When move operations are not defined, the copy operations will be used instead.

If you are in the habit or even feel the compulsion to alwaysdefine a destructor, even if it is an empty destructor, you may want to try to change that behavior. There is now actually a better alternative. Similar to how you can delete the compiler generated defaults, you can also explicitly specify that you would like to use the defaults. The syntax is the same as delete, except you use the default keyword.

C++

class UserType
{
public:
// This declaration will not preclude
// a user-type from receiving a
// compiler-generated move constructor.
~UserType() = default;
};

Specifying default, will also allow you to continue to use the compilers copy operations even when you implement your own move operations. If you would like to read a full account of the rules and reasoning for changes, refer to Item 17 in "Effective Modern C++".

std::move("it!");

std::move is a new function has been added to the Standard Library and it can be found in the <utility> header. std::move does not add any actual executable code to your program because it is implemented as a single cast operation. Yet this function is very important because it serves two purposes:

  1. Explicitly communicates your intentions to move an object
  2. Provides hint (enables actually) the compiler to apply move semantics

Here is the implementation of std::move:

C++

template< class T >
typename std::remove_reference<T>::type&& move(T&& t)
{
  return static_cast<typename std::remove_reference<T>::type&&>(t);
}

This function is a convenient wrapper around a cast that will unconditionally convert rvalue references to rvalue expressions when passing them to other functions. This makes them capable of participating in move operations. std::move is the explicit nudge that you supply to the compiler when you want to perform a move assignment rather than a copy assignment.

It is necessary to use std::move inside of a move constructor, because all of your values in the rhs object that you will move from are lvalues. As I mentioned, std::move unconditionally converts these lvalues into rvalue references. This is the only way the compiler would be able to differentiate between a move assignment and a copy assignment in this context.

The only operations that are valid to perform on an argument that has been supplied to std::move, is a call to its destructor, or to assign a new value to it. Therefore, it is best to only use std::move on the last use of its input for the current scope.

Extremely Important!

If your class is a derived class, and implements a move operation, it is very important that you use std::move on the parameters that you pass to the base class. Otherwise the copy operations will be called in the base implementation.

Why?

Because the input parameters to your move operation are lvalue expressions. Maybe you are objecting with "no they're not! They are rvalue references!" The parameters are rvalue references, however, your arguments have been given a name for you to refer to. That makes them lvalue expressions, which refer to rvalue references.

The bottom line is that calling a base class implementation requires the same attention that is required for all other move operations in this context. Just because you happen to be in a move operation for your derived class, does not mean that the compiler can tell that it needs to call the same move operation for the base class. In fact, you may not want it to call the move operation. This now allows you to choose which version is called.

Move operation implementations

We now have enough knowledge to be able to constructively apply the principles of move semantics. Let's apply them to implement a move constructor for an object.

The functions below implement the copy operations for a class called ComplexData that is derived from a base class called BasicData.

Derived Move Constructor

C++

ComplexData(ComplexData&& rhs)
  : BasicData(std::move(rhs))
{
  // Move operations on ComplexData data members
  complex_info = std::move(rhs.complex_info);
}

Derived Move Assignment Operator

C++

ComplexData& operator=(ComplexData&& rhs)
{
  BasicData::operator=(std::move(rhs));
 
  // Move operations on ComplexData data members
  complex_info = std::move(rhs.complex_info);
 
  return *this;
}

Observe move operations

Number

The class used in this demonstration is called Number. It implements each of the special member-functions of the class. It also provides a way to set and get the value of the object. This example lets you observe when move operations are performed versus copy operations.

The implementation is very simple only holding a single data member int m_value. I do not use the call to std::move inside of the move operations because it is not necessary. Similarly, if we had allocated pointers and were moving them between two objects, we would copy the pointer to the destination class, and set the source class pointer to nullptr. I will set the number to -1 in this version to differentiate an invalid state from 0.

Number move assignment operator

C++

Number& operator=(Number&& rhs)
{
  cout << "Move  Assignment Operator\n";
  m_value = rhs.m_value;
  rhs.m_value = -1;
  return *this;
}

Move Example

C++

int main(int argc, char* argv[])
{
  std::cout << "Construct three Numbers:\n";
  Number value(100);
  Number copied(value);
  Number moved(std::move(value));
 
  std::cout << "\nvalue:  " << value
            << "\ncopied: " << copied
            << "\nmoved:  " << moved;
  std::cout << "\n\nCopy and move:\n";
  value  = 202;
  moved = std::move(value);
  copied = value;
  std::cout << "\nvalue:  " << value
            << "\nmoved:  " << moved
            << "\ncopied: " << copied;
}
Run this code

Output:

Construct three Numbers:
Value Constructor
Copy  Constructor
Move  Constructor

value:  -1
copied: 100
moved:  100

Copy and move:
Value Assignment Operator
Move  Assignment Operator
Copy  Assignment Operator

value:  -1
moved:  202
copied: -1

std::forward<perfect>("it!");

As the title of this section implies, there is a new function in the Standard Library called std::forward. However, it is important to understand why it exists, because it is designed to be used in a special situation. The situation is when you have a type-deduced function argument that you would like to move, also called forward, as the return value, or to another sub-routine.

In classic C++, the way to move function arguments through function calls is by using call-by-reference. This works, but it is inconvenient because you must make a decision on trade-offs. You either choose to give up flexibility on the type of arguments that can be used to call your function, or you must provide overloads to expand the range of argument types that can be used with your function call.

Can be called by lvalues, however, rvalues are excluded:

C++

template< typename T >
T& process(T& param);
 
int val = 5;
process(val);       // OK: lvalue
process(10);        // Error: Initial value to ref of
                    //        non-const must be lvalue
process(val + val); // Error: Initial value to ref of
                    //        non-const must be lvalue

Now supports rvalues, however, move semantics are no longer possible:

C++

template< typename T >
T& process(T const& param);
 
int val = 5;
process(val);       // OK: lvalue
process(10);        // OK: Temporary object is constructed
process(val + val); // OK: Temporary object is constructed
 
                    // However, none of these instances
                    // can participate in move operations.

Furthermore, if the two solutions above are combined with overloads and the function in question contains multiple arguments the number of overloads required to capture all of the possible states of arguments causes an exponential explosion of overloads which is not scalable. With the addition of the rvalue reference in Modern C++, it is possible to compact this solution back into a single function, and rvalues will remain as viable candidates for move operations.

Where does std::forward apply in this situation?

Since lvalue expressions are already capable of passing through function calls in this situation, we actually want to avoid applying move semantics on these arguments because we could cause unexpected side-effects, such as moving local parameters.

But, in order to make rvalues capable of using the move operations, we need to indicate this to the compiler with something like std::move. std::forward provides a conditional cast to an rvalue reference, only to rvalue types. Once again the rules of reference collapsing are used to build this construct, except in a slightly different way.

Implementation of std::forward:

C++

template< class T >
T&& forward( typename std::remove_reference<T>::type& t )
{
  return static_cast<T&&>(t);
}

What is a forwarding/universal reference?

This is a special instance of an rvalue reference that is an instantiation of a function template parameter. Forwarding reference is the name that I have read in some standards proposal documents, and universal reference is the name that Scott Meyers used first in "Effective Modern C++".

It is important to identify this type when the situation occurs. Because the type deduction rules for this particular reference allows the type to become an lvalue reference, if an lvalue expression was used to initialize the template parameter. Remember the important type-deduction rules I pointed out above? References are not usually considered as part of type-deduction. This is the one exception.

If an rvalue expression is used to initialize the template parameter, then the type becomes an rvalue reference, which will make it qualify for move operations. Therefore, std::forward should be used to inform the compiler that you want a move operation to be performed on this type if an rvalue was used to initialize the parameter.

Observe forward vs. move

This next program allows you to observe the side-effects that could occur by using std::move when std::forward is most likely what was intended. This program is adapted from Item 25 in "Effective Modern C++". I have expanded the program to provide two methods to set the name of the test class, Item.

Item class

The class has a single value called name. name is set equal to "no name" in the default constructor. The name can be set in two different ways:

  1. Item::fwd_name:

    Sets the name value with an input rvalue reference string, which is forwarded to the internal storage of name in the class.

    C++

    template< typename T >
      void Item::fwd_name(T&& n)
      {
        m_name = std::forward<T>(n);
      }
  2. Item::move_name:

    Sets the name value with an input rvalue reference string, which is moved to the internal storage of name in the class.

    C++

    template< typename T >
      void Item::move_name(T&& n)
      {
        m_name = std::move(n);
      }

Notice that both of these functions are function templates. This provides the differentiating factor that makes std::forward necessary, moving a type-deduced argument.

Forward Example

C++

int main(int argc, char* argv[])
{
  std::cout << "Forward 'only' moves rvalues:\n\n";
  string fwd("Forward Text");
 
  Item fwd_item;
  fwd_item.fwd_name(fwd);
  cout << "fwd_name:   " << fwd_item.name() << "\n";
  cout << "fwd(local): " << fwd << "\n";
 
  std::cout &lt;&lt; "\nMove 'always' moves:\n\n";
  string mv("Move Text");
 
  Item move_item;
  move_item.move_name(mv);
  cout << "move_name:  " << move_item.name() << "\n";
  cout << "mv(local):  " << mv << "\n";
 
  return 0;
}
Run this code

Output:

Forward 'only' moves rvalues:

fwd_name:   Forward Text
fwd(local): Forward Text

Move 'always' moves:

move_name:  Move Text
mv(local):  no name

Move semantics and exceptions

A new keyword has been added to Modern C++ regarding exception handling, the keyword is noexcept. noexcept can use this information provided by the programmer to enable certain optimizations for non-throwing functions. One potential optimization is to not generate stack unwinding logic for the functions that specify noexcept.

It is possible to test if a function specifies noexcept by using the noexcept operator. This may be necessary if you can conditionally provide an optimization, but only if the functions you want to call are specified as non-throwing. The move operations in std::vector and std::swap are two good examples of functions that are conditionally optimized based on a non-throwing specification.

Finally, there is a function called std::move_if_noexcept that will conditionally obtain an rvalue reference if the move constructor of the type to be moved does not throw. Refer to Item 14 in "Effective Modern C++" for a thorough description of noexcept.

What you need to know

This is the section that I promised at the beginning of the essay. The most important concepts that you need to take away and know. Not remember, know, if you are going to employ move semantics and perfect-forwarding when you practice Modern C++.

  • Remember the difference between a value category and a type:

    Lvalue and rvalue are expressions that identify certain categories of values. Lvalue reference and rvalue reference are both types; because the names are so similar, it is easy to confuse the two.

  • Rvalues are the only expression types valid for move operations:

    std::move and std::forward explicitly attempt to convert arguments to rvalue references. This is performed by using the rvalue reference (type) in an rvalue expression, which is eligible for move operations.

  • If it has a name, then it is an lvalue (expression):

    This is true even if the type of the lvalue is an rvalue reference. Refer to the previous guideline for the ramifications.

  • Use std::move to explicitly request a move operation:

    std::move unconditionally makes the argument eligible for a move operation. I say request, because a move is not always possible, and the compiler may still elect to use a copy operation.

    After you use a value as an input to std::move, it is valid to only call an arguments destructor, or assign a new value to it.

  • Use std::forward to move type-deduced arguments in templates:

    std::forward conditionally casts its argument to an rvalue reference and is only required to be used in this context. This is important because an lvalue that is inadvertently moved could have unintended and unsafe side-effects, such as moving local values.

  • There are additional rules for the special compiler generated functions for a class:

    The move constructor and move assignment operator now can be generated automatically by the compiler if your class does not implement them. However, there are even stricter rules that dictate when these functions can be generated. Primarily, if you implement or delete any of the following functions, the compiler will not generate any unimplemented move operations:

    • Destructor
    • Copy Constructor
    • Copy Assignment Operator
    • Move Constructor
    • Move Assignment Operator
  • Strive to create exception free move operations and specify noexcept for them:

    Your move operations are more likely to be selected by the compiler if you can provide an exception free move operation. noexcept tells the compiler your function will does not require exceptions. Therefore, it can optimize the generated code further to not worry about maintaining code to unwind the stack.

Summary

Rvalue references were added to Modern C++ to help solve the problem of eliminating unnecessary temporary copies of expensive objects when possible. The addition of this value category for expressions has made two new idioms possible with the language.

Many restrictions had to be put in place when the compiler could safely perform the operations automatically. Moreover, it is important to use the functions added to the Standard Library because they express intent and provide the extra hints the compiler needs to utilize these operations safely.

Here is one last piece of advice as you look for locations to use move semantics. Do not try to outwit the compiler because it is already capable of some amazing optimizations such as the Return Value Optimization (RVO) for local objects returned by value. Practice the knowledge in the previous section and your C++ programs will keep that svelte contour envied by all other languages.


References

Effective Modern C++ by Scott Meyers (O'Reilly). Copyright 2015 Scott Meyers, 978-1-491-90399-1

C++ Rvalue References Explained by Thomas Becker, http://thbecker.net/articles/rvalue_references/section_01.html, March 2013.

A Brief Introduction to Rvalue References by Howard E. Hinnant, Bjarne Stroupstrup, and Bronek Kozicki, http://www.artima.com/cppsource/rvalue.html, March 10, 2008.

Value Categories at CppReference.com, http://en.cppreference.com/w/cpp/language/value_category, June 2015

Why Does a CS Degree Require So Much Math?

general, CodeProject, engineering Send feedback »

Over the years I have heard this question or criticism many times:

Why is so much math required for a computer science degree?

I never questioned the amount of math that was required to earn my degree. I enjoy learning, especially math and science. Although, a few of the classes felt like punishment. I remember the latter part of the semester in Probability was especially difficult at the time. Possibly because I was challenged with a new way of thinking that is required for these problems, which can be counter-intuitive.

What is computer science?

Let's start with computer science, and work our way towards math and how the two are interconnected. A definition of what is considered computer science may be helpful before we discuss its foundation in math. Here is an abridged definition from Wikipedia[^]

Computer science is the scientific and practical approach to computation and its applications. It is the systematic study of the feasibility, structure, expression, and mechanization of the methodical procedures...

A computer scientist specializes in the theory of computation and the design of computational systems.

I have had many discussions over the years with students and professionals educated from all different realms. One common sentiment that I hear from all of these groups is this:

Theory is nice, but I need to have relevant skills to get a job

This is a very valid point. It is also true that the majority of college graduates do not continue their career focused in the world of academics. Therefore, we can assume that most people that attend college do so with the intent to join the general workforce, ie. get a job. Let's evaluate

The role of academia

Much of what I have learned about the academic realm is that it has a fundamental focus on expanding the limits of our collective knowledge. Here is a great explanation of the purpose of a Ph.D. by Professor Matt Might at the University of Utah, The illustrated guide to a Ph.D.[^] As these new discoveries are better understood, they get further developed. Some discoveries may continue to live and exist solely in the academic realm. While other theories are applied and practical uses are developed from these discoveries.

Application of theory

The concepts taught in math, physics and engineering are based on theory. A theory itself is a model or a collection of ideas that is intended to explain something. The theories taught in these disciplines have been developed and refined over time. It is relatively simple to observe the application the theories that are required for the engineering disciplines based upon physics.

Some of our engineering disciplines start to push the boundaries of what is possible for us to observe, such as electrical engineering and nuclear sciences. However, we are still able to verify these hypothesis through repeatable experiments and observations, even if we can only observe the indirect results of these experiments.

Math and computer science are purely theoretical. They are abstract mental models that describe the relationships and behaviors of the interactions between these models. The application of these theories is a result of mapping these conceptual models into a context that can be represented and observed in physical form.

Yeah, yeah that's great (in theory), but...

I need skills to get a job

Trade schools are designed to help students learn the skills of a trade needed to get a job in that profession. The curriculum of trade schools is focused on teaching students the programming languages that are in highest demand, used to solve the most common problems that occur today.

This typically means that you would study a semester learning and solving problems with JAVA. Then move on an use a scripting language such as Python for a semester. Interfacing with relational databases is also likely to be in the curriculum as it is a very marketable skill.

As for college degrees in computer science; yes, you will learn a variety of computer languages that may help you get a job. However, these languages are used as the basis to teach the principles and give you the opportunity to apply these theories. The end goal is to teach critical thinking, to help students of computer science learn to reason, experiment, observe and understand the theories.

I believe that what you get out of from your education, depends on what type of effort that you put into your education. This is true for either path. The difference between the two is typically where the instructors focus your attention. Neither path guarantees a job at the end. However, if you do not have what it takes at a university computer science program, they will flunk you out of the program. The for profit trade-schools are bit more forgiving.

You can program computers with minimal math skills

This is a true statement.

In many cases, all that is needed to be able to program a computer is a computer, and the programming tools to interface with the computer. With basic arithmetic skills and knowledge of a computer language, you will be able to command that computer.

The same thing goes for many other skills, amateur auto mechanic, home plumbing, renovations, do-it-yourself haircuts. Some of the worlds most successful people dropped out of school and started a tech company. For example, John Carmack, who dropped out of college, co-founded Id Software and went on to pioneer some of the game industries ground-breaking graphics technologies; when he needed to know the math, he taught to himself.

I don't have any numbers, but I would imagine that most computer science graduates go on to become software developers of some sort. Many of the software engineers that I work with graduated college with an Electrical Engineering or Mechanical Engineering degree. Two guys I know graduated with law degrees, not a degrees in engineering.

Where does math factor in?

When math is mentioned, most people generally think of topics that require numbers and are focused on calculating values of some sort. The one notable exception is Geometry. Here is a small sampling of those types of topics:

  • Arithmetic
  • Geometry
  • Algebra
  • Calculus
  • Linear Algebra
  • Fourier Transforms
  • Ordinary Differential Equations
  • Partial Differential Equations
  • etc.

However, it is important to recognize that mathematics deals with more than just numbers. Mathematics spans a wide range topics:

  • Logic
  • Algorithms
  • Fundamental theorems
  • Mathematic proofs and models
  • Discrete math
  • Theory of computation
  • Information theory
  • Combinatorics
  • Set theory
  • Graph theory
  • Abstract algebra
  • etc.

Math is a foundation, which is built upon


“Mathematics is the gate and key to the sciences.”
     – Roger Bacon

For those of you that have taken Analytic Geometry, Calculus or even Trigonometry, consider how the material is taught. They start with a mathematical foundation and proofs are demonstrated that expand on what you should know going into the class. Each step is then used to derive the next step in the curriculum.

At the end of the course, many of the interim steps are forgotten or rarely used. However, they were a very necessary stepping stone to make a leap to the remainder of the material.

An example that comes to mind appears in Calculus, and is the concept of a limit of an equation. The concept of finding the limits of equations is then used to create derivatives and integration.

You could simply memorize the formulas to plug in your own numbers, and you would be able to use the equations. However, when you run into a problem that doesn't fit neatly within the formulas that you have learned, you would be stuck. The critical thinking skills that developed as these math courses progress give you the tools and ability to transform problems into a simpler form that does fit the mold of solvable equations.

This is the same structure that is used in computer science related courses. Data structures and algorithms that have been studied for decades have been optimized and we are taught how we reached this point. Again, it is possible to use an algorithm called sort, to organize your data and you do not need to understand how it works.

However, if you find that your sort routine runs much slower as the data-sets that you work with become much larger, you will need to know how to analyze your algorithm to be able to build something better. With a solid foundation you will understand how to compare the trade-offs and build something based on previous proven work.

Without this foundation, you may still be able to improve upon your original algorithm. However, you will only be guided by previous experience. Making that inspirational leap to a radically different, yet far more efficient design is not always easy.

Critical Thinking

Critical thinking is essential in professional fields as well as academia. Well, essential to maximize success. Critical thinking is more than analyzing and reasoning well thought-out decisions. It involves evaluating, conceptualizing, applying, observing, and communicating your conclusions.

Along the way, you also need to become self-critical looking for any biases that could add prejudice to your conclusions. Reflection and scientific skepticism help guide you towards accurate observations and well-justified conclusions.

It may seem as though we have meandered away from computer science into philosophy or scientific theory. We may now be discussing concepts that are central to these other topics. However, we have not wandered away from computer science. These skills and concepts are just as important to computer science as they are with every other engineering discipline.

Programming != Computer Science

It is confirmed then, you do not need advanced math to be able to program a computer, nor do you need a degree in computer science.

Hopefully, it is also obvious that there is a limit to what you will be able to program without knowledge of certain topics, especially math.

IT departments need a lot of programmers to create applications that automate or simplify business logic. These are line-of-business applications that give the companies users the information they need to do their jobs more efficiently. Much of this work centers around large databases and reducing the data set to the pertinent information required by the user.

Software engineering for machines and cutting-edge technologies often does require knowledge of advanced math. All of the wireless technologies that have been developed and become mainstream over the last 20 years are heavily dependent upon signal processing algorithms. Signal processing in itself could cover waveform synthesis and analysis, compression, encryption, forward-error-correction, and modulation/demodulation.

Now consider artificial intelligence, computer vision, image processing, natural-language processing, simulations, robotics, general-purpose GPU programming, distributed computing, the continues to grow each year. With the exception of robotics, I believe that today's smartphones are in some form dependent upon all of those technologies that I listed, all of which require knowledge of advanced math to do well.

Most web-sites do not receive an amount of traffic that cannot be solved by throwing more hardware at the problem. However, even with the commodity prices of hardware these days, more hardware is not always the best or even an easy solution. Experienced developers with strong critical-thinking and reasoning skills are a much better investment.

For some companies like Facebook, more hardware simply is not an acceptable solution. With hundreds of millions of interactions constantly occurring through-out the world, highly specialized algorithms are required to operate their site as efficiently as possible.

Summary

Math is good. For some people, math is hard. Math is also more than just numbers. Mathematics is a logical foundation that most of our sciences are built upon. Analysis, critical-thinking, and reasoning are just a few abstract skills that are developed by studying and applying mathematic concepts. These are also very important skills to develop in order to become a good software developer. This is true whether you work in a corporate IT department, or are developing the next-generation wireless communications network.

I was required to learn one-year of numerical-based math passed Calculus. The type most people think of when considering math. However, the majority of my math classes were about more abstract topics like Discrete Mathematics, and the Theory of Computation. These are the building blocks that set the foundation for the more advanced topics taught the senior year of my degree. These are also the building blocks that are required for those students that choose to move on to graduate school.

Math is not necessary to be able to program a computer. However, math is very important to engineering and science, and there are just some places that you cannot access with a computer unless you have a solid foundation in logic, critical-thinking and abstract mathematics.

Accidental Complexity

general, CodeProject, maintainability, design Send feedback »

Accidental complexity is the entropy that exists in your system that is possible to eliminate. The opposite of this is essential complexity; the parts of a system that are required and cannot be simplified. These two concepts were discussed by Fred Brooks in his essay No Silver Bullet -- Essence and Accidents of Software Engineering.. Many systems today are extremely complex, and any effort that can be done to eliminate complexity, should be.

There are a lot of should be's...

I get very irritated with excuses. Because excuses just keep allowing the code to decay with each new statement that is added. I started writing a list of these excuses for entertainment, but I started to get irritated, then I stopped. Here is one of them, "In a perfect world..." There's really on one way that sentence ever ends.

Add value, not garbage

Just because you have an excuse, even if it's a valid excuse, doesn't mean that you are off the hook for cleaning up messes. I remember a situation when I ran across a piece of code, and I thought that it looked peculiar.

C++

(void)Sleep(k_one_second);

Casting a function call to void. I've never seen that before. There were no comments and I couldn't think of anything valuable to the code that is added by that cast. The Sleep function did return an integer, but the value is not being assigned. I scratched my head, and I deleted the cast.

Soon after my changes were being reviewed. There was a question entered that asked why I deleted the void cast. The comment then goes on to explain that cast was added to fix a defect reported by our static analysis tool.

I thought, "This is a fix?"

One of the rules in the coding standards we used, JSF++, is that all return values must be checked. This hardly qualifies as checking the return value, but it appeases the analysis tool.

I replied with "The reason why the tool is reporting an error is because all return values are supposed to be tested. This looks odd, there are no comments, and it's not even a fix." I immediately had two better ideas that would have required the same or less work, and be valuable changes:

  1. Change the functions signature to return void
  2. Encapsulate the original function in an inline function that returns void

Changes like this only add garbage to the code which reduces its readability. Even worse, it covers up a valid problem reported by the code analysis tool. Using subversive code tricks to prevent the tool from reporting a valid issue negates the value of using the tool. You should strive to make every statement add value to the code.

Several hundred instances of void casts were added in that situation. This only added more clutter that covered up a problem rather than focusing on finding a valid solution to fix the problem.

Simplify

Imagine that there is a specific feature that exists, and it will require a finite amount of logic to implement and use:

Logic required to implement and use a feature

Now imagine that functionality needs to be used in multiple locations. Generally we can simplify by factoring the common code into a function, or if it is more complicated logic it may even become a class.

Encapsulating part of the implementation

One particular problem that I encounter frequently is an abstraction that handles a minimum amount required for the implementation. This tends to leave much more logic for the caller than is necessary. More logic outside of this re-usable abstraction, means more duplicated logic; logic that can be written incorrectly, or even logic that does not properly initialize the feature.

Code left to caller opens door for errors

Can you make it simpler?

After you have simplified your code inside if your features abstraction, put it to the test and use it. Better yet, put it in a test, a unit-test. See if there is anything that you could actually take care of for the user with the known input.

Abstract more for the caller when possible

It is not always apparent, but this is accidental complexity. This is an example of a situation that could eliminate code that is duplicated.

This is unfortunate, because duplicated code is often performed with cut-and-paste, which is notoriously error-prone. This also adds more code to be maintained, which is also more code to read in order to understand what purpose a section of logic serves.

Consider the trade-offs

It is not always the best choice to continue to simplify. Simplifying the interface to a function usually means giving up flexibility. In most cases there is no reason the choice must be either/or. Sometimes it is feasible and advantageous to do both.

Create the basic abstraction that leaves all of the flexibility and error-prone or cumbersome logic to the user. Then create simplified versions of the feature that handle commonly used cases.

Example

The cumbersome abstraction that I use as an example, is the ::GradientFill function from the Win32 API. I explored what this function is capable of a few years ago, and I learned that it is quite powerful. The interface provides a lot of flexibility, and it does not look too bad from a cursory inspection.

C++

BOOL GdiGradientFill(
  __in  HDC hdc,            // Handle to the DC
  __in  PTRIVERTEX pVertex, // Array of points of the polygon
  __in  ULONG dwNumVertex,  // Size of the vertices array
  __in  PVOID pMesh,        // Array of mesh triangles to fill
  __in  ULONG dwNumMesh,    // Size of the mesh array
  __in  ULONG dwMode        // Gradient fill mode
);

However, this function requires a lot of repetitive setup code. This is also the reason that I hardly ever used ::GradientFill up to that point. Here code from the MSDN documentation page for this function that is required to paint a horizontal and vertical gradient. I believe it would be simpler to write a for-loop than the setup that is required for this function:

C++

TRIVERTEX vertex[2] ;
vertex[0].x     = 0;
vertex[0].y     = 0;
vertex[0].Red   = 0x0000;
vertex[0].Green = 0x8000;
vertex[0].Blue  = 0x8000;
vertex[0].Alpha = 0x0000;
 
vertex[1].x     = 300;
vertex[1].y     = 80;
vertex[1].Red   = 0x0000;
vertex[1].Green = 0xd000;
vertex[1].Blue  = 0xd000;
vertex[1].Alpha = 0x0000;
 
// Create a GRADIENT_RECT structure that
// references the TRIVERTEX vertices.
GRADIENT_RECT gRect;
gRect.UpperLeft  = 0;
gRect.LowerRight = 1;
 
::GdiGradientFill(hdc, vertex, 2, &gRect, 1, GRADIENT_FILL_RECT_H);
::GdiGradientFill(hdc, vertex, 2, &gRect, 1, GRADIENT_FILL_RECT_V);


The code is even worse for a triangle.

I wanted to make these functions simpler to use in my code. To me, it should be as simple as a single function call to fill a rectangle. So I encapsulated the required code in the function below:

C++

bool RectGradient(
  HDC hDC,        // The device context to write to.
  const RECT &rc, // The rectangle coordinates to fill with the gradient.
  COLORREF c1,    // The color to use at the start of the gradient.
  COLORREF c2,    // The color to use at the end of the gradient.
  BOOL isVertical)// true creates a vertical gradient, false a horizontal

As I mentioned earlier, you often give up flexibility for convenience. One of the features that is given up from the previous function is the ability to control alpha-blend levels. Therefore, I created a second version of this rectangle gradient that allows alpha blending levels to be specified.

C++

bool RectGradient(
  HDC hDC,                  
  const RECT &amp;rc,          
  COLORREF c1,              
  COLORREF c2,              
  BOOL isVertical,          
  BYTE alpha1,    // Starting alpha level to associate with the gradient.
  BYTE alpha2     // Ending alpha level to associate with the gradient.
  )

These two functions could be used much more simply. Here is an example of how much simpler the code becomes:

C++

// Horizontal Gradient
RECT rc = {0,0,300,80};
COLORREF black = RGB(0,0,0);
COLORREF blue  = RGB(0,0,0xff);
RectGradient(hdc, rc, black, blue, false);
 
// Let's draw a vertical gradient right beside the horizontal gradient:
::OffsetRect(&amp;rc, 310, 0);
RectGradient(hdc, rc, black, blue, true);

The value of many small abstractions adds up

Sometime the flexibility of the original code can still be accessible even when the code is simplified. This can be done with a collection of smaller abstractions. Utility functions like std::make_pair from the C++ Standard Library is one example. These functions can be used in series to simplify a series of declarations and initialization statements.

A series of small abstractions is valuable

In some cases this collection of utility abstractions can be combined into a compound abstraction.

Abstractions can be grouped together

There are many ways that code can be simplified. It doesn't always need to be code that would be duplicated otherwise. If I run across a super-function, I will try to break it down into a few sub-routines. Even though this new function will only be called in a single location, I have abstracted the complexity of that logic at the call site.

It is even more feasible to give this function that is called only once a more cumbersome but descriptive name. When reading code from the original super-function it is now much easier to ignore large blocks of code that obfuscate the intended purpose of the function.

Abstraction simplifies code, as long as it provides value, continue to abstract

While the code may be necessary, that does not mean that it must make the code around it more difficult to read and main.

Summary

Accidental Complexity is the code that exists in our programs that we can simplify or eliminate. Duplicated code and code that exists only to get rid of warnings are two examples of accidental complexity. The best case scenario, the code becomes just a little bit more difficult to maintain. However, in the worst cases, legitimate issues could be covered up. Worse still, the manner that they were covered up makes them that much more difficult to find if they are the cause of a real problem.

I witness developers exerting a great deal of effort to work around problems, fix the symptoms or even make excuses and ignore the problems. This time could just as easily be focused on finding a fix that actually provides value. Much more quality software would exist if this were always the case.

...but I suppose this is not a perfect world.

Alchemy: PackedBits (BitLists Mk3)

C++, Alchemy, design Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

My second attempt to create a bit-field type was more successful. The size of the container only grew linearly with each sub-field that was added, and the implementation was cleaner. However, I showed an image of what this implementation looked like in the debugger and it was very in convenient. The thing I was concerned with the most was the pitiful performance that was revealed by my benchmark tests.

This entry describes my discoveries and the steps that I took to re-invent the bit-field type in Alchemy for the third time. This is also the current implementation in use by Alchemy, which is about 10% faster than hand-coded collection of packed-bits.

Overview

Let's start with a brief review. The PackedBits type in Alchemy essentially replaces C-style bit-fields. Bit-fields are very common with network protocols, which attempt to make every bit count that is sent across the wire.

C and C++ both have a syntax to access values within a byte at the bit level, the bit_field. Unfortunately, the standard for both languages leave the rules for implementation undefined. Therefore, your code may not be portable to a different compiler for your same architecture.

Alchemy's PackedBits provides an abstraction to portably allow bit-level access for message protocols and storage structures.

The poor performance of PackedBits v2

The implementation for version 2 of PackedBits was a simplification of the original design. The first implementation had a fixed size of 32 entries, which always existed. Therefore they always took up space, and they were initialized and destroyed with each PackedBits type. This made them a very expensive type in space and speed.

Version 2 solved the hard-coded limit by using recursion and inheritance to handle each BitFieldNode. Unfortunately, there are two aspects of this design that are bad.

Inconvenient Debugging

Since each but field inherits from the next field in the chain, you had to navigated to each sub-class to inspect the value of a particular node. They were not all displayed at once. This is actually a minor inconvenience because some tools like Visual Studio allow you to create custom visualizers for your types.

Design does not scale

There is a challenge in creating a type like this. Abstraction provides an illusion to the user that they are interacting with a single type that provides the expressive syntax of value types in a struct. Yet, these are distinct objects composited that are structured to interact and change specific bits on a single data type.

I achieved this in the original design, by passing a reference to the data field in the PackedBits container, to the BitFieldNodes. The mask and shift operations are then defined by the template instantiation and used on the referenced value.

This created a valid solution. The drawback is that for every new bit-field added to the type, a new reference was added as well. This is a reference that will take up additional space, and will need to be initialized. It turns out, the initialization of these references was killing performance.

Whenever a copy of the type was required, the reference for each node had to be initialized. The worst part is that this data type only manages up to the word-size of the platform you are running on. A 32-bit integer specified to have 32 1-bit flags, would required 32 references and the data value to be initialized for each copy.

  The Structure of PackedBits v2

This implementation had to go!

The Solution

I had know of and considered an alternative solution many times. However, for my original implementation of Alchemy I steered clear of this solution. The reasons were the strict coding guidelines that we followed. My first implementation of what is now called Hg, followed the JSF++ coding guidelines.

A quandary

The quandary was simple. Even though I am now developing this independent open-source library, do I want to use the solution that I had often considered?

The solution was to use the offsetof MACRO defined in the standard library. This MACRO can become tenuous, especially as your object hierarchy becomes more complicated. Luckily I haven't run into any problems yet. Hopefully with the controlled implementation I can keep it that way.

What does offsetof do, and how does it work? This solution must be a macro because it inserts the names of sub-fields from a struct or a class to calculate the offset of that data elements address from the base address of the parent type.

My final decision

Yes, of course I decided to use offsetof. Basically, I can now group all of the bit-field children at the same level in the PackedBits parent type. I now pass in the parent type to the BitField object as part of its instantiation. This allows the child to compare itself to its parent and find its location relative to the base pointer.

C++

template< class  OwnerT,
          class  TagT,
          size_t OffsetT,
          size_t CountT,
          class  T = uint8_t
        >
struct BitField
{
  // Generally the same logic in previous
  // implementations for shift, mask, read and store.
  // ...
};

A member-function of PackedBits is called in order to be able to reference the correct data element. This function is called value(). The trick is, "how to get a pointer to the child types parent object without requiring initialization.

C++

// Continued implementation of BitField:
 
  value_type& value()
  {
    return owner()->value();
  }
 
  // This function returns a pointer to the parent object.
  OwnerT* owner()
  {
    return reinterpret_cast<OwnerT*>(
             reinterpret_cast<char*>(this) - TagT::offset());
  }

This adjustment, as well as the change in code-generation performed by the pre-preprocessor form the solution for version 3 of the PackedBits type.

  The Structure of PackedBits v3

A challenge with cross-compiling

When I went to cross-compile and verify the implementation on Linux with gcc, the compiler complained. It believed that it was de-referencing a pointer to 0. My guess is that since it was a template and not an actual value-type, the calculations were performed at the base address of 0, which worked out quite nicely for me.

How did I appease gcc? Look away if you are a purist:

C++

#ifdef  __GNUC__
 
// Yes, we're redefining this MACRO...
#ifdef offsetof
#undef offsetof
#endif
 
// GCC does not approve of the way in which
// Alchemy employs this MACRO.
// This is a slight alteration:
// Performing the calculation on a non-null value,
// then re-adjust back to zero after the offset calculation.
#define offsetof(type,member) \
  (size_t)reinterpret_cast<const char*>((((type*)1)->member)-1);
 
#endif

Are you ready for my rationalization? It's just like the C++ Standard Library and its treatment of addresses and iterators. It's safe to calculate and store and address one past the end, but it's not safe to dereference that address. We get an offset from this calculation, and we apply it to a known valid base-pointer.

Summary

After an abysmal first showing, I was very pleased when I saw the results from this re-implementation of the PackedBit type. Not only is the design simpler and easier to maintain, but it also performs about 10 to 12% faster than the hand-coded implementation in the benchmark tests that I have created.

I did resort to using the offsetof MACRO, which is prohibited by some coding standards. I do not expect an issues to arise because of the simple context where this calculation is used. There are no virtual functions, or complex class hierarchies to consider. However, I am keeping an eye on its behavior.

C++: Template Meta-Programming 2.0

CodeProject, C++ 2 feedbacks »

I was amazed after I had converted only the first few portions of the TypeList from the C++98 implementation to Modern C++. I have decided to convert my Alchemy API to use Modern C++ in order to truly learn the nuances by application of acquired knowledge. The code of these meta-programs are very elegant and completely readable. This really does feel like a new language, and an entirely new version of meta-programming.

The elegance enabled by the new constructs will allow me to present a complete TypeList implementation for Modern C++ in this entry.

I have corrected errors and uploaded a new implementation file.


Compare and Contrast

My primary goal is to compare the differences between template meta-programming with with C++98 and Modern C++ (C++11 and beyond). While the concepts remain similar, the newer language has been expanded in a way that makes this a more natural way to compose functional programming logic with C++.

To be clear, I want to explicitly state that meta-programming has its places, and generally it will not be in core-application logic. Libraries and utilities that will be developed and tested, but are less likely to require maintenance work are good candidates for these types of solutions. Application programmers can take advantage of these complex machinations, yet the caller may never even realize what magic exists behind the curtain. A great example would the the C++ Standard Library itself.

Frame of reference

Andrei Alexandrescu made the concept of meta-programming accessible to the masses in his book Modern C++ Design, published in 2001. He demonstrates and develops many useful components for meta-programming that he created as part of his library, Loki.

This is the style of meta-programming that I first learned, and the type of structure that, Alchemy, is built around. I have altered the implementation and interfaces for my meta-constructs to suit my needs, compared to what is presented in the book. The next few sections demonstrate how the same tasks are accomplished in both versions of the language.

Then

The most important of these constructs is the TypeList. The TypeList is a work-horse construct that can be in a variety of unique ways, yet does not contain any internal data or run-time code. structs become the natural type to act as a functional container, which performs all of its compile-time, or static, operations on types, and stores values in static const values or enums.

To simplify expressions, I made liberal use of typedefs. This helped me avoid the repitition of verbose template expressions and at the same time give a label to the purpose of that template. Sometimes there are no ways to simplify expressions other than turning to the pre-processor. I prefer to avoid the pre-processor at all costs in my application logic. However, I have grown accustomed to leaning on the pre-processor to generate code for me for repetitive definitions that appear in a class or at the global scope.

Here is an example of how Alchemy's TypeList is constructed. Internally, a TypeNode provides the declarations for the head and tail types.

C++

// A trait class to assist with tag-dispatching.
struct container_trait{ };
// Represents the end of a type list.
struct empty{};
 
template< typename H,
          typename T
        >
struct TypeNode
{
  typedef H head;
  typedef T tail;
};

Now the definition of the TypeList to show the organization of the structure:

C++

template< class T0, class T1, class T2, class T3,
          class T4, class T5, class T6, class T7,
          class T8, class T9, class T10,class T11,
        >
struct TypeList
  : container_trait
{  
  typedef
    TypeNode<T1,  TypeNode<T2,  TypeNode<T3,
    TypeNode<T4,  TypeNode<T5,  TypeNode<T6,  
    TypeNode<T7,  TypeNode<T8,  TypeNode<T9,  
    TypeNode<T10, TypeNode<T11, TypeNode<T12, MT>
    > > > > > > > > > > >                type;
    // Alchemy continues on to 32
};

Composing a structure should be much simpler than this nested definition. Therefore, I decided to wrap the inner declaration with a simpler outer definition. Unfortunately, there are only a few facilities available to customize template declarations. The best option in my particular situation was template specialization.

I wanted to provide a natural interaction with my TypeList object, and still allow support for a variable number of parameters. Thirty-two was my initial number of parameters that I would support. I can live with writing thirty-two specializations once. However, I had many operations that I would also implement, and each of those would require specialized implementations as well. So I resorted to the preprocessor to generate the code for me.

Here is the definition of the MACRO, and how it was used. It generates the code in the block from above:

C++

// It seems my syntax highlighter for MACRO requires some attention
#define tmp_ALCHEMY_TYPELIST_DEF(S) \
template<TMP_ARRAY_##S(typename T)> \
struct TypeList<TMP_ARRAY_##S(T)> \
  : container_trait \
{ \
typedef TMP_ARRAY_##S(TypeNode<T), empty TMP_REPEAT_##S(>)  type; \
}
 
// Define specializations of this array from 1 to 31 elements
tmp_ALCHEMY_TYPELIST_DEF(1);
tmp_ALCHEMY_TYPELIST_DEF(2);
tmp_ALCHEMY_TYPELIST_DEF(3);

Yes, I definitely left out many of the gory details for the definitions of the MACROs. But why would you want them? We're moving forward into the future; but you can still access them from Alchemy on GitHub.

The direct usage of the TypeList was then much more accessible to the user. Also, there was no need for them to use any MACROs to define a new TypeList:

C++

typedef TypeList
        <
          int,
          long,
          float,
          char
        > types;

Now

There are two primary additions to Modern C++ that make template programming in general a pleasure to use, and that is to not even mention meta-programming itself:

  1. Variadic templates:
  2. Similar to variadic function parameters, this feature allows a variable number of arguments to be used in a template definition.

  3. Template aliases:
  4. This allows the using keyword to be used in situations similar to typedef. However, unlike typedef, using can be defined as a template. Therefore, it is compatible with partially-specialized templates.

Here are the definitions that I required when I ported my code to Modern C++ (don't worry, I will explain the syntax afterwards):

C++

// Forward Declarations
struct empty { };
struct container_trait { };
 
template< typename... T >
struct type_node;
 
template< typename... NodesT >
struct type_list;

And an implementation for these types:

C++

// An empty terminating node.
template< typename... T >
struct type_node<>
{
  using head = empty;
  using tail = empty;
};
 
// Recursive parameter pack node
template< typename    H,
           typename... T
        >
struct type_node<H, T...>
  : type_node<T...>
{
  using head = H;
  using tail = type_node<T...>;
};
 
template< typename... NodesT >
struct type_list
  : container_trait
{
  using nodes = type_node<NodesT...>
};

No, really! That's it! We get the exact same usage as the code from above, and I'm not even sure that I need to explain this last chunk of code.

I admit, I had a few false-starts trying to get a grasp on the parameter pack. No, not to reach this point, neither of the code samples above are good for anything except defining a list of types. My first challenge appeared when I tried to create a meta-function to give me the type of parameter at a specific index in the list.

Let me introduce the new constructs, then I will demonstrate some of the elegant solutions that barely scratch the surface of their capabilities.

The parameter pack...

The variable defined within a variadic template is called the parameter pack.

The parameter pack is essentially a list of types delimited by commas. To define one as a parameterized type in a template definition, use the ellipsis between the typename or class declaration and the name that you assign your type. There can be whitespace before and after the ellipsis if you desire...or not. However, the general style that you are likely to see places the ellipsis attached to the type-declaration and a space before the type-name.

C++

// Most common style
template<typename... T> struct common;
 
// These are all legal too.
template<typename ...T>  struct spaces_before;
template<typename ... T> struct spaces_both;
template<typename...T>   struct spaces_none;
 
// Template parameter packs are legal for
// use with template functions as well.
template<typename... T>
T function(T... params);

You may have noticed in my type_list implementation and the declaration of the template function that I placed the ellipsis after the declared name. This is how you invoke the parameter pack in your logic.

Invoke the parameter pack

What does it mean to invoke the parameter pack?

Nothing really. You're setting it where you want to apply it, and the compiler goes to work ripping apart the parameter pack and generating your code. However, the compiler does need a little bit of help. You will need two things if you are generating code from the parameter pack:

  1. Recursive definition:
  2. This is a definition that will be implicitly called by the compiler as many times as necessary until it reaches your terminating case. If you refer to the definition of the type_list, you will see that the parameter pack is applied in a context where another type is placed before it, separated with a common. This essentially peels one or more types away from the parameter pack at a time. In this sense, the template parameter pack is similar to the variadic MACRO usage.[/codespan]

  3. Terminating condition:
  4. A condition that will handle the case of an empty list, or at least terminate the recursion before the compiler attempts to go beyond the end of the parameter pack. It is not necessary for this to be an entirely different definition.

Size of a parameter pack

A convenient sizeof... operator has been provided to match the syntax of the parameter pack. This version of the operator is not related in anyway to the classic sizeof operator.

C++

template< typename... T >
struct length
  : std::integral_constant<std::size_t, sizeof...(T)>
{ };

The parameter pack cannot be a variable

The parameter pack must be decomposed completely at compile-time. It cannot be the sole definition of a typedef or a using alias.

C++

template< typename... T >
struct param_pack
{
  using save_for_later = T...;
};

However, that does not mean that we are helpless. There is an idiom that exists with template programming that allows us to extract the type from a template parameter. I am not sure if it has a name.

Let me demonstrate it for you. This is the definition you are most likely to find on the Internet for a TypeList:

C++

template< typename... T > struct typelist { };

The previous code is completely legal because the parameter pack expansion is defined and terminated with this definition. With another template that is given the right specialization, we can extract the parameter pack from the original type definition.

To demonstrate this, let's create a length meta-function that will report the number of elements in the type_list that I defined above. We need to declare a default version of the length meta-function. This function does not necessarily need to be implemented.

C++

// Default declaration
// This does not require an implementation
template< typename... T > struct length;
 
// This specialization allows us to identify and access
// the parameter pack defined within our type_list.
template< typename... T >
struct length <type_list<T...>>
  : std::integral_constant<std::size_t, sizeof...(T)>
{ };

We can use the parameter pack from the type_list because we specialized this template solely for the this type. The compiler does a best fit comparison when attempting to resolve types, and finds this version.

Template Aliases

Up until this point, we have had the typedef, which has served us well. However, it does have its shortcomings. I believe the most notable is that partial template specialization is not supported by a typedef. The template alias does provide this support.

C++

// I want to provide a simple type to create
// a map of strings to another type.
template< typename T >
using StringMap = std::map<std::string, T>;
 
// Can now be used as:
StringMap<int>     named_ints;

Here's a more complex example:

C++

// I want to map names to lists of things.
template< typename T >
using NamedListMap = std::map<std::string, std::list<T>;
 
NamedListMap<unsigned int> lotto_picks_by_state;

Improves readability of templates

There is one other feature of template aliases that I did not fully appreciate until I started to use them. Most code examples do not get complex enough to allow you to fully appreciate the second feature. Let me demonstrate, then I will get into the gory details.

This is an example of an additional declaration that was added to C++14, but is possible in C++11. I am not sure if this technique wasn't discovered until after C++11, or they left it out to keep it from becoming C++12.

C++

// The C++ Standard Library contains useful
// meta-functions to manipulate types.
// This one converts type T into T*
template< class T >
struct add_pointer;
 
// This is used like this:
typedef typename std::add_pointer<void>::type   void_ptr;
// eww! ^^^^^^^^                         ^^^^
 
// Or directly...
typename std::add_pointer<void>::type   p_void = 0;
// I think I just threw up in my mouth...
// No wonder templates have a bad reputation.

These definitions appear in C++14, but you can use this technique in C++11.

C++

// Template alias for std::add_pointer
template< class T >
using add_pointer_t = typename std::add_pointer<void>::type;
 
// New usage:
typedef add_pointer_t<void>   void_ptr;
 
// And directly...
add_pointer_t<void>   p_void = nullptr;

Detailed explanation

typedefs are a common way to reduce clutter in code. Primarily with templates because the use of template type declarations require you to qualify the type with typename if you are using a dependent-type.

What is a dependent-type?

That is a very good question. To help with the explanation dependent type is a shortened version of the name template parameter dependent type. I'm surprised the C++ community hasn't just adopted TPDT, but I digress. A dependent type is a sub-type declared within a template class or struct.

typename is required when referencing sub-items in a template. I say sub-items because other things can be defined within a struct, that are accessed in the same manner as a dependent type, like a static variable. typename is a clue to the compiler that you want it to be interpreted as a type.

The capabilities of the template alias allow us to clearly specify beforehand that we mean a type. Therefore both the typename qualifier and sub-type required to access the dependent name are managed by the alias. This greatly simplifies code when there are many template types to deal with. Template meta-programming is a prime example.

One Last Tip

In the fall of 2014, N4115 Parameter Pack Searching[^] was proposed with some additions to the utility library. This would add a common form of the idiom that I described above to gain access to a parameter pack. The name proposed for the type is packer.

I was trying to modify an existing parameter pack, and I just couldn't put the pieces together. So that is when I searched and found N4115 when I found N4144 Searching and Manipulation of Parameter Packs[^], by Bill Seymour and Stephan T. Lavavej. This is an amended version of the first document and it adds manipulation utilities. One in particular is add_to.

I already demonstrated the concepts of packer, however, in my code I refer to it as param_pack. Here is how add_to is implemented. Multiple specializations are declared to handle the possibility of adding a parameter pack to a parameter pack.

C++

template<class T, class U> struct add_to;
// Add to the front of the list
template<typename T, typename... ArgsU>
struct add_to<T, param_pack<ArgsU...>>
{
  using type = param_pack<T, ArgsU...>;
};
 
// Add to the back of the list
template<typename... ArgsT, typename U>
struct add_to<param_pack<ArgsT...>, U>
{
  using type = param_pack<ArgsT..., U>;
};
 
// Combine two lists
template<class... ArgsT, class... ArgsU>
struct add_to<param_pack<ArgsT...>, param_pack<ArgsU...>>
{
  using type = param_pack<ArgsT..., ArgsU...>;
};
 
// And the template alias
template<typename T, typename U>
using add_to_t = typename add_to<T,U>::type;

You will see a demonstration of the usage in the next section.


A Modern C++ TypeList

I searched the Internet, albeit briefly, and I did not find any Modern C++ implementations of a TypeList that did not expand beyond this definition:

C++

template< typename... T > struct typelist { };

I found the fundamentals to convert the code that I already have into modern form. I want to convert and get it integrated first. If there are better practices, I can adjust the implementation in a working test harness.

I have already shown the definition of the basic type_list structure that I use as well as a demonstration of the length and param_pack, and the implementation for add_to. In the code below, I have omitted the forward declarations and template aliases that I define in the type list header file.

I am going to blast through the different operations that I have built so I do not take up too much more of your time. If something is not clear, please drop a comment and I can further explain or even add more detail to the description.

I have posted a link to the single header file that contains all of these definitions at the bottom.

make_type_list

I wanted to be able to make a type_list from an existing set of internal type_nodes, and then later, a param_pack.

C++

template< typename... T>
struct make_type_list< type_node<T...>>
  : container_trait
{
   using type  = type_list<T...>;
};
 
template< typename... T>
struct make_type_list< param_pack<T...>>
  : container_trait
{
   using type  = type_list<T...>;
};

type_at

Query the type of element at a specified index in the type_list. This item required a helper template that I called type_of_node.

C++

template< std::size_t    IdxT,
          typename  NodesT
        >
struct type_of_node
{
  using type =
    typename type_of_node<IdxT-1, typename NodesT::tail>::type;
};
 
// Terminating specialization
template< typename  NodesT >
struct type_of_node<0, NodesT>
{
  using type = typename NodesT::head;
};

Now for the actual implementation of type_at.

C++

template< std::size_t IdxT,
          typename    T  
        >
struct type_at
{
  using nodes = typename T::nodes;
  using type  = typename type_of_node<IdxT, nodes>::type;
  using rest  = typename make_type_list<typename nodes::tail>::type;
};
 
// A terminating case for type_at
template< std::size_t IdxT,
          typename    T  
        >
struct type_at<IdxT, empty>
{
  using nodes = empty;
  using type  = empty;
  using rest  = empty;
};

I added the declaration of nodes to simplify the declaration for type. This wasn't strictly necessary. I added rest for convenience in other solutions. rest returns a type_list of the elements remaining after the specified index.

For example, if there were 10 elements in a type list and index 6 was specified. A type list with elements [7,8,9] would be returned.


Stop me if I go too fast for the rest of these.

front

C++

template< typename T >
struct front
{
  /// Type of the first element in the list.
  using type = type_at_t<0, T>;
  using rest = typename type_at<0, T>::rest;
};

back

C++

template< typename T >
struct back
{
  /// Type of the last element in the list.
  using type = type_at_t<length<T>::value-1, T>;
};

pop_front

C++

template< typename T >
struct pop_front
{
  using type = typename front<T>::rest;
};

push_front

C++

template< typename F, typename L >
struct push_front
{
private:
   using params = typename to_param_pack<typename L::nodes>::type;
   using sum    = typename add_to<F, params>::type;
 
public:
   using type   = typename make_type_list<sum>::type;
};

push_back

C++

template<typename L, typename B>
struct push_back
{
private:
   using params = typename to_param_pack<typename L::nodes>::type;
   using sum    = typename add_to<params, B>::type;
 
public:
   using type   = typename make_type_list<sum>::type;
};

New functionality updated since the original post

pop_back

C++

template<typename T>
struct pop_back
{
   using type = typename split_t<length<T>::value - 1, T>::type;
};
 
// Terminating specialization
template< >
struct pop_back
{
   using type = empty;
};

move_item

Move a specified number of elements from the front of the second list, to the end of the first list. This function is used to implement split, which is then used to implement pop_back.

C++

template<std::size_t CountT, typename T, typename R>
struct move_item
{
private:
  using first = push_back<T, front_t<R>>;
  using last  = pop_front<R>;
 
public:
   using type = typename move_item<CountT-1, first, last>::type;
   using rest = typename move_item<CountT-1, first, last>::rest;
};
 
// Terminating specialization
template<typename T, typename R>
struct move_item&lt;0, T, R>
{
   using type = T;
   using type = R;
};

split

Splits the list into two separate lists at the specified pivot index.

C++

template<std::size_t PivotT, typename T>
struct split
{
  static_assert(PivotT <= length<T>::value,
                "The split pivot index is out of range");
 
   using type = typename move_item<PivotT, type_list<>, T>::type;
   using rest = typename move_item<PivotT, type_list<>, T>::rest;
};

Summary

Again, I am pleased at how much simpler my code has become with these new additions. It's still C++. It's like C++ with the Hemi. Statements can be expressed more tersely, which actually increases the readability of the code as opposed to the lose meaning. Repetitive typing and redundant code can also be reduced.

If you frequently program with templates, or are even a big fan of the C++ Standard Library, you owe it to yourself to become familiar with these two features.

As promised, here is a link to the full type list implementation:


C++: auto

CodeProject, C++ Send feedback »

The keyword auto has been given a new behavior since the C++11 Standard was ratified. Instantly I could appreciate the value of its new function when I considered things like declaring an iterator for a container. However, I was skeptical of any value that auto could provide for general purpose use.

Now that I have had a chance to study this new keyword I believe that it is quite a valuable addition to Modern C++. While there are situations that could still cause some grief, they are not difficult to detect, and the solutions to the problem are straight-forward to apply

Basic Usage

Here are a few examples for reference that will give context to this discussion. auto has been repurposed to allow developers to take advantage of the type information known to the compiler. This provides the potential benefit to make code easier to read and write.

I have added a comment that shows the type that each auto variable will be assigned. This is also the same declaration that would be required if auto were not used.

C++

std::map<unsigned long, std::string>    keys;
auto value = 101;             // int
auto other = 101ul;           // unsigned long
 
// std::map<unsigned long, std::string>::iterator
auto  iter = keys.find(value);
 
// std::string
auto  entry = iter != keys.end()
            ? iter->second
            : "";
 
// std::string::size_type
auto len = entry.size();      // std::string::size_type
if (len > 0)
{
  auto first = entry[0];      // char
  auto data = entry.c_str();  // const char*
  auto first_data = data[0];  // char
}

Initial Doubts

I am not the only programmer to have doubts about allowing the system to automatically choose the types of my variables. There are a number of questions asked around the web like this one from StackExchange and other sites.

There were two fundamental sources of my doubt.

Flashbacks of Visual Basic

I spent a brief moment of my career developing with Visual Basic. (Hey! I only did it to pay for tuition while I was in college. sigh).

To declare a new variable you would use the keyword Dim. If you hadn't defined a variable explicitly with Dim, then the compiler would give you an error. Variables that were not given an explicit type in Visual Basic used a type called Variant.

If you ever have programmed with Microsoft's COM, then you are aware that a Variant could be coerced into different types at run-time. BasicallyEssentially allowing this strongly-typed static language to bend the rules when using the Variant.

It did not take me long to realize that auto in Modern C++ is nothing like a Variant.

The chaos of using unknown types

This is only a misconceived notion. The type is well known by the compiler after it verifies the correctness of your code and before it generates any of the final object code. The syntax used in the declarations and statements have well-defined rules to determine these types. This is especially true for C++; any version of C++.

There are subtle coercion rules that have always existed to allow C code to be more compatible with the stronger type-checking that is present in C++. These rules mostly deal with the width of an integer or converting an object from one type to a compatible type without cluttering the syntax.

If you still have doubts, inspect the type of assigned to your variable within your IDE or debugger. There is a definitive type that has been deduced and assigned for every auto variable. This type is fixed and will not change.

Automatic type deduction

Are you familiar with templates and how the types are deduced for parametric variables? If so, then you know almost all there is to know about how the type is determined for auto variables. If you aren't familiar with all of the rules, it's OK, because the compiler is.

The rules are the same between the two forms of variable declaration with one exception, which is when a curly-braced initializer list is considered in the type deduction. In this case, the auto declared variable assumes the type to be an std::initializer_list that was also added to Modern C++. A template-deduced type in this situation requires you to explicitly specify the std::initializer_list type. The type T contained within the std::initializer_list<T> will then be deduced by the template.

I also want to remind you about the existence of Type Decay[^], it seems probable to me that you will encounter this at some point using auto. This coercion rule applies to both template and auto type deduction.

C++ 14

C++14 has added the possibility for auto to be used as the return type for a function.

C++

// Let's wrap the map lookup code
// from above into a convenient function.
// The return type will be a std::string.
auto createName(unsigned index)
{
  auto  iter = keys.find(value);
 
  return iter != keys.end()
         ? iter->second
         : "";
}

It is important to note, that every return point from the function must result with the same return type, which also allows the possibility for type conversion of the values into the same type. This is more evidence that demonstrates there is nothing magical about auto. The final code that it generates follows the same rules as if you had explicitly defined the values yourself.

There is a catch

The auto type must use the rules for template type-deduction. This means that it is not possible to use auto when you would like to return a std::initializer_list.

C++

// auto return types use the same
// type deduction rules as templates.
auto Fibonacci_5()
{
  // This type of use is not legal.
  return {1,1,2,3,5};
}

This rule also applies to the type deduction for the evaluation of lambda expressions.

C++

// This version cannot be deduced by the
// compiler and results in an error.
std::vector<int> numbers;
auto initSequence =
  [&numbers](const auto& sequence)
  {
    numbers = sequence;
  };
 
initSequence({1,1,2,3,5});

To make this expression valid, the same adjustment required for templates is required. The std::initializer_list&lt;T> needs to be specified, then the type T can be deduced.

C++

// The initializer_list is explicitly specified
// to make this lambda expression legal.
std::vector<int> numbers;
auto initSequence =
  [&numbers](const std::initializer_list<auto>& sequence)
  {
    numbers = sequence;
  };

Advantages

The advantages of auto are immediately apparent when you consider the amount of clutter that is removed when you use iterators or a moderately complicated template statement.

C++

// Something like this:
std::shared_ptr<std::vector<std::pair<T, U>>> sp_values
  = std::make_shared<std::vector<std::pair<T, U>>>();
  ...
std::vector<std::pair<T, U>>::iterator iter
  = std::find(sp_values.begin(), sp_values.end(), 100);
 
// becomes:
auto sp_values = std::make_shared<std::vector<std::pair<T, U>>>();
auto iter      = std::find(sp_values.begin(), sp_values.end(), 100);

However, there are other advantages that are not immediately obvious.

Less typing (explicitly and with your keyboard)

Yeah, this is the low hanging fruit. I should mention, there is one exception, and that is if the type you need is an int. I also suppose there exists sadistic programmers that typedef or alias, one and two character length type names; so that would be another exception.

Along the same theme of less typing is refactoring data-fields. Changing types or function signatures. If you use auto effectively, you may only need to update the declaration in the header file and the functions definition to complete your refactor.

auto defined variables will be initialized

When a variable is defined with auto, it must be assigned an initial value. Otherwise there would be no way to determine its type:

C++

auto count;      // This is illegal, must be initialized
 
auto value = 10; // OK, assigned type int

One thing that I try to teach developers that I mentor, is to not define the variable until you need it. Many developers still seem to forward declare all of their variables at the top of a function. That isn't necessary. Furthermore, objects with constructors may not be cheap, and if you drop out of the function before you use the constructed object, well that's just wasteful.

I think declaring your variables at the point they become necessary using auto will help advance this style.

Portability

This next set of declarations is a common occurrence, where the programmer explicitly selected the type for the variable, and they almost got it right, or maybe not at all. Selecting the incorrect type may be the difference between correct behavior and a security vulnerability. Another consequence that is less severe, depending on who you ask, would be truncation, or data loss.

C++

// No, std::size_t
int len = ::strlen(buffer);
 
// Maybe, if you're on a 32-bit system.
// Narrowing truncation is possible on a 64-bit system.
// std::size_type, which is generally defined as
// std::size_t
std::string name = "Code of the Damned";
unsigned long size = name.size();
 
// No not even close, are you even trying?
// Of course I'll still accept it...
double how_big = ::strlen(buffer);

It seems that I am always correcting warnings such as thiswarning: '<' : signed/unsigned mismatch. While learning secure coding practices I adopted the habit of using unsigned types for values that will be used as an index to reference memory. In fact, std::size_t has become my defacto unsigned type. It is guaranteed to be large enough to reference the largest address on the system.

As it turns out, it's not always possible, or even correct for an unsigned type to be used in comparison operations. Then I am left with a cast of some sort. auto solves this problem by selecting a compatible type with your initialization value. The most frequent place this seems to occur is within for loops:

C++

// Yes, 'i' is signed and size() returns an unsigned value
for (int i = 0; i < name.size(); ++i)
{ ... }
 
// auto doesn't fix things alone.
// However, adding a type-specifier to 0 helps.
for (auto i = 0ul; i < name.size(); ++i)
{ ... }
 
// This is Modern C++.
// Use a range-based for loop and be done with it.
for (auto i : name)
{ ... }

Efficiency

That is correct. Consider all of the coercion and transformation techniques that may be used by the compiler to allow your program to compile successfully and run efficiently. Techniques such as move-semantics, const-correctness, temporary copies, the list goes on.

I have already argued that we don't always succeed in selecting the correct type just to hold the size of a container. Selecting the best type for efficiency is an even more difficult decision. The most probable place an incorrectly selected type could affect performance is within a loop. However, another time to consider replacing your type declaration to auto is when you are running a profiler trying to optimize your code, and you identify a hot-spot that seems to be making unnecessary copies. Using an auto declaration may give the compiler the freedom it needs to select a more efficient type and eliminate those copies.


Be aware of proxy classes

Yes, there is another potential gotcha that is important to know exists. Proxy classes are useful in variety of ways. The list below is not exhaustive, and it is likely the proxy class exists for more than one reason found on the list:

  • Transparently provide remote access
  • Delay expensive calculations until required
  • Encapsulate logic for efficiency
  • Protect direct access to encapsulated data
  • Formulate expression templates

Libraries commonly use proxy classes to simplify the syntax required to interface with the library objects. Expression templates a very common with math-based libraries to retain the natural mathematic syntax that is possible in C++. A Matrix class for example, or even a multi-dimensional array.

Example: Alchemy Proxies

I use proxy[^] classes in Alchemy to make the data fields of a message appear to the caller to be a fundamental type that is part of a struct. Each field is contained within its own proxy object, which knows how to efficiently manage the different operations supported in Alchemy.

Each proxy class contains a type-conversion function to extract the actual value from the proxy object. Here is an example definition of a 3-dimensional point.

C++

// Define an Alchemy message structure
// to represent a 3-d point.
ALCHEMY_STRUCT(pt3d_t,
  ALCHEMY_DATUM (double, X),
  ALCHEMY_DATUM (double, Y),
  ALCHEMY_DATUM (double, Z)
);
 
// Create an initialized instance of the message with:
// X=1.0, Y=2.0, Z=3.0
Hg::basic_msg<pt3d_t>  pt_msg = {1.0, 2.0, 3.0};

A comparison of the results between explicit type declaration and auto type declaration.

C++

// x_val is assigned 1.0
double x_val = pt_msg.X;
 
// ax_val is given the type:
// Hg::detail::DataProxy
//   < Hg::fundamental_trait, 0, Hg::TypeList<double, double, double>>
auto ax_val = pt_msg.X;

Clearly, ax_val is not assigned the type double, as would be desired in most circumstances. That is because the compiler actually does get the same type for the auto declared variable to the explicitly declared double x_val.

If that were the end, the result would be a compiler error. However, the proxy class implements a type-conversion function, operator value_type() const, where this is the definition for value_type, typedef double value_type; Now the compiler has the leeway to coerce the proxy object into the declared type, double, and the statement becomes valid.

Another name for proxy classes that have conversion operators like this, are also referred to as, invisible proxies.

What is the solution?

First, recognize that auto is doing exactly what it is supposed to do in this case. Then we can admit that it does not actually arrive at the desired type, a double.

Item 6 from, Scott Meyer's, Effective Modern C++ offers a solution that he calls the explicitly typed initializer idiom. The solution is to explicitly specify the desired type with a static_cast.

C++

// ax_val is now a double and will be assigned the value 1.0
auto ax_val = static_cast<double>(pt_msg.X);

This scenario is most likely to occur when the library you are using allows for a terse and expressive syntax. When you suspect auto is not assigning the type that you expected, take a look at the definition of the class that owns the expression. If the return type is a proxy object you can either explicitly cast to the desired type, or revert to the classic form of type declaration.


Summary

auto has been re-purposed in Modern C++. A cursory glance of the change reveals some benefits. However, the benefits run much deeper than the novelty applications that you can imagine. Adopting a consistent use of auto to declare your variable types will most likely improve your codes correctness, portability, efficiency, and most of all readability. If you have previously considered auto, and decided that it wasn't for you, take a moment to reconsider. auto has great potential.

Make Your Comments Matter

general, communication, CodeProject, maintainability Send feedback »

There are many different philosophies with regards to how source code should be commented. The gamut of these philosophies range from "Every single statement must have a comment." to "Comments are useless; avoid them at all costs!" I am not even going to attempt to explain the disparity of range. However, I will attempt to address and suggest potential solutions the reasons that are often cited for these extreme approaches to commenting code.

To make a decision

I think that it is important to remind everyone that judicious decision making is required to write high-quality software. These are the types of decisions that cannot be easily made by following a flow-chart or a rote memorized rule. If it were that simple to make a decision, programmers would not be in such great demand.

As skilled professionals we often try to distill the rules to simpler pieces of advice to help less-experienced colleagues become better, faster. Unfortunately, information is often lost when we attempt to simplify our experiences into a neat set of rules.

Rules cannot always be reduced to a simple choice to help make the final decision.

Context

A general purpose rule tends to lack context. Context becomes very important once you move beyond the most trivial decisions. Events, statements, and ideas can not always be fully understood without the settings or circumstances involved.

A action such as self-defense cannot be determined without understanding the circumstances and settings involved, the context. Literal jokes are funny because a meaning is interpreted out of context. Context allows us to interpret meanings in forms that differ from their strict literal meaning.

Context adds clarity. Laws attempt to create rules for society and omit the context as much as possible to make the law more generally applicable. A law that is written with more specific terms is limited to fewer situations where it can be applied. This leaves loop-holes for criminals to take advantage of, and when interpreted most literally, innocent people may be punished.

Context cannot always provide the answer.

Everybody knows that!

Skills that once seemed impossible to learn, often becomes natural once we learn to master that skill. As a beginner, the slightest distraction could interrupt our train of thought. Added confusion tends to make decisions that are not familiar more difficult, which could lead to failure.

Skilled masters may not even consciously realize all of the decisions they are making as they perform their skill. Most average adult humans have mastered the skill of walking. Walking and talking, no problem (viral internet videos demonstrate that walking and texting is entirely different). Distracting a toddler learning to walk, or a child learning to ride a bike without training wheels often leads to a crash or fall.

My point is, as we become more skilled we can forget the things that we actually know and naturally do without consciously thinking about it. We may have spent thousands of hours learning and understanding something. Often we walk away with the well earned conclusion, and possibly a few memorable mistakes we made along the way. For most people, it is difficult to remember or imagine not knowing their mastered skill.

This leads to gaps in explanations, or assumptions about when a rule should be applied. Because, "that's just common knowledge" to the skilled master.

Expert's are not always the best people to create guidelines for making decisions. This is especially true when the guidelines are aimed at a group with a wide variation of skill level.

Where's the value?

This is a question that I learned to ask while I was practicing TDD and learning how to unit-test. I soon learned that I could ask this question in many different contexts. Asking and answering this question can help simplify the decision-making process.

For example, when considering a particular approach to solving a code problem, I try to compare the value of the solution to the amount of risk the solution may add. If the amount of risk out-weighs any perceived value, the decision becomes simple; find another way.

To answer this question will not make every decision simpler. However, it often can be used to reduce the set of choices that are available for you to select from.

This question alone has changed how I approach commenting source code. If a modification or addition of code does not add value, why does the code need to be modified. Keep in mind that deleting code can also add value to your program; in the form of improved readability, and less complexity.

Criticisms of Code Commenting

Just about all of the opinions that I have encountered for code comments are valid, for certain contexts. There is no opinion or rule, which I have heard that I believe is absolutely true for all scenarios. Let's take some time to consider these opinions.

Every line/statement must be commented

In all fairness, the commenting rules for this guideline originates from the aeronautics industry. I will not argue with a policy for code that is used in critical applications where major bodily harm or death could occur. I should also add organizations like NASA, where hundreds of millions of dollars is spent, then you have to wait for nine years for your space-craft to reach its destination. You want to be sure that it will work properly when it gets there.

Remember thought, extra comments are not necessarily a good thing. When they don't add value, sometimes they add clutter, other times they use up space, in the worse case they are misleading.

Comments do not get maintained along with the code

I have seen this many times. To make matters worse, it is not always easy to tell which item is incorrect, the comments or the code. For comments to provide value, they must be accurate. In a situation where the comments are completely inaccurate, there may be more value in simply removing the comments rather than trying to update them.

If this argument is considered when paired with the previous guideline, then there is definitely a strong case to avoid comments altogether. Prohibiting the use of a language feature also prevents the possibility of any benefit that feature could have provided. It does, however, require that your organization allows and expects your engineers to think.

The code should be clear and self-documenting

I agree with this statement.

However, descriptive names still have a limit to the amount of information that they can convey. At some point, long names become obnoxious. If not for the inconvenience of the extra typing, then for clutter that is added reducing the readability of the code.

This type of policy does not translate well to all programming paradigms. Structured and imperative languages will have the most success with this approach. Each step is part of a well-defined sequence. Furthermore, while the code sequence may be readable, the purpose for the entire sequence of commands may not be immediately clear. I tend to encounter this most frequently in monolithic functions.

Yes, these are comments... useless comments

Pertinent examples can illustrate this point much better than I could ever describe with words.

Cut-and-Paste to comment

C++

if (SUCCESS != error)
{
  ...
} // if (SUCCESS != error)

The context of the curly brace is already a good indicator that something is ending here. Many developer IDEs can help wiht the rest. For example, Ctrl + ] will jump between matching braces and parenthesis.

There is one exception where I actually do like ending a closing curly brace with a comment. That is at the end of a namespace. I find this is especially useful with deeply nested namespaces. This type of comment is valuable to me because I use it as a visual cue to help navigate namespace-scope.

Captain Obvious

Sky-diving without a parachute is a once in a lifetime experience!

C++

// Increment the index.
index++;

What is the purpose of this code... ?

My first instinct is to look at the header file.

C++

/// @SteamPunk.h
/// This is the declaration of the SteamPunk class.

I am an optimistic-realist, so I will try looking at the class declaration.

C++

/// The SteamPunk class specializes the BasicMachine class
/// for the steam-punk functionality.
class SteamPunk
  : public BasicMachine
{  
  ...
};

Fine. I will figure out what this class does on my own. "What happens when I set a mode and what modes can I set?"

C++

/// Sets the mode.
/// @param[in] mode   - The value of the mode to set
void SetMode(int mode);

None of the comments from this section gave me any useful information. All that they did was duplicate explanations that were clear from the self-documented code. This is a situation where the comments were added solely to satisfy a code-quality checkbox.

We have always done it this way

Source control has been around for decades. However, before it was truly widespread there was a common practice of adding your edit history as a comment to the top of the source file. The way I see it, this is just more clutter. You can glean all of the history information from your version control tools and actually see the changes that were made by viewing a diff-comparison between the versions.

C++

/// 1/15/2002 - ABC: Initial implementation
/// 3/28/2004 - ZED: Added that crazy feature
/// 4/02/2004 - TEL: Fixed ZED's crazy feature
/// 5/14/2007 - PJT: Upgraded the logging system
/// 2/28/2010 - ABC: Fixed that memory leak
/// 8/02/2011 - LOG: Changed to another logging system

Increase the value of your code with useful comments

Again! Remember to ask what value are you providing with each modification. If the comment you are about to write can easily be gleaned from reading the next few lines, your comment becomes less valuable.

Comments should be used to annotate and document your source code in ways that are not possible with code alone.

Document purpose

If you are familiar with the domain for which your program is being written, the purpose of an object may be obvious to you. However, new programmers working with the same code, may not possess the correct vocabulary to work effectively with the source code. Moreover, it could slow down or even prevent other developers from being capable of working within your code.

Document usage

Self-documenting code does not always provide enough information for someone to appropriately use your class or function. The meaning of a function name could be misinterpreted, or the purpose of each input variable may be ambiguous.

There are a number of automated tools that extract specially formatted comments to generate external documentation that you can use for reference independently of your code. Some of these tools are capable of providing so much more than documentation. For instance, Doxygen can generate relationship diagrams for your class definitions. You can also incorporate Mark-down definitions directly in your code to generate up-to-date design documentation.

Clarify intentions

Sometimes the intended purpose for a sequence of logic is just not obvious (probably more than sometimes). When it is not obvious, summarize your intentions. If you make a mistake in your code, it may make it easier to integrate that test for a corner case that you missed if it is clear what the code was intended to do.

Document rationale

We build poor quality code one statement at a time. Later we may try to repent for our sins and clean up the code. However, sometimes there is no reason that is clearly apparent for why code is structured the way it is.

Perhaps atomic operations could replace more expensive synchronization objects like a mutex, if the statements were re-ordered. Since the order of the statements is important, documenting this code structure would be valuable for any future developer that works with this code.

Some times similar statements placed in proximity of each other improves the readability of the code. The prettiest code in the world becomes a useless program if it does not function properly.

Summarize blocks of behavior

I believe one of the most valuable places to comment code is at the beginning of a statement block. Even if the block contains 10 lines of the simple statements that can be easily followed, summarize what the 10 lines accomplish.

If the comments can be trusted, this allows the reader to combine that collection of logic into one abstract notion while reading the rest of the code.

What does the snippet of code in the example below do?

C++

// Creates a linear gradient at a specified angle
bool AngularGradient(...)
{
...
  if (0 == (offset % 2))
  {
    std::swap(cC[0], cC[1]);
    std::swap(alpha[0], alpha[1]);
  }
 
  offset = std::abs(offset - 4) % 4;
  COLORREF clr[4] = { c1, cC[0], c2, cC[1]};
  std::rotate(clr, clr + offset, clr + 4);
 
  USHORT   alphaV[4] = { alpha1, alpha[0], alpha2, alpha[1]};
  std::rotate(alphaV, alphaV + offset, alphaV + 4);
...
}

Here is the comment that I have placed above this section of code. It summarizes an entire block of logic as well as documents my rationale and intent of the behavior for the code:

C++

bool AngularGradient(...)
{
...
  // As the angle's start point changes quadrants, the colors will need to
  // rotate around the vertices to match the starting position.
  if (0 == (offset % 2))
  {
    std::swap(cC[0], cC[1]);
    std::swap(alpha[0], alpha[1]);
  }
  // Remainder of snippet excluded
  ...
}

The code snippet above is from one of my favorite articles discussing the AlphaBlend and GradientFill functions from the Win32 API. I posted it at CodeProject.com a few years ago. Here is a link if you are interested in reading more from that article.Win32 Image Composition[^]

Summary

Programming source code is an attempt to express an abstract thought into a concrete idea. The intended audience for this expressed idea is usually a compiler or interpreter. This can lead to some very cryptic commands, because all that matters in this context is following discrete rules of the programming language. A secondary audience are the humans that will continue to develop and maintain the program.

Code comments are a way for us to express intent, rationale for a decision, instructions of usage, or just a simple reminder. Comments can provide value. However, value is not usually achieved by blindly following a set of rules in a coding guideline. Carefully thought through and conscious decision-making is generally required. As is the case for the source code itself.

Contact / Help. ©2017 by Paul Watt; Charon adapted from work by daroz. CMS / cheap web hosting / adsense.
Design & icons by N.Design Studio. Skin by Tender Feelings / Skin Faktory.