Alchemy is a collection of independent library components that specifically relate to efficient lowlevel constructs used with embedded and network programming.
The latest version of Embedded Alchemy[^] can be found on GitHub.The most recent entries as well as Alchemy topics to be posted soon:
 Steganography[^]
 Coming Soon: Alchemy: Data View
 Coming Soon: Quad (Copter) of the Damned
I just completed my Masters Degree in Cybersecurity at Johns Hopkins University. I plan to resume Alchemy's development. I plan to use my newly acquired knowledge to add constructs that will help improve the security of devices built for the Internet of (Insecure) Things.
I previously introduced how to perform the basic matrix operations commonly used in 3D computer graphics. However, we haven’t yet reached anything that resembles graphics. I now introduce the concepts of geometry as it’s related to computer graphics. I intend to provide you with the remaining foundation necessary to be able to mathematically visualize the concepts at work. Unfortunately, we will still not be able to reach the "computer graphics" stage with this post; that must wait until the next entry.
This post covers a basic form of a polygonal mesh to represent models, points and Euclidean vectors. With the previous entry and this one, we will have all of the background necessary for me to cover matrix transforms in the next post. Specifically the 3D projection transform, which provides the illusion of perspective. This will be everything that we need to create a wireframe viewer. For now, let's focus on the topics of this post.
Modeling the Geometry
We will use models composed of a mesh of polygons. All of the polygons will be defined as triangles. It is important to consistently define the points of each triangle either clockwise or counterclockwise. This is because the definition order affects the direction a surface faces. If a single model mixes the order of point definitions, your rendering will have missing polygons.
The order of point definition for the triangle can be ignored if a surface normal is provided for each polygon. However, to keep things simple, I will only be demonstrating with basic models that are built from an index array structure. This data structure contains two arrays. The first array is a list of all of the vertices (points) in the model. The second array contains a threeelement structure with the index of the point that can be found in the first array. I define all of my triangles counterclockwise while looking at the front of the surface.
Generally, your models should be defined at or near the origin. The rotation transformation that we will explore in the next post requires a pivot point, which is defined at the origin. Model geometries defined near the origin are also easier to work with when building models from smaller components and using instancing to create multiple copies of a single model.
Here is a definition for a single triangle starting at the origin and adjacent to the Y plane. The final corner decreases for Y as it moves diagonally along the XZ plane:


This definition represents a unit cube starting at the origin. A cube has 8 corners to hold six squarefaces. It requires two triangles to represent each square. Therefore, there are 8 points used to define 12 triangles:


Vectors
Recall from my previous post that a vector is a special type of matrix, in which there is only a single row or column. You may be aware of a different definition for the term "vector"; something like
NOUN
A quantity having direction as well as magnitude, especially determining the position of one point in space relative to another.
Oxford Dictionary
From this point forward, when I use the term "vector", I mean a Euclidean vector. We will even use singlerow and singlecolumn matrices to represent these vectors; we’re just going to refer to those structures as matrices to minimize the confusion.
Notational Convention
It is most common to see vectors represented as singlerow matrices, as opposed to the standard mathematical notation which uses column matrices to represent vectors. It is important to remember this difference in notation based on what type of reference you are following.
The reason for this difference, is the rowform makes composing the transformation matrix more natural as the operations are performed lefttoright. I discuss the transformation matrix in the next entry. As you will see, properly constructing transformation matrices is crucial to achieve the expected results, and anything that we can do to simplify this sometimes complex topic, will improve our chances of success.
Representation
As the definition indicates, a vector has a direction and magnitude. There is a lot of freedom in that definition for representation, because it is like saying "I don’t know where I am, but I know which direction I am heading and how far I can go." Therefore, if we think of our vector as starting at the origin of a Cartesian grid, we could then represent a vector as a single point.
This is because starting from the origin, the direction of the vector is towards the point. The magnitude can be calculated by using Pythagoras' Theorem to calculate the distance between two points. All of this information can be encoded in a compact, singlerow matrix. As the definition indicates, we derive a vector from two points, by subtracting one point from the other using inverse of matrix addition.
This will give us the direction to the destination point, relative to the starting point. The subtraction effectively acts as a translation of the vectors starting point to the origin.
For these first few vector examples, I am going to use 2D vectors. Because they are simpler to demonstrate and diagram, and the only difference to moving to 3D vectors is an extra field has been added.
Here is a simple example with two points (2, 1) and (5, 5). The first step is to encode them in a matrix:
\( {\bf S} = \left\lbrack \matrix{2 & 1} \right\rbrack, {\bf D} = \left\lbrack \matrix{5 & 5} \right\rbrack \)
Let’s create two vectors, one that determines how to get to point D starting from S, and one to get to point S starting from D. The first step is to subtract the starting point from the destination point. Notice that if we were to subtract the starting point from itself, the result is the origin, (0, 0).
\( \eqalign{ \overrightarrow{SD} &= [ \matrix{5 & 5} ]  [ \matrix{2 & 1} ] \cr &= [ \matrix{3 & 4} ] } \)
\( \eqalign{ \overrightarrow{DS} &= [ \matrix{2 & 1} ]  [ \matrix{5 & 5} ] \cr &= [ \matrix{3 & 4} ] } \)
Notice that the only difference between the two vectors is the direction. The magnitude of each vector is the same, as expected. Here is how to calculate the magnitude of a vector:
Given:
\( \eqalign{ v &= \matrix{ [a_1 & a_2 & \ldots & a_n]} \cr v &= \sqrt{\rm (a_1^2 + a_2^2 + \ldots + a_n^2)} } \)
Operations
I covered scalar multiplication and matrix addition in my previous post. However, I wanted to briefly revisit the topic to demonstrate one way to think about vectors, which may help you develop a better intuition for the math. For the coordinate systems that we are working with, vectors are straight lines.
When you add two or more vectors together, you can think of them as moving the tail of each successive vector to the head of the previous vector. Then add the corresponding components.
\( \eqalign{ v &= A+B \cr v &= \sqrt{\rm 3^2 + 4^2} \cr &= 5 }\)
Regardless of the orientation of the vector, it can be decomposed into its individual contributions per axis. In our model for the vector, the contribution for each axis is the raw value defined in our matrix.
\( v = A+B+C \)
\( \eqalign{ v &= \sqrt{\rm (3+23)^2 + (1+3+2)^2} \cr &= \sqrt{\rm 2^2 + 6^2} \cr &= 2 \sqrt{\rm 10} }\)
Unit Vector
We can normalize a vector to create unit vector, which is a vector that has a length of 1. The concept of the unit vector is introduced to reduce a vector to simply indicate its direction. This type of vector can now be used as a unit of measure to which other vectors can be compared. This also simplifies the task of deriving other important information when necessary.
To normalize a vector, divide the vector by its magnitude. In terms of the matrix operations that we have discussed, this is simply scalar multiplication of the vector with the inverse of its magnitude.
\( u = \cfrac{v}{ v } \)
\(u\) now contains the information for the direction of \(v\). With algebraic manipulation we can also have:
\( v = u v \)
This basically says the vector \(v\) is equal to the unit vector that has the direction of \(u\) multiplied by the magnitude of \(v\).
Dot Product
The dot product is used for detecting visibility and shading of a surface. The dot product operation is shown below:
\( \eqalign{ w &= u \cdot v \cr &= u_1 v_1 + u_2 v_2 + \cdots + u_n v_n }\)
This formula may look familiar. That is because this is a special case in matrix multiplication called the inner product, which looks like this (remember, we are representing vectors as a row matrix):
\( \eqalign{ u \cdot v &= u v^T \cr &= \matrix {[u_1 & u_2 & \cdots & u_n]} \left\lbrack \matrix{v_1 \\ v_2 \\ \vdots \\ v_n } \right\rbrack }\)
We will use the dot product on two unit vectors to help us determine the angle between these vectors; that’s right, angle. We can do this because of the cosine rule from Trigonometry. The equation that we can use is as follows:
\( \cos \Theta = \cfrac{u \cdot v} {uv} \)
To convert this to the raw angle, \(\Theta\), you can use the inverse function of cosine, arccos(\( \Theta \)). In C and C++ the name of the function is acos().
\( \cos {\arccos \Theta} = \Theta \)
So what is actually occurring when we calculate the dot product that allows us to get an angle?
We are essentially calculating the projection of u onto v. This gives us enough information to manipulate the Trigonometric properties to derive \( \Theta \).
I would like to make one final note regarding the dot product. Many times it is only necessary to know the sign of the angle between two vectors. The following table shows the relationship between the dot product of two vectors and the value of the angle, \( \Theta \), which is in the range \( 0° \text{ to } 180° \). If you like radians, that is \( 0 \text{ to } \pi \).
\( w = u \cdot v = f(u,w) \)
\( f(u,w) = \cases{ w \gt 0 & \text { if } \Theta \lt 90° \cr w = 0 & \text { if } \Theta = 90° \cr w \lt 0 & \text { if } \Theta \gt 90° }\)
Cross Product
The cross product is used to calculate a vector that is perpendicular to a surface. The name for this vector is the surface normal. The surface normal indicates which direction the surface is facing. Therefore, it is used quite extensively for shading and visibility as well.
We have reached the point where we need to start using examples and diagrams with three dimensions. We will use the three points that define our triangular surface to create two vectors. Remember, it is important to use consistent definitions when creating models.
The first step is to calculate our planar vectors. What does that mean? We want to create two vectors that are coplanar (on the same plane), so that we may use them to calculate the surface normal. All that we need to be able to do this are three points that are noncollinear, and we have this with each triangle definition in our polygonal mesh model.
If we have defined a triangle \(\triangle ABC\), we will define two vectors, \(u = \overrightarrow{AB}\) and \(v = \overrightarrow{AC}\).
If we consistently defined the triangle surfaces in our model, we should now be able to take the cross product of \(u \times v\) to receive the surface normal, \(N_p\), of \(\triangle ABC\). Again, the normal vector is perpendicular to its surface.
In practice, this tells us which direction the surface is facing. We would combine this with the dot product to help us determine if are interested in this surface for the current view.
So how do you calculate the cross product?
Here is what you need to know in order to calculate the surface normal vector:
\(\eqalign { N_p &= u \times v \cr &= [\matrix{(u_2v_3 – u_3v_2) & (u_3v_1 – u_1v_3) & (u_1v_2 – u_2v_1)}] }\)
To help you remember how to perform the calculation, I have rewritten it in parts, and used \(x\), \(y\), and \(z\) rather than the index values.
\(\eqalign { x &= u_yv_z – u_zv_y \cr y &= u_zv_x – u_xv_z \cr z &= u_xv_y – u_yv_x }\)
Let's work through an example to demonstrate the cross product. We will use surface A from our axisaligned cube model that we constructed at the beginning of this post. Surface A is defined completely within the XY plane. How do we know this? Because all of the points have the same \(z\)coordinate value of 1. Therefore, we should expect that our calculation for \(N_p\) of surface A will be an \(z\) axisaligned vector pointing in the positive direction.
Step 1: Calculate our surface description vectors \(u\) and \(v\):
\( Surface_A = pt_7: (0,0,1), pt_6: (1,0,1), \text{ and } pt_4: (0,1,1) \)
\( \eqalign { u &= pt_6  pt_7 \cr &= [\matrix{1 & 0 & 1}]  [\matrix{0 & 0 & 1}] \cr &=[\matrix{\color{red}{1} & \color{green}{0} & \color{blue}{0}}] }\) 
\( \eqalign { v &= pt_4  pt_7 \cr &= [\matrix{0 & 1 & 1}]  [\matrix{0 & 0 & 1}] \cr &=[\matrix{\color{red}{0} & \color{green}{1} & \color{blue}{0}}] }\) 
Step 2: Calculate \(u \times v\):
\(N_p = [ \matrix{x & y & z} ]\text{, where:}\)
\( \eqalign { x &= \color{green}{u_y}\color{blue}{v_z} – \color{blue}{u_z}\color{green}{v_y} \cr &= \color{green}{0} \cdot \color{blue}{0}  \color{blue}{0} \cdot \color{green}{1} \cr &= 0 }\) 
\( \eqalign { y &= \color{blue}{u_z}\color{red}{v_x} – \color{red}{u_x}\color{blue}{v_z} \cr &= \color{blue}{0} \cdot \color{red}{0}  \color{red}{1} \cdot \color{blue}{0} \cr &= 0 }\) 
\( \eqalign { z &= \color{red}{u_x}\color{green}{v_y} – \color{green}{u_y}\color{red}{v_x} \cr &= \color{red}{1} \cdot \color{green}{1}  \color{green}{0} \cdot \color{red}{0} \cr &= 1 }\) 
\(N_p = [ \matrix{0 & 0 & 1} ]\)
This is a \(z\) axisaligned vector pointing in the positive direction, which is what we expected to receive.
Notation Formality
If you look at other resources to learn more about the cross product, it is actually defined as a sum in terms of three axisaligned vectors \(i\), \(j\), and \(k\), which then must be multiplied by the vectors \(i=[\matrix{1&0&0}]\), \(j=[\matrix{0&1&0}]\), and \(k=[\matrix{0&0&1}]\) to get the final vector. The difference between what I presented and the form with the standard vectors, \(i\), \(j\), and \(k\), is I omitted a step by placing the result calculations directly in a new vector. I wanted you to be aware of this discrepancy between what I have demonstrated and what is typically taught in the classroom.
Apply What We Have Learned
We have discovered quite a few concepts between the two posts that I have written regarding the basic math and geometry involved with computer graphics. Let's apply this knowledge and solve a useful problem. Detecting if a surface is visible from a specific pointofview is a fundamental problem that is used extensively in this domain. Since we have all of the tools necessary to solve this, let's run through two examples.
For the following examples we will specify the viewpoint as \(pt_{eye} = [\matrix{3 & 2 & 3}]:\)
Also, refer to this diagram for the surfaces that we will test for visibility from the viewpoint. The side surface is surface \(D\) from the cube model, and the hidden surface is the bottomfacing surface \(J\).
Detect a visible surface
Step 1: Calculate our surface description vectors \(u\) and \(v\):
\( \eqalign{ Surface_D: &pt_5: (1,1,1), \cr &pt_1: (1,0,0), \cr &pt_2: (1,1,0) } \)
\( \eqalign { u &= pt_1  pt_5 \cr &= [\matrix{1 & 0 & 0}]  [\matrix{1 & 1 & 1}] \cr &=[\matrix{0 & 1 & 1}] }\) 
\( \eqalign { v &= pt_2  pt_5 \cr &= [\matrix{1 & 1 & 0}]  [\matrix{1 & 1 & 1}] \cr &=[\matrix{0 & 0 & 1}] }\) 
Step 2: Calculate \(u \times v\):
\(N_{pD} = [ \matrix{x & y & z} ]\text{, where:}\)
\( \eqalign { x &= 1 \cdot (1)  (1) \cdot 0 \cr &= 1 }\) 
\( \eqalign { y &= 1 \cdot 0  (1) \cdot 0 \cr &= 0 }\) 
\( \eqalign { z &= 0 \cdot 0  (1) \cdot 0 \cr &= 0 }\) 
\(N_pD = [ \matrix{1 & 0 & 0} ]\)
Step 3: Calculate vector to the eye:
To calculate a vector to the eye, we need to select a point on the target surface and create a vector to the eye. For this example, we will select \(pt_5\)
\( \eqalign{ \overrightarrow{view_D} &= pt_{eye}  pt_5\cr &= [ \matrix{3 & 2 & 3} ]  [ \matrix{1 & 1 & 1} ]\cr &= [ \matrix{2 & 1 & 2} ] }\)
Step 4: Normalize the viewvector:
Before we can use this vector in a dot product, we must normalize the vector:
\( \eqalign{ eye_D &= \sqrt{\rm 2^2 + 1^2 + 2^2} \cr &= \sqrt{\rm 4 + 1 + 4} \cr &= \sqrt{\rm 9} \cr &= 3 \cr \cr eye_u &= \cfrac{eye_D}{eye_D} \cr &= \cfrac{1}3 [\matrix{2 & 1 & 2}] \cr &= \left\lbrack\matrix{\cfrac{2}3 & \cfrac{1}3 & \cfrac{2}3} \right\rbrack }\)
Step 5: Calculate the dot product of the viewvector and surface normal:
\( \eqalign{ w &= eye_{u} \cdot N_{pD} \cr &= \cfrac{2}3 \cdot 1 + \cfrac{1}3 \cdot 0 + \cfrac{2}3 \cdot 0 \cr &= \cfrac{2}3 }\)
Step 6: Test for visibility:
\(w \gt 0\), therefore \(Surface_D\) is visible. 
Detect a surface that is not visible
Step 1: Calculate our surface description vectors \(u\) and \(v\):
\( \eqalign{ Surface_J: &pt_6: (1,1,1), \cr &pt_0: (0,0,0), \cr &pt_1: (1,0,0) } \)
\( \eqalign { u &= pt_0  pt_6 \cr &= [\matrix{0 & 0 & 0}]  [\matrix{1 & 0 & 1}] \cr &=[\matrix{1 & 0 & 1}] }\) 
\( \eqalign { v &= pt_1  pt_6 \cr &= [\matrix{1 & 0 & 0}]  [\matrix{1 & 0 & 1}] \cr &=[\matrix{0 & 0 & 1}] }\) 
Step 2: Calculate \(u \times v\):
\(N_{pJ} = [ \matrix{x & y & z} ]\text{, where:}\)
\( \eqalign { x &= 0 \cdot (1)  (1) \cdot 0 \cr &= 0 }\) 
\( \eqalign { y &= 1 \cdot 0  (1) \cdot (1) \cr &= 1 }\) 
\( \eqalign { z &= (1) \cdot 0  0 \cdot 0 \cr &= 0 }\) 
\(N_pJ = [ \matrix{0 & 1 & 0} ]\)
Step 3: Calculate vector to the eye:
For this example, we will select \(pt_6\)
\( \eqalign{ \overrightarrow{view_J} &= pt_{eye}  pt_6\cr &= [ \matrix{3 & 2 & 3} ]  [ \matrix{1 & 0 & 1} ]\cr &= [ \matrix{2 & 2 & 2} ] }\)
Step 4: Normalize the viewvector:
Before we can use this vector in a dot product, we must normalize the vector:
\( \eqalign{ eye_J &= \sqrt{\rm 2^2 + 2^2 + 2^2} \cr &= \sqrt{\rm 4 + 4 + 4} \cr &= \sqrt{\rm 12} \cr &= 2 \sqrt{\rm 3} \cr \cr eye_u &= \cfrac{eye_J}{eye_J} \cr &= \cfrac{1}{2 \sqrt 3} [\matrix{2 & 2 & 2}] \cr &= \left\lbrack\matrix{\cfrac{1}{\sqrt 3} & \cfrac{1}{\sqrt 3} & \cfrac{1}{\sqrt 3}} \right\rbrack }\)
Step 5: Calculate the dot product of the viewvector and surface normal:
\( \eqalign{ w &= eye_{u} \cdot N_{pJ} \cr &= \cfrac{1}{\sqrt 3} \cdot 0 + \cfrac{1}{\sqrt 3} \cdot (1) + \cfrac{1}{\sqrt 3} \cdot 0 \cr &= \cfrac{1}{\sqrt 3} }\)
Step 6: Test for visibility:
\(w \lt 0\), therefore \(Surface_J\) is not visible. 
What has this example demonstrated?
Surprisingly, this basic task demonstrates every concept that we have learned from the two posts up to this point:
 Matrix Addition: Euclidean vector creation
 Scalar Multiplication: Unit vector calculation, multiplying by \(1 / v \).
 Matrix Multiplication: Dot Product calculation, squarematrix multiplication will be used in the next post.
 Geometric Model Representation: The triangle surface representation provided the points for our calculations.
 Euclidean Vector: Calculation of each surfaces planar vectors, as well as the view vector.
 Cross Product: Calculation of the surface normal, \(N_p\).
 Unit Vector: Preparation of vectors for dot product calculation.
 Dot Product: Used the sign from this calculation to test for visibility of a surface.
Summary
Math can be difficult sometimes, especially with all of the foreign notation. However, with a bit of practice the notation will soon become second nature and many calculations can be performed in your head, similar to basic arithmetic with integers. While many graphics APIs hide these calculations from you, don't be fooled because they still exist and must be performed to display the threedimensional images on your screen. By understanding the basic concepts that I have already demonstrated, you will be able to better troubleshoot your graphics programs when something wrong occurs. Even if you rely solely on a graphics library to abstract the complex mathematics.
There is only one more entry to go, and we will have enough basic knowledge of linear algebra, Euclidean geometry, and computer displays to be able to create, manipulate and display 3D models programmatically. The next entry discusses how to translate, scale and rotate geometric models. It also describes what must occur to create the illusion of a threedimensional image on a twodimensional display with the projection transform.
References
Watt, Alan, "Threedimensional geometry in computer graphics," in 3D Computer Graphics, Harlow, England: AddisonWesley, 1993
Threedimensional computer graphics is an exciting aspect of computing because of the amazing visual effects that can be created for display. All of this is created from an enormous number of calculations that manipulate virtual models, which are constructed from some form of geometric definition. While the math involved for some aspects of computer graphics and animation can become quite complex, the fundamental mathematics that is required is very accessible. Beyond learning some new mathematic notation, a basic threedimensional view can be constructed with algorithms that only require basic arithmetic and a little bit of trigonometry.
I demonstrate how to perform some basic operations with two key constructs for 3D graphics, without the rigorous mathematic introduction. Hopefully the level of detail that I use is enough for anyone that doesn't have a strong background in math, but would very much like to play with 3D APIs. I introduce the math that you must be familiar with in this entry, and in two future posts I demonstrate how to manipulate geometric models and apply the math towards the display of a 3D image.
Linear Algebra
The type of math that forms the basis of 3D graphics is called linear algebra. In general, linear algebra is quite useful for solving systems of linear equations. Rather than go into the details of linear algebra, and all of its capabilities, we are going to simply focus on two related constructs that are used heavily within the topic, the matrix and the vertex.
Matrix
The Matrix provides an efficient mechanism to translate, scale, rotate, and convert between different coordinate systems. This is used extensively to manipulate the geometric models for environment calculations and display.
These are all examples of valid matrices:
\( \left\lbrack \matrix{a & b & c \cr d & e & f} \right\rbrack, \left\lbrack \matrix{10 & 27 \cr 0 & 13 \cr 7 & 17} \right\rbrack, \left\lbrack \matrix{x^2 & 2x \cr 0 & e^x} \right\rbrack\)
Square matrices are particularly important, that is a matrix with the same number of columns and rows. There are some operations that can only be performed on a square matrix, which I will introduce shortly. The notation for matrices, in general, uses capital letters as the variable names. Matrix dimensions can be specified as a shorthand notation, and to also identify an indexed position within the matrix. As far as I am aware, rowmajor indexing is always used for consistency, that is, the first index represents the row, and the second represents the column.
\( A= [a_{ij}]= \left\lbrack \matrix{a_{11} & a_{12} & \ldots & a_{1n} \cr a_{21} & a_{22} & \ldots & a_{2n} \cr \vdots & \vdots & \ddots & \vdots \cr a_{m1} & a_{m2} & \ldots & a_{mn} } \right\rbrack \)
Vector
The vector is a special case of a matrix, where there is only a single row or a single column; also a common practice is to use lowercase letters to represent vectors:
\( u= \left\lbrack \matrix{u_1 & u_2 & u_3} \right\rbrack \), \( v= \left\lbrack \matrix{v_1\cr v_2\cr v_3} \right\rbrack\)
Operations
The mathematical shorthand notation for matrices is very elegant. It simplifies these equations that would be quite cumbersome, otherwise. There are only a few operations that we are concerned with to allow us to get started. Each operation has a basic algorithm to follow to perform a calculation, and the algorithm easily scales with the dimensions of a matrix.
Furthermore, the relationship between math and programming is not always clear. I have a number of colleagues that are excellent programmers and yet they do not consider their math skills very strong. I think that it could be helpful to many to see the conversion of the matrix operations that I just described from mathematical notation into code form. Once I demonstrate how to perform an operation with the mathematical notation and algorithmic steps, I will also show a basic C++ implementation of the concept to help you understand how these concepts map to actual code.
The operations that I implement below assume a Matrix class with the following interface:
C++
class Matrix  
{  
public:  
// Ctor and Dtor omitted  
 
// Calculate and return a reference to the specified element.  
double& element(size_t row, size_t column);  
 
// Resizes this Matrix to have the specified size.  
void resize(size_t row, size_t column);  
 
// Returns the number rows.  
size_t rows();  
 
// Returns the number of columns.  
size_t columns();  
private:  
std::vector<double> data;  
}; 
Transpose
Transpose is a unary operation that is performed on a single matrix, and it is represented by adding a superscript T to target matrix.
For example, the transpose of a matrix A is represented with A^{T}.
The transpose "flips" the orientation of the matrix so that each row becomes a column, and each original column is transformed into a row. I think that a few examples will make this concept more clear.
\( A= \left\lbrack \matrix{a & b & c \cr d & e & f \cr g & h & i} \right\rbrack \)
\(A^T= \left\lbrack \matrix{a & d & g \cr b & e & h \cr c & f & i} \right\rbrack \)
The resulting matrix will contain the same set of values as the original matrix, only their position in the matrix changes.
\( B= \left\lbrack \matrix{1 & 25 & 75 & 100\cr 0 & 5 & 50 & 25\cr 0 & 0 & 10 & 22} \right\rbrack \)
\(B^T= \left\lbrack \matrix{1 & 0 & 0 \cr 25 & 5 & 0 \cr 75 & 50 & 10 \cr 100 & 25 & 22} \right\rbrack \)
It is very common to transpose a matrix, including vertices, before performing other operations. The reason will become clear for matrix multiplication.
\( u= \left\lbrack \matrix{u_1 & u_2 & u_3} \right\rbrack \)
\(u^T= \left\lbrack \matrix{u_1 \cr u_2 \cr u_3} \right\rbrack \)
Matrix Addition
Addition can only be performed between two matrices that are the same size. By size, we mean that the number of rows and columns are the same for each matrix. Addition is performed by adding the values of the corresponding positions of both matrices. The sum of the values creates a new matrix that is the same size as the original two matrices provided to the add operation.
If
\( A= \left\lbrack \matrix{0 & 2 & 4 \cr 6 & 8 & 0 \cr 2 & 4 & 6} \right\rbrack, B= \left\lbrack \matrix{1 & 3 & 5 \cr 7 & 9 & 11 \cr 13 & 15 & 17} \right\rbrack \)
Then
\(A+B=\left\lbrack \matrix{1 & 1 & 9 \cr 1 & 17 & 11 \cr 15 & 11 & 23} \right\rbrack \)
The size of these matrices do not match in their current form.
\(U= \left\lbrack \matrix{4 & 8 & 5}\right\rbrack, V= \left\lbrack \matrix{1 \\ 5 \\ 4}\right\rbrack \)
However, if we take the transpose of one of them, their sizes will match and they can then be added together. The size of the result matrix depends upon which matrix we perform the transpose operation on from the original expression.:
\(U^T+V= \left\lbrack \matrix{3 \\ 3 \\ 9} \right\rbrack \)
Or
\(U+V^T= \left\lbrack \matrix{3 & 3 & 9} \right\rbrack \)
Matrix addition has the same algebraic properties as with the addition of two scalar values:
Commutative Property:
Associative Property:
Identity Property:
Inverse Property:
The code required to implement matrix addition is relatively simple. Here is an example for the Matrix
class definition that I presented earlier:
C++
void Matrix::operator+=(const Matrix& rhs)  
{  
if (rhs.data.size() == data.size())  
{  
// We can simply add each corresponding element  
// in the matrix element data array.  
for (size_t index = 0; index < data.size(); ++index)  
{  
data[index] += rhs.data[index];  
}  
}  
}  
 
Matrix operator+( const Matrix& lhs,  
const Matrix& rhs)  
{  
Matrix result(lhs);  
result += rhs;  
 
return result;  
} 
Scalar Multiplication
Scalar multiplication allows a single scalar value to be multiplied with every entry within a matrix. The result matrix is the same size as the matrix provided to the scalar multiplication expression:
If
\( A= \left\lbrack \matrix{3 & 6 & 9 \cr 12 & 15 & 18} \right\rbrack \)
Then
\( \frac{1}3 A= \left\lbrack \matrix{1 & 2 & 3 \cr 4 & 5 & 6} \right\rbrack, 0A= \left\lbrack \matrix{0 & 0 & 0 \cr 0 & 0 & 0} \right\rbrack, A= \left\lbrack \matrix{3 & 6 & 9 \cr 12 & 15 & 18} \right\rbrack, \)
Scalar multiplication with a matrix exhibits these properties, where c and d are scalar values:
Distributive Property:
Identity Property:
The implementation for scalar multiplication is even simpler than addition.
Note: this implementation only allows the scalar value to appear before the Matrix object in multiplication expressions, which is how the operation is represented in math notation.:
C++
void Matrix::operator*=(const double lhs)  
{  
for (size_t index = 0; index < data.size(); ++index)  
{  
data[index] *= rhs;  
}  
}  
 
Matrix operator*( const double scalar,  
const Matrix& rhs)  
{  
Matrix result(rhs);  
result *= scalar;  
 
return result;  
} 
Matrix Multiplication
Everything seems very simple with matrices, at least once you get used to the new structure. Then you are introduced to matrix multiplication. The algorithm for multiplication is not difficult, however, it is much more labor intensive compared to the other operations that I have introduced to you. There are also a few more restrictions on the parameters for multiplication to be valid. Finally, unlike the addition operator, the matrix multiplication operator does not have all of the same properties as the multiplication operator for scalar values; specifically, the order of parameters matters.
Input / Output
Let's first address what you need to be able to multiply matrices, and what type of matrix you can expect as output. Once we have addressed the structure, we will move on to the process.
Given an the following two matrices:
\( A= \left\lbrack \matrix{a_{11} & \ldots & a_{1n} \cr \vdots & \ddots & \vdots \cr a_{m1} & \ldots & a_{mn} } \right\rbrack, B= \left\lbrack \matrix{b_{11} & \ldots & b_{1v} \cr \vdots & \ddots & \vdots \cr b_{u1} & \ldots & b_{uv} } \right\rbrack \)
A valid product for \( AB=C \) is only possible if number of columns \( n \) in \( A \) is equal to the number of rows \( u \) in \( B \). The resulting matrix \( C \) will have the dimensions \( m \times v \).
\( AB=C= \left\lbrack \matrix{c_{11} & \ldots & c_{1v} \cr \vdots & \ddots & \vdots \cr c_{m1} & \ldots & c_{mv} } \right\rbrack, \)
Let's summarize this in a different way, hopefully this arrangement will make the concept more intuitive:
One last form of the rules for matrix multiplication:
 The number of columns, \(n\), in matrix \( A \) must be equal to the number of rows, \(u\), in matrix \( B \):
\( n = u \)
 The output matrix \( C \) will have the number of rows, \(m\), in \(A\), and the number of columns, \(v\), in \(B\):
\( m \times v \)
 \(m\) and \(v\) do not have to be equal. The only requirement is that they are both greaterthan zero:
\( m \gt 0,\)
\(v \gt 0 \)
How to Multiply
To calculate a single entry in the output matrix, we must multiply the element from each column in the first matrix, with the element in the corresponding row in the second matrix, and add all of these products together. We use the same row in the first matrix, \(A\), for which we are calculating the row element in \(C\). Similarly, we use the column in the second matrix, \(B\) that corresponds with the calculating column element in \(C\).
More succinctly, we can say we are multiplying rows into columns.
For example:
\( A= \left\lbrack \matrix{a_{11} & a_{12} & a_{13} \cr a_{21} & a_{22} & a_{23}} \right\rbrack, B= \left\lbrack \matrix{b_{11} & b_{12} & b_{13} & b_{14} \cr b_{21} & b_{22} & b_{23} & b_{24} \cr b_{31} & b_{32} & b_{33} & b_{34} } \right\rbrack \)
The number of columns in \(A\) is \(3\) and the number of rows in \(B\) is \(3\), therefore, we can perform this operation. The size of the output matrix will be \(2 \times 4\).
This is the formula to calculate the element \(c_{11}\) in \(C\) and the marked rows used from \(A\) and the columns from \(B\):
\( \left\lbrack \matrix{\color{#B11D0A}{a_{11}} & \color{#B11D0A}{a_{12}} & \color{#B11D0A}{a_{13}} \cr a_{21} & a_{22} & a_{23}} \right\rbrack \times \left\lbrack \matrix{\color{#B11D0A}{b_{11}} & b_{12} & b_{13} & b_{14} \cr \color{#B11D0A}{b_{21}} & b_{22} & b_{23} & b_{24} \cr \color{#B11D0A}{b_{31}} & b_{32} & b_{33} & b_{34} } \right\rbrack = \left\lbrack \matrix{\color{#B11D0A}{c_{11}} & c_{12} & c_{13} & c_{14}\cr c_{21} & c_{22} & c_{23} & c_{24} } \right\rbrack \)
\( c_{11}= (a_{11}\times b_{11}) + (a_{12}\times b_{21}) + (a_{13}\times b_{31}) \)
To complete the multiplication, we need to calculate these other seven values \( c_{12}, c_{13}, c_{14}, c_{21}, c_{22}, c_{23}, c_{24}\). Here is another example for the element \(c_{23}\):
\( \left\lbrack \matrix{ a_{11} & a_{12} & a_{13} \cr \color{#B11D0A}{a_{21}} & \color{#B11D0A}{a_{22}} & \color{#B11D0A}{a_{23}} } \right\rbrack \times \left\lbrack \matrix{b_{11} & b_{12} & \color{#B11D0A}{b_{13}} & b_{14} \cr b_{21} & b_{22} & \color{#B11D0A}{b_{23}} & b_{24} \cr b_{31} & b_{32} & \color{#B11D0A}{b_{33}} & b_{34} } \right\rbrack = \left\lbrack \matrix{c_{11} & c_{12} & c_{13} & c_{14}\cr c_{21} & c_{22} & \color{#B11D0A}{c_{23}} & c_{24} } \right\rbrack \)
\( c_{23}= (a_{21}\times b_{13}) + (a_{22}\times b_{23}) + (a_{23}\times b_{33}) \)
Notice how the size of the output matrix changes. Based on this and the size of the input matrices you can end up with some interesting results:
\( \left\lbrack \matrix{a_{11} \cr a_{21} \cr a_{31} } \right\rbrack \times \left\lbrack \matrix{ b_{11} & b_{12} & b_{13} } \right\rbrack = \left\lbrack \matrix{c_{11} & c_{12} & c_{13} \cr c_{21} & c_{22} & c_{23} \cr c_{31} & c_{32} & c_{33} } \right\rbrack \)
\( \left\lbrack \matrix{ a_{11} & a_{12} & a_{13} } \right\rbrack \times \left\lbrack \matrix{b_{11} \cr b_{21} \cr b_{31} } \right\rbrack = \left\lbrack \matrix{c_{11} } \right\rbrack \)
Tip:
To help you keep track of which row to use from the first matrix and which column from the second matrix, create your result matrix of the proper size, then methodically calculate the value for each individual element.
The algebraic properties for the matrix multiplication operator do not match those of the scalar multiplication operator. These are the most notable:
Not Commutative:
The order of the factor matrices definitely matters.
I think it is very important to illustrate this fact. Here is a simple \(2 \times 2 \) multiplication performed two times with the order of the input matrices switched. I have highlighted the only two terms that the two resulting answers have in common:
\( \left\lbrack \matrix{a & b \cr c & d } \right\rbrack \times \left\lbrack \matrix{w & x\cr y & z } \right\rbrack = \left\lbrack \matrix{(\color{red}{aw}+by) & (ax+bz)\cr (cw+dy) & (cx+\color{red}{dz}) } \right\rbrack \)
\( \left\lbrack \matrix{w & x\cr y & z } \right\rbrack \times \left\lbrack \matrix{a & b \cr c & d } \right\rbrack = \left\lbrack \matrix{(\color{red}{aw}+cx) & (bw+dx)\cr (ay+cz) & (by+\color{red}{dz}) } \right\rbrack \)
Product of Zero:
Scalar Multiplication is Commutative:
Associative Property:
Multiplication is associative, however, take note that the relative order for all of the matrices remains the same.
Transpose of a Product:
The transpose of a product is equivalent to the product of transposed factors multiplied in reverse order
Code
We are going to use a twostep solution to create a general purpose matrix multiplication solution. The first step is to create a function that properly calculates a single element in the output matrix:
C++
double Matrix::multiply_element(  
const Matrix& rhs,  
const Matrix& rhs,  
const size_t i,  
const size_t j  
)  
{  
double product = 0;  
 
// Multiply across the specified row, i, for the left matrix  
// and the specified column, j, for the right matrix.  
// Accumulate the total of the products  
// to return as the calculated result.  
for (size_t col_index = 1; col_index <= lhs.columns(); ++col_index)  
{  
for (size_t row_index = 1; row_index <= rhs.rows(); ++row_index)  
{  
product += lhs.element(i, col_index)  
* rhs.element(row_index, j);  
}  
}  
 
return product;  
} 
Now create the outer function that performs the multiplication to populate each field of the output matrix:
C++
// Because we may end up creating a matrix with  
// an entirely new size, it does not make sense  
// to have a *= operator for this generalpurpose solution.  
Matrix& Matrix::operator*( const Matrix& lhs,  
const Matrix& rhs)  
{  
if (lhs.columns() == rhs.rows())  
{  
// Resize the result matrix to the proper size.  
this>resize(lhs.row(), rhs.columns());  
 
// Calculate the value for each element  
// in the result matrix.  
for (size_t i = 1; i <= this>rows(); ++i)  
{  
for (size_t j = 1; j <= this>columns(); ++j)  
{  
element(i,j) = multiply_element(lhs, rhs, i, j);  
}  
}  
}  
 
return *this;  
} 
Summary
3D computer graphics relies heavily on the concepts found in the branch of math called, Linear Algebra. I have introduced two basic constructs from Linear Algebra that we will need to move forward and perform the fundamental calculations for rendering a threedimensional display. At this point I have only scratched the surface as to what is possible, and I have only demonstrated how. I will provide context and demonstrate the what and why, to a degree, on the path helping you begin to work with threedimensional graphics libraries, even if math is not one of your strongest skills.
References
Kreyszig, Erwin; "Chapter 7" from Advanced Engineering Mathematics, 7th ed., New York: John Wiley & Sons, 1993
The binary, octal and hexadecimal number systems pervade all of computing. Every command is reduced to a sequence of strings of 1s and 0s for the computer to interpret. These commands seem like noise, garbage, especially with the sheer length of the information. Becoming familiar with binary and other number systems can make it much simpler to interpret the data.
Once you become familiar with the relationships between the number systems, you can manipulate the data in more convenient forms. Your ability to reason, solve and implement solid programs will grow. Patterns will begin to emerge when you view the raw data in hexadecimal. Some of the most efficient algorithms are based on the powers of two. So do yourself a favor and become more familiar with hexadecimal and binary.
Number Base Conversion and Place Value
I am going to skip this since you can refer to my previous entry for a detailed review of number system conversions[^].
Continuous Data
Binary and hexadecimal are more natural number systems for use with computers because they both have a base of 2 raised to some exponent; binary is 2^{1} and hexadecimal is 2^{4}. We can easily convert from binary to decimal. However, decimal is not always the most useful form. Especially when we consider that we don't always have a nice organized view of our data. Learning to effectively navigate between the formats, especially in your head, increases your ability to understand programs loaded into memory, as well as streams of raw data that may be found in a debugger or analyzers like Wireshark.
The basic unit of data is a byte. While it is true that a byte can be broken down into bits, we tend to work with these bundled collections and process bytes. Modern computer architectures process multiple bytes, specifically 8 for 64bit computers. And then there's graphics cards, which are using 128, 192 and even 256bit width memory accesses. While these large data widths could represent extremely large numbers in decimal, the values tend to have encodings that only use a fraction of the space.
Recognize Your Surroundings
What is the largest binary value that can fit in an 8bit field?
It will be a value that has eight, ones: 1111 1111
. Placing a space after every four bits helps with readability.
What is the largest hexadecimal value that can fit in an 8bit field?
We can take advantage of the power of 2 relationship between binary and hexadecimal. Each hexadecimal digit requires four binary bits. It would be very beneficial to you to commit the following table to memory:
Dec  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15 
Bin  0000  0001  0010  0011  0100  0101  0110  0111  1000  1001  1010  1011  1100  1101  1110  1111 
Hex  0x0  0x1  0x2  0x3  0x4  0x5  0x6  0x7  0x8  0x9  0xA  0xB  0xC  0xD  0xE  0xF 
Now we can simply take each grouping of four bits, 1111 1111
, and convert them into hexdigits, FF
.
What is the largest decimal value that can fit in an 8bit field? This isn't as simple, we must convert the binary value into decimal. Using the number base conversion algorithm, we know that the eighth bit is equal to 2^{8}, or 256. Since zero must be represented the largest value is 2^{8}  1, or 255
.
Navigating the Sea of Data
Binary and hexadecimal (ok, and octal) are all number systems whose base is a power of two. Considering that computers work with binary, the representation of binary and other number systems that fit nicely into the logical data blocks, bytes, become much more meaningful. Through practice and the relatively small scale of the values that can be represented with 8bits, converting a byte a to decimal value feels natural to me. However, when the data is represented with 2, 4 and 8 bytes, the values can grow quickly, and decimal form quickly loses its meaning and my intuition becomes useless as to what value I am dealing with.
For example:
What is the largest binary value that can fit in an 16bit field?
It will be a value that has sixteen, ones: 1111 1111 1111 1111
.
What is the largest hexadecimal value that can fit in an 16bit field? Again, let's convert each of the blocks of fourbits into a hexdigit, FFFF
.
What is the largest decimal value that can fit in an 16bit field?
Let's see, is it 2^{16}  1, so that makes it 65355
, 65535
, or 65555
, it's somewhere around there.
Here's a realworld example that I believe many people are familiar with is the RGB color encoding for pixels. You could add a fourth channel to represent an alpha channel an encode RGBA. If we use onebyte per channel, we can encode all four channels in a single 32bit word, which can be processed very efficiently.
Imagine we are looking at pixel values in memory and the bytes appear in this format: RR GG BB
. It takes two hexadecimal digits to represent a single byte. Therefore, the value of pure green could be represented as 00 FF 00
. To view this same value as a 24bit decimal, is much less helpful, 65,280
.
If we were to change the value to this, 8,388,608
, what has changed? We can tell the value has changed by roughly 8.3M. Since a 16bit value can hold ~65K, we know that the third byte has been modified, and we can guess that it has been increased to 120 or more (8.3M / 65K). But what is held in the lower two bytes now? Our ability to deduce information is not much greater than an estimate. The value in hexadecimal is 80 00 00
.
The difference between 8,388,608
and 8,388,607
are enormous with respect to data encoded at the bytelevel:
8,388,608 
8,388,607 

00  80  00  00  7F  FF  
Now consider that we are dealing with 24bit values in a stream of pixels. For every 12bytes, we will have encoded 4 pixels. Here is a representation of what we would see in the data as most computer views of memory are grouped into 32bit groupings:
4,260,948,991 
2,568,312,378 
3,954,253,066 

FD  F8  EB  FF  99  15  56  3A  8C  B1  1D  0A  
Binary
I typically try to use binary only up to 8bits. Anything larger than that, I simply skip the first 8bits (1byte), and focus on the next 8bits. For example: 1111 1111
1111 1111
. As I demonstrated with the RGB color encoding, values do not always represent a single number. In fact, it is a stream of data. So whether there is one byte, four bytes, or a gigabytes worth of data, we usually process it either one byte or one word at a time. We actually break down decimal numbers into groups by using a comma (or some other regional punctuation) to separate thousands, e.g. 4,294,967,295
, 294
of what? Millions.
Binary manipulation can be found in many contexts. One of the most common is storing a collection of flags in an unsigned buffer. These flags can be flipped on and off with the Boolean flag operations of your programming language. Using a mask with multiple bits allows an enumerated value with more than two options to be encoded within the binary buffer. I'm not going to go to describe the mechanics here, I simply want to demonstrate that data is encoded in many ways, and there are many reasons for you to become proficient at picking apart bytestreams down to the bit.
Hexadecimal
It is just as beneficial to be able to convert between hexadecimal and decimal in your head from 1 to 255, especially if you ever deal with selecting colors for webpages or you edit images in programs like Photoshop. It only takes two hexdigits to represent an 8bit byte. If you memorize the values that are defined by the highorder hexdigit, reading hexadecimal byte values becomes almost as easy as reading decimal values. There are two tables listed below. The table on the left indicates the value mapping for the highorder and loworder hexdigits of a byte. The table on the right contains a set of landmark values that you will most certainly encounter and find useful:


Some of the numbers listed in the table on the right are more obvious than others for why I chose to included them in the map. For example, 100, that's a nice round number that we commonly deal with daily. When you run into 0x64, now you can automatically map that in your mind to 100 and have a referencepoint for its value. Alternatively, if you have a value such as 0x6C, you could start by calculating the difference: 0x6C  0x64 = 8; add that to 100 to know the decimal value is 108.
Some of you will recognize the values 127, 168, 192, 224 and 238. For those that do not see the relevance of these values, they are common octets found in landmark network IP addresses. The table below shows the name and common dotteddecimal for as well as the hexadecimal form of the address in both bigendian and littleendian format:
Name  IPv4 Address  
bigendian  littleendian  
localhost  127.0.0.1  0x7F000001  0x0100007F  
private range  192.168.x.x  0xC0A80000  0x0000A8C0  
multicast base address  224.0.0.0  0xE0000000  0x000000E0  
multicast last address  239.255.255.255  0xEFFFFFFF  0xFFFFFFEF 
One additional fact related to the IPv4 multicast address range, is the official definition declares any IPv4 address with the leading four bits set to 1110
to be a multicast address. 0xE is the hexdigit that maps to binary 1110
. This explains why the full multicast range of addresses is from 0xE0000000 to 0xEFFFFFFF, or written in decimal dotted notation as 224.0.0.0 to 239.255.255.255.
Octal
I'm sure octal has uses, but I have never ran into a situation that I have used it.
Actually, there is one place, which is to demonstrate this equality:
Oct 31 = Dec 25
Landmark Values
Binary has landmark values similar to the way decimal does. For example, 1's up to 10's, 10's through 100, then 1000, 1M, 1B ... These values are focused on powers of 10, and arranged in groups that provide a significant range to be meaningful when you roundoff or estimate. Becoming proficient with the scale of each binary bit up to 2^{10}, which is equal to 1024, or 1K. At this point, we can use the powers of two in multiple contexts; 1) Simple binary counting, 2) measuring data byte lengths.
I have constructed a table below that shows landmark values with the power of 2. On the left I have indicated the name of the unit if we are discussing bits and the size of a single value; for example, 8bits is a byte. I haven't had the need to explore data values larger than 64bits yet, but it's possible that some of you have. To the right I have indicated the units of measure used in computers when we discuss sizes. At 1024 is a kilobyte, and 1024 kilobytes is officially a megabyte (not if you're a hard drive manufacturer...). I continued the table up through 2^{80}, which is known as a "yottabyte." Becoming familiar with the landmarks up to 32bits is probably enough for most people. To the far right I converted the major landmark values into decimal to give you a sense of size and scale for these numbers.
Unit (bits)  Binary Exponent  Unit (bytes)  Place Value  Largest Value  
Bit (b)  2^{0}  Byte (B)  1  
2^{1}  Word  2  
2^{2}  Doubleword  4  
Nibble  2^{3}  Quadword  8  
2^{4}  16  
2^{5}  32  
2^{6}  64  
Byte (B)  2^{7}  128  255  
2^{8}  256  
2^{9}  512  
2^{10}  Kilobyte (KB)  1024  
2^{20}  Megabyte (MB)  1024^{2}  1,048,576  
2^{30}  Gigabyte (GB)  1024^{3}  1,073,741,824  
Doubleword  2^{31}  1024^{3}·2  4,294,967,295  
2^{32}  1024^{3}·2^{2}  
2^{40}  Terabyte (TB)  1024^{4}  
2^{50}  Petabyte (PB)  1024^{5}  
2^{60}  Exabyte (EB)  1024^{6}  
Quadword  2^{63}  1024^{6}·2^{3}  9,223,372,036,854,775,807  
2^{70}  Zettabyte (ZB)  1024^{7}  
2^{80}  Yottabyte (YB)  1024^{8} 
Summary
Looking at a large mass of symbols, such as a memory dump from a computer, can appear to be overwhelming. We do have tools that we use to develop and debug software help us organize and make sense of this data. However, these tools cannot always display this data in formats that are helpful for particular situations. In these cases, understanding the relationships between the numbers and their different representations can be extremely helpful. This is especially true when the data is encoded at irregular offsets. I presented a few common forms that you are likely to encounter when looking at the values stored in computer memory. Hopefully you can use these identity and navigation tips to improve your development and debugging abilities.
I have started writing an entry that discusses the value of becoming familiar with the binary (base2) and hexadecimal (base16) number systems because they are generally more useful to a programmer than decimal (base10). My daughter is currently in highschool and she is taking a programming course. One of the things that she is currently learning how to do is count in binary. So I decided to expand my explanation of conversion between number systems as a reference for her and all of those who would like a refresher. The entry following this one will describe how binary and hexadecimal will make you a more effective programmer.
Place Value: coefficient·radix^{place}
To be able to convert numbers between different numerical bases, it is important to review the concept of place value. Each place, or column, in a number represents a value equal to number system base raised to the power of its place index starting at 0. The official name of the basevalue is the radix. For example, consider the first three place values for a number system that has a radix of b.
b^{2}+b^{1}+b^{0}
If we are dealing with decimal (base10), the radix = 10 and we would have place values of:
10^{2} + 10^{1} + 10^{0}
Now we have the 1's column, 10's column, and 100's column:
100 + 10 + 1
The number stored at each column in the number is called the coefficient. To construct the final number, we multiply the coefficient by its placevalue and add the results at each place together. Here is the decimal number 237 broken down by place value:
2·b^{2} +3·b^{1} +7·b^{0}
2·10^{2} + 3·10^{1} + 7·1^{0}
200 + 30 + 7
237
Hopefully decimal form is so natural to you that 237 seems like a single number, rather than the sum of place values that are multiplied by their coefficient.
The Binary Number System
There are 10 types of people, those that understand binary and those that don't
If you are wondering what the eight other types of people are, continue reading.
Binary is a base2 number system. This means that each column in a binary number represents a value of 2 raised to the power of its place index. A number system requires a number of symbols to represent each place value that is equal to its Base value, and zero is always the first symbol to include in this set. For instance, decimal (base ten) requires ten symbols: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Therefore, binary (base two) only requires two symbols: {0, 1}.
Adding the radix as a subscript after a number is a common notation to indicate the basevalue of a number when there is a possibility for confusion. Unless the context specifically indicates a different number system, decimal form is assumed. For example, the value 2 in binary:
10_{2} = 2_{10}
If we had used that notation in the joke at the beginning... well, it just wouldn't have been as funny.
Another place that you will see the subscript notation is when you study logarithms:
log_{a}x = log_{b}x / log_{b}a
I'll leave the details for logarithms for another time.
Counting in Binary
When learning anything new, it can be helpful to map something that you already know to the new topic. For a new number system, counting with both number systems can be a helpful exercise. Counting in all number systems uses the same process:
 Start with zero in the least significant column
 Count up until you have used all of the symbols in increasing order in the least significant, 1's, column
 When the limit is reached, increment the value in the next column, and reset the current column to zero.
 If the next column has used all of the symbols, increment the column after that and reset the current column.
 Once no further columns reach their limit, return to step 2 to continue counting.
Starting with decimal, if we count up from zero to 9, we get:
0  1  2  3  4  5  6  7  8  9 ...roll over
We are now at step 3, we have reached the limit, so we increment the next column from an implied 0 to 1, and reset the current column for the result of:
10
Continuing to count, we increment the 1's column and rolling over the successive columns as necessary:
11  12  13 ... 98  99 ... roll over
100
Here is 015 in binary. When working with computers, and binary in general, you will typically see zeroes explicitly written for the more significant columns. We require 4 binary digits to represent 15.
Binary  Sum of Columns  Decimal  
0000  0 + 0 + 0 + 0  0  
0001  0 + 0 + 0 + 1  1  
0010  0 + 0 + 2 + 0  2  
0011  0 + 0 + 2 + 1  3  
0100  0 + 4 + 0 + 0  4  
0101  0 + 4 + 0 + 1  5  
0110  0 + 4 + 2 + 0  6  
0111  0 + 4 + 2 + 1  7  
1000  8 + 0 + 0 + 0  8  
1001  8 + 0 + 0 + 1  9  
1010  8 + 0 + 2 + 0  10  
1011  8 + 0 + 2 + 1  11  
1100  8 + 4 + 0 + 0  12  
1101  8 + 4 + 0 + 1  13  
1110  8 + 4 + 2 + 0  14  
1111  8 + 4 + 2 + 1  15 
The Hexadecimal Number System
Hexadecimal is a base16 number system. Therefore, we will need sixteen symbols to represent the place values. We can start with the ten numbers used in decimal, and we use letters of the alphabet to represent the remaining six symbols. Although lettercase can matter in programming, the letters used in hexadecimal are caseinsensitive. Here is a mapping of the hexadecimal values to decimal:
Decimal:  {  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  } 
Hexadecimal:  {  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  } 
Number Base Conversion
Let's discuss how to convert between number systems with different base values. Specifically, we will describe how to convert from "Decimal to Base_{x}" and "Base_{x} to Decimal".
Decimal to Base_{x}
Here is the algorithm to convert a number in decimal to a different base:
 Divide the decimal number by the radix of the target base
 The remainder from step 1 becomes the value for the current column.
 Use the quotient (answer) from step 1 as the decimal value to calculate the next column.
 Return to step 1 and repeat until the quotient is zero.
Let's return to the number 237 and convert it to a binary number:
decimal:  237  
radix:  2  
Decimal  Radix  Quotient  Remainder  
237  /  2  118  1 (2^{0})  
118  /  2  59  0 (2^{1})  
59  /  2  29  1 (2^{2})  
29  /  2  14  1 (2^{3})  
14  /  2  7  0 (2^{4})  
7  /  2  3  1 (2^{5})  
3  /  2  1  1 (2^{6})  
1  /  2  0  1 (2^{7})  
binary:  11101101 
Here is 237 converted to a hexadecimal number:
decimal:  237  
radix:  16  
Decimal  Radix  Quotient  Remainder  
237  /  16  14  D_{16} (13) (16^{0})  
14  /  16  0  E_{16} (14) (16^{1})  
hexadecimal:  ED_{16} 
A common notation to represent hexadecimal when dealing with computers and in programming languages themselves, is to prepend an 'x' in front of the number like so: xED
.
Here is one more decimaltohexadecimal example:
decimal:  3,134,243,038  
radix:  16  
Decimal  Radix  Quotient  Remainder  
3,134,243,038  /  16  195,890,189  E (14) (16^{0})  
195,890,189  /  16  12,243,136  D (13) (16^{1})  
12,243,136  /  16  765,196  0 ( 0) (16^{2})  
765,196  /  16  47,824  C (12) (16^{3})  
47,824  /  16  2,989  0 ( 0) (16^{4})  
2,989  /  16  186  D (13) (16^{5})  
186  /  16  11  A (10) (16^{6})  
11  /  16  0  B (11) (16^{7})  
hexadecimal:  xBAD0C0DE 
Base_{x} to Decimal
Actually, I have already demonstrated how to convert a number from a base different than ten, into decimal. Once again, here is the complete formula, where c_{x} represents the coefficients at each placecolumn.
c_{n}·b^{n} + ... + c_{2}·b^{2} + c_{1}·b^{1} + c_{0}·b^{0}
As an example, let's convert the binary answer back into decimal:
1·2^{7} + 1·2^{6} + 1·2^{5} + 0·2^{4} + 1·2^{3} + 1·2^{2} + 0·2^{1} + 1·2^{0}
1·128 + 1·64 + 1·32 + 0·16 + 1·8 + 1·4 + 0·2 + 1·1
128 + 64 + 32 + 8 + 4 + 1
237
Base_{x} to Base_{y}
Is it possible to convert a number from Base_{x} directly to Base_{y} without converting to decimal (base10) first?
Yes, however, you will need to perform all of your math operations in either Base_{x} or Base_{y}. The algorithms that I have presented are performed with base10 since that is the number system most people are familiar with.
Demonstration
Here is a short demonstration program to convert a decimal number into a value of any numeral base between 236.
Why between those two ranges?
Try to imagine how a base1 number system would work with only the symbol {0} to work with. Alternatively, we can combine the numerical set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} with the alphabetic set: {A, B, C ... X, Y, Z} to create a set of 36 symbols.
C++
// The set of possible symbols for representing other bases.  
const char symbols[] = {'0', '1', '2', '3', '4', '5',  
'6', '7', '8', '9', 'A', 'B',  
'C', 'D', 'E', 'F', 'G', 'H',  
'I', 'J', 'K', 'L', 'M', 'N',  
'O', 'P', 'Q', 'R', 'S', 'T',  
'U', 'V', 'W', 'X', 'Y', 'Z'}; 
Here is the baseconversion algorithm from above written in C++.
C++
void ConvertToBase( const unsigned long decimal,  
const unsigned long radix)  
{  
unsigned long remainder[32] = {0};  
unsigned long quotient = decimal;  
unsigned char place = 0;  
 
while (0 != quotient)  
{  
unsigned long value = quotient;  
remainder[place] = value % radix;  
quotient = (value  remainder[place]) / radix;  
 
place++;  
}  
 
cout << decimal << " in base " << radix << " is ";  
 
for (unsigned char index = 1; index <= place; index++)  
{  
cout << symbols[remainder[place  index]];  
}  
} 
The values are from the examples above. You can modify the values that are used with the function calls to ConvertToBase
in the program below:
C++
int main(int argc, char* argv[])  
{  
ConvertToBase(237, 2);  
ConvertToBase(237, 10);  
ConvertToBase(237, 16);  
ConvertToBase(3134243038, 16);  
ConvertToBase(3134243038, 36);  
 
return 0;  
} 
Output:
237 in base 2 is 11101101 237 in base 10 is 237 237 in base 16 is ED 3134243038 in base 16 is BAD0C0DE 3134243038 in base 36 is 1FU1PEM
Summary
Ever since you learned to count in decimal you have been using exponents with a base10; it's just that no one ever made a big deal of this fact. To use a different numeral system, such as binary (base2) or hexadecimal (base16), you simply need to determine a set of symbols to represent the values and you can use the same counting rules that you use in base10, except that you have a different number of symbols. Converting between any number system is possible, however, it is simplest to convert to decimal first if you want to continue to use base10 arithmetic operations.
A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs automated data serialization with compiletime reflection. It is written in C++ using template metaprogramming.
My previous entry was a condensed overview on rvalue references. I described the differences between value expressions and types. I also summarized as much wisdom as I could collect regarding how to effectively use move semantics and perfectforwarding. After I completed the essay, I was eager to integrate move semantics for my serialization objects in Alchemy. This entry is a journal of my experience optimizing my library with rvalue references.
Motivation
My initial motivation for learning and writing about rvalue references was because I was finding it difficult to improve the performance of Alchemy with them. In fact, I tended to make the performance of my library much worse that without move semantics.
The first thing this says to me, is that the compilers are doing a pretty damn good job of optimizing my code without the move constructors. The next conclusion was shrouded in mystery.
"Am I using this correctly?""... I followed the advice in Effective Modern C++, why is it getting worse?"
"... Maybe I have done so well that all of the copied have been elided by the compiler?!"
After a few fruitless attempts to improve the performance in the evenings, I decided I better dig in and truly understand what is happening before I revisit Alchemy and move semantics. I learned a lot simply trying to write a few compelling programs to demonstrate the concepts.
Now it was time to return back to integrate move semantics into my library.
Measuring Success
I have developed a benchmark application that is included as part of the Alchemy source on GitHub. As part of my initial tuning of Alchemy I was able to achieve a modicum of success in Classic C++ (C++98). If you are unfamiliar with the details of the benchmark application, you can learn more from this previous post: Alchemy: Benchmarks[^].
Here is a summary of the machine and compiler settings displayed in this post:
 Machine:
 Intel Core i74790L @ 4.00GHz
 16 GB RAM
 Windows 8.1
 Compiler  64bit Visual Studio 2013:
 Maximize Speed: /O2
 Enable Intrinsic (Yes): /Oi
 Favor Speed: /Ot
 Omit Frame Pointers (Yes): /Oy
 Whole Program Optimization (Yes): /GL
 Data set size: 512 MB
The numbers below, reflect what I started with when began to add the move operations to Hg.
Test  Control  Hg  Diff  Percent 
Basic:  0.4133 s  0.3193 s  0.0940 s  22.74% 
Packed:  0.3959 s  0.3519 s  0.0440 s  11.12% 
Unaligned:  0.4391 s  0.4425 s  0.0034 s  0.773% 
Complex:  0.7485 s  0.7654 s  0.0169 s  2.261% 
Array:  0.5141 s  0.1409 s  0.3732 s  72.59% 
Total:  2.511 s  2.0574 s  0.3732 s  18.06% 
The thing that was most challenging to improve up to this point was the speed of the Complex test. This test uses a nested message structure, that contains an array and all of the message structures from the other tests. I found many temporary copies, which I eliminated to reach this point. However, the relatively low performance number of this test compared to the others indicated that I most likely had other temporary copies the were lurking within the code.
What did I discover?
The very first thing that I did when I returned to Alchemy, was search every class for a destructor, copy constructor, or assignment operator. The new rules the compiler uses to automatically generate special memberfunctions will halt the use of move semantics dead in its tracks if you are not paying attention.
There are very few destructors in Alchemy. The majority of the data is transient, and actually managed by the caller. There is the Hg::MsgBuffer
, which uses a std::vector
to store serialized message data. This class has a destructor. All of the others manage a single field of data that is composited within the class.
On the other hand, Alchemy has plenty of copy constructors and assignment operators. Alchemy provides value semantics for all of the message subfields. These fields behave much like the fundamental types that they encapsulate. To provide a natural syntax, there are generally two types of assignment operators in each Datum
and Proxy
class. The first is the assignment operator for the object, and the second accepts the value_type
the object represents.
I discovered places where I missed supplying a move constructor for all of the value types that had some form of copy operation. I also found a few places where I supplied copy and move constructors for subtypes of a class, but not for the class itself.
C++
// ******************************************************  
/// Copy Constructor  
Message(const message_type& rhs)  
{  
*static_cast< message_type *>(this) = rhs;  
}  
 
// ******************************************************  
/// Move Constructor  
Message(message_type&& rhs)  
: base_type(std::move(rhs))  
{ }  
 
// ******************************************************  
// Discovered that I was missing these versions:  
// Message(const Message& rhs)  
// Message(Message&& rhs)  
// 
Results
These changes had very little effect on the results. Setting break points within my code showed that the move constructors were now being called. So why didn't the results change? I kept searching.
FieldTypes
I started to scrutinize the inheritance hierarchy of the Hg::Message
class and the Hg::Datum
types that it contained. I verified that I was calling the proper base class operations and moving the rvalue expressions into these function calls.
Then I reached the end of the inheritance hierarchy, which existed a very simple class, FieldTypes
. FieldTypes
can be found in ./Alchemy/Hg/Datum/basic_datum.h
.
This class provides a generic structure to provide a common way to hold and access the actual data storage created for each message datum. The TraitT
type allows for tagdispatching for special cases such as a std::vector
or a nested message type.
C++
template< typename FieldT,  
typename TraitT =  
typename Hg::detail::deduce_type_trait<FieldT>::type  
>  
struct FieldTypes  
{  
using index_type = FieldT;  
using value_type = typename field_data_t<index_type>::value_type;  
 
value_type& reference()  
{  
return m_data;  
}  
 
const value_type& data() const  
{  
return m_data;  
}  
 
protected:  
value_type m_data;  
}; 
I took a look at the specialization that I had created for the nested message type. Here is the declaration of that class:
C++
template< typename FieldT >  
struct FieldTypes <FieldT, nested_trait>  
: public field_data_t<FieldT>::value_type  
{  
using index_type = FieldT;  
using value_type = typename field_data_t<index_type>::value_type;  
 
// ...  
}; 
There is one interesting line that caught my attention, and helped me narrow down the cause of the issue:
C++
: public field_data_t<FieldT>::value_type 
The reason this is interesting, is because I was searching for how the storage was represented and accessed in a nested type. In this case, rather than containing a member data field, the data is provided by derivation. For a nested type, the base class is the Alchemy format definition. This class then contains a ProxyDatum
, which derives from a Datum
, which derives from a FieldType
and brings us back to the level we are currently inspecting.
It's not the nested type after all...
After looking at this, it occurred to me that the default generated move operations were clearly being used by the compiler. I have not added any code that would prevent this in the message definitions and nested fields. However, that did not prevent entire sets of fields from being moved and copied.
I had created plenty of value constructors and assignment operators to accept value types, but I had not made any attempt to optimize the movement of these fields within the most basic structures that managed these data fields. So I added copy and move constructor implementations to the FieldTypes
class to allow every possible conversion to be considered when moving the fundamental, and container data types.
This brought me my first big gain in performance.
... or maybe it is the nested type?
Unfortunately, there still seemed to be a major problem with the nested types.
I followed the execution path of the benchmark program in the debugger. I was looking for any clues where a copy was being performed rather than a move. Then I located the next big gain in performance.
In the byteorder processing code, I discovered a difference in processing for the nested fieldtypes compared to all of the other types of data. The implementation constructs a Hg::Message
object from that raw Alchemy message format that is used to represent the nested field type. This is repackaged in order to recursively use the same processing logic that converts the other fieldtypes performed by convert_byte_order
.
C++
struct ConvertEndianess<T, StorageT, nested_trait>  
{  
template <typename NestedValueT>  
void operator()(const NestedValueT &input,  
NestedValueT &output)  
{  
// Construct a shallow message wrapper around the nested data.  
from_type from(input);  
to_type to;  
 
// Pass this message to be byteorder swapped.  
output = convert_byte_order<from_type,  
to_order>(from, to).values();  
}  
}; 
You can see that I pass the output
field as an input parameter to this function call. In order to improve the speed of this function I needed to figure out how to more efficiently construct this temporary message, or alter the processing logic to accept this nested fieldtype.
Moving local resources... don't do it
This was my first attempt to improve the performance of this temporary instance of the message looked like this:
C++
template <typename NestedValueT>  
void operator()(const NestedValueT &input,  
NestedValueT &output)  
{  
// Pass this message to be byteorder swapped.  
output = convert_byte_order<from_type,  
to_order>(from_type(input), to_type()).values();  
} 
That helped, but not like I had hoped. Then I made this adjustment:
C++
output = std::move(convert_byte_order(/*params*/)); 
That was what I needed!
Then I ran the unittests, and a number of them failed. The ones that contained a vector actually crashed. The output of byteorder conversion was showing me that the data was being destroyed. I followed the logic in the debugger and discovered my problem.
I am creating the instance of to_type
locally. This is passed into the conversion function and all is well. Then the data is moved into output
and what appears is garbage. I was confused at first. Then I watched the addresses of all of the data items.
to_type
is being created on the stack, and that portion of the stack is destroyed before it has a chance to be moved to output
. I tried many different variations of the this approach. My final conclusion is that I would not be able to achieve what I was after without moving the creation of the to_type
object outside of the function call.
However, I could not do that because that would change the usage of the library. I want to keep the interaction as simple as possible for the user. Therefore, I reworked the structure of the code just a little and this is the final implementation:
C++
template <typename NestedValueT>  
void operator()(const NestedValueT &input,  
NestedValueT &output)  
{  
// Pass this message to be byteorder swapped.  
to_type to(output);  
convert_byte_order<from_type, to_order>(from_type(input),  
to);  
output = std::move(to);  
} 
convert_byte_order
no longer returns a value. That is because the value it would return is the to_type
object that is created, and we still have access to that instance because it is passedbyreference. Therefore, when I am done with it, it is moved into the output
parameter, which also is passedbyreference into the current function.
Final Results
Here are where the current performance results stand based on the adjustments that I made to optimize with move semantics:
Test  Control  Hg  Diff  Percent 
Basic:  0.4133 s  0.2423 s  0.1710 s  41.37% 
Packed:  0.3959 s  0.3403 s  0.0556 s  14.05% 
Unaligned:  0.4391 s  0.2295 s  0.2096 s  47.74% 
Complex:  0.7485 s  0.5573 s  0.1912 s  25.54% 
Array:  0.5141 s  0.1376 s  0.3765 s  73.23% 
Total:  2.5109 s  1.5071 s  1.0039 s  39.98% 
Here is a table that compares the final times for each test with the previous implementation of Alchemy, as well as the results after adding move semantics. I think you will agree that the small amount of effort required to add move semantics is well worth spending.
Test  Alchemy  w/ Move Semantics 
Diff  Overall Change 
Basic:  0.3193 s  0.2423 s  0.0770 s  24% 
Packed:  0.3519 s  0.3403 s  0.0116 s  3% 
Unaligned:  0.4425 s  0.2295 s  0.2130 s  48% 
Complex:  0.7654 s  0.5573 s  0.2081 s  27% 
Array:  0.1409 s  0.1376 s  0.0033 s  2% 
Total:  2.0574 s  1.5071 s  0.5503 s  27% 
Summary
So far I have learned that move semantics is a somewhat finicky feature that is new with Modern C++. However, the rules are not difficult to learn. After an afternoon's worth of effort applying the principles of rvalue references to Alchemy I was able to realize a 25% increase in performance. This was time well spent.
Rvalue references were introduced with C++11, and they are used to implement move semantics and perfectforwarding. Both of these techniques are ways to eliminate copies of data parameters for efficiency. There is much confusion around this new feature that uses the &&
operator, because its meaning is often based on the context it is used. It is important to understand the subtleties around rvalue references in order for them to be effective. This entry will teach you how to use the rvalue reference with plenty of livedemonstrations.
Move it!
When I first learned of move semantics, I expected that this feature would be more or less automatic, much like the copy constructor. As it turns out, there are common programming practices that will actually hinder the compiler's ability to generate and use move operations. The concept of move semantics and perfectforwarding are very simple. However, without understanding a few of the nuances of rvalue references, these idioms will seem fickle when you try to put them to use.
It is important to have a basic understanding of the fundamental components of C++ that have shaped how this new feature was added to the language, and why the explicit steps are required. Therefore, let's start with some background information and vocabulary, then work our way to the main topic.
Lvalue and Rvalue
Syntax expressions are evaluated and assigned both a type and a value category. We are concerned with the differences between the different value categories as we try to understand rvalue references. Specifically we are interested in the lvalue and rvalue categories.These terms are derived from the arguments on each side of the assignment operator. 'L' for left, to which values are assigned, and 'R' for right that contains the value to be assigned. However, this is only a simplification of their definition.
Another way to look at these terms is how they manifest in the final program. Lvalues are expressions that identify nontemporary objects. Essentially, they have addressable storage for loading and storing data. An rvalue is an expression that refers to a temporary object, or a value that is not associated with any object.
An lvalue is not necessarily modifiable. A good example is a constant expression qualified with the const
keyword. After its initialization, the expression has storage that can be addressed, but the value cannot be modified. Therefore, lvalues are further distinguished by modifiable lvalues and nonmodifiable lvalues.
Here is a list of items that are lvalue expressions:
 Nonmodifiable:
 String literals
 Constant expressions
 Modifiable:
 The name of a variable
 Function calls that return lvalue references
 Preincrement and predecrement operators
 Dereference and assignments
 Expressions cast to lvalue reference type
Here is a list of items that are rvalue expressions:
 Literal values: true, 27ul, 3.14 (except string literals)
 Function call expressions that do not return a reference
 Expressions composed from arithmetic, relational, logical and bitwise operators
 The postfix increment and decrement operators
 Cast expression to any type other than a reference type
 Lambda expressions
Does it have a name?
There is a simple way that can help you determine if you are dealing with an lvalue or an rvalue.
Can you refer to the expression by name?
A value that can be referenced by name is an lvalue. This is not an absolute, but it is a good rule of thumb to help you generally reason about your data values. An example of an exception is a memberfunction. Also, this does not cover all expressions that are considered lvalues. Examples of lvalue expressions that do not have names are string literals and function call expressions that return an lvalue reference.
xvalues, prvalues, glvalues...
In the cursory overview of expression values, I have left out the description of some of the exceptions to the rules and subcategories for lvalues and rvalues. These other categories that capture the remaining situations. However, going even deeper into the nuances digresses from the original topic, and will only add more confusion. Therefore I will simply leave you with the knowledge that these other categories exist, and a reference of where you can learn more about them. Value categories at cppreference.com[^]
& (lvalue reference)
An lvalue reference is what we generally call a reference. It is also important to note that it is a type. This is in contrast to value categories, which I described in the previous section. Here is a brief review of the concepts associated with references:
 A reference is an alias to an object or a function that already exists
 A reference must be initialized when it is defined
 It cannot be reseated (reassigned) after it is created
 It is not legal to create arrays, pointers or references to references (except with templates)
The most common use for an lvalue reference is to pass parameters byreference in function calls.
C++
void LogMessage(std::string const &msg)  
{  
// msg is an alias for the input parameter at the call site.  
// Therefore, a copy of the string is avoided.  
} 
I prefer to use references over pointers, except when there is a possibility to receive an empty pointer. The logic becomes much simpler when writing safe productionquality code. The need to verify pointer input parameters is eliminated. In some cases, after I verify a pointer parameter, I will dereference it and assign it to a reference. A similar situation is when I perform some type of cast on a pointer I usually dereference and assign it to a reference of the new type.
C++
// I know what you're thinking...  
// I interface with a lot of C and legacy C++  
int process_state(  
const SystemInputs *p_inputs,  
void* p_context  
)  
{  
if ( !p_inputs  
 !p_context)  
{  
return k_error_invalid_parameter;  
}  
 
SystemInputs& input = *p_inputs;  
SystemState& state = *reinterpret_cast< SystemState* >(p_context);  
 
// ...  
} 
If a function returns an lvalue reference, then the function call expression will be an lvalue expression. This is use of references is used to implement the at
and operator[]
memberfunctions of std:: vector
.
C++
// Where reference is an alias for T&  
reference operator[]( size_type pos )  
{  
// Return the requested element  
// from the heapallocated data array  
return p_data[pos];  
} 
Dangling References
Although references do make code easier to work and reason with, they are not perfect. Similar to a pointer, the possibility still exists for the object that was used to initialize a reference is destroyed before the reference is destroyed. This leaves you with a dangling reference, which leaves your code executing in the unspecified behavior territory.
The stack is one of the safest places to create a reference. That is with the assumption that the new reference will go out of scope before or at the same time as the object used to initialize the reference.
This is the reason why you do not return a reference from a function call, in which you return a locally created variable. Either your object was created on the stack and will be destroyed after the return statement is evaluated, or your object was dynamically allocated, which you would have no way to free the memory when you were done.
C++
std::string& FormatError(int errCode)  
{  
std::string errorText;  
// Populate the string with the proper error message.  
 
return errorText;  
// errorText is now destroyed.  
// The caller receives a dangling reference.  
} 
&& (rvalue reference)
Prior to C++11, it was not possible to declare an rvalue as a reference. The only place it was legal to declare a reference was with an lvalue expression. C++11 introduces the &&
operator, which now allows references to be defined for rvalue expressions. An rvalue reference is a type.
Remember that one type of rvalue is an expression that refers to a temporary object. An rvalue reference is used to extend the lifetime of a temporary object. The most compelling place to apply rvalue references are with object construction and assignment. This allows compilers to replace expensive copy operations with less expensive moves. The formal name given to this feature is move semantics. Another exciting use is applied to template function parameters, in which the technique known as perfectforwarding is used.
In overload resolution for function calls, the rvalue reference type is given precedence of lvalue reference.
Move Semantics
Allows you to control the semantics of moving your userdefined types. It is actually possible to accomplish this with classic C++. However, you would have to forego the copy constructor. With the rvalue reference, it is now possible to provide both a move constructor and a copy constructor within the same object.
PerfectForwarding
Makes it possible to create function templates that are able to pass their arguments to other functions in a way that allows the target function to receive the exact same objects.
[Intermission]
I presented that long and detailoriented introduction up front so you would have context with most of the details to understand why this movement isn't always automatic. Also, hopefully I have presented the details in a memorable order to help you remember the proper actions required for each situation. We will continue to introduce details gradually, and I will summarize with a set of rules to lead you in a successful direction.
Reference Collapsing
Reference collapsing is part of the type deduction rules used for function templates. The rules are applied based upon the context of the function call. The type of argument passed to the specific instantiation is considered when determining the type for the final function call. This is necessary to protect against unintentional errors from occurring where lvalues and rvalues are concerned.
I mentioned earlier in the section regarding references that it was not legal to create a reference to a reference, with the exception of templates. It's time to demonstrate what I mean:
C++
int value = 0; // OK: Fundamental type  
int& ref = value; // OK: Reference to type  
int& &ref_to_ref = ref; // Error: Reference to reference not allowed  
 
// Now we have rvalue references  
int&& rvalue_bad = ref; // Error: Rvalue reference cannot bind to lvalue  
// Remember, if it has a name, it is an lvalue  
int&& rvalue_ref = 100; // OK: A literal value is an rvalue 
Templates follow a set of typededuction rules to determine what type should be assigned to each parameterized value of the template. Scott Meyers provides a very thorough description of these rules in Item 1 of "Effective Modern C++". Suffice to say, the important rules to note are:
 If an argument is a reference, the reference is not considered during type deduction
 Lvalue arguments are given special consideration in certain circumstances (this is where reference collapsing applies)
The rules of reference collapsing
The rules are actually very simple. The rules have the same output of the AND
truth table; where an lvalue reference, &
, is 0 and an rvalue reference, &&
, is 1. I think it is subtly fitting, given the other meaning of the &&
operator. This should make it easier to remember the rules as well.
Truth Table: Reference Collapsing Rules  & := 0, && := 1 
New Rules for compiler generated functions
Hopefully you are well aware that the compiler may generate four special memberfunctions for a class as needed with classic C++. If not, it's never too late to learn. The four functions are:
 Default Constructor (If no other constructor has been defined)
 Destructor
 Copy Constructor
 (Copy) Assignment Operator
Two additional functions have been added to this set to properly manage the new concept of move operations.
 Move Constructor
 Move Assignment Operator
The default behavior of a generated move function is similar to the copybased counterparts. A move operation for each member of the object. However, the compiler is much more conservative about automatically choosing to generate these new functions when compared to the others. The primary reason is the notion that if the default move behavior is not sufficient that you elect to implement your own, then the default copy behavior most likely will not be sufficient either. Therefore it will not automatically generate the copybased functions when you implement either of the move functions.
Furthermore, if you implement only one of the move operations, it will not automatically implement the other move operation for the same logic. In fact, no compiler generated move operations will be created if the userdefined type has implemented its own destructor or copy operation. When move operations are not defined, the copy operations will be used instead.
If you are in the habit or even feel the compulsion to alwaysdefine a destructor, even if it is an empty destructor, you may want to try to change that behavior. There is now actually a better alternative. Similar to how you can delete
the compiler generated defaults, you can also explicitly specify that you would like to use the defaults. The syntax is the same as delete
, except you use the default
keyword.
C++
class UserType  
{  
public:  
// This declaration will not preclude  
// a usertype from receiving a  
// compilergenerated move constructor.  
~UserType() = default;  
}; 
Specifying default
, will also allow you to continue to use the compilers copy operations even when you implement your own move operations. If you would like to read a full account of the rules and reasoning for changes, refer to Item 17 in "Effective Modern C++".
std::move("it!");
std::move
is a new function has been added to the Standard Library and it can be found in the <utility>
header. std::move
does not add any actual executable code to your program because it is implemented as a single cast operation. Yet this function is very important because it serves two purposes:
 Explicitly communicates your intentions to move an object
 Provides hint (enables actually) the compiler to apply move semantics
Here is the implementation of std::move
:
C++
template< class T >  
typename std::remove_reference<T>::type&& move(T&& t)  
{  
return static_cast<typename std::remove_reference<T>::type&&>(t);  
} 
This function is a convenient wrapper around a cast that will unconditionally convert rvalue references to rvalue expressions when passing them to other functions. This makes them capable of participating in move operations. std::move
is the explicit nudge that you supply to the compiler when you want to perform a move assignment rather than a copy assignment.
It is necessary to use std::move
inside of a move constructor, because all of your values in the rhs object that you will move from are lvalues. As I mentioned, std::move
unconditionally converts these lvalues into rvalue references. This is the only way the compiler would be able to differentiate between a move assignment and a copy assignment in this context.
The only operations that are valid to perform on an argument that has been supplied to std::move
, is a call to its destructor, or to assign a new value to it. Therefore, it is best to only use std::move
on the last use of its input for the current scope.
Extremely Important!
If your class is a derived class, and implements a move operation, it is very important that you use std::move
on the parameters that you pass to the base class. Otherwise the copy operations will be called in the base implementation.
Why?
Because the input parameters to your move operation are lvalue expressions. Maybe you are objecting with "no they're not! They are rvalue references!" The parameters are rvalue references, however, your arguments have been given a name for you to refer to. That makes them lvalue expressions, which refer to rvalue references.
The bottom line is that calling a base class implementation requires the same attention that is required for all other move operations in this context. Just because you happen to be in a move operation for your derived class, does not mean that the compiler can tell that it needs to call the same move operation for the base class. In fact, you may not want it to call the move operation. This now allows you to choose which version is called.
Move operation implementations
We now have enough knowledge to be able to constructively apply the principles of move semantics. Let's apply them to implement a move constructor for an object.
The functions below implement the copy operations for a class called ComplexData
that is derived from a base class called BasicData
.
Derived Move Constructor
C++
ComplexData(ComplexData&& rhs)  
: BasicData(std::move(rhs))  
{  
// Move operations on ComplexData data members  
complex_info = std::move(rhs.complex_info);  
} 
Derived Move Assignment Operator
C++
ComplexData& operator=(ComplexData&& rhs)  
{  
BasicData::operator=(std::move(rhs));  
 
// Move operations on ComplexData data members  
complex_info = std::move(rhs.complex_info);  
 
return *this;  
} 
Observe move operations
Number
The class used in this demonstration is called Number
. It implements each of the special memberfunctions of the class. It also provides a way to set and get the value of the object. This example lets you observe when move operations are performed versus copy operations.
The implementation is very simple only holding a single data member int m_value
. I do not use the call to std::move
inside of the move operations because it is not necessary. Similarly, if we had allocated pointers and were moving them between two objects, we would copy the pointer to the destination class, and set the source class pointer to nullptr
. I will set the number to 1 in this version to differentiate an invalid state from 0.
Number move assignment operator
C++
Number& operator=(Number&& rhs)  
{  
cout << "Move Assignment Operator\n";  
m_value = rhs.m_value;  
rhs.m_value = 1;  
return *this;  
} 
Move Example
C++
int main(int argc, char* argv[])  
{  
std::cout << "Construct three Numbers:\n";  
Number value(100);  
Number copied(value);  
Number moved(std::move(value));  
 
std::cout << "\nvalue: " << value  
<< "\ncopied: " << copied  
<< "\nmoved: " << moved;  
std::cout << "\n\nCopy and move:\n";  
value = 202;  
moved = std::move(value);  
copied = value;  
std::cout << "\nvalue: " << value  
<< "\nmoved: " << moved  
<< "\ncopied: " << copied;  
} 
Output:
Construct three Numbers: Value Constructor Copy Constructor Move Constructor value: 1 copied: 100 moved: 100 Copy and move: Value Assignment Operator Move Assignment Operator Copy Assignment Operator value: 1 moved: 202 copied: 1
std::forward<perfect>("it!");
As the title of this section implies, there is a new function in the Standard Library called std::forward
. However, it is important to understand why it exists, because it is designed to be used in a special situation. The situation is when you have a typededuced function argument that you would like to move, also called forward, as the return value, or to another subroutine.
In classic C++, the way to move function arguments through function calls is by using callbyreference. This works, but it is inconvenient because you must make a decision on tradeoffs. You either choose to give up flexibility on the type of arguments that can be used to call your function, or you must provide overloads to expand the range of argument types that can be used with your function call.
Can be called by lvalues, however, rvalues are excluded:
C++
template< typename T >  
T& process(T& param);  
 
int val = 5;  
process(val); // OK: lvalue  
process(10); // Error: Initial value to ref of  
// nonconst must be lvalue  
process(val + val); // Error: Initial value to ref of  
// nonconst must be lvalue 
Now supports rvalues, however, move semantics are no longer possible:
C++
template< typename T >  
T& process(T const& param);  
 
int val = 5;  
process(val); // OK: lvalue  
process(10); // OK: Temporary object is constructed  
process(val + val); // OK: Temporary object is constructed  
 
// However, none of these instances  
// can participate in move operations. 
Furthermore, if the two solutions above are combined with overloads and the function in question contains multiple arguments the number of overloads required to capture all of the possible states of arguments causes an exponential explosion of overloads which is not scalable. With the addition of the rvalue reference in Modern C++, it is possible to compact this solution back into a single function, and rvalues will remain as viable candidates for move operations.
Where does std::forward
apply in this situation?
Since lvalue expressions are already capable of passing through function calls in this situation, we actually want to avoid applying move semantics on these arguments because we could cause unexpected sideeffects, such as moving local parameters.
But, in order to make rvalues capable of using the move operations, we need to indicate this to the compiler with something like std::move
. std::forward
provides a conditional cast to an rvalue reference, only to rvalue types. Once again the rules of reference collapsing are used to build this construct, except in a slightly different way.
Implementation of std::forward
:
C++
template< class T >  
T&& forward( typename std::remove_reference<T>::type& t )  
{  
return static_cast<T&&>(t);  
} 
What is a forwarding/universal reference?
This is a special instance of an rvalue reference that is an instantiation of a function template parameter. Forwarding reference is the name that I have read in some standards proposal documents, and universal reference is the name that Scott Meyers used first in "Effective Modern C++".
It is important to identify this type when the situation occurs. Because the type deduction rules for this particular reference allows the type to become an lvalue reference, if an lvalue expression was used to initialize the template parameter. Remember the important typededuction rules I pointed out above? References are not usually considered as part of typededuction. This is the one exception.
If an rvalue expression is used to initialize the template parameter, then the type becomes an rvalue reference, which will make it qualify for move operations. Therefore, std::forward
should be used to inform the compiler that you want a move operation to be performed on this type if an rvalue was used to initialize the parameter.
Observe forward vs. move
This next program allows you to observe the sideeffects that could occur by using std::move
when std::forward
is most likely what was intended. This program is adapted from Item 25 in "Effective Modern C++". I have expanded the program to provide two methods to set the name of the test class, Item.
Item class
The class has a single value called name
. name is set equal to "no name" in the default constructor. The name can be set in two different ways:
Item::fwd_name
:Sets the name value with an input rvalue reference string, which is forwarded to the internal storage of name in the class.
C++
template< typename T >
void Item::fwd_name(T&& n)
{
m_name = std::forward<T>(n);
}
Item::move_name
:Sets the name value with an input rvalue reference string, which is moved to the internal storage of name in the class.
C++
template< typename T >
void Item::move_name(T&& n)
{
m_name = std::move(n);
}
Notice that both of these functions are function templates. This provides the differentiating factor that makes std::forward
necessary, moving a typededuced argument.
Forward Example
C++
int main(int argc, char* argv[])  
{  
std::cout << "Forward 'only' moves rvalues:\n\n";  
string fwd("Forward Text");  
 
Item fwd_item;  
fwd_item.fwd_name(fwd);  
cout << "fwd_name: " << fwd_item.name() << "\n";  
cout << "fwd(local): " << fwd << "\n";  
 
std::cout << "\nMove 'always' moves:\n\n";  
string mv("Move Text");  
 
Item move_item;  
move_item.move_name(mv);  
cout << "move_name: " << move_item.name() << "\n";  
cout << "mv(local): " << mv << "\n";  
 
return 0;  
} 
Output:
Forward 'only' moves rvalues: fwd_name: Forward Text fwd(local): Forward Text Move 'always' moves: move_name: Move Text mv(local): no name
Move semantics and exceptions
A new keyword has been added to Modern C++ regarding exception handling, the keyword is noexcept
. noexcept
can use this information provided by the programmer to enable certain optimizations for nonthrowing functions. One potential optimization is to not generate stack unwinding logic for the functions that specify noexcept
.
It is possible to test if a function specifies noexcept
by using the noexcept
operator. This may be necessary if you can conditionally provide an optimization, but only if the functions you want to call are specified as nonthrowing. The move operations in std::vector
and std::swap
are two good examples of functions that are conditionally optimized based on a nonthrowing specification.
Finally, there is a function called std::move_if_noexcept
that will conditionally obtain an rvalue reference if the move constructor of the type to be moved does not throw. Refer to Item 14 in "Effective Modern C++" for a thorough description of noexcept
.
What you need to know
This is the section that I promised at the beginning of the essay. The most important concepts that you need to take away and know. Not remember, know, if you are going to employ move semantics and perfectforwarding when you practice Modern C++.
 Remember the difference between a value category and a type:
Lvalue and rvalue are expressions that identify certain categories of values. Lvalue reference and rvalue reference are both types; because the names are so similar, it is easy to confuse the two.
 Rvalues are the only expression types valid for move operations:
std::move
andstd::forward
explicitly attempt to convert arguments to rvalue references. This is performed by using the rvalue reference (type) in an rvalue expression, which is eligible for move operations.  If it has a name, then it is an lvalue (expression):
This is true even if the type of the lvalue is an rvalue reference. Refer to the previous guideline for the ramifications.
 Use
std::move
to explicitly request a move operation:std::move
unconditionally makes the argument eligible for a move operation. I say request, because a move is not always possible, and the compiler may still elect to use a copy operation.After you use a value as an input to
std::move
, it is valid to only call an arguments destructor, or assign a new value to it.  Use
std::forward
to move typededuced arguments in templates:std::forward
conditionally casts its argument to an rvalue reference and is only required to be used in this context. This is important because an lvalue that is inadvertently moved could have unintended and unsafe sideeffects, such as moving local values.  There are additional rules for the special compiler generated functions for a class:
The move constructor and move assignment operator now can be generated automatically by the compiler if your class does not implement them. However, there are even stricter rules that dictate when these functions can be generated. Primarily, if you implement or delete any of the following functions, the compiler will not generate any unimplemented move operations:
 Destructor
 Copy Constructor
 Copy Assignment Operator
 Move Constructor
 Move Assignment Operator
 Strive to create exception free move operations and specify
noexcept
for them:Your move operations are more likely to be selected by the compiler if you can provide an exception free move operation.
noexcept
tells the compiler your function will does not require exceptions. Therefore, it can optimize the generated code further to not worry about maintaining code to unwind the stack.
Summary
Rvalue references were added to Modern C++ to help solve the problem of eliminating unnecessary temporary copies of expensive objects when possible. The addition of this value category for expressions has made two new idioms possible with the language.
Many restrictions had to be put in place when the compiler could safely perform the operations automatically. Moreover, it is important to use the functions added to the Standard Library because they express intent and provide the extra hints the compiler needs to utilize these operations safely.
Here is one last piece of advice as you look for locations to use move semantics. Do not try to outwit the compiler because it is already capable of some amazing optimizations such as the Return Value Optimization (RVO) for local objects returned by value. Practice the knowledge in the previous section and your C++ programs will keep that svelte contour envied by all other languages.
References
Effective Modern C++ by Scott Meyers (O'Reilly). Copyright 2015 Scott Meyers, 9781491903991
C++ Rvalue References Explained by Thomas Becker, http://thbecker.net/articles/rvalue_references/section_01.html, March 2013.
A Brief Introduction to Rvalue References by Howard E. Hinnant, Bjarne Stroupstrup, and Bronek Kozicki, http://www.artima.com/cppsource/rvalue.html, March 10, 2008.
Value Categories at CppReference.com, http://en.cppreference.com/w/cpp/language/value_category, June 2015
Over the years I have heard this question or criticism many times:
Why is so much math required for a computer science degree?
I never questioned the amount of math that was required to earn my degree. I enjoy learning, especially math and science. Although, a few of the classes felt like punishment. I remember the latter part of the semester in Probability was especially difficult at the time. Possibly because I was challenged with a new way of thinking that is required for these problems, which can be counterintuitive.
What is computer science?
Let's start with computer science, and work our way towards math and how the two are interconnected. A definition of what is considered computer science may be helpful before we discuss its foundation in math. Here is an abridged definition from Wikipedia[^]
Computer science is the scientific and practical approach to computation and its applications. It is the systematic study of the feasibility, structure, expression, and mechanization of the methodical procedures...
A computer scientist specializes in the theory of computation and the design of computational systems.
I have had many discussions over the years with students and professionals educated from all different realms. One common sentiment that I hear from all of these groups is this:
Theory is nice, but I need to have relevant skills to get a job
This is a very valid point. It is also true that the majority of college graduates do not continue their career focused in the world of academics. Therefore, we can assume that most people that attend college do so with the intent to join the general workforce, ie. get a job. Let's evaluate
The role of academia
Much of what I have learned about the academic realm is that it has a fundamental focus on expanding the limits of our collective knowledge. Here is a great explanation of the purpose of a Ph.D. by Professor Matt Might at the University of Utah, The illustrated guide to a Ph.D.[^] As these new discoveries are better understood, they get further developed. Some discoveries may continue to live and exist solely in the academic realm. While other theories are applied and practical uses are developed from these discoveries.
Application of theory
The concepts taught in math, physics and engineering are based on theory. A theory itself is a model or a collection of ideas that is intended to explain something. The theories taught in these disciplines have been developed and refined over time. It is relatively simple to observe the application the theories that are required for the engineering disciplines based upon physics.
Some of our engineering disciplines start to push the boundaries of what is possible for us to observe, such as electrical engineering and nuclear sciences. However, we are still able to verify these hypothesis through repeatable experiments and observations, even if we can only observe the indirect results of these experiments.
Math and computer science are purely theoretical. They are abstract mental models that describe the relationships and behaviors of the interactions between these models. The application of these theories is a result of mapping these conceptual models into a context that can be represented and observed in physical form.
Yeah, yeah that's great (in theory), but...
I need skills to get a job
Trade schools are designed to help students learn the skills of a trade needed to get a job in that profession. The curriculum of trade schools is focused on teaching students the programming languages that are in highest demand, used to solve the most common problems that occur today.
This typically means that you would study a semester learning and solving problems with JAVA. Then move on an use a scripting language such as Python for a semester. Interfacing with relational databases is also likely to be in the curriculum as it is a very marketable skill.
As for college degrees in computer science; yes, you will learn a variety of computer languages that may help you get a job. However, these languages are used as the basis to teach the principles and give you the opportunity to apply these theories. The end goal is to teach critical thinking, to help students of computer science learn to reason, experiment, observe and understand the theories.
I believe that what you get out of from your education, depends on what type of effort that you put into your education. This is true for either path. The difference between the two is typically where the instructors focus your attention. Neither path guarantees a job at the end. However, if you do not have what it takes at a university computer science program, they will flunk you out of the program. The for profit tradeschools are bit more forgiving.
You can program computers with minimal math skills
This is a true statement.
In many cases, all that is needed to be able to program a computer is a computer, and the programming tools to interface with the computer. With basic arithmetic skills and knowledge of a computer language, you will be able to command that computer.
The same thing goes for many other skills, amateur auto mechanic, home plumbing, renovations, doityourself haircuts. Some of the worlds most successful people dropped out of school and started a tech company. For example, John Carmack, who dropped out of college, cofounded Id Software and went on to pioneer some of the game industries groundbreaking graphics technologies; when he needed to know the math, he taught to himself.
I don't have any numbers, but I would imagine that most computer science graduates go on to become software developers of some sort. Many of the software engineers that I work with graduated college with an Electrical Engineering or Mechanical Engineering degree. Two guys I know graduated with law degrees, not a degrees in engineering.
Where does math factor in?
When math is mentioned, most people generally think of topics that require numbers and are focused on calculating values of some sort. The one notable exception is Geometry. Here is a small sampling of those types of topics:
 Arithmetic
 Geometry
 Algebra
 Calculus
 Linear Algebra
 Fourier Transforms
 Ordinary Differential Equations
 Partial Differential Equations
 etc.
However, it is important to recognize that mathematics deals with more than just numbers. Mathematics spans a wide range topics:
 Logic
 Algorithms
 Fundamental theorems
 Mathematic proofs and models
 Discrete math
 Theory of computation
 Information theory
 Combinatorics
 Set theory
 Graph theory
 Abstract algebra
 etc.
Math is a foundation, which is built upon
“Mathematics is the gate and key to the sciences.”
– Roger Bacon
For those of you that have taken Analytic Geometry, Calculus or even Trigonometry, consider how the material is taught. They start with a mathematical foundation and proofs are demonstrated that expand on what you should know going into the class. Each step is then used to derive the next step in the curriculum.
At the end of the course, many of the interim steps are forgotten or rarely used. However, they were a very necessary stepping stone to make a leap to the remainder of the material.
An example that comes to mind appears in Calculus, and is the concept of a limit of an equation. The concept of finding the limits of equations is then used to create derivatives and integration.
You could simply memorize the formulas to plug in your own numbers, and you would be able to use the equations. However, when you run into a problem that doesn't fit neatly within the formulas that you have learned, you would be stuck. The critical thinking skills that developed as these math courses progress give you the tools and ability to transform problems into a simpler form that does fit the mold of solvable equations.
This is the same structure that is used in computer science related courses. Data structures and algorithms that have been studied for decades have been optimized and we are taught how we reached this point. Again, it is possible to use an algorithm called sort
, to organize your data and you do not need to understand how it works.
However, if you find that your sort
routine runs much slower as the datasets that you work with become much larger, you will need to know how to analyze your algorithm to be able to build something better. With a solid foundation you will understand how to compare the tradeoffs and build something based on previous proven work.
Without this foundation, you may still be able to improve upon your original algorithm. However, you will only be guided by previous experience. Making that inspirational leap to a radically different, yet far more efficient design is not always easy.
Critical Thinking
Critical thinking is essential in professional fields as well as academia. Well, essential to maximize success. Critical thinking is more than analyzing and reasoning well thoughtout decisions. It involves evaluating, conceptualizing, applying, observing, and communicating your conclusions.
Along the way, you also need to become selfcritical looking for any biases that could add prejudice to your conclusions. Reflection and scientific skepticism help guide you towards accurate observations and welljustified conclusions.
It may seem as though we have meandered away from computer science into philosophy or scientific theory. We may now be discussing concepts that are central to these other topics. However, we have not wandered away from computer science. These skills and concepts are just as important to computer science as they are with every other engineering discipline.
Programming != Computer Science
It is confirmed then, you do not need advanced math to be able to program a computer, nor do you need a degree in computer science.
Hopefully, it is also obvious that there is a limit to what you will be able to program without knowledge of certain topics, especially math.
IT departments need a lot of programmers to create applications that automate or simplify business logic. These are lineofbusiness applications that give the companies users the information they need to do their jobs more efficiently. Much of this work centers around large databases and reducing the data set to the pertinent information required by the user.
Software engineering for machines and cuttingedge technologies often does require knowledge of advanced math. All of the wireless technologies that have been developed and become mainstream over the last 20 years are heavily dependent upon signal processing algorithms. Signal processing in itself could cover waveform synthesis and analysis, compression, encryption, forwarderrorcorrection, and modulation/demodulation.
Now consider artificial intelligence, computer vision, image processing, naturallanguage processing, simulations, robotics, generalpurpose GPU programming, distributed computing, the continues to grow each year. With the exception of robotics, I believe that today's smartphones are in some form dependent upon all of those technologies that I listed, all of which require knowledge of advanced math to do well.
Most websites do not receive an amount of traffic that cannot be solved by throwing more hardware at the problem. However, even with the commodity prices of hardware these days, more hardware is not always the best or even an easy solution. Experienced developers with strong criticalthinking and reasoning skills are a much better investment.
For some companies like Facebook, more hardware simply is not an acceptable solution. With hundreds of millions of interactions constantly occurring throughout the world, highly specialized algorithms are required to operate their site as efficiently as possible.
Summary
Math is good. For some people, math is hard. Math is also more than just numbers. Mathematics is a logical foundation that most of our sciences are built upon. Analysis, criticalthinking, and reasoning are just a few abstract skills that are developed by studying and applying mathematic concepts. These are also very important skills to develop in order to become a good software developer. This is true whether you work in a corporate IT department, or are developing the nextgeneration wireless communications network.
I was required to learn oneyear of numericalbased math passed Calculus. The type most people think of when considering math. However, the majority of my math classes were about more abstract topics like Discrete Mathematics, and the Theory of Computation. These are the building blocks that set the foundation for the more advanced topics taught the senior year of my degree. These are also the building blocks that are required for those students that choose to move on to graduate school.
Math is not necessary to be able to program a computer. However, math is very important to engineering and science, and there are just some places that you cannot access with a computer unless you have a solid foundation in logic, criticalthinking and abstract mathematics.
Accidental complexity is the entropy that exists in your system that is possible to eliminate. The opposite of this is essential complexity; the parts of a system that are required and cannot be simplified. These two concepts were discussed by Fred Brooks in his essay No Silver Bullet  Essence and Accidents of Software Engineering.. Many systems today are extremely complex, and any effort that can be done to eliminate complexity, should be.
There are a lot of should be's...
I get very irritated with excuses. Because excuses just keep allowing the code to decay with each new statement that is added. I started writing a list of these excuses for entertainment, but I started to get irritated, then I stopped. Here is one of them, "In a perfect world..." There's really on one way that sentence ever ends.
Add value, not garbage
Just because you have an excuse, even if it's a valid excuse, doesn't mean that you are off the hook for cleaning up messes. I remember a situation when I ran across a piece of code, and I thought that it looked peculiar.
C++
(void)Sleep(k_one_second); 
Casting a function call to void
. I've never seen that before. There were no comments and I couldn't think of anything valuable to the code that is added by that cast. The Sleep
function did return an integer, but the value is not being assigned. I scratched my head, and I deleted the cast.
Soon after my changes were being reviewed. There was a question entered that asked why I deleted the void cast. The comment then goes on to explain that cast was added to fix a defect reported by our static analysis tool.
I thought, "This is a fix?"
One of the rules in the coding standards we used, JSF++, is that all return values must be checked. This hardly qualifies as checking the return value, but it appeases the analysis tool.
I replied with "The reason why the tool is reporting an error is because all return values are supposed to be tested. This looks odd, there are no comments, and it's not even a fix." I immediately had two better ideas that would have required the same or less work, and be valuable changes:
 Change the functions signature to return
void
 Encapsulate the original function in an
inline
function that returnsvoid
Changes like this only add garbage to the code which reduces its readability. Even worse, it covers up a valid problem reported by the code analysis tool. Using subversive code tricks to prevent the tool from reporting a valid issue negates the value of using the tool. You should strive to make every statement add value to the code.
Several hundred instances of void
casts were added in that situation. This only added more clutter that covered up a problem rather than focusing on finding a valid solution to fix the problem.
Simplify
Imagine that there is a specific feature that exists, and it will require a finite amount of logic to implement and use:
Now imagine that functionality needs to be used in multiple locations. Generally we can simplify by factoring the common code into a function, or if it is more complicated logic it may even become a class.
One particular problem that I encounter frequently is an abstraction that handles a minimum amount required for the implementation. This tends to leave much more logic for the caller than is necessary. More logic outside of this reusable abstraction, means more duplicated logic; logic that can be written incorrectly, or even logic that does not properly initialize the feature.
Can you make it simpler?
After you have simplified your code inside if your features abstraction, put it to the test and use it. Better yet, put it in a test, a unittest. See if there is anything that you could actually take care of for the user with the known input.
It is not always apparent, but this is accidental complexity. This is an example of a situation that could eliminate code that is duplicated.
This is unfortunate, because duplicated code is often performed with cutandpaste, which is notoriously errorprone. This also adds more code to be maintained, which is also more code to read in order to understand what purpose a section of logic serves.
Consider the tradeoffs
It is not always the best choice to continue to simplify. Simplifying the interface to a function usually means giving up flexibility. In most cases there is no reason the choice must be either/or. Sometimes it is feasible and advantageous to do both.
Create the basic abstraction that leaves all of the flexibility and errorprone or cumbersome logic to the user. Then create simplified versions of the feature that handle commonly used cases.
Example
The cumbersome abstraction that I use as an example, is the ::GradientFill
function from the Win32 API. I explored what this function is capable of a few years ago, and I learned that it is quite powerful. The interface provides a lot of flexibility, and it does not look too bad from a cursory inspection.
C++
BOOL GdiGradientFill(  
__in HDC hdc, // Handle to the DC  
__in PTRIVERTEX pVertex, // Array of points of the polygon  
__in ULONG dwNumVertex, // Size of the vertices array  
__in PVOID pMesh, // Array of mesh triangles to fill  
__in ULONG dwNumMesh, // Size of the mesh array  
__in ULONG dwMode // Gradient fill mode  
); 
However, this function requires a lot of repetitive setup code. This is also the reason that I hardly ever used ::GradientFill
up to that point. Here code from the MSDN documentation page for this function that is required to paint a horizontal and vertical gradient. I believe it would be simpler to write a forloop than the setup that is required for this function:
C++
TRIVERTEX vertex[2] ;  
vertex[0].x = 0;  
vertex[0].y = 0;  
vertex[0].Red = 0x0000;  
vertex[0].Green = 0x8000;  
vertex[0].Blue = 0x8000;  
vertex[0].Alpha = 0x0000;  
 
vertex[1].x = 300;  
vertex[1].y = 80;  
vertex[1].Red = 0x0000;  
vertex[1].Green = 0xd000;  
vertex[1].Blue = 0xd000;  
vertex[1].Alpha = 0x0000;  
 
// Create a GRADIENT_RECT structure that  
// references the TRIVERTEX vertices.  
GRADIENT_RECT gRect;  
gRect.UpperLeft = 0;  
gRect.LowerRight = 1;  
 
::GdiGradientFill(hdc, vertex, 2, &gRect, 1, GRADIENT_FILL_RECT_H);  
::GdiGradientFill(hdc, vertex, 2, &gRect, 1, GRADIENT_FILL_RECT_V); 
The code is even worse for a triangle.
I wanted to make these functions simpler to use in my code. To me, it should be as simple as a single function call to fill a rectangle. So I encapsulated the required code in the function below:
C++
bool RectGradient(  
HDC hDC, // The device context to write to.  
const RECT &rc, // The rectangle coordinates to fill with the gradient.  
COLORREF c1, // The color to use at the start of the gradient.  
COLORREF c2, // The color to use at the end of the gradient.  
BOOL isVertical)// true creates a vertical gradient, false a horizontal 
As I mentioned earlier, you often give up flexibility for convenience. One of the features that is given up from the previous function is the ability to control alphablend levels. Therefore, I created a second version of this rectangle gradient that allows alpha blending levels to be specified.
C++
bool RectGradient(  
HDC hDC,  
const RECT &rc,  
COLORREF c1,  
COLORREF c2,  
BOOL isVertical,  
BYTE alpha1, // Starting alpha level to associate with the gradient.  
BYTE alpha2 // Ending alpha level to associate with the gradient.  
) 
These two functions could be used much more simply. Here is an example of how much simpler the code becomes:
C++
// Horizontal Gradient  
RECT rc = {0,0,300,80};  
COLORREF black = RGB(0,0,0);  
COLORREF blue = RGB(0,0,0xff);  
RectGradient(hdc, rc, black, blue, false);  
 
// Let's draw a vertical gradient right beside the horizontal gradient:  
::OffsetRect(&rc, 310, 0);  
RectGradient(hdc, rc, black, blue, true); 
The value of many small abstractions adds up
Sometime the flexibility of the original code can still be accessible even when the code is simplified. This can be done with a collection of smaller abstractions. Utility functions like std::make_pair
from the C++ Standard Library is one example. These functions can be used in series to simplify a series of declarations and initialization statements.
In some cases this collection of utility abstractions can be combined into a compound abstraction.
There are many ways that code can be simplified. It doesn't always need to be code that would be duplicated otherwise. If I run across a superfunction, I will try to break it down into a few subroutines. Even though this new function will only be called in a single location, I have abstracted the complexity of that logic at the call site.
It is even more feasible to give this function that is called only once a more cumbersome but descriptive name. When reading code from the original superfunction it is now much easier to ignore large blocks of code that obfuscate the intended purpose of the function.
While the code may be necessary, that does not mean that it must make the code around it more difficult to read and main.
Summary
Accidental Complexity is the code that exists in our programs that we can simplify or eliminate. Duplicated code and code that exists only to get rid of warnings are two examples of accidental complexity. The best case scenario, the code becomes just a little bit more difficult to maintain. However, in the worst cases, legitimate issues could be covered up. Worse still, the manner that they were covered up makes them that much more difficult to find if they are the cause of a real problem.
I witness developers exerting a great deal of effort to work around problems, fix the symptoms or even make excuses and ignore the problems. This time could just as easily be focused on finding a fix that actually provides value. Much more quality software would exist if this were always the case.
...but I suppose this is not a perfect world.
A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs lowlevel data serialization with compiletime reflection. It is written in C++ using template metaprogramming.
My second attempt to create a bitfield type was more successful. The size of the container only grew linearly with each subfield that was added, and the implementation was cleaner. However, I showed an image of what this implementation looked like in the debugger and it was very in convenient. The thing I was concerned with the most was the pitiful performance that was revealed by my benchmark tests.
This entry describes my discoveries and the steps that I took to reinvent the bitfield type in Alchemy for the third time. This is also the current implementation in use by Alchemy, which is about 10% faster than handcoded collection of packedbits.
Overview
Let's start with a brief review. The PackedBits
type in Alchemy essentially replaces Cstyle bitfields. Bitfields are very common with network protocols, which attempt to make every bit count that is sent across the wire.
C and C++ both have a syntax to access values within a byte at the bit level, the bit_field. Unfortunately, the standard for both languages leave the rules for implementation undefined. Therefore, your code may not be portable to a different compiler for your same architecture.
Alchemy's PackedBits
provides an abstraction to portably allow bitlevel access for message protocols and storage structures.
The poor performance of PackedBits v2
The implementation for version 2 of PackedBits
was a simplification of the original design. The first implementation had a fixed size of 32 entries, which always existed. Therefore they always took up space, and they were initialized and destroyed with each PackedBits
type. This made them a very expensive type in space and speed.
Version 2 solved the hardcoded limit by using recursion and inheritance to handle each BitFieldNode
. Unfortunately, there are two aspects of this design that are bad.
Inconvenient Debugging
Since each but field inherits from the next field in the chain, you had to navigated to each subclass to inspect the value of a particular node. They were not all displayed at once. This is actually a minor inconvenience because some tools like Visual Studio allow you to create custom visualizers for your types.
Design does not scale
There is a challenge in creating a type like this. Abstraction provides an illusion to the user that they are interacting with a single type that provides the expressive syntax of value types in a struct
. Yet, these are distinct objects composited that are structured to interact and change specific bits on a single data type.
I achieved this in the original design, by passing a reference to the data field in the PackedBits
container, to the BitFieldNodes
. The mask and shift operations are then defined by the template instantiation and used on the referenced value.
This created a valid solution. The drawback is that for every new bitfield added to the type, a new reference was added as well. This is a reference that will take up additional space, and will need to be initialized. It turns out, the initialization of these references was killing performance.
Whenever a copy of the type was required, the reference for each node had to be initialized. The worst part is that this data type only manages up to the wordsize of the platform you are running on. A 32bit integer specified to have 32 1bit flags, would required 32 references and the data value to be initialized for each copy.
This implementation had to go!
The Solution
I had know of and considered an alternative solution many times. However, for my original implementation of Alchemy I steered clear of this solution. The reasons were the strict coding guidelines that we followed. My first implementation of what is now called Hg, followed the JSF++ coding guidelines.
A quandary
The quandary was simple. Even though I am now developing this independent opensource library, do I want to use the solution that I had often considered?
The solution was to use the offsetof
MACRO defined in the standard library. This MACRO can become tenuous, especially as your object hierarchy becomes more complicated. Luckily I haven't run into any problems yet. Hopefully with the controlled implementation I can keep it that way.
What does offsetof
do, and how does it work? This solution must be a macro because it inserts the names of subfields from a struct or a class to calculate the offset of that data elements address from the base address of the parent type.
My final decision
Yes, of course I decided to use offsetof
. Basically, I can now group all of the bitfield children at the same level in the PackedBits
parent type. I now pass in the parent type to the BitField
object as part of its instantiation. This allows the child to compare itself to its parent and find its location relative to the base pointer.
C++
template< class OwnerT,  
class TagT,  
size_t OffsetT,  
size_t CountT,  
class T = uint8_t  
>  
struct BitField  
{  
// Generally the same logic in previous  
// implementations for shift, mask, read and store.  
// ...  
}; 
A memberfunction of PackedBits
is called in order to be able to reference the correct data element. This function is called value()
. The trick is, "how to get a pointer to the child types parent object without requiring initialization.
C++
// Continued implementation of BitField:  
 
value_type& value()  
{  
return owner()>value();  
}  
 
// This function returns a pointer to the parent object.  
OwnerT* owner()  
{  
return reinterpret_cast<OwnerT*>(  
reinterpret_cast<char*>(this)  TagT::offset());  
} 
This adjustment, as well as the change in codegeneration performed by the prepreprocessor form the solution for version 3 of the PackedBits
type.
A challenge with crosscompiling
When I went to crosscompile and verify the implementation on Linux with gcc, the compiler complained. It believed that it was dereferencing a pointer to 0. My guess is that since it was a template and not an actual valuetype, the calculations were performed at the base address of 0, which worked out quite nicely for me.
How did I appease gcc? Look away if you are a purist:
C++
#ifdef __GNUC__  
 
// Yes, we're redefining this MACRO...  
#ifdef offsetof  
#undef offsetof  
#endif  
 
// GCC does not approve of the way in which  
// Alchemy employs this MACRO.  
// This is a slight alteration:  
// Performing the calculation on a nonnull value,  
// then readjust back to zero after the offset calculation.  
#define offsetof(type,member) \  
(size_t)reinterpret_cast<const char*>((((type*)1)>member)1);  
 
#endif 
Are you ready for my rationalization? It's just like the C++ Standard Library and its treatment of addresses and iterators. It's safe to calculate and store and address one past the end, but it's not safe to dereference that address. We get an offset from this calculation, and we apply it to a known valid basepointer.
Summary
After an abysmal first showing, I was very pleased when I saw the results from this reimplementation of the PackedBit
type. Not only is the design simpler and easier to maintain, but it also performs about 10 to 12% faster than the handcoded implementation in the benchmark tests that I have created.
I did resort to using the offsetof
MACRO, which is prohibited by some coding standards. I do not expect an issues to arise because of the simple context where this calculation is used. There are no virtual functions, or complex class hierarchies to consider. However, I am keeping an eye on its behavior.
I was amazed after I had converted only the first few portions of the TypeList
from the C++98 implementation to Modern C++. I have decided to convert my Alchemy API to use Modern C++ in order to truly learn the nuances by application of acquired knowledge. The code of these metaprograms are very elegant and completely readable. This really does feel like a new language, and an entirely new version of metaprogramming.
The elegance enabled by the new constructs will allow me to present a complete TypeList
implementation for Modern C++ in this entry.
I have corrected errors and uploaded a new implementation file.
Compare and Contrast
My primary goal is to compare the differences between template metaprogramming with with C++98 and Modern C++ (C++11 and beyond). While the concepts remain similar, the newer language has been expanded in a way that makes this a more natural way to compose functional programming logic with C++.
To be clear, I want to explicitly state that metaprogramming has its places, and generally it will not be in coreapplication logic. Libraries and utilities that will be developed and tested, but are less likely to require maintenance work are good candidates for these types of solutions. Application programmers can take advantage of these complex machinations, yet the caller may never even realize what magic exists behind the curtain. A great example would the the C++ Standard Library itself.
Frame of reference
Andrei Alexandrescu made the concept of metaprogramming accessible to the masses in his book Modern C++ Design, published in 2001. He demonstrates and develops many useful components for metaprogramming that he created as part of his library, Loki.
This is the style of metaprogramming that I first learned, and the type of structure that, Alchemy, is built around. I have altered the implementation and interfaces for my metaconstructs to suit my needs, compared to what is presented in the book. The next few sections demonstrate how the same tasks are accomplished in both versions of the language.
Then
The most important of these constructs is the TypeList
. The TypeList
is a workhorse construct that can be in a variety of unique ways, yet does not contain any internal data or runtime code. struct
s become the natural type to act as a functional container, which performs all of its compiletime, or static, operations on types, and stores values in static const
values or enums
.
To simplify expressions, I made liberal use of typedefs. This helped me avoid the repitition of verbose template expressions and at the same time give a label to the purpose of that template. Sometimes there are no ways to simplify expressions other than turning to the preprocessor. I prefer to avoid the preprocessor at all costs in my application logic. However, I have grown accustomed to leaning on the preprocessor to generate code for me for repetitive definitions that appear in a class or at the global scope.
Here is an example of how Alchemy's TypeList is constructed. Internally, a TypeNode
provides the declarations for the head and tail types.
C++
// A trait class to assist with tagdispatching.  
struct container_trait{ };  
// Represents the end of a type list.  
struct empty{};  
 
template< typename H,  
typename T  
>  
struct TypeNode  
{  
typedef H head;  
typedef T tail;  
}; 
Now the definition of the TypeList
to show the organization of the structure:
C++
template< class T0, class T1, class T2, class T3,  
class T4, class T5, class T6, class T7,  
class T8, class T9, class T10,class T11,  
>  
struct TypeList  
: container_trait  
{  
typedef  
TypeNode<T1, TypeNode<T2, TypeNode<T3,  
TypeNode<T4, TypeNode<T5, TypeNode<T6,  
TypeNode<T7, TypeNode<T8, TypeNode<T9,  
TypeNode<T10, TypeNode<T11, TypeNode<T12, MT>  
> > > > > > > > > > > type;  
// Alchemy continues on to 32  
}; 
Composing a structure should be much simpler than this nested definition. Therefore, I decided to wrap the inner declaration with a simpler outer definition. Unfortunately, there are only a few facilities available to customize template
declarations. The best option in my particular situation was template specialization.
I wanted to provide a natural interaction with my TypeList object, and still allow support for a variable number of parameters. Thirtytwo was my initial number of parameters that I would support. I can live with writing thirtytwo specializations once. However, I had many operations that I would also implement, and each of those would require specialized implementations as well. So I resorted to the preprocessor to generate the code for me.
Here is the definition of the MACRO, and how it was used. It generates the code in the block from above:
C++
// It seems my syntax highlighter for MACRO requires some attention  
#define tmp_ALCHEMY_TYPELIST_DEF(S) \  
template<TMP_ARRAY_##S(typename T)> \  
struct TypeList<TMP_ARRAY_##S(T)> \  
: container_trait \  
{ \  
typedef TMP_ARRAY_##S(TypeNode<T), empty TMP_REPEAT_##S(>) type; \  
}  
 
// Define specializations of this array from 1 to 31 elements  
tmp_ALCHEMY_TYPELIST_DEF(1);  
tmp_ALCHEMY_TYPELIST_DEF(2);  
tmp_ALCHEMY_TYPELIST_DEF(3); 
Yes, I definitely left out many of the gory details for the definitions of the MACROs. But why would you want them? We're moving forward into the future; but you can still access them from Alchemy on GitHub.
The direct usage of the TypeList
was then much more accessible to the user. Also, there was no need for them to use any MACROs to define a new TypeList
:
C++
typedef TypeList  
<  
int,  
long,  
float,  
char  
> types; 
Now
There are two primary additions to Modern C++ that make template
programming in general a pleasure to use, and that is to not even mention metaprogramming itself:
 Variadic templates:
 Template aliases:
Similar to variadic function parameters, this feature allows a variable number of arguments to be used in a template definition.
This allows the using
keyword to be used in situations similar to typedef
. However, unlike typedef
, using
can be defined as a template. Therefore, it is compatible with partiallyspecialized templates.
Here are the definitions that I required when I ported my code to Modern C++ (don't worry, I will explain the syntax afterwards):
C++
// Forward Declarations  
struct empty { };  
struct container_trait { };  
 
template< typename... T >  
struct type_node;  
 
template< typename... NodesT >  
struct type_list; 
And an implementation for these types:
C++
// An empty terminating node.  
template< typename... T >  
struct type_node<>  
{  
using head = empty;  
using tail = empty;  
};  
 
// Recursive parameter pack node  
template< typename H,  
typename... T  
>  
struct type_node<H, T...>  
: type_node<T...>  
{  
using head = H;  
using tail = type_node<T...>;  
};  
 
template< typename... NodesT >  
struct type_list  
: container_trait  
{  
using nodes = type_node<NodesT...>  
}; 
No, really! That's it! We get the exact same usage as the code from above, and I'm not even sure that I need to explain this last chunk of code.
I admit, I had a few falsestarts trying to get a grasp on the parameter pack. No, not to reach this point, neither of the code samples above are good for anything except defining a list of types. My first challenge appeared when I tried to create a metafunction to give me the type of parameter at a specific index in the list.
Let me introduce the new constructs, then I will demonstrate some of the elegant solutions that barely scratch the surface of their capabilities.
The parameter pack...
The variable defined within a variadic template is called the parameter pack.
The parameter pack is essentially a list of types delimited by commas. To define one as a parameterized type in a template definition, use the ellipsis between the typename
or class
declaration and the name that you assign your type. There can be whitespace before and after the ellipsis if you desire...or not. However, the general style that you are likely to see places the ellipsis attached to the typedeclaration and a space before the typename.
C++
// Most common style  
template<typename... T> struct common;  
 
// These are all legal too.  
template<typename ...T> struct spaces_before;  
template<typename ... T> struct spaces_both;  
template<typename...T> struct spaces_none;  
 
// Template parameter packs are legal for  
// use with template functions as well.  
template<typename... T>  
T function(T... params); 
You may have noticed in my type_list
implementation and the declaration of the template function that I placed the ellipsis after the declared name. This is how you invoke the parameter pack in your logic.
Invoke the parameter pack
What does it mean to invoke the parameter pack?
Nothing really. You're setting it where you want to apply it, and the compiler goes to work ripping apart the parameter pack and generating your code. However, the compiler does need a little bit of help. You will need two things if you are generating code from the parameter pack:
 Recursive definition:
 Terminating condition:
This is a definition that will be implicitly called by the compiler as many times as necessary until it reaches your terminating case. If you refer to the definition of the type_list
, you will see that the parameter pack is applied in a context where another type is placed before it, separated with a common. This essentially peels one or more types away from the parameter pack at a time. In this sense, the template parameter pack is similar to the variadic MACRO usage.[/codespan]
A condition that will handle the case of an empty list, or at least terminate the recursion before the compiler attempts to go beyond the end of the parameter pack. It is not necessary for this to be an entirely different definition.
Size of a parameter pack
A convenient sizeof...
operator has been provided to match the syntax of the parameter pack. This version of the operator is not related in anyway to the classic sizeof
operator.
C++
template< typename... T >  
struct length  
: std::integral_constant<std::size_t, sizeof...(T)>  
{ }; 
The parameter pack cannot be a variable
The parameter pack must be decomposed completely at compiletime. It cannot be the sole definition of a typedef
or a using
alias.
C++
template< typename... T >  
struct param_pack  
{  
using save_for_later = T...;  
}; 
However, that does not mean that we are helpless. There is an idiom that exists with template programming that allows us to extract the type from a template parameter. I am not sure if it has a name.
Let me demonstrate it for you. This is the definition you are most likely to find on the Internet for a TypeList
:
C++
template< typename... T > struct typelist { }; 
The previous code is completely legal because the parameter pack expansion is defined and terminated with this definition. With another template that is given the right specialization, we can extract the parameter pack from the original type definition.
To demonstrate this, let's create a length
metafunction that will report the number of elements in the type_list that I defined above. We need to declare a default version of the length
metafunction. This function does not necessarily need to be implemented.
C++
// Default declaration  
// This does not require an implementation  
template< typename... T > struct length;  
 
// This specialization allows us to identify and access  
// the parameter pack defined within our type_list.  
template< typename... T >  
struct length <type_list<T...>>  
: std::integral_constant<std::size_t, sizeof...(T)>  
{ }; 
We can use the parameter pack from the type_list
because we specialized this template solely for the this type. The compiler does a best fit comparison when attempting to resolve types, and finds this version.
Template Aliases
Up until this point, we have had the typedef
, which has served us well. However, it does have its shortcomings. I believe the most notable is that partial template specialization is not supported by a typedef
. The template
alias does provide this support.
C++
// I want to provide a simple type to create  
// a map of strings to another type.  
template< typename T >  
using StringMap = std::map<std::string, T>;  
 
// Can now be used as:  
StringMap<int> named_ints; 
Here's a more complex example:
C++
// I want to map names to lists of things.  
template< typename T >  
using NamedListMap = std::map<std::string, std::list<T>;  
 
NamedListMap<unsigned int> lotto_picks_by_state; 
Improves readability of templates
There is one other feature of template aliases that I did not fully appreciate until I started to use them. Most code examples do not get complex enough to allow you to fully appreciate the second feature. Let me demonstrate, then I will get into the gory details.
This is an example of an additional declaration that was added to C++14, but is possible in C++11. I am not sure if this technique wasn't discovered until after C++11, or they left it out to keep it from becoming C++12.
C++
// The C++ Standard Library contains useful  
// metafunctions to manipulate types.  
// This one converts type T into T*  
template< class T >  
struct add_pointer;  
 
// This is used like this:  
typedef typename std::add_pointer<void>::type void_ptr;  
// eww! ^^^^^^^^ ^^^^  
 
// Or directly...  
typename std::add_pointer<void>::type p_void = 0;  
// I think I just threw up in my mouth...  
// No wonder templates have a bad reputation. 
These definitions appear in C++14, but you can use this technique in C++11.
C++
// Template alias for std::add_pointer  
template< class T >  
using add_pointer_t = typename std::add_pointer<void>::type;  
 
// New usage:  
typedef add_pointer_t<void> void_ptr;  
 
// And directly...  
add_pointer_t<void> p_void = nullptr; 
Detailed explanation
typedef
s are a common way to reduce clutter in code. Primarily with templates because the use of template
type declarations require you to qualify the type with typename
if you are using a dependenttype.
What is a dependenttype?
That is a very good question. To help with the explanation dependent type is a shortened version of the name template parameter dependent type. I'm surprised the C++ community hasn't just adopted TPDT, but I digress. A dependent type is a subtype declared within a template
class
or struct
.
typename
is required when referencing subitems in a template
. I say subitems because other things can be defined within a struct, that are accessed in the same manner as a dependent type, like a static
variable. typename
is a clue to the compiler that you want it to be interpreted as a type.
The capabilities of the template alias allow us to clearly specify beforehand that we mean a type. Therefore both the typename
qualifier and subtype required to access the dependent name are managed by the alias. This greatly simplifies code when there are many template types to deal with. Template metaprogramming is a prime example.
One Last Tip
In the fall of 2014, N4115 Parameter Pack Searching[^] was proposed with some additions to the utility library. This would add a common form of the idiom that I described above to gain access to a parameter pack. The name proposed for the type is packer
.
I was trying to modify an existing parameter pack, and I just couldn't put the pieces together. So that is when I searched and found N4115 when I found N4144 Searching and Manipulation of Parameter Packs[^], by Bill Seymour and Stephan T. Lavavej. This is an amended version of the first document and it adds manipulation utilities. One in particular is add_to
.
I already demonstrated the concepts of packer
, however, in my code I refer to it as param_pack
. Here is how add_to
is implemented. Multiple specializations are declared to handle the possibility of adding a parameter pack to a parameter pack.
C++
template<class T, class U> struct add_to;  
// Add to the front of the list  
template<typename T, typename... ArgsU>  
struct add_to<T, param_pack<ArgsU...>>  
{  
using type = param_pack<T, ArgsU...>;  
};  
 
// Add to the back of the list  
template<typename... ArgsT, typename U>  
struct add_to<param_pack<ArgsT...>, U>  
{  
using type = param_pack<ArgsT..., U>;  
};  
 
// Combine two lists  
template<class... ArgsT, class... ArgsU>  
struct add_to<param_pack<ArgsT...>, param_pack<ArgsU...>>  
{  
using type = param_pack<ArgsT..., ArgsU...>;  
};  
 
// And the template alias  
template<typename T, typename U>  
using add_to_t = typename add_to<T,U>::type; 
You will see a demonstration of the usage in the next section.
A Modern C++ TypeList
I searched the Internet, albeit briefly, and I did not find any Modern C++ implementations of a TypeList that did not expand beyond this definition:
C++
template< typename... T > struct typelist { }; 
I found the fundamentals to convert the code that I already have into modern form. I want to convert and get it integrated first. If there are better practices, I can adjust the implementation in a working test harness.
I have already shown the definition of the basic type_list
structure that I use as well as a demonstration of the length
and param_pack
, and the implementation for add_to
. In the code below, I have omitted the forward declarations and template aliases that I define in the type list header file.
I am going to blast through the different operations that I have built so I do not take up too much more of your time. If something is not clear, please drop a comment and I can further explain or even add more detail to the description.
I have posted a link to the single header file that contains all of these definitions at the bottom.
make_type_list
I wanted to be able to make a type_list
from an existing set of internal type_node
s, and then later, a param_pack
.
C++
template< typename... T>  
struct make_type_list< type_node<T...>>  
: container_trait  
{  
using type = type_list<T...>;  
};  
 
template< typename... T>  
struct make_type_list< param_pack<T...>>  
: container_trait  
{  
using type = type_list<T...>;  
}; 
type_at
Query the type of element at a specified index in the type_list
. This item required a helper template that I called type_of_node
.
C++
template< std::size_t IdxT,  
typename NodesT  
>  
struct type_of_node  
{  
using type =  
typename type_of_node<IdxT1, typename NodesT::tail>::type;  
};  
 
// Terminating specialization  
template< typename NodesT >  
struct type_of_node<0, NodesT>  
{  
using type = typename NodesT::head;  
}; 
Now for the actual implementation of type_at
.
C++
template< std::size_t IdxT,  
typename T  
>  
struct type_at  
{  
using nodes = typename T::nodes;  
using type = typename type_of_node<IdxT, nodes>::type;  
using rest = typename make_type_list<typename nodes::tail>::type;  
};  
 
// A terminating case for type_at  
template< std::size_t IdxT,  
typename T  
>  
struct type_at<IdxT, empty>  
{  
using nodes = empty;  
using type = empty;  
using rest = empty;  
}; 
I added the declaration of nodes
to simplify the declaration for type
. This wasn't strictly necessary. I added rest
for convenience in other solutions. rest
returns a type_list
of the elements remaining after the specified index.
For example, if there were 10 elements in a type list and index 6 was specified. A type list with elements [7,8,9] would be returned.
Stop me if I go too fast for the rest of these.
front
C++
template< typename T >  
struct front  
{  
/// Type of the first element in the list.  
using type = type_at_t<0, T>;  
using rest = typename type_at<0, T>::rest;  
}; 
back
C++
template< typename T >  
struct back  
{  
/// Type of the last element in the list.  
using type = type_at_t<length<T>::value1, T>;  
}; 
pop_front
C++
template< typename T >  
struct pop_front  
{  
using type = typename front<T>::rest;  
}; 
push_front
C++
template< typename F, typename L >  
struct push_front  
{  
private:  
using params = typename to_param_pack<typename L::nodes>::type;  
using sum = typename add_to<F, params>::type;  
 
public:  
using type = typename make_type_list<sum>::type;  
}; 
push_back
C++
template<typename L, typename B>  
struct push_back  
{  
private:  
using params = typename to_param_pack<typename L::nodes>::type;  
using sum = typename add_to<params, B>::type;  
 
public:  
using type = typename make_type_list<sum>::type;  
}; 
New functionality updated since the original post
pop_back
C++
template<typename T>  
struct pop_back  
{  
using type = typename split_t<length<T>::value  1, T>::type;  
};  
 
// Terminating specialization  
template< >  
struct pop_back  
{  
using type = empty;  
}; 
move_item
Move a specified number of elements from the front of the second list, to the end of the first list. This function is used to implement split
, which is then used to implement pop_back
.
C++
template<std::size_t CountT, typename T, typename R>  
struct move_item  
{  
private:  
using first = push_back<T, front_t<R>>;  
using last = pop_front<R>;  
 
public:  
using type = typename move_item<CountT1, first, last>::type;  
using rest = typename move_item<CountT1, first, last>::rest;  
};  
 
// Terminating specialization  
template<typename T, typename R>  
struct move_item<0, T, R>  
{  
using type = T;  
using type = R;  
}; 
split
Splits the list into two separate lists at the specified pivot index.
C++
template<std::size_t PivotT, typename T>  
struct split  
{  
static_assert(PivotT <= length<T>::value,  
"The split pivot index is out of range");  
 
using type = typename move_item<PivotT, type_list<>, T>::type;  
using rest = typename move_item<PivotT, type_list<>, T>::rest;  
}; 
Summary
Again, I am pleased at how much simpler my code has become with these new additions. It's still C++. It's like C++ with the Hemi. Statements can be expressed more tersely, which actually increases the readability of the code as opposed to the lose meaning. Repetitive typing and redundant code can also be reduced.
If you frequently program with templates, or are even a big fan of the C++ Standard Library, you owe it to yourself to become familiar with these two features.
As promised, here is a link to the full type list implementation:
 Download type_list.h  19.8 KB
 Update June 6, 2015 at 8:49 PM MST
Recent Comments