Embedded Alchemy

Alchemy Send feedback »

Alchemy is a collection of independent library components that specifically relate to efficient low-level constructs used with embedded and network programming.

The latest version of Embedded Alchemy[^] can be found on GitHub.
The most recent entries as well as Alchemy topics to be posted soon:
  • Steganography[^]
  • Coming Soon: Alchemy: Data View
  • Coming Soon: Quad (Copter) of the Damned
Alchemy: Documentation[^]

I just completed my Masters Degree in Cybersecurity at Johns Hopkins University. I plan to resume Alchemy's development. I plan to use my newly acquired knowledge to add constructs that will help improve the security of devices built for the Internet of (Insecure) Things.

Geometry in 3D Computer Graphics

math Send feedback »

I previously introduced how to perform the basic matrix operations commonly used in 3D computer graphics. However, we haven’t yet reached anything that resembles graphics. I now introduce the concepts of geometry as it’s related to computer graphics. I intend to provide you with the remaining foundation necessary to be able to mathematically visualize the concepts at work. Unfortunately, we will still not be able to reach the "computer graphics" stage with this post; that must wait until the next entry.

This post covers a basic form of a polygonal mesh to represent models, points and Euclidean vectors. With the previous entry and this one, we will have all of the background necessary for me to cover matrix transforms in the next post. Specifically the 3D projection transform, which provides the illusion of perspective. This will be everything that we need to create a wireframe viewer. For now, let's focus on the topics of this post.

Modeling the Geometry

We will use models composed of a mesh of polygons. All of the polygons will be defined as triangles. It is important to consistently define the points of each triangle either clockwise or counter-clockwise. This is because the definition order affects the direction a surface faces. If a single model mixes the order of point definitions, your rendering will have missing polygons.

The order of point definition for the triangle can be ignored if a surface normal is provided for each polygon. However, to keep things simple, I will only be demonstrating with basic models that are built from an index array structure. This data structure contains two arrays. The first array is a list of all of the vertices (points) in the model. The second array contains a three-element structure with the index of the point that can be found in the first array. I define all of my triangles counter-clockwise while looking at the front of the surface.

Generally, your models should be defined at or near the origin. The rotation transformation that we will explore in the next post requires a pivot point, which is defined at the origin. Model geometries defined near the origin are also easier to work with when building models from smaller components and using instancing to create multiple copies of a single model.

Here is a definition for a single triangle starting at the origin and adjacent to the Y plane. The final corner decreases for Y as it moves diagonally along the XZ plane:

  Points
0: 0 0 0
1: 1 0 0
2: 0 1 0
  Triangles
A 0 1 2
  
Model
Single Triangle Model

This definition represents a unit cube starting at the origin. A cube has 8 corners to hold six square-faces. It requires two triangles to represent each square. Therefore, there are 8 points used to define 12 triangles:

  Points
0: 0 0 0
1: 1 0 0
2: 1 1 0
3: 0 1 0
4: 0 1 1
5: 1 1 1
6: 1 0 1
7: 0 0 1
  Triangles
A 7 6 4
B 4 6 5
C 5 6 1
D 5 1 2
E 5 2 3
F 5 3 4
G 4 3 7
H 0 7 3
I 7 0 6
J 6 0 1
K 0 3 1
L 1 3 2
  
Model
Cube Model (Front)
Cube Model (Rear)

Vectors

Recall from my previous post that a vector is a special type of matrix, in which there is only a single row or column. You may be aware of a different definition for the term "vector"; something like

NOUN
A quantity having direction as well as magnitude, especially determining the position of one point in space relative to another.

Oxford Dictionary

From this point forward, when I use the term "vector", I mean a Euclidean vector. We will even use single-row and single-column matrices to represent these vectors; we’re just going to refer to those structures as matrices to minimize the confusion.

Notational Convention

It is most common to see vectors represented as single-row matrices, as opposed to the standard mathematical notation which uses column matrices to represent vectors. It is important to remember this difference in notation based on what type of reference you are following.

The reason for this difference, is the row-form makes composing the transformation matrix more natural as the operations are performed left-to-right. I discuss the transformation matrix in the next entry. As you will see, properly constructing transformation matrices is crucial to achieve the expected results, and anything that we can do to simplify this sometimes complex topic, will improve our chances of success.

Representation

As the definition indicates, a vector has a direction and magnitude. There is a lot of freedom in that definition for representation, because it is like saying "I don’t know where I am, but I know which direction I am heading and how far I can go." Therefore, if we think of our vector as starting at the origin of a Cartesian grid, we could then represent a vector as a single point.

This is because starting from the origin, the direction of the vector is towards the point. The magnitude can be calculated by using Pythagoras' Theorem to calculate the distance between two points. All of this information can be encoded in a compact, single-row matrix. As the definition indicates, we derive a vector from two points, by subtracting one point from the other using inverse of matrix addition.

This will give us the direction to the destination point, relative to the starting point. The subtraction effectively acts as a translation of the vectors starting point to the origin.

For these first few vector examples, I am going to use 2D vectors. Because they are simpler to demonstrate and diagram, and the only difference to moving to 3D vectors is an extra field has been added.

Here is a simple example with two points (2, 1) and (5, 5). The first step is to encode them in a matrix:

\( {\bf S} = \left\lbrack \matrix{2 & 1} \right\rbrack, {\bf D} = \left\lbrack \matrix{5 & 5} \right\rbrack \)

Two points

Let’s create two vectors, one that determines how to get to point D starting from S, and one to get to point S starting from D. The first step is to subtract the starting point from the destination point. Notice that if we were to subtract the starting point from itself, the result is the origin, (0, 0).

\( \eqalign{ \overrightarrow{SD} &= [ \matrix{5 & 5} ] - [ \matrix{2 & 1} ] \cr &= [ \matrix{3 & 4} ] } \)

S-to-D translated Vector SD

\( \eqalign{ \overrightarrow{DS} &= [ \matrix{2 & 1} ] - [ \matrix{5 & 5} ] \cr &= [ \matrix{-3 & -4} ] } \)

D-to-S translated Vector DS

Notice that the only difference between the two vectors is the direction. The magnitude of each vector is the same, as expected. Here is how to calculate the magnitude of a vector:

Given:

\( \eqalign{ v &= \matrix{ [a_1 & a_2 & \ldots & a_n]} \cr |v| &= \sqrt{\rm (a_1^2 + a_2^2 + \ldots + a_n^2)} } \)


Operations

I covered scalar multiplication and matrix addition in my previous post. However, I wanted to briefly revisit the topic to demonstrate one way to think about vectors, which may help you develop a better intuition for the math. For the coordinate systems that we are working with, vectors are straight lines.

Two vectors

When you add two or more vectors together, you can think of them as moving the tail of each successive vector to the head of the previous vector. Then add the corresponding components.

\( \eqalign{ v &= A+B \cr |v| &= \sqrt{\rm 3^2 + 4^2} \cr &= 5 }\)

Visualization of adding vectors

Regardless of the orientation of the vector, it can be decomposed into its individual contributions per axis. In our model for the vector, the contribution for each axis is the raw value defined in our matrix.

3 vectors to add together Arrange vectors head-to-tail

\( v = A+B+C \)

The original vectors decomposed into axis aligned vectors The sum of the original 3 vectors

\( \eqalign{ |v| &= \sqrt{\rm (3+2-3)^2 + (1+3+2)^2} \cr &= \sqrt{\rm 2^2 + 6^2} \cr &= 2 \sqrt{\rm 10} }\)

Unit Vector

We can normalize a vector to create unit vector, which is a vector that has a length of 1. The concept of the unit vector is introduced to reduce a vector to simply indicate its direction. This type of vector can now be used as a unit of measure to which other vectors can be compared. This also simplifies the task of deriving other important information when necessary.

To normalize a vector, divide the vector by its magnitude. In terms of the matrix operations that we have discussed, this is simply scalar multiplication of the vector with the inverse of its magnitude.

\( u = \cfrac{v}{ |v| } \)

\(u\) now contains the information for the direction of \(v\). With algebraic manipulation we can also have:

\( v = u |v| \)

This basically says the vector \(v\) is equal to the unit vector that has the direction of \(u\) multiplied by the magnitude of \(v\).

Dot Product

The dot product is used for detecting visibility and shading of a surface. The dot product operation is shown below:

\( \eqalign{ w &= u \cdot v \cr &= u_1 v_1 + u_2 v_2 + \cdots + u_n v_n }\)

This formula may look familiar. That is because this is a special case in matrix multiplication called the inner product, which looks like this (remember, we are representing vectors as a row matrix):

\( \eqalign{ u \cdot v &= u v^T \cr &= \matrix {[u_1 & u_2 & \cdots & u_n]} \left\lbrack \matrix{v_1 \\ v_2 \\ \vdots \\ v_n } \right\rbrack }\)

We will use the dot product on two unit vectors to help us determine the angle between these vectors; that’s right, angle. We can do this because of the cosine rule from Trigonometry. The equation that we can use is as follows:

\( \cos \Theta = \cfrac{u \cdot v} {|u||v|} \)

To convert this to the raw angle, \(\Theta\), you can use the inverse function of cosine, arccos(\( \Theta \)). In C and C++ the name of the function is acos().

\( \cos {\arccos \Theta} = \Theta \)

So what is actually occurring when we calculate the dot product that allows us to get an angle?

Extract angle from dot product

We are essentially calculating the projection of u onto v. This gives us enough information to manipulate the Trigonometric properties to derive \( \Theta \).

Projection on unit vectors

I would like to make one final note regarding the dot product. Many times it is only necessary to know the sign of the angle between two vectors. The following table shows the relationship between the dot product of two vectors and the value of the angle, \( \Theta \), which is in the range \( 0° \text{ to } 180° \). If you like radians, that is \( 0 \text{ to } \pi \).

\( w = u \cdot v = f(u,w) \)

\( f(u,w) = \cases{ w \gt 0 & \text { if } \Theta \lt 90° \cr w = 0 & \text { if } \Theta = 90° \cr w \lt 0 & \text { if } \Theta \gt 90° }\)

Theta < 90 degrees, therefore w is positive Theta = 90 degrees, therefore w is zero Theta /> 90 degrees, therefore w is negative

Cross Product

The cross product is used to calculate a vector that is perpendicular to a surface. The name for this vector is the surface normal. The surface normal indicates which direction the surface is facing. Therefore, it is used quite extensively for shading and visibility as well.

We have reached the point where we need to start using examples and diagrams with three dimensions. We will use the three points that define our triangular surface to create two vectors. Remember, it is important to use consistent definitions when creating models.

The first step is to calculate our planar vectors. What does that mean? We want to create two vectors that are coplanar (on the same plane), so that we may use them to calculate the surface normal. All that we need to be able to do this are three points that are non-collinear, and we have this with each triangle definition in our polygonal mesh model.

If we have defined a triangle \(\triangle ABC\), we will define two vectors, \(u = \overrightarrow{AB}\) and \(v = \overrightarrow{AC}\).

A single triangle surface planar vectors derived from the surface definition

If we consistently defined the triangle surfaces in our model, we should now be able to take the cross product of \(u \times v\) to receive the surface normal, \(N_p\), of \(\triangle ABC\). Again, the normal vector is perpendicular to its surface.

Counter-clockwise cross product Clockwise cross product

In practice, this tells us which direction the surface is facing. We would combine this with the dot product to help us determine if are interested in this surface for the current view.

So how do you calculate the cross product?

Here is what you need to know in order to calculate the surface normal vector:

\(\eqalign { N_p &= u \times v \cr &= [\matrix{(u_2v_3 – u_3v_2) & (u_3v_1 – u_1v_3) & (u_1v_2 – u_2v_1)}] }\)

To help you remember how to perform the calculation, I have re-written it in parts, and used \(x\), \(y\), and \(z\) rather than the index values.

\(\eqalign { x &= u_yv_z – u_zv_y \cr y &= u_zv_x – u_xv_z \cr z &= u_xv_y – u_yv_x }\)

Let's work through an example to demonstrate the cross product. We will use surface A from our axis-aligned cube model that we constructed at the beginning of this post. Surface A is defined completely within the XY plane. How do we know this? Because all of the points have the same \(z\)-coordinate value of 1. Therefore, we should expect that our calculation for \(N_p\) of surface A will be an \(z\) axis-aligned vector pointing in the positive direction.

Step 1: Calculate our surface description vectors \(u\) and \(v\):


\( Surface_A = pt_7: (0,0,1), pt_6: (1,0,1), \text{ and } pt_4: (0,1,1) \)

\( \eqalign { u &= pt_6 - pt_7 \cr &= [\matrix{1 & 0 & 1}] - [\matrix{0 & 0 & 1}] \cr &=[\matrix{\color{red}{1} & \color{green}{0} & \color{blue}{0}}] }\)

\( \eqalign { v &= pt_4 - pt_7 \cr &= [\matrix{0 & 1 & 1}] - [\matrix{0 & 0 & 1}] \cr &=[\matrix{\color{red}{0} & \color{green}{1} & \color{blue}{0}}] }\)

Step 2: Calculate \(u \times v\):


\(N_p = [ \matrix{x & y & z} ]\text{, where:}\)

\( \eqalign { x &= \color{green}{u_y}\color{blue}{v_z} – \color{blue}{u_z}\color{green}{v_y} \cr &= \color{green}{0} \cdot \color{blue}{0} - \color{blue}{0} \cdot \color{green}{1} \cr &= 0 }\)

\( \eqalign { y &= \color{blue}{u_z}\color{red}{v_x} – \color{red}{u_x}\color{blue}{v_z} \cr &= \color{blue}{0} \cdot \color{red}{0} - \color{red}{1} \cdot \color{blue}{0} \cr &= 0 }\)

\( \eqalign { z &= \color{red}{u_x}\color{green}{v_y} – \color{green}{u_y}\color{red}{v_x} \cr &= \color{red}{1} \cdot \color{green}{1} - \color{green}{0} \cdot \color{red}{0} \cr &= 1 }\)


\(N_p = [ \matrix{0 & 0 & 1} ]\)

This is a \(z\) axis-aligned vector pointing in the positive direction, which is what we expected to receive.

Notation Formality

If you look at other resources to learn more about the cross product, it is actually defined as a sum in terms of three axis-aligned vectors \(i\), \(j\), and \(k\), which then must be multiplied by the vectors \(i=[\matrix{1&0&0}]\), \(j=[\matrix{0&1&0}]\), and \(k=[\matrix{0&0&1}]\) to get the final vector. The difference between what I presented and the form with the standard vectors, \(i\), \(j\), and \(k\), is I omitted a step by placing the result calculations directly in a new vector. I wanted you to be aware of this discrepancy between what I have demonstrated and what is typically taught in the classroom.

Apply What We Have Learned

We have discovered quite a few concepts between the two posts that I have written regarding the basic math and geometry involved with computer graphics. Let's apply this knowledge and solve a useful problem. Detecting if a surface is visible from a specific point-of-view is a fundamental problem that is used extensively in this domain. Since we have all of the tools necessary to solve this, let's run through two examples.

For the following examples we will specify the view-point as \(pt_{eye} = [\matrix{3 & 2 & 3}]:\)

View-point origin (eye)

Also, refer to this diagram for the surfaces that we will test for visibility from the view-point. The side surface is surface \(D\) from the cube model, and the hidden surface is the bottom-facing surface \(J\).

Surfaces to test for visibility

Detect a visible surface

Step 1: Calculate our surface description vectors \(u\) and \(v\):


\( \eqalign{ Surface_D: &pt_5: (1,1,1), \cr &pt_1: (1,0,0), \cr &pt_2: (1,1,0) } \)

\( \eqalign { u &= pt_1 - pt_5 \cr &= [\matrix{1 & 0 & 0}] - [\matrix{1 & 1 & 1}] \cr &=[\matrix{0 & -1 & -1}] }\)

\( \eqalign { v &= pt_2 - pt_5 \cr &= [\matrix{1 & 1 & 0}] - [\matrix{1 & 1 & 1}] \cr &=[\matrix{0 & 0 & -1}] }\)

Step 2: Calculate \(u \times v\):


\(N_{pD} = [ \matrix{x & y & z} ]\text{, where:}\)

\( \eqalign { x &= -1 \cdot (-1) - (-1) \cdot 0 \cr &= 1 }\)

\( \eqalign { y &= -1 \cdot 0 - (-1) \cdot 0 \cr &= 0 }\)

\( \eqalign { z &= 0 \cdot 0 - (-1) \cdot 0 \cr &= 0 }\)

\(N_pD = [ \matrix{1 & 0 & 0} ]\)

Step 3: Calculate vector to the eye:


To calculate a vector to the eye, we need to select a point on the target surface and create a vector to the eye. For this example, we will select \(pt_5\)

\( \eqalign{ \overrightarrow{view_D} &= pt_{eye} - pt_5\cr &= [ \matrix{3 & 2 & 3} ] - [ \matrix{1 & 1 & 1} ]\cr &= [ \matrix{2 & 1 & 2} ] }\)

Step 4: Normalize the view-vector:


Before we can use this vector in a dot product, we must normalize the vector:

\( \eqalign{ |eye_D| &= \sqrt{\rm 2^2 + 1^2 + 2^2} \cr &= \sqrt{\rm 4 + 1 + 4} \cr &= \sqrt{\rm 9} \cr &= 3 \cr \cr eye_u &= \cfrac{eye_D}{|eye_D|} \cr &= \cfrac{1}3 [\matrix{2 & 1 & 2}] \cr &= \left\lbrack\matrix{\cfrac{2}3 & \cfrac{1}3 & \cfrac{2}3} \right\rbrack }\)

Step 5: Calculate the dot product of the view-vector and surface normal:


\( \eqalign{ w &= eye_{u} \cdot N_{pD} \cr &= \cfrac{2}3 \cdot 1 + \cfrac{1}3 \cdot 0 + \cfrac{2}3 \cdot 0 \cr &= \cfrac{2}3 }\)

Step 6: Test for visibility:


\(w \gt 0\), therefore \(Surface_D\) is visible.

Side Surface D

Detect a surface that is not visible

Step 1: Calculate our surface description vectors \(u\) and \(v\):


\( \eqalign{ Surface_J: &pt_6: (1,1,1), \cr &pt_0: (0,0,0), \cr &pt_1: (1,0,0) } \)

\( \eqalign { u &= pt_0 - pt_6 \cr &= [\matrix{0 & 0 & 0}] - [\matrix{1 & 0 & 1}] \cr &=[\matrix{-1 & 0 & -1}] }\)

\( \eqalign { v &= pt_1 - pt_6 \cr &= [\matrix{1 & 0 & 0}] - [\matrix{1 & 0 & 1}] \cr &=[\matrix{0 & 0 & -1}] }\)

Step 2: Calculate \(u \times v\):


\(N_{pJ} = [ \matrix{x & y & z} ]\text{, where:}\)

\( \eqalign { x &= 0 \cdot (-1) - (-1) \cdot 0 \cr &= 0 }\)

\( \eqalign { y &= -1 \cdot 0 - (-1) \cdot (-1) \cr &= -1 }\)

\( \eqalign { z &= (-1) \cdot 0 - 0 \cdot 0 \cr &= 0 }\)

\(N_pJ = [ \matrix{0 & -1 & 0} ]\)

Step 3: Calculate vector to the eye:


For this example, we will select \(pt_6\)

\( \eqalign{ \overrightarrow{view_J} &= pt_{eye} - pt_6\cr &= [ \matrix{3 & 2 & 3} ] - [ \matrix{1 & 0 & 1} ]\cr &= [ \matrix{2 & 2 & 2} ] }\)

Step 4: Normalize the view-vector:


Before we can use this vector in a dot product, we must normalize the vector:

\( \eqalign{ |eye_J| &= \sqrt{\rm 2^2 + 2^2 + 2^2} \cr &= \sqrt{\rm 4 + 4 + 4} \cr &= \sqrt{\rm 12} \cr &= 2 \sqrt{\rm 3} \cr \cr eye_u &= \cfrac{eye_J}{|eye_J|} \cr &= \cfrac{1}{2 \sqrt 3} [\matrix{2 & 2 & 2}] \cr &= \left\lbrack\matrix{\cfrac{1}{\sqrt 3} & \cfrac{1}{\sqrt 3} & \cfrac{1}{\sqrt 3}} \right\rbrack }\)

Step 5: Calculate the dot product of the view-vector and surface normal:


\( \eqalign{ w &= eye_{u} \cdot N_{pJ} \cr &= \cfrac{1}{\sqrt 3} \cdot 0 + \cfrac{1}{\sqrt 3} \cdot (-1) + \cfrac{1}{\sqrt 3} \cdot 0 \cr &= -\cfrac{1}{\sqrt 3} }\)

Step 6: Test for visibility:


\(w \lt 0\), therefore \(Surface_J\) is not visible.

Bottom Surface J

What has this example demonstrated?

Surprisingly, this basic task demonstrates every concept that we have learned from the two posts up to this point:

  • Matrix Addition: Euclidean vector creation
  • Scalar Multiplication: Unit vector calculation, multiplying by \(1 / |v| \).
  • Matrix Multiplication: Dot Product calculation, square-matrix multiplication will be used in the next post.
  • Geometric Model Representation: The triangle surface representation provided the points for our calculations.
  • Euclidean Vector: Calculation of each surfaces planar vectors, as well as the view vector.
  • Cross Product: Calculation of the surface normal, \(N_p\).
  • Unit Vector: Preparation of vectors for dot product calculation.
  • Dot Product: Used the sign from this calculation to test for visibility of a surface.

Summary

Math can be difficult sometimes, especially with all of the foreign notation. However, with a bit of practice the notation will soon become second nature and many calculations can be performed in your head, similar to basic arithmetic with integers. While many graphics APIs hide these calculations from you, don't be fooled because they still exist and must be performed to display the three-dimensional images on your screen. By understanding the basic concepts that I have already demonstrated, you will be able to better troubleshoot your graphics programs when something wrong occurs. Even if you rely solely on a graphics library to abstract the complex mathematics.

There is only one more entry to go, and we will have enough basic knowledge of linear algebra, Euclidean geometry, and computer displays to be able to create, manipulate and display 3D models programmatically. The next entry discusses how to translate, scale and rotate geometric models. It also describes what must occur to create the illusion of a three-dimensional image on a two-dimensional display with the projection transform.

References

Watt, Alan, "Three-dimensional geometry in computer graphics," in 3D Computer Graphics, Harlow, England: Addison-Wesley, 1993

Introduction to the Math of Computer Graphics

CodeProject, beginner, math 2 feedbacks »

Three-dimensional computer graphics is an exciting aspect of computing because of the amazing visual effects that can be created for display. All of this is created from an enormous number of calculations that manipulate virtual models, which are constructed from some form of geometric definition. While the math involved for some aspects of computer graphics and animation can become quite complex, the fundamental mathematics that is required is very accessible. Beyond learning some new mathematic notation, a basic three-dimensional view can be constructed with algorithms that only require basic arithmetic and a little bit of trigonometry.

I demonstrate how to perform some basic operations with two key constructs for 3D graphics, without the rigorous mathematic introduction. Hopefully the level of detail that I use is enough for anyone that doesn't have a strong background in math, but would very much like to play with 3D APIs. I introduce the math that you must be familiar with in this entry, and in two future posts I demonstrate how to manipulate geometric models and apply the math towards the display of a 3D image.

Linear Algebra

The type of math that forms the basis of 3D graphics is called linear algebra. In general, linear algebra is quite useful for solving systems of linear equations. Rather than go into the details of linear algebra, and all of its capabilities, we are going to simply focus on two related constructs that are used heavily within the topic, the matrix and the vertex.

Matrix

The Matrix provides an efficient mechanism to translate, scale, rotate, and convert between different coordinate systems. This is used extensively to manipulate the geometric models for environment calculations and display.

These are all examples of valid matrices:

\( \left\lbrack \matrix{a & b & c \cr d & e & f} \right\rbrack, \left\lbrack \matrix{10 & -27 \cr 0 & 13 \cr 7 & 17} \right\rbrack, \left\lbrack \matrix{x^2 & 2x \cr 0 & e^x} \right\rbrack\)

Square matrices are particularly important, that is a matrix with the same number of columns and rows. There are some operations that can only be performed on a square matrix, which I will introduce shortly. The notation for matrices, in general, uses capital letters as the variable names. Matrix dimensions can be specified as a shorthand notation, and to also identify an indexed position within the matrix. As far as I am aware, row-major indexing is always used for consistency, that is, the first index represents the row, and the second represents the column.

\( A= [a_{ij}]= \left\lbrack \matrix{a_{11} & a_{12} & \ldots & a_{1n} \cr a_{21} & a_{22} & \ldots & a_{2n} \cr \vdots & \vdots & \ddots & \vdots \cr a_{m1} & a_{m2} & \ldots & a_{mn} } \right\rbrack \)

Vector

The vector is a special case of a matrix, where there is only a single row or a single column; also a common practice is to use lowercase letters to represent vectors:

\( u= \left\lbrack \matrix{u_1 & u_2 & u_3} \right\rbrack \), \( v= \left\lbrack \matrix{v_1\cr v_2\cr v_3} \right\rbrack\)

Operations

The mathematical shorthand notation for matrices is very elegant. It simplifies these equations that would be quite cumbersome, otherwise. There are only a few operations that we are concerned with to allow us to get started. Each operation has a basic algorithm to follow to perform a calculation, and the algorithm easily scales with the dimensions of a matrix.

Furthermore, the relationship between math and programming is not always clear. I have a number of colleagues that are excellent programmers and yet they do not consider their math skills very strong. I think that it could be helpful to many to see the conversion of the matrix operations that I just described from mathematical notation into code form. Once I demonstrate how to perform an operation with the mathematical notation and algorithmic steps, I will also show a basic C++ implementation of the concept to help you understand how these concepts map to actual code.

The operations that I implement below assume a Matrix class with the following interface:

C++

class Matrix
{
public:
  // Ctor and Dtor omitted
 
  // Calculate and return a reference to the specified element.
  double& element(size_t row, size_t column);
 
  // Resizes this Matrix to have the specified size.
  void resize(size_t row, size_t column);
 
  // Returns the number rows.
  size_t rows();
 
  // Returns the number of columns.
  size_t columns();
private:
  std::vector<double>    data;
};

Transpose

Transpose is a unary operation that is performed on a single matrix, and it is represented by adding a superscript T to target matrix.

For example, the transpose of a matrix A is represented with AT.

The transpose "flips" the orientation of the matrix so that each row becomes a column, and each original column is transformed into a row. I think that a few examples will make this concept more clear.

\( A= \left\lbrack \matrix{a & b & c \cr d & e & f \cr g & h & i} \right\rbrack \)

\(A^T= \left\lbrack \matrix{a & d & g \cr b & e & h \cr c & f & i} \right\rbrack \)

The resulting matrix will contain the same set of values as the original matrix, only their position in the matrix changes.

\( B= \left\lbrack \matrix{1 & 25 & 75 & 100\cr 0 & 5 & -50 & -25\cr 0 & 0 & 10 & 22} \right\rbrack \)

\(B^T= \left\lbrack \matrix{1 & 0 & 0 \cr 25 & 5 & 0 \cr 75 & -50 & 10 \cr 100 & -25 & 22} \right\rbrack \)

It is very common to transpose a matrix, including vertices, before performing other operations. The reason will become clear for matrix multiplication.

\( u= \left\lbrack \matrix{u_1 & u_2 & u_3} \right\rbrack \)

\(u^T= \left\lbrack \matrix{u_1 \cr u_2 \cr u_3} \right\rbrack \)

Matrix Addition

Addition can only be performed between two matrices that are the same size. By size, we mean that the number of rows and columns are the same for each matrix. Addition is performed by adding the values of the corresponding positions of both matrices. The sum of the values creates a new matrix that is the same size as the original two matrices provided to the add operation.

If

\( A= \left\lbrack \matrix{0 & -2 & 4 \cr -6 & 8 & 0 \cr 2 & -4 & 6} \right\rbrack, B= \left\lbrack \matrix{1 & 3 & 5 \cr 7 & 9 & 11 \cr 13 & 15 & 17} \right\rbrack \)

Then

\(A+B=\left\lbrack \matrix{1 & 1 & 9 \cr 1 & 17 & 11 \cr 15 & 11 & 23} \right\rbrack \)

The size of these matrices do not match in their current form.

\(U= \left\lbrack \matrix{4 & -8 & 5}\right\rbrack, V= \left\lbrack \matrix{-1 \\ 5 \\ 4}\right\rbrack \)

However, if we take the transpose of one of them, their sizes will match and they can then be added together. The size of the result matrix depends upon which matrix we perform the transpose operation on from the original expression.:

\(U^T+V= \left\lbrack \matrix{3 \\ -3 \\ 9} \right\rbrack \)

Or

\(U+V^T= \left\lbrack \matrix{3 & -3 & 9} \right\rbrack \)

Matrix addition has the same algebraic properties as with the addition of two scalar values:

Commutative Property:

\( A+B=B+A \)

Associative Property:

\( (U+V)+W=U+(V+W) \)

Identity Property:

\( A+0=A \)

Inverse Property:

\( A+(-A)=0 \)

The code required to implement matrix addition is relatively simple. Here is an example for the Matrix class definition that I presented earlier:

C++

void Matrix::operator+=(const Matrix& rhs)
{
  if (rhs.data.size() == data.size())
  {
    // We can simply add each corresponding element
    // in the matrix element data array.
    for (size_t index = 0; index < data.size(); ++index)
    {
      data[index] += rhs.data[index];
    }
  }
}
 
Matrix operator+( const Matrix& lhs,
                  const Matrix& rhs)
{
  Matrix result(lhs);
  result += rhs;
 
  return result;
}

Scalar Multiplication

Scalar multiplication allows a single scalar value to be multiplied with every entry within a matrix. The result matrix is the same size as the matrix provided to the scalar multiplication expression:

If

\( A= \left\lbrack \matrix{3 & 6 & -9 \cr 12 & -15 & -18} \right\rbrack \)

Then

\( \frac{1}3 A= \left\lbrack \matrix{1 & 2 & -3 \cr 4 & -5 & -6} \right\rbrack, 0A= \left\lbrack \matrix{0 & 0 & 0 \cr 0 & 0 & 0} \right\rbrack, -A= \left\lbrack \matrix{-3 & -6 & 9 \cr -12 & 15 & 18} \right\rbrack, \)

Scalar multiplication with a matrix exhibits these properties, where c and d are scalar values:

Distributive Property:

\( c(A+B)=cA+cB \)

Identity Property:

\( 1A=A \)

The implementation for scalar multiplication is even simpler than addition.
Note: this implementation only allows the scalar value to appear before the Matrix object in multiplication expressions, which is how the operation is represented in math notation.:

C++

void Matrix::operator*=(const double lhs)
{
  for (size_t index = 0; index < data.size(); ++index)
  {
    data[index] *= rhs;
  }
}
 
Matrix operator*( const double scalar,
                  const Matrix& rhs)
{
  Matrix result(rhs);
  result *= scalar;
 
  return result;
}

Matrix Multiplication

Everything seems very simple with matrices, at least once you get used to the new structure. Then you are introduced to matrix multiplication. The algorithm for multiplication is not difficult, however, it is much more labor intensive compared to the other operations that I have introduced to you. There are also a few more restrictions on the parameters for multiplication to be valid. Finally, unlike the addition operator, the matrix multiplication operator does not have all of the same properties as the multiplication operator for scalar values; specifically, the order of parameters matters.

Input / Output

Let's first address what you need to be able to multiply matrices, and what type of matrix you can expect as output. Once we have addressed the structure, we will move on to the process.

Given an the following two matrices:

\( A= \left\lbrack \matrix{a_{11} & \ldots & a_{1n} \cr \vdots & \ddots & \vdots \cr a_{m1} & \ldots & a_{mn} } \right\rbrack, B= \left\lbrack \matrix{b_{11} & \ldots & b_{1v} \cr \vdots & \ddots & \vdots \cr b_{u1} & \ldots & b_{uv} } \right\rbrack \)

A valid product for \( AB=C \) is only possible if number of columns \( n \) in \( A \) is equal to the number of rows \( u \) in \( B \). The resulting matrix \( C \) will have the dimensions \( m \times v \).

\( AB=C= \left\lbrack \matrix{c_{11} & \ldots & c_{1v} \cr \vdots & \ddots & \vdots \cr c_{m1} & \ldots & c_{mv} } \right\rbrack, \)

Let's summarize this in a different way, hopefully this arrangement will make the concept more intuitive:

Matrix multiplication dimensions

One last form of the rules for matrix multiplication:

  • The number of columns, \(n\), in matrix \( A \) must be equal to the number of rows, \(u\), in matrix \( B \):
    \( n = u \)

  • The output matrix \( C \) will have the number of rows, \(m\), in \(A\), and the number of columns, \(v\), in \(B\):
    \( m \times v \)

  • \(m\) and \(v\) do not have to be equal. The only requirement is that they are both greater-than zero:
    \( m \gt 0,\)
    \(v \gt 0 \)

How to Multiply

To calculate a single entry in the output matrix, we must multiply the element from each column in the first matrix, with the element in the corresponding row in the second matrix, and add all of these products together. We use the same row in the first matrix, \(A\), for which we are calculating the row element in \(C\). Similarly, we use the column in the second matrix, \(B\) that corresponds with the calculating column element in \(C\).

More succinctly, we can say we are multiplying rows into columns.

For example:

\( A= \left\lbrack \matrix{a_{11} & a_{12} & a_{13} \cr a_{21} & a_{22} & a_{23}} \right\rbrack, B= \left\lbrack \matrix{b_{11} & b_{12} & b_{13} & b_{14} \cr b_{21} & b_{22} & b_{23} & b_{24} \cr b_{31} & b_{32} & b_{33} & b_{34} } \right\rbrack \)

The number of columns in \(A\) is \(3\) and the number of rows in \(B\) is \(3\), therefore, we can perform this operation. The size of the output matrix will be \(2 \times 4\).

This is the formula to calculate the element \(c_{11}\) in \(C\) and the marked rows used from \(A\) and the columns from \(B\):

\( \left\lbrack \matrix{\color{#B11D0A}{a_{11}} & \color{#B11D0A}{a_{12}} & \color{#B11D0A}{a_{13}} \cr a_{21} & a_{22} & a_{23}} \right\rbrack \times \left\lbrack \matrix{\color{#B11D0A}{b_{11}} & b_{12} & b_{13} & b_{14} \cr \color{#B11D0A}{b_{21}} & b_{22} & b_{23} & b_{24} \cr \color{#B11D0A}{b_{31}} & b_{32} & b_{33} & b_{34} } \right\rbrack = \left\lbrack \matrix{\color{#B11D0A}{c_{11}} & c_{12} & c_{13} & c_{14}\cr c_{21} & c_{22} & c_{23} & c_{24} } \right\rbrack \)

\( c_{11}= (a_{11}\times b_{11}) + (a_{12}\times b_{21}) + (a_{13}\times b_{31}) \)

To complete the multiplication, we need to calculate these other seven values \( c_{12}, c_{13}, c_{14}, c_{21}, c_{22}, c_{23}, c_{24}\). Here is another example for the element \(c_{23}\):

\( \left\lbrack \matrix{ a_{11} & a_{12} & a_{13} \cr \color{#B11D0A}{a_{21}} & \color{#B11D0A}{a_{22}} & \color{#B11D0A}{a_{23}} } \right\rbrack \times \left\lbrack \matrix{b_{11} & b_{12} & \color{#B11D0A}{b_{13}} & b_{14} \cr b_{21} & b_{22} & \color{#B11D0A}{b_{23}} & b_{24} \cr b_{31} & b_{32} & \color{#B11D0A}{b_{33}} & b_{34} } \right\rbrack = \left\lbrack \matrix{c_{11} & c_{12} & c_{13} & c_{14}\cr c_{21} & c_{22} & \color{#B11D0A}{c_{23}} & c_{24} } \right\rbrack \)

\( c_{23}= (a_{21}\times b_{13}) + (a_{22}\times b_{23}) + (a_{23}\times b_{33}) \)

Notice how the size of the output matrix changes. Based on this and the size of the input matrices you can end up with some interesting results:

\( \left\lbrack \matrix{a_{11} \cr a_{21} \cr a_{31} } \right\rbrack \times \left\lbrack \matrix{ b_{11} & b_{12} & b_{13} } \right\rbrack = \left\lbrack \matrix{c_{11} & c_{12} & c_{13} \cr c_{21} & c_{22} & c_{23} \cr c_{31} & c_{32} & c_{33} } \right\rbrack \)

\( \left\lbrack \matrix{ a_{11} & a_{12} & a_{13} } \right\rbrack \times \left\lbrack \matrix{b_{11} \cr b_{21} \cr b_{31} } \right\rbrack = \left\lbrack \matrix{c_{11} } \right\rbrack \)

Tip:

To help you keep track of which row to use from the first matrix and which column from the second matrix, create your result matrix of the proper size, then methodically calculate the value for each individual element.

The algebraic properties for the matrix multiplication operator do not match those of the scalar multiplication operator. These are the most notable:

Not Commutative:

The order of the factor matrices definitely matters.

\( AB \ne BA \) in general

I think it is very important to illustrate this fact. Here is a simple \(2 \times 2 \) multiplication performed two times with the order of the input matrices switched. I have highlighted the only two terms that the two resulting answers have in common:

\( \left\lbrack \matrix{a & b \cr c & d } \right\rbrack \times \left\lbrack \matrix{w & x\cr y & z } \right\rbrack = \left\lbrack \matrix{(\color{red}{aw}+by) & (ax+bz)\cr (cw+dy) & (cx+\color{red}{dz}) } \right\rbrack \)

\( \left\lbrack \matrix{w & x\cr y & z } \right\rbrack \times \left\lbrack \matrix{a & b \cr c & d } \right\rbrack = \left\lbrack \matrix{(\color{red}{aw}+cx) & (bw+dx)\cr (ay+cz) & (by+\color{red}{dz}) } \right\rbrack \)

Product of Zero:

\( AB=0 \) does not necessarily imply \( A=0 \) or \( B=0 \)

Scalar Multiplication is Commutative:

\( (kA)B = k(AB) = A(kB) \)

Associative Property:

Multiplication is associative, however, take note that the relative order for all of the matrices remains the same.

\( (AB)C=A(BC) \)

Transpose of a Product:

The transpose of a product is equivalent to the product of transposed factors multiplied in reverse order

\( (AB)^T=B^TA^T \)

Code

We are going to use a two-step solution to create a general purpose matrix multiplication solution. The first step is to create a function that properly calculates a single element in the output matrix:

C++

double Matrix::multiply_element(
  const Matrix& rhs,
  const Matrix& rhs,
  const size_t i,
  const size_t j
)
{
  double product = 0;
 
  // Multiply across the specified row, i, for the left matrix
  // and the specified column, j, for the right matrix.
  // Accumulate the total of the products
  // to return as the calculated result.
  for (size_t col_index = 1; col_index <= lhs.columns(); ++col_index)
  {
    for (size_t row_index = 1; row_index <= rhs.rows(); ++row_index)
    {
      product += lhs.element(i, col_index)
               * rhs.element(row_index, j);
    }
  }
 
  return product;
}

Now create the outer function that performs the multiplication to populate each field of the output matrix:

C++

// Because we may end up creating a matrix with
// an entirely new size, it does not make sense
// to have a *= operator for this general-purpose solution.
Matrix& Matrix::operator*( const Matrix& lhs,
                           const Matrix& rhs)
{
  if (lhs.columns() == rhs.rows())
  {
    // Resize the result matrix to the proper size.
    this->resize(lhs.row(), rhs.columns());
 
    // Calculate the value for each element
    // in the result matrix.
    for (size_t i = 1; i <= this->rows(); ++i)
    {
      for (size_t j = 1; j <= this->columns(); ++j)
      {
        element(i,j) = multiply_element(lhs, rhs, i, j);
      }
    }
  }
 
  return *this;
}

Summary

3D computer graphics relies heavily on the concepts found in the branch of math called, Linear Algebra. I have introduced two basic constructs from Linear Algebra that we will need to move forward and perform the fundamental calculations for rendering a three-dimensional display. At this point I have only scratched the surface as to what is possible, and I have only demonstrated how. I will provide context and demonstrate the what and why, to a degree, on the path helping you begin to work with three-dimensional graphics libraries, even if math is not one of your strongest skills.

References

Kreyszig, Erwin; "Chapter 7" from Advanced Engineering Mathematics, 7th ed., New York: John Wiley & Sons, 1993

Learn Hexadecimal and Binary

general, beginner 1 feedback »

The binary, octal and hexadecimal number systems pervade all of computing. Every command is reduced to a sequence of strings of 1s and 0s for the computer to interpret. These commands seem like noise, garbage, especially with the sheer length of the information. Becoming familiar with binary and other number systems can make it much simpler to interpret the data.

Once you become familiar with the relationships between the number systems, you can manipulate the data in more convenient forms. Your ability to reason, solve and implement solid programs will grow. Patterns will begin to emerge when you view the raw data in hexadecimal. Some of the most efficient algorithms are based on the powers of two. So do yourself a favor and become more familiar with hexadecimal and binary.

Number Base Conversion and Place Value

I am going to skip this since you can refer to my previous entry for a detailed review of number system conversions[^].

Continuous Data

Binary and hexadecimal are more natural number systems for use with computers because they both have a base of 2 raised to some exponent; binary is 21 and hexadecimal is 24. We can easily convert from binary to decimal. However, decimal is not always the most useful form. Especially when we consider that we don't always have a nice organized view of our data. Learning to effectively navigate between the formats, especially in your head, increases your ability to understand programs loaded into memory, as well as streams of raw data that may be found in a debugger or analyzers like Wireshark.

The basic unit of data is a byte. While it is true that a byte can be broken down into bits, we tend to work with these bundled collections and process bytes. Modern computer architectures process multiple bytes, specifically 8 for 64-bit computers. And then there's graphics cards, which are using 128, 192 and even 256-bit width memory accesses. While these large data widths could represent extremely large numbers in decimal, the values tend to have encodings that only use a fraction of the space.

Recognize Your Surroundings

What is the largest binary value that can fit in an 8-bit field?

It will be a value that has eight, ones: 1111 1111. Placing a space after every four bits helps with readability.

What is the largest hexadecimal value that can fit in an 8-bit field?

We can take advantage of the power of 2 relationship between binary and hexadecimal. Each hexadecimal digit requires four binary bits. It would be very beneficial to you to commit the following table to memory:

Dec 0123 4567 891011 12131415
Bin 0000000100100011 0100010101100111 1000100110101011 1100110111101111
Hex 0x00x10x20x3 0x40x50x60x7 0x80x90xA0xB 0xC0xD0xE0xF

Now we can simply take each grouping of four bits, 1111 1111, and convert them into hex-digits, FF.

What is the largest decimal value that can fit in an 8-bit field? This isn't as simple, we must convert the binary value into decimal. Using the number base conversion algorithm, we know that the eighth bit is equal to 28, or 256. Since zero must be represented the largest value is 28 - 1, or 255.

Navigating the Sea of Data

Binary and hexadecimal (ok, and octal) are all number systems whose base is a power of two. Considering that computers work with binary, the representation of binary and other number systems that fit nicely into the logical data blocks, bytes, become much more meaningful. Through practice and the relatively small scale of the values that can be represented with 8-bits, converting a byte a to decimal value feels natural to me. However, when the data is represented with 2, 4 and 8 bytes, the values can grow quickly, and decimal form quickly loses its meaning and my intuition becomes useless as to what value I am dealing with.

For example:

What is the largest binary value that can fit in an 16-bit field?

It will be a value that has sixteen, ones: 1111 1111 1111 1111.

What is the largest hexadecimal value that can fit in an 16-bit field? Again, let's convert each of the blocks of four-bits into a hex-digit, FFFF.

What is the largest decimal value that can fit in an 16-bit field?

Let's see, is it 216 - 1, so that makes it 65355, 65535, or 65555, it's somewhere around there.

Here's a real-world example that I believe many people are familiar with is the RGB color encoding for pixels. You could add a fourth channel to represent an alpha channel an encode RGBA. If we use one-byte per channel, we can encode all four channels in a single 32-bit word, which can be processed very efficiently.

Imagine we are looking at pixel values in memory and the bytes appear in this format: RR GG BB. It takes two hexadecimal digits to represent a single byte. Therefore, the value of pure green could be represented as 00 FF 00. To view this same value as a 24-bit decimal, is much less helpful, 65,280‭.

If we were to change the value to this, 8,388,608‬, what has changed? We can tell the value has changed by roughly 8.3M. Since a 16-bit value can hold ~65K, we know that the third byte has been modified, and we can guess that it has been increased to 120 or more (8.3M / 65K). But what is held in the lower two bytes now? Our ability to deduce information is not much greater than an estimate. The value in hexadecimal is 80 00 00.

The difference between 8,388,608‬ and 8,388,607‬ are enormous with respect to data encoded at the byte-level:

8,388,608‬‬ 8,388,607‬‬‬
00 80 00 00 7F FF
 

Now consider that we are dealing with 24-bit values in a stream of pixels. For every 12-bytes, we will have encoded 4 pixels. Here is a representation of what we would see in the data as most computer views of memory are grouped into 32-bit groupings:

‭4,260,948,991‬‬   ‭2,568,312,378‬‬‬   ‭3,954,253,066
FD F8 EB FF   99 15 56 3A   8C B1 1D 0A
     

Binary

I typically try to use binary only up to 8-bits. Anything larger than that, I simply skip the first 8-bits (1-byte), and focus on the next 8-bits. For example: 1111 1111 1111 1111. As I demonstrated with the RGB color encoding, values do not always represent a single number. In fact, it is a stream of data. So whether there is one byte, four bytes, or a gigabytes worth of data, we usually process it either one byte or one word at a time. We actually break down decimal numbers into groups by using a comma (or some other regional punctuation) to separate thousands, e.g. ‭4,294,967,295, 294 of what? Millions.

Binary manipulation can be found in many contexts. One of the most common is storing a collection of flags in an unsigned buffer. These flags can be flipped on and off with the Boolean flag operations of your programming language. Using a mask with multiple bits allows an enumerated value with more than two options to be encoded within the binary buffer. I'm not going to go to describe the mechanics here, I simply want to demonstrate that data is encoded in many ways, and there are many reasons for you to become proficient at picking apart byte-streams down to the bit.

Hexadecimal

It is just as beneficial to be able to convert between hexadecimal and decimal in your head from 1 to 255, especially if you ever deal with selecting colors for web-pages or you edit images in programs like Photoshop. It only takes two hex-digits to represent an 8-bit byte. If you memorize the values that are defined by the high-order hex-digit, reading hexadecimal byte values becomes almost as easy as reading decimal values. There are two tables listed below. The table on the left indicates the value mapping for the high-order and low-order hex-digits of a byte. The table on the right contains a set of landmark values that you will most certainly encounter and find useful:

161Decimal  160Decimal
0x0000x00
0x10160x11
0x20320x22
0x30480x33
0x40640x44
0x50800x55
0x60960x66
0x701120x77
0x801280x88
0x901440x99
0xA01600xA10
0xB01760xB11
0xC01920xC12
0xD02080xD13
0xE02240xE14
0xF02400xF15
Decimal  Hexadecimal
320x20
640x40
1000x64
1270x7F
1280x80
1680xA8
1920xC0
2000xC8
2240xE0
2390xEF
2500xFA
 
 
 
 
 

Some of the numbers listed in the table on the right are more obvious than others for why I chose to included them in the map. For example, 100, that's a nice round number that we commonly deal with daily. When you run into 0x64, now you can automatically map that in your mind to 100 and have a reference-point for its value. Alternatively, if you have a value such as 0x6C, you could start by calculating the difference: 0x6C - 0x64 = 8; add that to 100 to know the decimal value is 108.

Some of you will recognize the values 127, 168, 192, 224 and 238. For those that do not see the relevance of these values, they are common octets found in landmark network IP addresses. The table below shows the name and common dotted-decimal for as well as the hexadecimal form of the address in both big-endian and little-endian format:

Name   IPv4 Address   
Hexadecimal
 big-endianlittle-endian
local-host127.0.0.10x7F0000010x0100007F
private range192.168.x.x0xC0A800000x0000A8C0
multicast base address224.0.0.00xE00000000x000000E0
multicast last address239.255.255.2550xEFFFFFFF0xFFFFFFEF

One additional fact related to the IPv4 multicast address range, is the official definition declares any IPv4 address with the leading four bits set to 1110 to be a multicast address. 0xE is the hex-digit that maps to binary 1110. This explains why the full multicast range of addresses is from 0xE0000000 to 0xEFFFFFFF, or written in decimal dotted notation as 224.0.0.0 to 239.255.255.255.

Octal

I'm sure octal has uses, but I have never ran into a situation that I have used it.

Actually, there is one place, which is to demonstrate this equality:

Oct 31 = Dec 25

Landmark Values

Binary has landmark values similar to the way decimal does. For example, 1's up to 10's, 10's through 100, then 1000, 1M, 1B ... These values are focused on powers of 10, and arranged in groups that provide a significant range to be meaningful when you round-off or estimate. Becoming proficient with the scale of each binary bit up to 210, which is equal to 1024, or 1K. At this point, we can use the powers of two in multiple contexts; 1) Simple binary counting, 2) measuring data byte lengths.

I have constructed a table below that shows landmark values with the power of 2. On the left I have indicated the name of the unit if we are discussing bits and the size of a single value; for example, 8-bits is a byte. I haven't had the need to explore data values larger than 64-bits yet, but it's possible that some of you have. To the right I have indicated the units of measure used in computers when we discuss sizes. At 1024 is a kilobyte, and 1024 kilobytes is officially a megabyte (not if you're a hard drive manufacturer...). I continued the table up through 280, which is known as a "yotta-byte." Becoming familiar with the landmarks up to 32-bits is probably enough for most people. To the far right I converted the major landmark values into decimal to give you a sense of size and scale for these numbers.

Unit
(bits)
Binary
Exponent
Unit
(bytes)
  Place
Value
  Largest
Value
Bit (b)20Byte (B) 1
21Word 2
22Double-word 4
Nibble23Quad-word 8
24 16
25 32
26 64
Byte (B)27 128255
28 256
29 512
210Kilobyte (KB) 1024
220Megabyte (MB) 102421,048,576
230Gigabyte (GB) 102431,073,741,824
Double-word231 10243·24,294,967,295
232 10243·22
240Terabyte (TB) 10244
250Petabyte (PB) 10245
260Exabyte (EB) 10246
Quad-word263 10246·239,223,372,036,854,775,807
270Zettabyte (ZB) 10247
280Yottabyte (YB)10248

Summary

Looking at a large mass of symbols, such as a memory dump from a computer, can appear to be overwhelming. We do have tools that we use to develop and debug software help us organize and make sense of this data. However, these tools cannot always display this data in formats that are helpful for particular situations. In these cases, understanding the relationships between the numbers and their different representations can be extremely helpful. This is especially true when the data is encoded at irregular offsets. I presented a few common forms that you are likely to encounter when looking at the values stored in computer memory. Hopefully you can use these identity and navigation tips to improve your development and debugging abilities.

Number Base Conversion

general, beginner, math 5 feedbacks »

I have started writing an entry that discusses the value of becoming familiar with the binary (base-2) and hexadecimal (base-16) number systems because they are generally more useful to a programmer than decimal (base-10). My daughter is currently in high-school and she is taking a programming course. One of the things that she is currently learning how to do is count in binary. So I decided to expand my explanation of conversion between number systems as a reference for her and all of those who would like a refresher. The entry following this one will describe how binary and hexadecimal will make you a more effective programmer.

Place Value: coefficient·radixplace

To be able to convert numbers between different numerical bases, it is important to review the concept of place value. Each place, or column, in a number represents a value equal to number system base raised to the power of its place index starting at 0. The official name of the base-value is the radix. For example, consider the first three place values for a number system that has a radix of b.

b2+b1+b0

If we are dealing with decimal (base-10), the radix = 10 and we would have place values of:

102 + 101 + 100

Now we have the 1's column, 10's column, and 100's column:

100 + 10 + 1

The number stored at each column in the number is called the coefficient. To construct the final number, we multiply the coefficient by its place-value and add the results at each place together. Here is the decimal number 237 broken down by place value:

2·b2 +3·b1 +7·b0

2·102 + 3·101 + 7·10

200 + 30 + 7

237

Hopefully decimal form is so natural to you that 237 seems like a single number, rather than the sum of place values that are multiplied by their coefficient.

The Binary Number System

There are 10 types of people, those that understand binary and those that don't

If you are wondering what the eight other types of people are, continue reading.

Binary is a base-2 number system. This means that each column in a binary number represents a value of 2 raised to the power of its place index. A number system requires a number of symbols to represent each place value that is equal to its Base value, and zero is always the first symbol to include in this set. For instance, decimal (base ten) requires ten symbols: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Therefore, binary (base two) only requires two symbols: {0, 1}.

Adding the radix as a subscript after a number is a common notation to indicate the base-value of a number when there is a possibility for confusion. Unless the context specifically indicates a different number system, decimal form is assumed. For example, the value 2 in binary:

102 = 210

If we had used that notation in the joke at the beginning... well, it just wouldn't have been as funny.

Another place that you will see the subscript notation is when you study logarithms:

logax = logbx / logba

I'll leave the details for logarithms for another time.

Counting in Binary

When learning anything new, it can be helpful to map something that you already know to the new topic. For a new number system, counting with both number systems can be a helpful exercise. Counting in all number systems uses the same process:

  1. Start with zero in the least significant column
  2. Count up until you have used all of the symbols in increasing order in the least significant, 1's, column
  3. When the limit is reached, increment the value in the next column, and reset the current column to zero.
  4. If the next column has used all of the symbols, increment the column after that and reset the current column.
  5. Once no further columns reach their limit, return to step 2 to continue counting.

Starting with decimal, if we count up from zero to 9, we get:

0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 ...roll over

We are now at step 3, we have reached the limit, so we increment the next column from an implied 0 to 1, and reset the current column for the result of:

10

Continuing to count, we increment the 1's column and rolling over the successive columns as necessary:

11 - 12 - 13 ... 98 - 99 ... roll over

100

Here is 0-15 in binary. When working with computers, and binary in general, you will typically see zeroes explicitly written for the more significant columns. We require 4 binary digits to represent 15.

Binary  Sum of Columns  Decimal
00000 + 0 + 0 + 00
00010 + 0 + 0 + 11
00100 + 0 + 2 + 02
00110 + 0 + 2 + 13
01000 + 4 + 0 + 04
01010 + 4 + 0 + 15
01100 + 4 + 2 + 06
01110 + 4 + 2 + 17
10008 + 0 + 0 + 08
10018 + 0 + 0 + 19
10108 + 0 + 2 + 010
10118 + 0 + 2 + 111
11008 + 4 + 0 + 012
11018 + 4 + 0 + 113
11108 + 4 + 2 + 014
11118 + 4 + 2 + 115

The Hexadecimal Number System

Hexadecimal is a base-16 number system. Therefore, we will need sixteen symbols to represent the place values. We can start with the ten numbers used in decimal, and we use letters of the alphabet to represent the remaining six symbols. Although letter-case can matter in programming, the letters used in hexadecimal are case-insensitive. Here is a mapping of the hexadecimal values to decimal:

Decimal:  { 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 }
Hexadecimal:  { 0 1 2 3 4 5 6 7 8 9 A B C D E F }

Number Base Conversion

Let's discuss how to convert between number systems with different base values. Specifically, we will describe how to convert from "Decimal to Basex" and "Basex to Decimal".

Decimal to Basex

Here is the algorithm to convert a number in decimal to a different base:

  1. Divide the decimal number by the radix of the target base
  2. The remainder from step 1 becomes the value for the current column.
  3. Use the quotient (answer) from step 1 as the decimal value to calculate the next column.
  4. Return to step 1 and repeat until the quotient is zero.

Let's return to the number 237 and convert it to a binary number:

decimal:237
radix:2
DecimalRadix  Quotient  Remainder
237/2  118  1 (20)
118/2  59  0 (21)
59/2  29  1 (22)
29/2  14  1 (23)
14/2  7  0 (24)
7/2  3  1 (25)
3/2  1  1 (26)
1/2  0  1 (27)
binary:11101101

Here is 237 converted to a hexadecimal number:

decimal:237
radix:16
DecimalRadix  Quotient  Remainder
237/16  14  D16 (13) (160)
14/16  0  E16 (14) (161)
hexadecimal:ED16

A common notation to represent hexadecimal when dealing with computers and in programming languages themselves, is to prepend an 'x' in front of the number like so: xED.

Here is one more decimal-to-hexadecimal example:

decimal:3,134,243,038
radix:16
DecimalRadix  Quotient  Remainder
3,134,243,038/16  195,890,189  E (14) (160)
195,890,189/16  12,243,136  D (13) (161)
12,243,136/16  765,196  0 ( 0) (162)
765,196/16  47,824  C (12) (163)
47,824/16  2,989  0 ( 0) (164)
2,989/16  186  D (13) (165)
186/16  11  A (10) (166)
11/16  0  B (11) (167)
hexadecimal:xBAD0C0DE

Basex to Decimal

Actually, I have already demonstrated how to convert a number from a base different than ten, into decimal. Once again, here is the complete formula, where cx represents the coefficients at each place-column.

cn·bn + ... + c2·b2 + c1·b1 + c0·b0

As an example, let's convert the binary answer back into decimal:

1·27 + 1·26 + 1·25 + 0·24 + 1·23 + 1·22 + 0·21 + 1·20

1·128 + 1·64 + 1·32 + 0·16 + 1·8 + 1·4 + 0·2 + 1·1

128 + 64 + 32 + 8 + 4 + 1

237

Basex to Basey

Is it possible to convert a number from Basex directly to Basey without converting to decimal (base-10) first?

Yes, however, you will need to perform all of your math operations in either Basex or Basey. The algorithms that I have presented are performed with base-10 since that is the number system most people are familiar with.

Demonstration

Here is a short demonstration program to convert a decimal number into a value of any numeral base between 2-36.

Why between those two ranges?

Try to imagine how a base-1 number system would work with only the symbol {0} to work with. Alternatively, we can combine the numerical set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} with the alphabetic set: {A, B, C ... X, Y, Z} to create a set of 36 symbols.

C++

// The set of possible symbols for representing other bases.
const char symbols[] = {'0', '1', '2', '3', '4', '5',
                        '6', '7', '8', '9', 'A', 'B',
                        'C', 'D', 'E', 'F', 'G', 'H',
                        'I', 'J', 'K', 'L', 'M', 'N',
                        'O', 'P', 'Q', 'R', 'S', 'T',
                        'U', 'V', 'W', 'X', 'Y', 'Z'};

Here is the base-conversion algorithm from above written in C++.

C++

void ConvertToBase( const unsigned long decimal,
                    const unsigned long radix)
{
  unsigned long remainder[32] = {0};
  unsigned long quotient = decimal;
  unsigned char place    = 0;
 
  while (0 != quotient)
  {
    unsigned long value = quotient;
    remainder[place] = value % radix;
    quotient = (value - remainder[place]) / radix;
 
    place++;
  }
 
  cout << decimal << " in base " << radix << " is ";
 
  for (unsigned char index = 1; index <= place; index++)
  {
    cout << symbols[remainder[place - index]];
  }
}

The values are from the examples above. You can modify the values that are used with the function calls to ConvertToBase in the program below:

C++

int main(int argc, char* argv[])
{
  ConvertToBase(237,        2);
  ConvertToBase(237,        10);
  ConvertToBase(237,        16);
  ConvertToBase(3134243038, 16);
  ConvertToBase(3134243038, 36);
 
  return 0;
}
Run this code

Output:

237 in base 2 is 11101101
237 in base 10 is 237
237 in base 16 is ED
3134243038 in base 16 is BAD0C0DE
3134243038 in base 36 is 1FU1PEM

Summary

Ever since you learned to count in decimal you have been using exponents with a base-10; it's just that no one ever made a big deal of this fact. To use a different numeral system, such as binary (base-2) or hexadecimal (base-16), you simply need to determine a set of symbols to represent the values and you can use the same counting rules that you use in base-10, except that you have a different number of symbols. Converting between any number system is possible, however, it is simplest to convert to decimal first if you want to continue to use base-10 arithmetic operations.

Rvalue References Applied

C++, Alchemy, engineering Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs automated data serialization with compile-time reflection. It is written in C++ using template meta-programming.

My previous entry was a condensed overview on rvalue references. I described the differences between value expressions and types. I also summarized as much wisdom as I could collect regarding how to effectively use move semantics and perfect-forwarding. After I completed the essay, I was eager to integrate move semantics for my serialization objects in Alchemy. This entry is a journal of my experience optimizing my library with rvalue references.

Motivation

My initial motivation for learning and writing about rvalue references was because I was finding it difficult to improve the performance of Alchemy with them. In fact, I tended to make the performance of my library much worse that without move semantics.

The first thing this says to me, is that the compilers are doing a pretty damn good job of optimizing my code without the move constructors. The next conclusion was shrouded in mystery.

"Am I using this correctly?"
"... I followed the advice in Effective Modern C++, why is it getting worse?"
"... Maybe I have done so well that all of the copied have been elided by the compiler?!"

After a few fruitless attempts to improve the performance in the evenings, I decided I better dig in and truly understand what is happening before I revisit Alchemy and move semantics. I learned a lot simply trying to write a few compelling programs to demonstrate the concepts.

Now it was time to return back to integrate move semantics into my library.

Measuring Success

I have developed a benchmark application that is included as part of the Alchemy source on GitHub. As part of my initial tuning of Alchemy I was able to achieve a modicum of success in Classic C++ (C++98). If you are unfamiliar with the details of the benchmark application, you can learn more from this previous post: Alchemy: Benchmarks[^].

Here is a summary of the machine and compiler settings displayed in this post:

  • Machine:
    • Intel Core i7-4790L @ 4.00GHz
    • 16 GB RAM
    • Windows 8.1
  • Compiler - 64-bit Visual Studio 2013:
    • Maximize Speed: /O2
    • Enable Intrinsic (Yes): /Oi
    • Favor Speed: /Ot
    • Omit Frame Pointers (Yes): /Oy
    • Whole Program Optimization (Yes): /GL
  • Data set size: 512 MB

The numbers below, reflect what I started with when began to add the move operations to Hg.

Test Control Hg Diff Percent
Basic: 0.4133 s 0.3193 s -0.0940 s 22.74%
Packed: 0.3959 s 0.3519 s -0.0440 s 11.12%
Unaligned: 0.4391 s 0.4425 s 0.0034 s -0.773%
Complex: 0.7485 s 0.7654 s 0.0169 s -2.261%
Array: 0.5141 s 0.1409 s -0.3732 s 72.59%
Total: 2.511 s 2.0574 s -0.3732 s 18.06%

The thing that was most challenging to improve up to this point was the speed of the Complex test. This test uses a nested message structure, that contains an array and all of the message structures from the other tests. I found many temporary copies, which I eliminated to reach this point. However, the relatively low performance number of this test compared to the others indicated that I most likely had other temporary copies the were lurking within the code.

What did I discover?

The very first thing that I did when I returned to Alchemy, was search every class for a destructor, copy constructor, or assignment operator. The new rules the compiler uses to automatically generate special member-functions will halt the use of move semantics dead in its tracks if you are not paying attention.

There are very few destructors in Alchemy. The majority of the data is transient, and actually managed by the caller. There is the Hg::MsgBuffer, which uses a std::vector to store serialized message data. This class has a destructor. All of the others manage a single field of data that is composited within the class.

On the other hand, Alchemy has plenty of copy constructors and assignment operators. Alchemy provides value semantics for all of the message sub-fields. These fields behave much like the fundamental types that they encapsulate. To provide a natural syntax, there are generally two types of assignment operators in each Datum and Proxy class. The first is the assignment operator for the object, and the second accepts the value_type the object represents.

I discovered places where I missed supplying a move constructor for all of the value types that had some form of copy operation. I also found a few places where I supplied copy and move constructors for sub-types of a class, but not for the class itself.

C++

//  ******************************************************
/// Copy Constructor
Message(const message_type&amp; rhs)
{
  *static_cast< message_type *>(this) = rhs;
}
 
//  ******************************************************
/// Move Constructor
Message(message_type&amp;&amp; rhs)
  : base_type(std::move(rhs))
{ }
 
//  ******************************************************
//  Discovered that I was missing these versions:
//  Message(const Message&amp; rhs)
//  Message(Message&amp;&amp; rhs)
//

Results

These changes had very little effect on the results. Setting break points within my code showed that the move constructors were now being called. So why didn't the results change? I kept searching.

FieldTypes

I started to scrutinize the inheritance hierarchy of the Hg::Message class and the Hg::Datum types that it contained. I verified that I was calling the proper base class operations and moving the rvalue expressions into these function calls.

Then I reached the end of the inheritance hierarchy, which existed a very simple class, FieldTypes. FieldTypes can be found in ./Alchemy/Hg/Datum/basic_datum.h.

This class provides a generic structure to provide a common way to hold and access the actual data storage created for each message datum. The TraitT type allows for tag-dispatching for special cases such as a std::vector or a nested message type.

C++

template< typename FieldT,
          typename TraitT =
            typename Hg::detail::deduce_type_trait<FieldT>::type
        >
struct FieldTypes
{
  using index_type = FieldT;
  using value_type = typename field_data_t<index_type>::value_type;
 
  value_type& reference()
  {
    return m_data;
  }
 
  const value_type& data() const
  {
    return m_data;
  }
 
protected:
  value_type            m_data;        
};

I took a look at the specialization that I had created for the nested message type. Here is the declaration of that class:

C++

template< typename FieldT >
struct FieldTypes <FieldT, nested_trait>
  : public field_data_t<FieldT>::value_type
{
  using index_type = FieldT;
  using value_type = typename field_data_t<index_type>::value_type;
 
// ...
};

There is one interesting line that caught my attention, and helped me narrow down the cause of the issue:

C++

: public field_data_t<FieldT>::value_type

The reason this is interesting, is because I was searching for how the storage was represented and accessed in a nested type. In this case, rather than containing a member data field, the data is provided by derivation. For a nested type, the base class is the Alchemy format definition. This class then contains a ProxyDatum, which derives from a Datum, which derives from a FieldType and brings us back to the level we are currently inspecting.

It's not the nested type after all...

After looking at this, it occurred to me that the default generated move operations were clearly being used by the compiler. I have not added any code that would prevent this in the message definitions and nested fields. However, that did not prevent entire sets of fields from being moved and copied.

I had created plenty of value constructors and assignment operators to accept value types, but I had not made any attempt to optimize the movement of these fields within the most basic structures that managed these data fields. So I added copy and move constructor implementations to the FieldTypes class to allow every possible conversion to be considered when moving the fundamental, and container data types.

This brought me my first big gain in performance.

... or maybe it is the nested type?

Unfortunately, there still seemed to be a major problem with the nested types.

I followed the execution path of the benchmark program in the debugger. I was looking for any clues where a copy was being performed rather than a move. Then I located the next big gain in performance.

In the byte-order processing code, I discovered a difference in processing for the nested field-types compared to all of the other types of data. The implementation constructs a Hg::Message object from that raw Alchemy message format that is used to represent the nested field type. This is repackaged in order to recursively use the same processing logic that converts the other field-types performed by convert_byte_order.

C++

struct ConvertEndianess<T, StorageT, nested_trait>
{
  template <typename NestedValueT>
  void operator()(const NestedValueT &input,
                        NestedValueT &output)
  {
    // Construct a shallow message wrapper around the nested data.
    from_type  from(input);
    to_type    to;
 
    // Pass this message to be byte-order swapped.
    output = convert_byte_order<from_type,
                                to_order>(from, to).values();
  }
};

You can see that I pass the output field as an input parameter to this function call. In order to improve the speed of this function I needed to figure out how to more efficiently construct this temporary message, or alter the processing logic to accept this nested field-type.

Moving local resources... don't do it

This was my first attempt to improve the performance of this temporary instance of the message looked like this:

C++

template <typename NestedValueT>
void operator()(const NestedValueT &input,
                      NestedValueT &output)
{
  // Pass this message to be byte-order swapped.
  output = convert_byte_order<from_type,
                              to_order>(from_type(input), to_type()).values();
}

That helped, but not like I had hoped. Then I made this adjustment:

C++

output = std::move(convert_byte_order(/*params*/));

That was what I needed!

Then I ran the unit-tests, and a number of them failed. The ones that contained a vector actually crashed. The output of byte-order conversion was showing me that the data was being destroyed. I followed the logic in the debugger and discovered my problem.

I am creating the instance of to_type locally. This is passed into the conversion function and all is well. Then the data is moved into output and what appears is garbage. I was confused at first. Then I watched the addresses of all of the data items.

to_type is being created on the stack, and that portion of the stack is destroyed before it has a chance to be moved to output. I tried many different variations of the this approach. My final conclusion is that I would not be able to achieve what I was after without moving the creation of the to_type object outside of the function call.

However, I could not do that because that would change the usage of the library. I want to keep the interaction as simple as possible for the user. Therefore, I reworked the structure of the code just a little and this is the final implementation:

C++

template <typename NestedValueT>
void operator()(const NestedValueT &input,
                      NestedValueT &output)
{
    // Pass this message to be byte-order swapped.
    to_type to(output);
    convert_byte_order<from_type, to_order>(from_type(input),
                                            to);
    output = std::move(to);
}

convert_byte_order no longer returns a value. That is because the value it would return is the to_type object that is created, and we still have access to that instance because it is passed-by-reference. Therefore, when I am done with it, it is moved into the output parameter, which also is passed-by-reference into the current function.

Final Results

Here are where the current performance results stand based on the adjustments that I made to optimize with move semantics:

Test Control Hg Diff Percent
Basic: 0.4133 s 0.2423 s -0.1710 s 41.37%
Packed: 0.3959 s 0.3403 s -0.0556 s 14.05%
Unaligned: 0.4391 s 0.2295 s -0.2096 s 47.74%
Complex: 0.7485 s 0.5573 s -0.1912 s 25.54%
Array: 0.5141 s 0.1376 s -0.3765 s 73.23%
Total: 2.5109 s 1.5071 s -1.0039 s 39.98%

Here is a table that compares the final times for each test with the previous implementation of Alchemy, as well as the results after adding move semantics. I think you will agree that the small amount of effort required to add move semantics is well worth spending.

Test Alchemy w/ Move
Semantics
Diff Overall
Change
Basic: 0.3193 s 0.2423 s -0.0770 s 24%
Packed: 0.3519 s 0.3403 s -0.0116 s 3%
Unaligned: 0.4425 s 0.2295 s -0.2130 s 48%
Complex: 0.7654 s 0.5573 s -0.2081 s 27%
Array: 0.1409 s 0.1376 s -0.0033 s 2%
Total: 2.0574 s 1.5071 s -0.5503 s 27%

Summary

So far I have learned that move semantics is a somewhat finicky feature that is new with Modern C++. However, the rules are not difficult to learn. After an afternoon's worth of effort applying the principles of rvalue references to Alchemy I was able to realize a 25% increase in performance. This was time well spent.

C++: Rvalue References

CodeProject, C++ 2 feedbacks »

Rvalue references were introduced with C++11, and they are used to implement move semantics and perfect-forwarding. Both of these techniques are ways to eliminate copies of data parameters for efficiency. There is much confusion around this new feature that uses the && operator, because its meaning is often based on the context it is used. It is important to understand the subtleties around rvalue references in order for them to be effective. This entry will teach you how to use the rvalue reference with plenty of live-demonstrations.

Move it!

When I first learned of move semantics, I expected that this feature would be more or less automatic, much like the copy constructor. As it turns out, there are common programming practices that will actually hinder the compiler's ability to generate and use move operations. The concept of move semantics and perfect-forwarding are very simple. However, without understanding a few of the nuances of rvalue references, these idioms will seem fickle when you try to put them to use.

It is important to have a basic understanding of the fundamental components of C++ that have shaped how this new feature was added to the language, and why the explicit steps are required. Therefore, let's start with some background information and vocabulary, then work our way to the main topic.

Lvalue and Rvalue

Syntax expressions are evaluated and assigned both a type and a value category. We are concerned with the differences between the different value categories as we try to understand rvalue references. Specifically we are interested in the lvalue and rvalue categories.These terms are derived from the arguments on each side of the assignment operator. 'L' for left, to which values are assigned, and 'R' for right that contains the value to be assigned. However, this is only a simplification of their definition.

Another way to look at these terms is how they manifest in the final program. Lvalues are expressions that identify non-temporary objects. Essentially, they have addressable storage for loading and storing data. An rvalue is an expression that refers to a temporary object, or a value that is not associated with any object.

An lvalue is not necessarily modifiable. A good example is a constant expression qualified with the const keyword. After its initialization, the expression has storage that can be addressed, but the value cannot be modified. Therefore, lvalues are further distinguished by modifiable lvalues and non-modifiable lvalues.

Here is a list of items that are lvalue expressions:

  • Non-modifiable:
    • String literals
    • Constant expressions
  • Modifiable:
    • The name of a variable
    • Function calls that return lvalue references
    • Pre-increment and pre-decrement operators
    • Dereference and assignments
    • Expressions cast to lvalue reference type

Here is a list of items that are rvalue expressions:

  • Literal values: true, 27ul, 3.14 (except string literals)
  • Function call expressions that do not return a reference
  • Expressions composed from arithmetic, relational, logical and bit-wise operators
  • The post-fix increment and decrement operators
  • Cast expression to any type other than a reference type
  • Lambda expressions

Does it have a name?

There is a simple way that can help you determine if you are dealing with an lvalue or an rvalue.

Can you refer to the expression by name?

A value that can be referenced by name is an lvalue. This is not an absolute, but it is a good rule of thumb to help you generally reason about your data values. An example of an exception is a member-function. Also, this does not cover all expressions that are considered lvalues. Examples of lvalue expressions that do not have names are string literals and function call expressions that return an lvalue reference.

xvalues, prvalues, glvalues...

In the cursory overview of expression values, I have left out the description of some of the exceptions to the rules and sub-categories for lvalues and rvalues. These other categories that capture the remaining situations. However, going even deeper into the nuances digresses from the original topic, and will only add more confusion. Therefore I will simply leave you with the knowledge that these other categories exist, and a reference of where you can learn more about them. Value categories at cppreference.com[^]

& (lvalue reference)

An lvalue reference is what we generally call a reference. It is also important to note that it is a type. This is in contrast to value categories, which I described in the previous section. Here is a brief review of the concepts associated with references:

  • A reference is an alias to an object or a function that already exists
  • A reference must be initialized when it is defined
  • It cannot be re-seated (reassigned) after it is created
  • It is not legal to create arrays, pointers or references to references (except with templates)

The most common use for an lvalue reference is to pass parameters by-reference in function calls.

C++

void LogMessage(std::string const &msg)
{
  // msg is an alias for the input parameter at the call site.
  // Therefore, a copy of the string is avoided.
}

I prefer to use references over pointers, except when there is a possibility to receive an empty pointer. The logic becomes much simpler when writing safe production-quality code. The need to verify pointer input parameters is eliminated. In some cases, after I verify a pointer parameter, I will dereference it and assign it to a reference. A similar situation is when I perform some type of cast on a pointer I usually dereference and assign it to a reference of the new type.

C++

// I know what you're thinking...
// I interface with a lot of C and legacy C++
int process_state(
  const SystemInputs *p_inputs,
  void* p_context
)
{
  if ( !p_inputs
    || !p_context)
  {
    return k_error_invalid_parameter;
  }
 
  SystemInputs& input = *p_inputs;
  SystemState&  state = *reinterpret_cast< SystemState* >(p_context);
 
  // ...
}

If a function returns an lvalue reference, then the function call expression will be an lvalue expression. This is use of references is used to implement the at and operator[] member-functions of std:: vector.

C++

// Where reference is an alias for T&
reference operator[]( size_type pos )
{
  // Return the requested element
  // from the heap-allocated data array
  return p_data[pos];
}

Dangling References

Although references do make code easier to work and reason with, they are not perfect. Similar to a pointer, the possibility still exists for the object that was used to initialize a reference is destroyed before the reference is destroyed. This leaves you with a dangling reference, which leaves your code executing in the unspecified behavior territory.

The stack is one of the safest places to create a reference. That is with the assumption that the new reference will go out of scope before or at the same time as the object used to initialize the reference.

This is the reason why you do not return a reference from a function call, in which you return a locally created variable. Either your object was created on the stack and will be destroyed after the return statement is evaluated, or your object was dynamically allocated, which you would have no way to free the memory when you were done.

C++

std::string&amp; FormatError(int errCode)
{
  std::string errorText;
  // Populate the string with the proper error message.
 
  return errorText;
  // errorText is now destroyed.
  // The caller receives a dangling reference.
}

&& (rvalue reference)

Prior to C++11, it was not possible to declare an rvalue as a reference. The only place it was legal to declare a reference was with an lvalue expression. C++11 introduces the && operator, which now allows references to be defined for rvalue expressions. An rvalue reference is a type.

Remember that one type of rvalue is an expression that refers to a temporary object. An rvalue reference is used to extend the lifetime of a temporary object. The most compelling place to apply rvalue references are with object construction and assignment. This allows compilers to replace expensive copy operations with less expensive moves. The formal name given to this feature is move semantics. Another exciting use is applied to template function parameters, in which the technique known as perfect-forwarding is used.

In overload resolution for function calls, the rvalue reference type is given precedence of lvalue reference.

Move Semantics

Allows you to control the semantics of moving your user-defined types. It is actually possible to accomplish this with classic C++. However, you would have to forego the copy constructor. With the rvalue reference, it is now possible to provide both a move constructor and a copy constructor within the same object.

Perfect-Forwarding

Makes it possible to create function templates that are able to pass their arguments to other functions in a way that allows the target function to receive the exact same objects.

[Intermission]

I presented that long and detail-oriented introduction up front so you would have context with most of the details to understand why this movement isn't always automatic. Also, hopefully I have presented the details in a memorable order to help you remember the proper actions required for each situation. We will continue to introduce details gradually, and I will summarize with a set of rules to lead you in a successful direction.

Reference Collapsing

Reference collapsing is part of the type deduction rules used for function templates. The rules are applied based upon the context of the function call. The type of argument passed to the specific instantiation is considered when determining the type for the final function call. This is necessary to protect against unintentional errors from occurring where lvalues and rvalues are concerned.

I mentioned earlier in the section regarding references that it was not legal to create a reference to a reference, with the exception of templates. It's time to demonstrate what I mean:

C++

int   value      = 0;     // OK: Fundamental type
int&  ref        = value; // OK: Reference to type
int& &ref_to_ref = ref;   // Error: Reference to reference not allowed
 
// Now we have rvalue references
int&& rvalue_bad = ref;   // Error: Rvalue reference cannot bind to lvalue
                          // Remember, if it has a name, it is an lvalue
int&& rvalue_ref = 100;   // OK: A literal value is an rvalue

Templates follow a set of type-deduction rules to determine what type should be assigned to each parameterized value of the template. Scott Meyers provides a very thorough description of these rules in Item 1 of "Effective Modern C++". Suffice to say, the important rules to note are:

  • If an argument is a reference, the reference is not considered during type deduction
  • Lvalue arguments are given special consideration in certain circumstances (this is where reference collapsing applies)

The rules of reference collapsing

The rules are actually very simple. The rules have the same output of the AND truth table; where an lvalue reference, &, is 0 and an rvalue reference, &&, is 1. I think it is subtly fitting, given the other meaning of the && operator. This should make it easier to remember the rules as well.

  Reference Collapsing Truth Table
  Truth Table: Reference Collapsing Rules - & := 0, && := 1

New Rules for compiler generated functions

Hopefully you are well aware that the compiler may generate four special member-functions for a class as needed with classic C++. If not, it's never too late to learn. The four functions are:

  • Default Constructor (If no other constructor has been defined)
  • Destructor
  • Copy Constructor
  • (Copy) Assignment Operator

Two additional functions have been added to this set to properly manage the new concept of move operations.

  • Move Constructor
  • Move Assignment Operator

The default behavior of a generated move function is similar to the copy-based counterparts. A move operation for each member of the object. However, the compiler is much more conservative about automatically choosing to generate these new functions when compared to the others. The primary reason is the notion that if the default move behavior is not sufficient that you elect to implement your own, then the default copy behavior most likely will not be sufficient either. Therefore it will not automatically generate the copy-based functions when you implement either of the move functions.

Furthermore, if you implement only one of the move operations, it will not automatically implement the other move operation for the same logic. In fact, no compiler generated move operations will be created if the user-defined type has implemented its own destructor or copy operation. When move operations are not defined, the copy operations will be used instead.

If you are in the habit or even feel the compulsion to alwaysdefine a destructor, even if it is an empty destructor, you may want to try to change that behavior. There is now actually a better alternative. Similar to how you can delete the compiler generated defaults, you can also explicitly specify that you would like to use the defaults. The syntax is the same as delete, except you use the default keyword.

C++

class UserType
{
public:
// This declaration will not preclude
// a user-type from receiving a
// compiler-generated move constructor.
~UserType() = default;
};

Specifying default, will also allow you to continue to use the compilers copy operations even when you implement your own move operations. If you would like to read a full account of the rules and reasoning for changes, refer to Item 17 in "Effective Modern C++".

std::move("it!");

std::move is a new function has been added to the Standard Library and it can be found in the <utility> header. std::move does not add any actual executable code to your program because it is implemented as a single cast operation. Yet this function is very important because it serves two purposes:

  1. Explicitly communicates your intentions to move an object
  2. Provides hint (enables actually) the compiler to apply move semantics

Here is the implementation of std::move:

C++

template< class T >
typename std::remove_reference<T>::type&& move(T&& t)
{
  return static_cast<typename std::remove_reference<T>::type&&>(t);
}

This function is a convenient wrapper around a cast that will unconditionally convert rvalue references to rvalue expressions when passing them to other functions. This makes them capable of participating in move operations. std::move is the explicit nudge that you supply to the compiler when you want to perform a move assignment rather than a copy assignment.

It is necessary to use std::move inside of a move constructor, because all of your values in the rhs object that you will move from are lvalues. As I mentioned, std::move unconditionally converts these lvalues into rvalue references. This is the only way the compiler would be able to differentiate between a move assignment and a copy assignment in this context.

The only operations that are valid to perform on an argument that has been supplied to std::move, is a call to its destructor, or to assign a new value to it. Therefore, it is best to only use std::move on the last use of its input for the current scope.

Extremely Important!

If your class is a derived class, and implements a move operation, it is very important that you use std::move on the parameters that you pass to the base class. Otherwise the copy operations will be called in the base implementation.

Why?

Because the input parameters to your move operation are lvalue expressions. Maybe you are objecting with "no they're not! They are rvalue references!" The parameters are rvalue references, however, your arguments have been given a name for you to refer to. That makes them lvalue expressions, which refer to rvalue references.

The bottom line is that calling a base class implementation requires the same attention that is required for all other move operations in this context. Just because you happen to be in a move operation for your derived class, does not mean that the compiler can tell that it needs to call the same move operation for the base class. In fact, you may not want it to call the move operation. This now allows you to choose which version is called.

Move operation implementations

We now have enough knowledge to be able to constructively apply the principles of move semantics. Let's apply them to implement a move constructor for an object.

The functions below implement the copy operations for a class called ComplexData that is derived from a base class called BasicData.

Derived Move Constructor

C++

ComplexData(ComplexData&& rhs)
  : BasicData(std::move(rhs))
{
  // Move operations on ComplexData data members
  complex_info = std::move(rhs.complex_info);
}

Derived Move Assignment Operator

C++

ComplexData& operator=(ComplexData&& rhs)
{
  BasicData::operator=(std::move(rhs));
 
  // Move operations on ComplexData data members
  complex_info = std::move(rhs.complex_info);
 
  return *this;
}

Observe move operations

Number

The class used in this demonstration is called Number. It implements each of the special member-functions of the class. It also provides a way to set and get the value of the object. This example lets you observe when move operations are performed versus copy operations.

The implementation is very simple only holding a single data member int m_value. I do not use the call to std::move inside of the move operations because it is not necessary. Similarly, if we had allocated pointers and were moving them between two objects, we would copy the pointer to the destination class, and set the source class pointer to nullptr. I will set the number to -1 in this version to differentiate an invalid state from 0.

Number move assignment operator

C++

Number& operator=(Number&& rhs)
{
  cout << "Move  Assignment Operator\n";
  m_value = rhs.m_value;
  rhs.m_value = -1;
  return *this;
}

Move Example

C++

int main(int argc, char* argv[])
{
  std::cout << "Construct three Numbers:\n";
  Number value(100);
  Number copied(value);
  Number moved(std::move(value));
 
  std::cout << "\nvalue:  " << value
            << "\ncopied: " << copied
            << "\nmoved:  " << moved;
  std::cout << "\n\nCopy and move:\n";
  value  = 202;
  moved = std::move(value);
  copied = value;
  std::cout << "\nvalue:  " << value
            << "\nmoved:  " << moved
            << "\ncopied: " << copied;
}
Run this code

Output:

Construct three Numbers:
Value Constructor
Copy  Constructor
Move  Constructor

value:  -1
copied: 100
moved:  100

Copy and move:
Value Assignment Operator
Move  Assignment Operator
Copy  Assignment Operator

value:  -1
moved:  202
copied: -1

std::forward<perfect>("it!");

As the title of this section implies, there is a new function in the Standard Library called std::forward. However, it is important to understand why it exists, because it is designed to be used in a special situation. The situation is when you have a type-deduced function argument that you would like to move, also called forward, as the return value, or to another sub-routine.

In classic C++, the way to move function arguments through function calls is by using call-by-reference. This works, but it is inconvenient because you must make a decision on trade-offs. You either choose to give up flexibility on the type of arguments that can be used to call your function, or you must provide overloads to expand the range of argument types that can be used with your function call.

Can be called by lvalues, however, rvalues are excluded:

C++

template< typename T >
T& process(T& param);
 
int val = 5;
process(val);       // OK: lvalue
process(10);        // Error: Initial value to ref of
                    //        non-const must be lvalue
process(val + val); // Error: Initial value to ref of
                    //        non-const must be lvalue

Now supports rvalues, however, move semantics are no longer possible:

C++

template< typename T >
T& process(T const& param);
 
int val = 5;
process(val);       // OK: lvalue
process(10);        // OK: Temporary object is constructed
process(val + val); // OK: Temporary object is constructed
 
                    // However, none of these instances
                    // can participate in move operations.

Furthermore, if the two solutions above are combined with overloads and the function in question contains multiple arguments the number of overloads required to capture all of the possible states of arguments causes an exponential explosion of overloads which is not scalable. With the addition of the rvalue reference in Modern C++, it is possible to compact this solution back into a single function, and rvalues will remain as viable candidates for move operations.

Where does std::forward apply in this situation?

Since lvalue expressions are already capable of passing through function calls in this situation, we actually want to avoid applying move semantics on these arguments because we could cause unexpected side-effects, such as moving local parameters.

But, in order to make rvalues capable of using the move operations, we need to indicate this to the compiler with something like std::move. std::forward provides a conditional cast to an rvalue reference, only to rvalue types. Once again the rules of reference collapsing are used to build this construct, except in a slightly different way.

Implementation of std::forward:

C++

template< class T >
T&& forward( typename std::remove_reference<T>::type& t )
{
  return static_cast<T&&>(t);
}

What is a forwarding/universal reference?

This is a special instance of an rvalue reference that is an instantiation of a function template parameter. Forwarding reference is the name that I have read in some standards proposal documents, and universal reference is the name that Scott Meyers used first in "Effective Modern C++".

It is important to identify this type when the situation occurs. Because the type deduction rules for this particular reference allows the type to become an lvalue reference, if an lvalue expression was used to initialize the template parameter. Remember the important type-deduction rules I pointed out above? References are not usually considered as part of type-deduction. This is the one exception.

If an rvalue expression is used to initialize the template parameter, then the type becomes an rvalue reference, which will make it qualify for move operations. Therefore, std::forward should be used to inform the compiler that you want a move operation to be performed on this type if an rvalue was used to initialize the parameter.

Observe forward vs. move

This next program allows you to observe the side-effects that could occur by using std::move when std::forward is most likely what was intended. This program is adapted from Item 25 in "Effective Modern C++". I have expanded the program to provide two methods to set the name of the test class, Item.

Item class

The class has a single value called name. name is set equal to "no name" in the default constructor. The name can be set in two different ways:

  1. Item::fwd_name:

    Sets the name value with an input rvalue reference string, which is forwarded to the internal storage of name in the class.

    C++

    template< typename T >
      void Item::fwd_name(T&& n)
      {
        m_name = std::forward<T>(n);
      }
  2. Item::move_name:

    Sets the name value with an input rvalue reference string, which is moved to the internal storage of name in the class.

    C++

    template< typename T >
      void Item::move_name(T&& n)
      {
        m_name = std::move(n);
      }

Notice that both of these functions are function templates. This provides the differentiating factor that makes std::forward necessary, moving a type-deduced argument.

Forward Example

C++

int main(int argc, char* argv[])
{
  std::cout << "Forward 'only' moves rvalues:\n\n";
  string fwd("Forward Text");
 
  Item fwd_item;
  fwd_item.fwd_name(fwd);
  cout << "fwd_name:   " << fwd_item.name() << "\n";
  cout << "fwd(local): " << fwd << "\n";
 
  std::cout &lt;&lt; "\nMove 'always' moves:\n\n";
  string mv("Move Text");
 
  Item move_item;
  move_item.move_name(mv);
  cout << "move_name:  " << move_item.name() << "\n";
  cout << "mv(local):  " << mv << "\n";
 
  return 0;
}
Run this code

Output:

Forward 'only' moves rvalues:

fwd_name:   Forward Text
fwd(local): Forward Text

Move 'always' moves:

move_name:  Move Text
mv(local):  no name

Move semantics and exceptions

A new keyword has been added to Modern C++ regarding exception handling, the keyword is noexcept. noexcept can use this information provided by the programmer to enable certain optimizations for non-throwing functions. One potential optimization is to not generate stack unwinding logic for the functions that specify noexcept.

It is possible to test if a function specifies noexcept by using the noexcept operator. This may be necessary if you can conditionally provide an optimization, but only if the functions you want to call are specified as non-throwing. The move operations in std::vector and std::swap are two good examples of functions that are conditionally optimized based on a non-throwing specification.

Finally, there is a function called std::move_if_noexcept that will conditionally obtain an rvalue reference if the move constructor of the type to be moved does not throw. Refer to Item 14 in "Effective Modern C++" for a thorough description of noexcept.

What you need to know

This is the section that I promised at the beginning of the essay. The most important concepts that you need to take away and know. Not remember, know, if you are going to employ move semantics and perfect-forwarding when you practice Modern C++.

  • Remember the difference between a value category and a type:

    Lvalue and rvalue are expressions that identify certain categories of values. Lvalue reference and rvalue reference are both types; because the names are so similar, it is easy to confuse the two.

  • Rvalues are the only expression types valid for move operations:

    std::move and std::forward explicitly attempt to convert arguments to rvalue references. This is performed by using the rvalue reference (type) in an rvalue expression, which is eligible for move operations.

  • If it has a name, then it is an lvalue (expression):

    This is true even if the type of the lvalue is an rvalue reference. Refer to the previous guideline for the ramifications.

  • Use std::move to explicitly request a move operation:

    std::move unconditionally makes the argument eligible for a move operation. I say request, because a move is not always possible, and the compiler may still elect to use a copy operation.

    After you use a value as an input to std::move, it is valid to only call an arguments destructor, or assign a new value to it.

  • Use std::forward to move type-deduced arguments in templates:

    std::forward conditionally casts its argument to an rvalue reference and is only required to be used in this context. This is important because an lvalue that is inadvertently moved could have unintended and unsafe side-effects, such as moving local values.

  • There are additional rules for the special compiler generated functions for a class:

    The move constructor and move assignment operator now can be generated automatically by the compiler if your class does not implement them. However, there are even stricter rules that dictate when these functions can be generated. Primarily, if you implement or delete any of the following functions, the compiler will not generate any unimplemented move operations:

    • Destructor
    • Copy Constructor
    • Copy Assignment Operator
    • Move Constructor
    • Move Assignment Operator
  • Strive to create exception free move operations and specify noexcept for them:

    Your move operations are more likely to be selected by the compiler if you can provide an exception free move operation. noexcept tells the compiler your function will does not require exceptions. Therefore, it can optimize the generated code further to not worry about maintaining code to unwind the stack.

Summary

Rvalue references were added to Modern C++ to help solve the problem of eliminating unnecessary temporary copies of expensive objects when possible. The addition of this value category for expressions has made two new idioms possible with the language.

Many restrictions had to be put in place when the compiler could safely perform the operations automatically. Moreover, it is important to use the functions added to the Standard Library because they express intent and provide the extra hints the compiler needs to utilize these operations safely.

Here is one last piece of advice as you look for locations to use move semantics. Do not try to outwit the compiler because it is already capable of some amazing optimizations such as the Return Value Optimization (RVO) for local objects returned by value. Practice the knowledge in the previous section and your C++ programs will keep that svelte contour envied by all other languages.


References

Effective Modern C++ by Scott Meyers (O'Reilly). Copyright 2015 Scott Meyers, 978-1-491-90399-1

C++ Rvalue References Explained by Thomas Becker, http://thbecker.net/articles/rvalue_references/section_01.html, March 2013.

A Brief Introduction to Rvalue References by Howard E. Hinnant, Bjarne Stroupstrup, and Bronek Kozicki, http://www.artima.com/cppsource/rvalue.html, March 10, 2008.

Value Categories at CppReference.com, http://en.cppreference.com/w/cpp/language/value_category, June 2015

Why Does a CS Degree Require So Much Math?

general, CodeProject, engineering Send feedback »

Over the years I have heard this question or criticism many times:

Why is so much math required for a computer science degree?

I never questioned the amount of math that was required to earn my degree. I enjoy learning, especially math and science. Although, a few of the classes felt like punishment. I remember the latter part of the semester in Probability was especially difficult at the time. Possibly because I was challenged with a new way of thinking that is required for these problems, which can be counter-intuitive.

What is computer science?

Let's start with computer science, and work our way towards math and how the two are interconnected. A definition of what is considered computer science may be helpful before we discuss its foundation in math. Here is an abridged definition from Wikipedia[^]

Computer science is the scientific and practical approach to computation and its applications. It is the systematic study of the feasibility, structure, expression, and mechanization of the methodical procedures...

A computer scientist specializes in the theory of computation and the design of computational systems.

I have had many discussions over the years with students and professionals educated from all different realms. One common sentiment that I hear from all of these groups is this:

Theory is nice, but I need to have relevant skills to get a job

This is a very valid point. It is also true that the majority of college graduates do not continue their career focused in the world of academics. Therefore, we can assume that most people that attend college do so with the intent to join the general workforce, ie. get a job. Let's evaluate

The role of academia

Much of what I have learned about the academic realm is that it has a fundamental focus on expanding the limits of our collective knowledge. Here is a great explanation of the purpose of a Ph.D. by Professor Matt Might at the University of Utah, The illustrated guide to a Ph.D.[^] As these new discoveries are better understood, they get further developed. Some discoveries may continue to live and exist solely in the academic realm. While other theories are applied and practical uses are developed from these discoveries.

Application of theory

The concepts taught in math, physics and engineering are based on theory. A theory itself is a model or a collection of ideas that is intended to explain something. The theories taught in these disciplines have been developed and refined over time. It is relatively simple to observe the application the theories that are required for the engineering disciplines based upon physics.

Some of our engineering disciplines start to push the boundaries of what is possible for us to observe, such as electrical engineering and nuclear sciences. However, we are still able to verify these hypothesis through repeatable experiments and observations, even if we can only observe the indirect results of these experiments.

Math and computer science are purely theoretical. They are abstract mental models that describe the relationships and behaviors of the interactions between these models. The application of these theories is a result of mapping these conceptual models into a context that can be represented and observed in physical form.

Yeah, yeah that's great (in theory), but...

I need skills to get a job

Trade schools are designed to help students learn the skills of a trade needed to get a job in that profession. The curriculum of trade schools is focused on teaching students the programming languages that are in highest demand, used to solve the most common problems that occur today.

This typically means that you would study a semester learning and solving problems with JAVA. Then move on an use a scripting language such as Python for a semester. Interfacing with relational databases is also likely to be in the curriculum as it is a very marketable skill.

As for college degrees in computer science; yes, you will learn a variety of computer languages that may help you get a job. However, these languages are used as the basis to teach the principles and give you the opportunity to apply these theories. The end goal is to teach critical thinking, to help students of computer science learn to reason, experiment, observe and understand the theories.

I believe that what you get out of from your education, depends on what type of effort that you put into your education. This is true for either path. The difference between the two is typically where the instructors focus your attention. Neither path guarantees a job at the end. However, if you do not have what it takes at a university computer science program, they will flunk you out of the program. The for profit trade-schools are bit more forgiving.

You can program computers with minimal math skills

This is a true statement.

In many cases, all that is needed to be able to program a computer is a computer, and the programming tools to interface with the computer. With basic arithmetic skills and knowledge of a computer language, you will be able to command that computer.

The same thing goes for many other skills, amateur auto mechanic, home plumbing, renovations, do-it-yourself haircuts. Some of the worlds most successful people dropped out of school and started a tech company. For example, John Carmack, who dropped out of college, co-founded Id Software and went on to pioneer some of the game industries ground-breaking graphics technologies; when he needed to know the math, he taught to himself.

I don't have any numbers, but I would imagine that most computer science graduates go on to become software developers of some sort. Many of the software engineers that I work with graduated college with an Electrical Engineering or Mechanical Engineering degree. Two guys I know graduated with law degrees, not a degrees in engineering.

Where does math factor in?

When math is mentioned, most people generally think of topics that require numbers and are focused on calculating values of some sort. The one notable exception is Geometry. Here is a small sampling of those types of topics:

  • Arithmetic
  • Geometry
  • Algebra
  • Calculus
  • Linear Algebra
  • Fourier Transforms
  • Ordinary Differential Equations
  • Partial Differential Equations
  • etc.

However, it is important to recognize that mathematics deals with more than just numbers. Mathematics spans a wide range topics:

  • Logic
  • Algorithms
  • Fundamental theorems
  • Mathematic proofs and models
  • Discrete math
  • Theory of computation
  • Information theory
  • Combinatorics
  • Set theory
  • Graph theory
  • Abstract algebra
  • etc.

Math is a foundation, which is built upon


“Mathematics is the gate and key to the sciences.”
     – Roger Bacon

For those of you that have taken Analytic Geometry, Calculus or even Trigonometry, consider how the material is taught. They start with a mathematical foundation and proofs are demonstrated that expand on what you should know going into the class. Each step is then used to derive the next step in the curriculum.

At the end of the course, many of the interim steps are forgotten or rarely used. However, they were a very necessary stepping stone to make a leap to the remainder of the material.

An example that comes to mind appears in Calculus, and is the concept of a limit of an equation. The concept of finding the limits of equations is then used to create derivatives and integration.

You could simply memorize the formulas to plug in your own numbers, and you would be able to use the equations. However, when you run into a problem that doesn't fit neatly within the formulas that you have learned, you would be stuck. The critical thinking skills that developed as these math courses progress give you the tools and ability to transform problems into a simpler form that does fit the mold of solvable equations.

This is the same structure that is used in computer science related courses. Data structures and algorithms that have been studied for decades have been optimized and we are taught how we reached this point. Again, it is possible to use an algorithm called sort, to organize your data and you do not need to understand how it works.

However, if you find that your sort routine runs much slower as the data-sets that you work with become much larger, you will need to know how to analyze your algorithm to be able to build something better. With a solid foundation you will understand how to compare the trade-offs and build something based on previous proven work.

Without this foundation, you may still be able to improve upon your original algorithm. However, you will only be guided by previous experience. Making that inspirational leap to a radically different, yet far more efficient design is not always easy.

Critical Thinking

Critical thinking is essential in professional fields as well as academia. Well, essential to maximize success. Critical thinking is more than analyzing and reasoning well thought-out decisions. It involves evaluating, conceptualizing, applying, observing, and communicating your conclusions.

Along the way, you also need to become self-critical looking for any biases that could add prejudice to your conclusions. Reflection and scientific skepticism help guide you towards accurate observations and well-justified conclusions.

It may seem as though we have meandered away from computer science into philosophy or scientific theory. We may now be discussing concepts that are central to these other topics. However, we have not wandered away from computer science. These skills and concepts are just as important to computer science as they are with every other engineering discipline.

Programming != Computer Science

It is confirmed then, you do not need advanced math to be able to program a computer, nor do you need a degree in computer science.

Hopefully, it is also obvious that there is a limit to what you will be able to program without knowledge of certain topics, especially math.

IT departments need a lot of programmers to create applications that automate or simplify business logic. These are line-of-business applications that give the companies users the information they need to do their jobs more efficiently. Much of this work centers around large databases and reducing the data set to the pertinent information required by the user.

Software engineering for machines and cutting-edge technologies often does require knowledge of advanced math. All of the wireless technologies that have been developed and become mainstream over the last 20 years are heavily dependent upon signal processing algorithms. Signal processing in itself could cover waveform synthesis and analysis, compression, encryption, forward-error-correction, and modulation/demodulation.

Now consider artificial intelligence, computer vision, image processing, natural-language processing, simulations, robotics, general-purpose GPU programming, distributed computing, the continues to grow each year. With the exception of robotics, I believe that today's smartphones are in some form dependent upon all of those technologies that I listed, all of which require knowledge of advanced math to do well.

Most web-sites do not receive an amount of traffic that cannot be solved by throwing more hardware at the problem. However, even with the commodity prices of hardware these days, more hardware is not always the best or even an easy solution. Experienced developers with strong critical-thinking and reasoning skills are a much better investment.

For some companies like Facebook, more hardware simply is not an acceptable solution. With hundreds of millions of interactions constantly occurring through-out the world, highly specialized algorithms are required to operate their site as efficiently as possible.

Summary

Math is good. For some people, math is hard. Math is also more than just numbers. Mathematics is a logical foundation that most of our sciences are built upon. Analysis, critical-thinking, and reasoning are just a few abstract skills that are developed by studying and applying mathematic concepts. These are also very important skills to develop in order to become a good software developer. This is true whether you work in a corporate IT department, or are developing the next-generation wireless communications network.

I was required to learn one-year of numerical-based math passed Calculus. The type most people think of when considering math. However, the majority of my math classes were about more abstract topics like Discrete Mathematics, and the Theory of Computation. These are the building blocks that set the foundation for the more advanced topics taught the senior year of my degree. These are also the building blocks that are required for those students that choose to move on to graduate school.

Math is not necessary to be able to program a computer. However, math is very important to engineering and science, and there are just some places that you cannot access with a computer unless you have a solid foundation in logic, critical-thinking and abstract mathematics.

Accidental Complexity

general, CodeProject, maintainability, design Send feedback »

Accidental complexity is the entropy that exists in your system that is possible to eliminate. The opposite of this is essential complexity; the parts of a system that are required and cannot be simplified. These two concepts were discussed by Fred Brooks in his essay No Silver Bullet -- Essence and Accidents of Software Engineering.. Many systems today are extremely complex, and any effort that can be done to eliminate complexity, should be.

There are a lot of should be's...

I get very irritated with excuses. Because excuses just keep allowing the code to decay with each new statement that is added. I started writing a list of these excuses for entertainment, but I started to get irritated, then I stopped. Here is one of them, "In a perfect world..." There's really on one way that sentence ever ends.

Add value, not garbage

Just because you have an excuse, even if it's a valid excuse, doesn't mean that you are off the hook for cleaning up messes. I remember a situation when I ran across a piece of code, and I thought that it looked peculiar.

C++

(void)Sleep(k_one_second);

Casting a function call to void. I've never seen that before. There were no comments and I couldn't think of anything valuable to the code that is added by that cast. The Sleep function did return an integer, but the value is not being assigned. I scratched my head, and I deleted the cast.

Soon after my changes were being reviewed. There was a question entered that asked why I deleted the void cast. The comment then goes on to explain that cast was added to fix a defect reported by our static analysis tool.

I thought, "This is a fix?"

One of the rules in the coding standards we used, JSF++, is that all return values must be checked. This hardly qualifies as checking the return value, but it appeases the analysis tool.

I replied with "The reason why the tool is reporting an error is because all return values are supposed to be tested. This looks odd, there are no comments, and it's not even a fix." I immediately had two better ideas that would have required the same or less work, and be valuable changes:

  1. Change the functions signature to return void
  2. Encapsulate the original function in an inline function that returns void

Changes like this only add garbage to the code which reduces its readability. Even worse, it covers up a valid problem reported by the code analysis tool. Using subversive code tricks to prevent the tool from reporting a valid issue negates the value of using the tool. You should strive to make every statement add value to the code.

Several hundred instances of void casts were added in that situation. This only added more clutter that covered up a problem rather than focusing on finding a valid solution to fix the problem.

Simplify

Imagine that there is a specific feature that exists, and it will require a finite amount of logic to implement and use:

Logic required to implement and use a feature

Now imagine that functionality needs to be used in multiple locations. Generally we can simplify by factoring the common code into a function, or if it is more complicated logic it may even become a class.

Encapsulating part of the implementation

One particular problem that I encounter frequently is an abstraction that handles a minimum amount required for the implementation. This tends to leave much more logic for the caller than is necessary. More logic outside of this re-usable abstraction, means more duplicated logic; logic that can be written incorrectly, or even logic that does not properly initialize the feature.

Code left to caller opens door for errors

Can you make it simpler?

After you have simplified your code inside if your features abstraction, put it to the test and use it. Better yet, put it in a test, a unit-test. See if there is anything that you could actually take care of for the user with the known input.

Abstract more for the caller when possible

It is not always apparent, but this is accidental complexity. This is an example of a situation that could eliminate code that is duplicated.

This is unfortunate, because duplicated code is often performed with cut-and-paste, which is notoriously error-prone. This also adds more code to be maintained, which is also more code to read in order to understand what purpose a section of logic serves.

Consider the trade-offs

It is not always the best choice to continue to simplify. Simplifying the interface to a function usually means giving up flexibility. In most cases there is no reason the choice must be either/or. Sometimes it is feasible and advantageous to do both.

Create the basic abstraction that leaves all of the flexibility and error-prone or cumbersome logic to the user. Then create simplified versions of the feature that handle commonly used cases.

Example

The cumbersome abstraction that I use as an example, is the ::GradientFill function from the Win32 API. I explored what this function is capable of a few years ago, and I learned that it is quite powerful. The interface provides a lot of flexibility, and it does not look too bad from a cursory inspection.

C++

BOOL GdiGradientFill(
  __in  HDC hdc,            // Handle to the DC
  __in  PTRIVERTEX pVertex, // Array of points of the polygon
  __in  ULONG dwNumVertex,  // Size of the vertices array
  __in  PVOID pMesh,        // Array of mesh triangles to fill
  __in  ULONG dwNumMesh,    // Size of the mesh array
  __in  ULONG dwMode        // Gradient fill mode
);

However, this function requires a lot of repetitive setup code. This is also the reason that I hardly ever used ::GradientFill up to that point. Here code from the MSDN documentation page for this function that is required to paint a horizontal and vertical gradient. I believe it would be simpler to write a for-loop than the setup that is required for this function:

C++

TRIVERTEX vertex[2] ;
vertex[0].x     = 0;
vertex[0].y     = 0;
vertex[0].Red   = 0x0000;
vertex[0].Green = 0x8000;
vertex[0].Blue  = 0x8000;
vertex[0].Alpha = 0x0000;
 
vertex[1].x     = 300;
vertex[1].y     = 80;
vertex[1].Red   = 0x0000;
vertex[1].Green = 0xd000;
vertex[1].Blue  = 0xd000;
vertex[1].Alpha = 0x0000;
 
// Create a GRADIENT_RECT structure that
// references the TRIVERTEX vertices.
GRADIENT_RECT gRect;
gRect.UpperLeft  = 0;
gRect.LowerRight = 1;
 
::GdiGradientFill(hdc, vertex, 2, &gRect, 1, GRADIENT_FILL_RECT_H);
::GdiGradientFill(hdc, vertex, 2, &gRect, 1, GRADIENT_FILL_RECT_V);


The code is even worse for a triangle.

I wanted to make these functions simpler to use in my code. To me, it should be as simple as a single function call to fill a rectangle. So I encapsulated the required code in the function below:

C++

bool RectGradient(
  HDC hDC,        // The device context to write to.
  const RECT &rc, // The rectangle coordinates to fill with the gradient.
  COLORREF c1,    // The color to use at the start of the gradient.
  COLORREF c2,    // The color to use at the end of the gradient.
  BOOL isVertical)// true creates a vertical gradient, false a horizontal

As I mentioned earlier, you often give up flexibility for convenience. One of the features that is given up from the previous function is the ability to control alpha-blend levels. Therefore, I created a second version of this rectangle gradient that allows alpha blending levels to be specified.

C++

bool RectGradient(
  HDC hDC,                  
  const RECT &amp;rc,          
  COLORREF c1,              
  COLORREF c2,              
  BOOL isVertical,          
  BYTE alpha1,    // Starting alpha level to associate with the gradient.
  BYTE alpha2     // Ending alpha level to associate with the gradient.
  )

These two functions could be used much more simply. Here is an example of how much simpler the code becomes:

C++

// Horizontal Gradient
RECT rc = {0,0,300,80};
COLORREF black = RGB(0,0,0);
COLORREF blue  = RGB(0,0,0xff);
RectGradient(hdc, rc, black, blue, false);
 
// Let's draw a vertical gradient right beside the horizontal gradient:
::OffsetRect(&amp;rc, 310, 0);
RectGradient(hdc, rc, black, blue, true);

The value of many small abstractions adds up

Sometime the flexibility of the original code can still be accessible even when the code is simplified. This can be done with a collection of smaller abstractions. Utility functions like std::make_pair from the C++ Standard Library is one example. These functions can be used in series to simplify a series of declarations and initialization statements.

A series of small abstractions is valuable

In some cases this collection of utility abstractions can be combined into a compound abstraction.

Abstractions can be grouped together

There are many ways that code can be simplified. It doesn't always need to be code that would be duplicated otherwise. If I run across a super-function, I will try to break it down into a few sub-routines. Even though this new function will only be called in a single location, I have abstracted the complexity of that logic at the call site.

It is even more feasible to give this function that is called only once a more cumbersome but descriptive name. When reading code from the original super-function it is now much easier to ignore large blocks of code that obfuscate the intended purpose of the function.

Abstraction simplifies code, as long as it provides value, continue to abstract

While the code may be necessary, that does not mean that it must make the code around it more difficult to read and main.

Summary

Accidental Complexity is the code that exists in our programs that we can simplify or eliminate. Duplicated code and code that exists only to get rid of warnings are two examples of accidental complexity. The best case scenario, the code becomes just a little bit more difficult to maintain. However, in the worst cases, legitimate issues could be covered up. Worse still, the manner that they were covered up makes them that much more difficult to find if they are the cause of a real problem.

I witness developers exerting a great deal of effort to work around problems, fix the symptoms or even make excuses and ignore the problems. This time could just as easily be focused on finding a fix that actually provides value. Much more quality software would exist if this were always the case.

...but I suppose this is not a perfect world.

Alchemy: PackedBits (BitLists Mk3)

C++, Alchemy, design Send feedback »

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called, Network Alchemy[^]. Alchemy performs low-level data serialization with compile-time reflection. It is written in C++ using template meta-programming.

My second attempt to create a bit-field type was more successful. The size of the container only grew linearly with each sub-field that was added, and the implementation was cleaner. However, I showed an image of what this implementation looked like in the debugger and it was very in convenient. The thing I was concerned with the most was the pitiful performance that was revealed by my benchmark tests.

This entry describes my discoveries and the steps that I took to re-invent the bit-field type in Alchemy for the third time. This is also the current implementation in use by Alchemy, which is about 10% faster than hand-coded collection of packed-bits.

Overview

Let's start with a brief review. The PackedBits type in Alchemy essentially replaces C-style bit-fields. Bit-fields are very common with network protocols, which attempt to make every bit count that is sent across the wire.

C and C++ both have a syntax to access values within a byte at the bit level, the bit_field. Unfortunately, the standard for both languages leave the rules for implementation undefined. Therefore, your code may not be portable to a different compiler for your same architecture.

Alchemy's PackedBits provides an abstraction to portably allow bit-level access for message protocols and storage structures.

The poor performance of PackedBits v2

The implementation for version 2 of PackedBits was a simplification of the original design. The first implementation had a fixed size of 32 entries, which always existed. Therefore they always took up space, and they were initialized and destroyed with each PackedBits type. This made them a very expensive type in space and speed.

Version 2 solved the hard-coded limit by using recursion and inheritance to handle each BitFieldNode. Unfortunately, there are two aspects of this design that are bad.

Inconvenient Debugging

Since each but field inherits from the next field in the chain, you had to navigated to each sub-class to inspect the value of a particular node. They were not all displayed at once. This is actually a minor inconvenience because some tools like Visual Studio allow you to create custom visualizers for your types.

Design does not scale

There is a challenge in creating a type like this. Abstraction provides an illusion to the user that they are interacting with a single type that provides the expressive syntax of value types in a struct. Yet, these are distinct objects composited that are structured to interact and change specific bits on a single data type.

I achieved this in the original design, by passing a reference to the data field in the PackedBits container, to the BitFieldNodes. The mask and shift operations are then defined by the template instantiation and used on the referenced value.

This created a valid solution. The drawback is that for every new bit-field added to the type, a new reference was added as well. This is a reference that will take up additional space, and will need to be initialized. It turns out, the initialization of these references was killing performance.

Whenever a copy of the type was required, the reference for each node had to be initialized. The worst part is that this data type only manages up to the word-size of the platform you are running on. A 32-bit integer specified to have 32 1-bit flags, would required 32 references and the data value to be initialized for each copy.

  The Structure of PackedBits v2

This implementation had to go!

The Solution

I had know of and considered an alternative solution many times. However, for my original implementation of Alchemy I steered clear of this solution. The reasons were the strict coding guidelines that we followed. My first implementation of what is now called Hg, followed the JSF++ coding guidelines.

A quandary

The quandary was simple. Even though I am now developing this independent open-source library, do I want to use the solution that I had often considered?

The solution was to use the offsetof MACRO defined in the standard library. This MACRO can become tenuous, especially as your object hierarchy becomes more complicated. Luckily I haven't run into any problems yet. Hopefully with the controlled implementation I can keep it that way.

What does offsetof do, and how does it work? This solution must be a macro because it inserts the names of sub-fields from a struct or a class to calculate the offset of that data elements address from the base address of the parent type.

My final decision

Yes, of course I decided to use offsetof. Basically, I can now group all of the bit-field children at the same level in the PackedBits parent type. I now pass in the parent type to the BitField object as part of its instantiation. This allows the child to compare itself to its parent and find its location relative to the base pointer.

C++

template< class  OwnerT,
          class  TagT,
          size_t OffsetT,
          size_t CountT,
          class  T = uint8_t
        >
struct BitField
{
  // Generally the same logic in previous
  // implementations for shift, mask, read and store.
  // ...
};

A member-function of PackedBits is called in order to be able to reference the correct data element. This function is called value(). The trick is, "how to get a pointer to the child types parent object without requiring initialization.

C++

// Continued implementation of BitField:
 
  value_type& value()
  {
    return owner()->value();
  }
 
  // This function returns a pointer to the parent object.
  OwnerT* owner()
  {
    return reinterpret_cast<OwnerT*>(
             reinterpret_cast<char*>(this) - TagT::offset());
  }

This adjustment, as well as the change in code-generation performed by the pre-preprocessor form the solution for version 3 of the PackedBits type.

  The Structure of PackedBits v3

A challenge with cross-compiling

When I went to cross-compile and verify the implementation on Linux with gcc, the compiler complained. It believed that it was de-referencing a pointer to 0. My guess is that since it was a template and not an actual value-type, the calculations were performed at the base address of 0, which worked out quite nicely for me.

How did I appease gcc? Look away if you are a purist:

C++

#ifdef  __GNUC__
 
// Yes, we're redefining this MACRO...
#ifdef offsetof
#undef offsetof
#endif
 
// GCC does not approve of the way in which
// Alchemy employs this MACRO.
// This is a slight alteration:
// Performing the calculation on a non-null value,
// then re-adjust back to zero after the offset calculation.
#define offsetof(type,member) \
  (size_t)reinterpret_cast<const char*>((((type*)1)->member)-1);
 
#endif

Are you ready for my rationalization? It's just like the C++ Standard Library and its treatment of addresses and iterators. It's safe to calculate and store and address one past the end, but it's not safe to dereference that address. We get an offset from this calculation, and we apply it to a known valid base-pointer.

Summary

After an abysmal first showing, I was very pleased when I saw the results from this re-implementation of the PackedBit type. Not only is the design simpler and easier to maintain, but it also performs about 10 to 12% faster than the hand-coded implementation in the benchmark tests that I have created.

I did resort to using the offsetof MACRO, which is prohibited by some coding standards. I do not expect an issues to arise because of the simple context where this calculation is used. There are no virtual functions, or complex class hierarchies to consider. However, I am keeping an eye on its behavior.

C++: Template Meta-Programming 2.0

CodeProject, C++ 2 feedbacks »

I was amazed after I had converted only the first few portions of the TypeList from the C++98 implementation to Modern C++. I have decided to convert my Alchemy API to use Modern C++ in order to truly learn the nuances by application of acquired knowledge. The code of these meta-programs are very elegant and completely readable. This really does feel like a new language, and an entirely new version of meta-programming.

The elegance enabled by the new constructs will allow me to present a complete TypeList implementation for Modern C++ in this entry.

I have corrected errors and uploaded a new implementation file.


Compare and Contrast

My primary goal is to compare the differences between template meta-programming with with C++98 and Modern C++ (C++11 and beyond). While the concepts remain similar, the newer language has been expanded in a way that makes this a more natural way to compose functional programming logic with C++.

To be clear, I want to explicitly state that meta-programming has its places, and generally it will not be in core-application logic. Libraries and utilities that will be developed and tested, but are less likely to require maintenance work are good candidates for these types of solutions. Application programmers can take advantage of these complex machinations, yet the caller may never even realize what magic exists behind the curtain. A great example would the the C++ Standard Library itself.

Frame of reference

Andrei Alexandrescu made the concept of meta-programming accessible to the masses in his book Modern C++ Design, published in 2001. He demonstrates and develops many useful components for meta-programming that he created as part of his library, Loki.

This is the style of meta-programming that I first learned, and the type of structure that, Alchemy, is built around. I have altered the implementation and interfaces for my meta-constructs to suit my needs, compared to what is presented in the book. The next few sections demonstrate how the same tasks are accomplished in both versions of the language.

Then

The most important of these constructs is the TypeList. The TypeList is a work-horse construct that can be in a variety of unique ways, yet does not contain any internal data or run-time code. structs become the natural type to act as a functional container, which performs all of its compile-time, or static, operations on types, and stores values in static const values or enums.

To simplify expressions, I made liberal use of typedefs. This helped me avoid the repitition of verbose template expressions and at the same time give a label to the purpose of that template. Sometimes there are no ways to simplify expressions other than turning to the pre-processor. I prefer to avoid the pre-processor at all costs in my application logic. However, I have grown accustomed to leaning on the pre-processor to generate code for me for repetitive definitions that appear in a class or at the global scope.

Here is an example of how Alchemy's TypeList is constructed. Internally, a TypeNode provides the declarations for the head and tail types.

C++

// A trait class to assist with tag-dispatching.
struct container_trait{ };
// Represents the end of a type list.
struct empty{};
 
template< typename H,
          typename T
        >
struct TypeNode
{
  typedef H head;
  typedef T tail;
};

Now the definition of the TypeList to show the organization of the structure:

C++

template< class T0, class T1, class T2, class T3,
          class T4, class T5, class T6, class T7,
          class T8, class T9, class T10,class T11,
        >
struct TypeList
  : container_trait
{  
  typedef
    TypeNode<T1,  TypeNode<T2,  TypeNode<T3,
    TypeNode<T4,  TypeNode<T5,  TypeNode<T6,  
    TypeNode<T7,  TypeNode<T8,  TypeNode<T9,  
    TypeNode<T10, TypeNode<T11, TypeNode<T12, MT>
    > > > > > > > > > > >                type;
    // Alchemy continues on to 32
};

Composing a structure should be much simpler than this nested definition. Therefore, I decided to wrap the inner declaration with a simpler outer definition. Unfortunately, there are only a few facilities available to customize template declarations. The best option in my particular situation was template specialization.

I wanted to provide a natural interaction with my TypeList object, and still allow support for a variable number of parameters. Thirty-two was my initial number of parameters that I would support. I can live with writing thirty-two specializations once. However, I had many operations that I would also implement, and each of those would require specialized implementations as well. So I resorted to the preprocessor to generate the code for me.

Here is the definition of the MACRO, and how it was used. It generates the code in the block from above:

C++

// It seems my syntax highlighter for MACRO requires some attention
#define tmp_ALCHEMY_TYPELIST_DEF(S) \
template<TMP_ARRAY_##S(typename T)> \
struct TypeList<TMP_ARRAY_##S(T)> \
  : container_trait \
{ \
typedef TMP_ARRAY_##S(TypeNode<T), empty TMP_REPEAT_##S(>)  type; \
}
 
// Define specializations of this array from 1 to 31 elements
tmp_ALCHEMY_TYPELIST_DEF(1);
tmp_ALCHEMY_TYPELIST_DEF(2);
tmp_ALCHEMY_TYPELIST_DEF(3);

Yes, I definitely left out many of the gory details for the definitions of the MACROs. But why would you want them? We're moving forward into the future; but you can still access them from Alchemy on GitHub.

The direct usage of the TypeList was then much more accessible to the user. Also, there was no need for them to use any MACROs to define a new TypeList:

C++

typedef TypeList
        <
          int,
          long,
          float,
          char
        > types;

Now

There are two primary additions to Modern C++ that make template programming in general a pleasure to use, and that is to not even mention meta-programming itself:

  1. Variadic templates:
  2. Similar to variadic function parameters, this feature allows a variable number of arguments to be used in a template definition.

  3. Template aliases:
  4. This allows the using keyword to be used in situations similar to typedef. However, unlike typedef, using can be defined as a template. Therefore, it is compatible with partially-specialized templates.

Here are the definitions that I required when I ported my code to Modern C++ (don't worry, I will explain the syntax afterwards):

C++

// Forward Declarations
struct empty { };
struct container_trait { };
 
template< typename... T >
struct type_node;
 
template< typename... NodesT >
struct type_list;

And an implementation for these types:

C++

// An empty terminating node.
template< typename... T >
struct type_node<>
{
  using head = empty;
  using tail = empty;
};
 
// Recursive parameter pack node
template< typename    H,
           typename... T
        >
struct type_node<H, T...>
  : type_node<T...>
{
  using head = H;
  using tail = type_node<T...>;
};
 
template< typename... NodesT >
struct type_list
  : container_trait
{
  using nodes = type_node<NodesT...>
};

No, really! That's it! We get the exact same usage as the code from above, and I'm not even sure that I need to explain this last chunk of code.

I admit, I had a few false-starts trying to get a grasp on the parameter pack. No, not to reach this point, neither of the code samples above are good for anything except defining a list of types. My first challenge appeared when I tried to create a meta-function to give me the type of parameter at a specific index in the list.

Let me introduce the new constructs, then I will demonstrate some of the elegant solutions that barely scratch the surface of their capabilities.

The parameter pack...

The variable defined within a variadic template is called the parameter pack.

The parameter pack is essentially a list of types delimited by commas. To define one as a parameterized type in a template definition, use the ellipsis between the typename or class declaration and the name that you assign your type. There can be whitespace before and after the ellipsis if you desire...or not. However, the general style that you are likely to see places the ellipsis attached to the type-declaration and a space before the type-name.

C++

// Most common style
template<typename... T> struct common;
 
// These are all legal too.
template<typename ...T>  struct spaces_before;
template<typename ... T> struct spaces_both;
template<typename...T>   struct spaces_none;
 
// Template parameter packs are legal for
// use with template functions as well.
template<typename... T>
T function(T... params);

You may have noticed in my type_list implementation and the declaration of the template function that I placed the ellipsis after the declared name. This is how you invoke the parameter pack in your logic.

Invoke the parameter pack

What does it mean to invoke the parameter pack?

Nothing really. You're setting it where you want to apply it, and the compiler goes to work ripping apart the parameter pack and generating your code. However, the compiler does need a little bit of help. You will need two things if you are generating code from the parameter pack:

  1. Recursive definition:
  2. This is a definition that will be implicitly called by the compiler as many times as necessary until it reaches your terminating case. If you refer to the definition of the type_list, you will see that the parameter pack is applied in a context where another type is placed before it, separated with a common. This essentially peels one or more types away from the parameter pack at a time. In this sense, the template parameter pack is similar to the variadic MACRO usage.[/codespan]

  3. Terminating condition:
  4. A condition that will handle the case of an empty list, or at least terminate the recursion before the compiler attempts to go beyond the end of the parameter pack. It is not necessary for this to be an entirely different definition.

Size of a parameter pack

A convenient sizeof... operator has been provided to match the syntax of the parameter pack. This version of the operator is not related in anyway to the classic sizeof operator.

C++

template< typename... T >
struct length
  : std::integral_constant<std::size_t, sizeof...(T)>
{ };

The parameter pack cannot be a variable

The parameter pack must be decomposed completely at compile-time. It cannot be the sole definition of a typedef or a using alias.

C++

template< typename... T >
struct param_pack
{
  using save_for_later = T...;
};

However, that does not mean that we are helpless. There is an idiom that exists with template programming that allows us to extract the type from a template parameter. I am not sure if it has a name.

Let me demonstrate it for you. This is the definition you are most likely to find on the Internet for a TypeList:

C++

template< typename... T > struct typelist { };

The previous code is completely legal because the parameter pack expansion is defined and terminated with this definition. With another template that is given the right specialization, we can extract the parameter pack from the original type definition.

To demonstrate this, let's create a length meta-function that will report the number of elements in the type_list that I defined above. We need to declare a default version of the length meta-function. This function does not necessarily need to be implemented.

C++

// Default declaration
// This does not require an implementation
template< typename... T > struct length;
 
// This specialization allows us to identify and access
// the parameter pack defined within our type_list.
template< typename... T >
struct length <type_list<T...>>
  : std::integral_constant<std::size_t, sizeof...(T)>
{ };

We can use the parameter pack from the type_list because we specialized this template solely for the this type. The compiler does a best fit comparison when attempting to resolve types, and finds this version.

Template Aliases

Up until this point, we have had the typedef, which has served us well. However, it does have its shortcomings. I believe the most notable is that partial template specialization is not supported by a typedef. The template alias does provide this support.

C++

// I want to provide a simple type to create
// a map of strings to another type.
template< typename T >
using StringMap = std::map<std::string, T>;
 
// Can now be used as:
StringMap<int>     named_ints;

Here's a more complex example:

C++

// I want to map names to lists of things.
template< typename T >
using NamedListMap = std::map<std::string, std::list<T>;
 
NamedListMap<unsigned int> lotto_picks_by_state;

Improves readability of templates

There is one other feature of template aliases that I did not fully appreciate until I started to use them. Most code examples do not get complex enough to allow you to fully appreciate the second feature. Let me demonstrate, then I will get into the gory details.

This is an example of an additional declaration that was added to C++14, but is possible in C++11. I am not sure if this technique wasn't discovered until after C++11, or they left it out to keep it from becoming C++12.

C++

// The C++ Standard Library contains useful
// meta-functions to manipulate types.
// This one converts type T into T*
template< class T >
struct add_pointer;
 
// This is used like this:
typedef typename std::add_pointer<void>::type   void_ptr;
// eww! ^^^^^^^^                         ^^^^
 
// Or directly...
typename std::add_pointer<void>::type   p_void = 0;
// I think I just threw up in my mouth...
// No wonder templates have a bad reputation.

These definitions appear in C++14, but you can use this technique in C++11.

C++

// Template alias for std::add_pointer
template< class T >
using add_pointer_t = typename std::add_pointer<void>::type;
 
// New usage:
typedef add_pointer_t<void>   void_ptr;
 
// And directly...
add_pointer_t<void>   p_void = nullptr;

Detailed explanation

typedefs are a common way to reduce clutter in code. Primarily with templates because the use of template type declarations require you to qualify the type with typename if you are using a dependent-type.

What is a dependent-type?

That is a very good question. To help with the explanation dependent type is a shortened version of the name template parameter dependent type. I'm surprised the C++ community hasn't just adopted TPDT, but I digress. A dependent type is a sub-type declared within a template class or struct.

typename is required when referencing sub-items in a template. I say sub-items because other things can be defined within a struct, that are accessed in the same manner as a dependent type, like a static variable. typename is a clue to the compiler that you want it to be interpreted as a type.

The capabilities of the template alias allow us to clearly specify beforehand that we mean a type. Therefore both the typename qualifier and sub-type required to access the dependent name are managed by the alias. This greatly simplifies code when there are many template types to deal with. Template meta-programming is a prime example.

One Last Tip

In the fall of 2014, N4115 Parameter Pack Searching[^] was proposed with some additions to the utility library. This would add a common form of the idiom that I described above to gain access to a parameter pack. The name proposed for the type is packer.

I was trying to modify an existing parameter pack, and I just couldn't put the pieces together. So that is when I searched and found N4115 when I found N4144 Searching and Manipulation of Parameter Packs[^], by Bill Seymour and Stephan T. Lavavej. This is an amended version of the first document and it adds manipulation utilities. One in particular is add_to.

I already demonstrated the concepts of packer, however, in my code I refer to it as param_pack. Here is how add_to is implemented. Multiple specializations are declared to handle the possibility of adding a parameter pack to a parameter pack.

C++

template<class T, class U> struct add_to;
// Add to the front of the list
template<typename T, typename... ArgsU>
struct add_to<T, param_pack<ArgsU...>>
{
  using type = param_pack<T, ArgsU...>;
};
 
// Add to the back of the list
template<typename... ArgsT, typename U>
struct add_to<param_pack<ArgsT...>, U>
{
  using type = param_pack<ArgsT..., U>;
};
 
// Combine two lists
template<class... ArgsT, class... ArgsU>
struct add_to<param_pack<ArgsT...>, param_pack<ArgsU...>>
{
  using type = param_pack<ArgsT..., ArgsU...>;
};
 
// And the template alias
template<typename T, typename U>
using add_to_t = typename add_to<T,U>::type;

You will see a demonstration of the usage in the next section.


A Modern C++ TypeList

I searched the Internet, albeit briefly, and I did not find any Modern C++ implementations of a TypeList that did not expand beyond this definition:

C++

template< typename... T > struct typelist { };

I found the fundamentals to convert the code that I already have into modern form. I want to convert and get it integrated first. If there are better practices, I can adjust the implementation in a working test harness.

I have already shown the definition of the basic type_list structure that I use as well as a demonstration of the length and param_pack, and the implementation for add_to. In the code below, I have omitted the forward declarations and template aliases that I define in the type list header file.

I am going to blast through the different operations that I have built so I do not take up too much more of your time. If something is not clear, please drop a comment and I can further explain or even add more detail to the description.

I have posted a link to the single header file that contains all of these definitions at the bottom.

make_type_list

I wanted to be able to make a type_list from an existing set of internal type_nodes, and then later, a param_pack.

C++

template< typename... T>
struct make_type_list< type_node<T...>>
  : container_trait
{
   using type  = type_list<T...>;
};
 
template< typename... T>
struct make_type_list< param_pack<T...>>
  : container_trait
{
   using type  = type_list<T...>;
};

type_at

Query the type of element at a specified index in the type_list. This item required a helper template that I called type_of_node.

C++

template< std::size_t    IdxT,
          typename  NodesT
        >
struct type_of_node
{
  using type =
    typename type_of_node<IdxT-1, typename NodesT::tail>::type;
};
 
// Terminating specialization
template< typename  NodesT >
struct type_of_node<0, NodesT>
{
  using type = typename NodesT::head;
};

Now for the actual implementation of type_at.

C++

template< std::size_t IdxT,
          typename    T  
        >
struct type_at
{
  using nodes = typename T::nodes;
  using type  = typename type_of_node<IdxT, nodes>::type;
  using rest  = typename make_type_list<typename nodes::tail>::type;
};
 
// A terminating case for type_at
template< std::size_t IdxT,
          typename    T  
        >
struct type_at<IdxT, empty>
{
  using nodes = empty;
  using type  = empty;
  using rest  = empty;
};

I added the declaration of nodes to simplify the declaration for type. This wasn't strictly necessary. I added rest for convenience in other solutions. rest returns a type_list of the elements remaining after the specified index.

For example, if there were 10 elements in a type list and index 6 was specified. A type list with elements [7,8,9] would be returned.


Stop me if I go too fast for the rest of these.

front

C++

template< typename T >
struct front
{
  /// Type of the first element in the list.
  using type = type_at_t<0, T>;
  using rest = typename type_at<0, T>::rest;
};

back

C++

template< typename T >
struct back
{
  /// Type of the last element in the list.
  using type = type_at_t<length<T>::value-1, T>;
};

pop_front

C++

template< typename T >
struct pop_front
{
  using type = typename front<T>::rest;
};

push_front

C++

template< typename F, typename L >
struct push_front
{
private:
   using params = typename to_param_pack<typename L::nodes>::type;
   using sum    = typename add_to<F, params>::type;
 
public:
   using type   = typename make_type_list<sum>::type;
};

push_back

C++

template<typename L, typename B>
struct push_back
{
private:
   using params = typename to_param_pack<typename L::nodes>::type;
   using sum    = typename add_to<params, B>::type;
 
public:
   using type   = typename make_type_list<sum>::type;
};

New functionality updated since the original post

pop_back

C++

template<typename T>
struct pop_back
{
   using type = typename split_t<length<T>::value - 1, T>::type;
};
 
// Terminating specialization
template< >
struct pop_back
{
   using type = empty;
};

move_item

Move a specified number of elements from the front of the second list, to the end of the first list. This function is used to implement split, which is then used to implement pop_back.

C++

template<std::size_t CountT, typename T, typename R>
struct move_item
{
private:
  using first = push_back<T, front_t<R>>;
  using last  = pop_front<R>;
 
public:
   using type = typename move_item<CountT-1, first, last>::type;
   using rest = typename move_item<CountT-1, first, last>::rest;
};
 
// Terminating specialization
template<typename T, typename R>
struct move_item&lt;0, T, R>
{
   using type = T;
   using type = R;
};

split

Splits the list into two separate lists at the specified pivot index.

C++

template<std::size_t PivotT, typename T>
struct split
{
  static_assert(PivotT <= length<T>::value,
                "The split pivot index is out of range");
 
   using type = typename move_item<PivotT, type_list<>, T>::type;
   using rest = typename move_item<PivotT, type_list<>, T>::rest;
};

Summary

Again, I am pleased at how much simpler my code has become with these new additions. It's still C++. It's like C++ with the Hemi. Statements can be expressed more tersely, which actually increases the readability of the code as opposed to the lose meaning. Repetitive typing and redundant code can also be reduced.

If you frequently program with templates, or are even a big fan of the C++ Standard Library, you owe it to yourself to become familiar with these two features.

As promised, here is a link to the full type list implementation:


Contact / Help. ©2017 by Paul Watt; Charon adapted from work by daroz. CMS / cheap web hosting / adsense.
Design & icons by N.Design Studio. Skin by Tender Feelings / Skin Faktory.