|« The Purpose of a Unit Test||From Good to Great »|
The concept of selecting descriptive variable names is a lesson that seems to start almost the moment you pick up your first programming book. This is sound advice, and I do not contest this. However, I think that the basis could be improved by creating and using the most appropriate type for the task at hand. Do you believe that you already use the most appropriate type for each job? Read on and see if there is possibly more that you could do to improve the readability, maintainability or your programs, as well as more simply express your original intent.
Before I demonstrate my primary point, I would like to first discuss a few of the other popular styles that exist as an attempt to introduce clarity into our software.
First a quick note about the Hungarian Notation naming convention. Those of us who started our careers developing Windows applications are all aware of this convention. This convention encodes the type of the variable in the name using the first few letters to mean a variable type code. Here is an example list of the prefixes, the types they represent and a sample variable:
Some of the prefixes duplicate, such as the bool and byte types, which both use b. It's quite common to see n used as the prefix for an integer when the author would like to create a variable to hold a count. Then we reach the types that have historical names, that no longer apply. LPCSTR, LPWSTR and all of the other types that start with LP. The LP stands for Long Pointer, and was a necessary discriminator with 16-bit Windows and the segmented memory architecture of the x86 chips. This is an antiquated term that is no longer relevant with the 32-bit and 64-bit systems. If you want more details this article on x86 memory segmentation should be a good starting point.
I used to develop with Hungarian Notation. Over time I found that variables were littered through the code marked with the incorrect type prefix. I would find that a variable would be better suited as a different type. This meant that a global search and replace was required to properly change the type, because the name of the variable would need to be changed as well.
Why is the Type Part of the Name?
This thought finally came to mind when I was recovering from a variable type change. Why do I need to change every instance of the name, simply because I change its type? I suppose this makes sense when I think back to what the development tools were like when I first started programming. IDE's were a little more than syntax highlighting editors that also had hooks to compile and debug software.
It was not until the last decade that features like Intellisense and programs like VisualAssist appeared that improved our programming proficiency. We now have the ability to move the cursor over a variable and have both its type and value be displayed in-place in the editor. This is such a simple and yet valuable addition. These advancements have made the use of Hungarian Notation an antiquated practice. If you still prefer notepad, may God have mercy on your soul.
Wow! Naming conventions huh?! My instinct desperately inclines me to simply skip this topic. This is a very passionate subject for almost every developer. Even people that do not write code feel the need to weigh in with an opinion. Let's simply say for this discussion, variable conventions should be simple with a limited number of rules.
Even thought I no longer use Hungarian Notation, I still like to lightly prefix variables in specific contexts, such as a 'p' prefix to indicate a pointer, 'm_' for my member variables, and 'k_' for constants. The 'm_' gives a hint to ownership in an object context, and it simplifies the naming of sets of variables. Anything that helps eliminate superfluous choices can help me focus on the important problems that I am trying to solve. One last prefix I almost forgot is the use of 'sp' for a smart or shared pointer. These are nice little hints for how the object will be used or behaviors that you can expect. The possibility always remains that these types will change, however I have found, in fact, that variables in these contexts rarely do change.
Increase the Clarity of Intent
Developing code is the expression of some abstract idea that is translated into a form the computer can comprehend. Before it even reaches that point, we the developers need to understand the intention of the idea that has been coded. Using simple descriptive names for variables is a good start. However there are other issues to consider as well.
There is a potential problem lurking within all code that performs physical calculations. The unit type of a variable must be carefully tracked, otherwise a function expecting meters may receive a value in millimeters. This problem can be nefarious, and generally elusive. When your lucky, you catch the factor of 1000 error, and make the proper adjustment. When things do not work out well, you may find that one team used the metric system and the other team used the imperial system for their calculations, then an expensive piece of equipment could crash onto mars. Hey! It can happen.
One obvious solution to help avoid this is to include the units in the name of the variable.
I believe this method falls a bit short. We are now encoding information in the variable once again. It definitely is a step in the right direction, because the name of the variable is less likely to be ignored compared to a comment at its declaration that indicates the units of the variable. It is also possible for the unit of the variable to change, but the variable name is not updated to reflect the correct unit.
Unfortunately, the best solution to this problem is only available in C++ 11. It is called user-defined specifiers. The suffixes that we are allowed to add to literal numbers to specify the actual type that we desire, can now be defined by the user. We are no longer limited to unsigned, float, short, long etc... This sort of natural mathematical expression is possible with user-defined specifiers:
It is now possible to define a units-based system along with conversion operations to allow a meter type to be divided by a time type, and the result is a velocity type. I will most likely write another entry soon to document the full capabilities of this feature. I should also mention that Boost has a units library that provides very similar functionality for managing unit data with types. However, the user-defined specifiers are not part of the Boost implementation, it's just not possible without a change to the language.
Additional Context Information for Types
The other method to improve your type selection for your programs is to use typedef to create types that are descriptive and help indicate your intent. I have come across this simple for loop statement thousands of times:
Although there is nothing logically incorrect with the previous block, because of the type of index is declared as a signed integer, future problems could creep in over time with maintenance changes. In this next sample, I have added two modifications that are particularly risky when using a signed variable for indexing into an array. One of these examples modifies the index counter, which is always dangerous. The other change does not initialize the index explicitly, rather a function call a function call whose return value is not verified initializes the index. Both changes are demonstrated below:
The results could be disastrous. If the find call were to return a negative value, an out of bounds access would occur. This may not crash the application, however it could definitely corrupt memory in a subtle way. This creates a bug that is very difficult to track down because the origin of the cause is usually no where near the actual manifestation of the negative side-effects. The other possibility is that modifying the counting index could also result in a negative index based on how WorkOffset is defined. A corollary conclusion to take away from this example is that is it not good practice to modify the counter in the middle of an active loop.
If a developer was stubborn and wanted to keep the integer type for their index, the loop terminator test should be written to protect from spurious negative values from corrupting the data:
Since the indexing into the data array will always be positive, why not simply choose a type that is unsigned?! This will explicitly enforce the invariant > 0. Unless an explicit test is put in place to enforce a desired invariant, at best, it can only remain an assumption that the invariant will hold true. Assumptions leave the door open for risk to turn into problems. Here is a better definition of the loop above that now uses an unsigned integer type:
For this loop I have chosen the size_t type, which is defined in many header files, the most common is <cstddef>. The type, size_t, represents the largest addressable number on your compiled platform. Therefore if you have a 32-bit address space, size_t will be a 32-bit number. This type is also just a typedef, an alias, for whatever type is required to make it the appropriate size for your platform. This type is a portable way to specify sizes and use numbers that will succeed on different platforms. The very act of using this type also declares your intent to have a count or size of some item. While I was using the variable name index in the loop examples above, that only indicates what the variable is. The type size_t gives an extra bit of context information to indicate what can be expected from the use of the index.
Use a Meaningful Name for Types
Let's elaborate on the Improved Approach from the previous section. Let's consider the size_t type for just a moment. It's true purpose is to provide portability across platforms by defining a type based on the address size of the platform. However, we found a new use for it, represent any type of variable that is a count of size of something. This should also be considered a valid reason for declaring a new type. Consider this list of variables declared somewhere in a function. Is it immediately clear what they might be used for? Are all of the types chosen correctly?
Here is a simple example for how to add extra context to a set of variables that could easily be lost in the mix. Consider IP addresses and port id pairs. I have seen both of these variables be called a number of different names. In some cases, it seems just convenient to reassign a value to a variable and use that variable in a function call that makes no sense. This adds to the confusion that is experienced when trying to understand a block of logic. To prevent this sort of abuse, make the types seem special.
Now when this block of code is encountered, it may help keep control of what your variables are used for, and how they are used. Here is an example of a jumbled block of code with the new types:
Simplify Template Types
I recently wrote an entry on Template Meta-Programming, and I have been pleasently surprised how well it has been received. This includes the number of times the article has been read. Up until recently, I was starting to believe that developers had an aversion to the angle brackets < >. I know from historical experience that the first introduction of templates was not as smooth and portable as desired. Over the years the designers of C++ have recognized how templates could be improved. Now they are an indispensable part of C++.
There is still one thing that seems to vex many developers, the damn < >. There is a very simple solution to tuck those away and still benefit from the generality and power of templates, give them an alias with typedef.
C/C++ // Need to add template syntax highlighting typedef std::vector<int> IntVector; typedef std::map<int, std::string> StringMap;
Unfortunately it is not possible to typedef partially specialized templates until C++11 and greater. I am very grateful for this feature in the new specification of the language. Up until this point it has not been possible to create simplified typedef aliases for partially specified templates such as this::
C/C++ template <typename T, typename U, size_t SizeT> class CompoundArray; // This syntax is illegal in C++03 template <typename T> typedef CompoundArray<t , int, 10> CompoundIntArray;
The new standard now allows this alias to be defined with the using keyword for the template above to simplify usage:
C/C++ // The way to define a template alias with C++11 template <typename T> using CompoundIntArray = CompoundArray<t , int, 10> // The new usage: CompoundIntArray<double> object; // Creates an object instance equivalent to: CompoundArray<double , int, 10> object;
Continuous development in software is a cumulative effort. Every change that is made to the software is built upon the exist set of code. If the code is difficult to understand and modify to begin with, chances are that it will only continue to get worse. One way to simply and clarify your intentions is by choosing both meaning types and names for the variables that you use. We have been primarily taught to use variables to communicate intent. In this article I showed how it is possible to use the type system and the typedef operator to create aliases for your variable types. This gives you another tool to write clean and maintainable code.