Category: "general"

| (Absolute) Mentality |

general, communication, knowledge Send feedback »

When you know nothing, it's easy to think you know everything

I haven't yet decided if I think this is a universal flaw among people or that it is a trait more prone to the practitioners of our profession; I am referring to the absolute mentality that so many people seem to use when making decisions. This is a closed-minded approach to solving problems. This is the basic concept:

"There is one absolutely best practice, and you are an idiot if you don't subscribe to the same approach that I take."

It is important to leave your comfort zone to expand your abilities. You can increase your potential by learning to spot this cognitive bias, recognize when you are limiting your potential by falling prey to it, and learn to evaluate objectively as you make decisions in your day-to-day activities.

Full story »

Copyrights

general, knowledge 1 feedback »
Surprisingly, to me at least, as a software developer I have had a decent amount of experience relating to software patents and copyrights. I wanted to share my experiences, knowledge, and bring attention to current events.

Copyright Infringement

Let's start with my personal experience with copyright infringement.

Before I created my own site, I posted many articles at codeproject.com. I like to write on topics that I am interested in to learn them better. Around 2002 I wrote a basic implementation of the game Tetris to learn DirectX and similar technologies. I posted this on CodeProject.com.

The Tetris Company is a very litigious entity, and ownership of game itself is has been surrounded in controversy throughout its existence. In 2007 I received a letter from CodeProject informing me that they had to remove my article from their site because they received a DMCA copyright notice from The Tetris Company.

Then about nine months later, CodeProject sent me another letter indicating that if I were to remove all references to "Tetris" from my article and code, they could repost it on the site. They also included a few legal references for me to help educate myself on copyright law, which I will share in a moment.

After a bit of research I settled on a new name for my game, Quadrino. I removed references to the name "Tetris" and cleaned up my article. CodeProject then reposted it to their site, and I haven't been harrassed by The Tetris Company since then.

If you are interested, you can checkout Quadrino[^] at CodeProject. This version uses DirectX 7 and 8, but it still works. I have a port that I updated to use Direct 2D and added support for Xbox 360 controllers, however, I never polished it up enough to release it.

What Does a Copyright Protect?

(Disclaimer: The following is my understanding and experiences with copyright law. I'm not lawyer and the courts and legal system do not always seem to play out logically to me. Also, it seems to me that what you can prove in a court of law tends to be more valuable than the truth.)

It turns out that a copyright only protects the original expression of an idea, but not the idea itself (which really does lead to a lot of confusion and misinterpretations).

For example:

  • Books
  • Poems
  • Lyrics to a song (written or otherwise)
  • Written sheet music for a melody
  • A recorded version of the singer singing or a musician playing
  • Paintings
  • Sculptures
are all fairly straight-forward examples to understand as having copyright protection.

Other examples of creations that are protected:

  • Software Source Code as well as compiled binaries
  • Hardware design documents
  • Research Papers
  • Blog entries
  • Internet forum comments
  • Contracts
  • Technical Manuals
  • Parker Brothers's written rules to Monopoly

The name of the game Monopoly is trademarked (a different form of protection, which is also different from a "registered trademark"). The written rules to Monopoly have copyright protection, however, the concept of the game of Monopoly itself cannot be protected in any way. That is why you will see games similar to Monopoly from time to time. Such as a local city themed version of the game with local landmarks, events, celebrities. As long as they write their own version of the rules and avoid the name Monopoly, they can legally sell their game.

This is the aspect of copyrights that allowed me to change the name of my game and avoid any further infringement violations.

Then issues start to arise such as the "look and feel" of a product and so on.

And yes, works published on the Internet are publically accessible, however, they are not considered in the public domain, which means you still hold the copyright to whatever you post. Terms of service for a website may state that by posting content on their site that you give them ownership, a limited use copyright license, or many other things (damn fine print.)

How do you Copyright something?

Step 1: You create it

Congratulations! You have just completed the copyright process to the expression of your idea!

That's it!

There is only one step. You do not need to put a copyright symbol on the creative work, no date is required, and the "poor man's" copyright is a myth. That is, sending yourself a sealed copy of your work in the mail doesn't gain you anything (you'll actually be out the cost of the envelope and price of shipping, not to mention the "opportunity cost" of what you could have done with your time instead of mailing something to yourself).

Adding the symbols, date, signing with your name etc. helps establish ownership and disambiguate that you are claiming your copy rights. Otherwise, if you can prove that you are the creator of a work, then you automatically own the copyright of that work (and it's what you can prove in a court of law that actually matters.)

This is for works created after 1989, because copyright notices were required before this point. For more details on this, look up the Berne Convention Implementation Act, which makes copyright ownership automatic. If you created your work before 1989 and forgot to properly mark your creative work, you may still be able to claim it. You should consult a lawyer if it matters that much.

Fair Use

I am not going to go into full detail on this, but there is a concept of fair use on copyrights. For the purpose of reviews, references and citations you can use a portion of a creation that is under copy protection. You can also use this content for satire and parodies and to create derivative works.

Supreme Court decisions

  • 1994 Campbell v. Acuff-Rose Music, Inc. [Copyright - Fair Use - Parody]
  • 1984Sony Corp. of Am. v. Universal City Studios, Inc. [Copyright - Fair Use - Sale and Use of VCRs]

Derivative Works

Derivative works are a sticky issue. These works can be derivations of an existing work, but they must be more your work than the original. Beyond that basic notion, my understanding is limited. This is a very gray area. Hell, Google avoided a $9.2B lawsuit against Oracle that has been raging in our courts since 2011, because the jury ruled Google had Fair Use rights to create a derivative work. Many analysts are expecting Oracle to appeal. We'll have to wait and see what happens.

Digital Millennium Copyright Act(DMCA)

The Digital Millennium Copyright Act (DMCA) is a four-letter word for security researchers and hobbyists, especially section 1201. It was enacted in 1998 and was aimed at protecting the actors, authors, artists and musicians (more accurately studios, publishers, and recording companies) creative works (distributed content) from being illegally pirated on the Internet and other forms of digital media that began to evolve.

One of the clauses and subsequent side-effects of this law (in the United States) prohibits a lawful owner from reverse engineering anti-circumvention provisions in most cases. This has brought John Deere and auto manufacturer's into the spot light recently as they are trying to use this law to prevent security researchers from looking for vulnerabilities in their equipment and maintain a monopoly on the support and repair of these complex systems.

It's some of the side-effects of the DCMA that make me a little jumpy at the threat of being sued. The penalties could reach a fine of $5M and 5 years in prison. For this reason, the Electronic Frontier Foundation (EFF) is suing the federal government on behalf of Andrew Huang, and Matthew Green. You can read the press release made by the EFF here: EFF Lawsuit Takes on DMCA Section 1201: Research and Technology Restrictions Violate the First Amendment[^].

Wait! What are those sub-clauses in section 1201?

There are a number of sub-clauses in section 1201 that actually give owners of lawfully acquired (i.e., not pirated or stolen) copy written material, to reverse-engineer and circumvent the copyright protection mechanism in a few select instances:

  • f. Reverse Engineering for achieving interoperability
  • g. Encryption Research
  • i. Protection of Personally Identifying Information
  • j. Security Testing (Research)

I mentioned this to Matt Green through Twitter, and his response was:

Matthew Green's Response

Matt wrote a blog entry that details why he is doing this. You can read that here: Matthew Green's Statement on DMCA lawsuit[^]

After I read his blog post I asked myself this question:

Even with the law on my side, do I really want to risk getting taken to court by a mega-corporation with deep pockets?

My conclusion:

Nope!

Summary

Copyright and patent infringement are civil offenses and are likely only to become a concern for hackers if the goal is to duplicate and manufacture their own product for profit. Regardless of their moral view on if they are entitled to hack systems, violation of one of these IP legal protections is likely to only affect a hacker if their activities will end in a lawsuit and probable loss in an infringement case with the original manufacturer.

Otherwise, the criminal penalties for hacking are much more severe with penalties that could include both jail time and monetary fines. When the topic moves into espionage, a death sentence is even a potential outcome. Therefore, I doubt that any hackers (with the exception of corporate reverse-engineers) even consider the legal violations of IP protection that they are committing.

Ode to the Anagramic Poem

general, communication, C++ Send feedback »

Twitter is an... interesting way to spend one's time. It can be quite a challenge to cram a thought into 140 characters. However, I have found there are many ways to be entertained by this micro-blogging site. One of them includes interacting and learning from a wide variety of people. During an interaction with a creative-writer, I was a bit challenged by a poem.

Inspiration from Twitter

This is a brief entry that includes my implementation of that "Poem Anagram Generator."

The Original Poem

The first thing that is required, is the poem. You can find the original at what_adri_writes[^].

Here is an excerpt:

Stanley the fishmonger
told me how to know it
when I saw it
without a poet
And we fished out drowned Phlebas,
Patron saint of Unconsidered Phoenicians
and failed Changers of Minds.
The Highlander Art double-checked the veil
of Reichenbach Falls
three days later
and found
A beekeeper.

Strategy

When I thought about it, I realized it wouldn't take much to actually write a program to reorganize the words of the poem. Especially if I were to use the algorithms that are available in the C++ Standard Library. That is, of course, creating the naïve implementation that ignores such possibly desirable features such as sentence structure or proper grammar.

Then again, this is poetry!

So let's go ahead and throw those concerns out the window. This will be a post-modern avant-garde cyber-experience.

Tokenize

My original goal was to tokenize all of the words from the poem, then randomize with something like next_permutation. However, when I really started to look at the text I saw all of the punctuation. Did I want to remove the punctuation and just live with the words? Well then there are also the new-lines that give the text form and clues the reader in that "pssst, this is probably a poem"

So I decided that I would include both the punctuation and new-lines as tokens to be generated for the poem generator.

To do this I put a space between ever word, punctuation mark, and new-line in the poem; like so:

C++

const std::string poem( "Oh but I too want to be a poet ! \n "
                        "and yet \n "
                        "Tell me where is meaning bred \n "
                        "In my heart or in your head ? \n "
 
                         // omitted for brevity
 
                        "And so the poem \n "
                        "was not to be . \n");

Here is a simple function to add each token into a vector for future modification:

C++

typedef vector<string>    words_t;
 
//  ************************************************************
words_t tokenize(const string &poem)
{
  words_t words;
  size_t  next = 0;
 
  do
  {
    size_t cur = next;
    next = poem.find(' ', cur);
 
    size_t count = next == string::npos
                  ? string::npos
                  : next - cur;
    words.push_back(poem.substr(cur, count));
 
    if (next != string::npos)
      next += 1;
 
  } while (next != string::npos);
 
  return words;
}

If I missed a potential algorithm from the standard library that would perform this task I would be interested to learn how this function could be simplified.

The Generator

The generator code is found below. It contains three algorithms from the standard library; A random number generator, shuffle and copy. Then of course the call to tokenize.

You can run the code below to generate a new poem each time.

C++

//  ************************************************************
int main()
{
  // Tokenize the poem.
  words_t words(tokenize(poem));
 
  // Jumble the words.
  random_device rdev;
  mt19937 rng(rdev());
 
  shuffle(words.begin(), words.end(), rng);
 
  // Print the results.
  copy(words.begin(), words.end(), ostream_iterator<string>(cout, " "));
 
  cout << "\n";
 
  return 0;
}
Run this code

Output:

Twitter and Poetry...

Instant art!

Maybe to improve upon this I could pre-tokenize based on particular phrases.

Summary

Twitter is fun!

C++ is fun!

Combining Twitter and C++ makes poetry fun even for a left-brained analytic like myself.

If you end up generating an interesting poem, post it in the comments.

Segmented Computer Memory

general, beginner Send feedback »

I wanted to start writing about secure coding practices as well as more instructive posts related to security topics such as encryption and hacking. You probably already have a conceptual understanding of things like the "stack", "heap" and "program counter". However, it's difficult to have concrete discussions regarding security unless you have a solid grasp on the computer memory model. This post is intended to provide a concrete foundation of the memory model, and my future posts related to security will build on this foundation.

It is easy to take for granted the complexity of computer memory because of the many layers of abstraction that programmers work through today. The same basic memory design has existed for all computers that use a paged memory structure since the early 60's. These are some of the areas where the knowledge of the memory layout plays a crucial role in application portability and embedded resources, program security, code optimization. The diagrams I present will also help you understand where the different activities occur in a program during runtime.

Full story »

Learn Hexadecimal and Binary

general, beginner 1 feedback »

The binary, octal and hexadecimal number systems pervade all of computing. Every command is reduced to a sequence of strings of 1s and 0s for the computer to interpret. These commands seem like noise, garbage, especially with the sheer length of the information. Becoming familiar with binary and other number systems can make it much simpler to interpret the data.

Once you become familiar with the relationships between the number systems, you can manipulate the data in more convenient forms. Your ability to reason, solve and implement solid programs will grow. Patterns will begin to emerge when you view the raw data in hexadecimal. Some of the most efficient algorithms are based on the powers of two. So do yourself a favor and become more familiar with hexadecimal and binary.

Number Base Conversion and Place Value

I am going to skip this since you can refer to my previous entry for a detailed review of number system conversions[^].

Continuous Data

Binary and hexadecimal are more natural number systems for use with computers because they both have a base of 2 raised to some exponent; binary is 21 and hexadecimal is 24. We can easily convert from binary to decimal. However, decimal is not always the most useful form. Especially when we consider that we don't always have a nice organized view of our data. Learning to effectively navigate between the formats, especially in your head, increases your ability to understand programs loaded into memory, as well as streams of raw data that may be found in a debugger or analyzers like Wireshark.

The basic unit of data is a byte. While it is true that a byte can be broken down into bits, we tend to work with these bundled collections and process bytes. Modern computer architectures process multiple bytes, specifically 8 for 64-bit computers. And then there's graphics cards, which are using 128, 192 and even 256-bit width memory accesses. While these large data widths could represent extremely large numbers in decimal, the values tend to have encodings that only use a fraction of the space.

Recognize Your Surroundings

What is the largest binary value that can fit in an 8-bit field?

It will be a value that has eight, ones: 1111 1111. Placing a space after every four bits helps with readability.

What is the largest hexadecimal value that can fit in an 8-bit field?

We can take advantage of the power of 2 relationship between binary and hexadecimal. Each hexadecimal digit requires four binary bits. It would be very beneficial to you to commit the following table to memory:

Dec 0123 4567 891011 12131415
Bin 0000000100100011 0100010101100111 1000100110101011 1100110111101111
Hex 0x00x10x20x3 0x40x50x60x7 0x80x90xA0xB 0xC0xD0xE0xF

Now we can simply take each grouping of four bits, 1111 1111, and convert them into hex-digits, FF.

What is the largest decimal value that can fit in an 8-bit field? This isn't as simple, we must convert the binary value into decimal. Using the number base conversion algorithm, we know that the eighth bit is equal to 28, or 256. Since zero must be represented the largest value is 28 - 1, or 255.

Navigating the Sea of Data

Binary and hexadecimal (ok, and octal) are all number systems whose base is a power of two. Considering that computers work with binary, the representation of binary and other number systems that fit nicely into the logical data blocks, bytes, become much more meaningful. Through practice and the relatively small scale of the values that can be represented with 8-bits, converting a byte a to decimal value feels natural to me. However, when the data is represented with 2, 4 and 8 bytes, the values can grow quickly, and decimal form quickly loses its meaning and my intuition becomes useless as to what value I am dealing with.

For example:

What is the largest binary value that can fit in an 16-bit field?

It will be a value that has sixteen, ones: 1111 1111 1111 1111.

What is the largest hexadecimal value that can fit in an 16-bit field? Again, let's convert each of the blocks of four-bits into a hex-digit, FFFF.

What is the largest decimal value that can fit in an 16-bit field?

Let's see, is it 216 - 1, so that makes it 65355, 65535, or 65555, it's somewhere around there.

Here's a real-world example that I believe many people are familiar with is the RGB color encoding for pixels. You could add a fourth channel to represent an alpha channel an encode RGBA. If we use one-byte per channel, we can encode all four channels in a single 32-bit word, which can be processed very efficiently.

Imagine we are looking at pixel values in memory and the bytes appear in this format: RR GG BB. It takes two hexadecimal digits to represent a single byte. Therefore, the value of pure green could be represented as 00 FF 00. To view this same value as a 24-bit decimal, is much less helpful, 65,280‭.

If we were to change the value to this, 8,388,608‬, what has changed? We can tell the value has changed by roughly 8.3M. Since a 16-bit value can hold ~65K, we know that the third byte has been modified, and we can guess that it has been increased to 120 or more (8.3M / 65K). But what is held in the lower two bytes now? Our ability to deduce information is not much greater than an estimate. The value in hexadecimal is 80 00 00.

The difference between 8,388,608‬ and 8,388,607‬ are enormous with respect to data encoded at the byte-level:

8,388,608‬‬ 8,388,607‬‬‬
00 80 00 00 7F FF
 

Now consider that we are dealing with 24-bit values in a stream of pixels. For every 12-bytes, we will have encoded 4 pixels. Here is a representation of what we would see in the data as most computer views of memory are grouped into 32-bit groupings:

‭4,260,948,991‬‬   ‭2,568,312,378‬‬‬   ‭3,954,253,066
FD F8 EB FF   99 15 56 3A   8C B1 1D 0A
     

Binary

I typically try to use binary only up to 8-bits. Anything larger than that, I simply skip the first 8-bits (1-byte), and focus on the next 8-bits. For example: 1111 1111 1111 1111. As I demonstrated with the RGB color encoding, values do not always represent a single number. In fact, it is a stream of data. So whether there is one byte, four bytes, or a gigabytes worth of data, we usually process it either one byte or one word at a time. We actually break down decimal numbers into groups by using a comma (or some other regional punctuation) to separate thousands, e.g. ‭4,294,967,295, 294 of what? Millions.

Binary manipulation can be found in many contexts. One of the most common is storing a collection of flags in an unsigned buffer. These flags can be flipped on and off with the Boolean flag operations of your programming language. Using a mask with multiple bits allows an enumerated value with more than two options to be encoded within the binary buffer. I'm not going to go to describe the mechanics here, I simply want to demonstrate that data is encoded in many ways, and there are many reasons for you to become proficient at picking apart byte-streams down to the bit.

Hexadecimal

It is just as beneficial to be able to convert between hexadecimal and decimal in your head from 1 to 255, especially if you ever deal with selecting colors for web-pages or you edit images in programs like Photoshop. It only takes two hex-digits to represent an 8-bit byte. If you memorize the values that are defined by the high-order hex-digit, reading hexadecimal byte values becomes almost as easy as reading decimal values. There are two tables listed below. The table on the left indicates the value mapping for the high-order and low-order hex-digits of a byte. The table on the right contains a set of landmark values that you will most certainly encounter and find useful:

161Decimal  160Decimal
0x0000x00
0x10160x11
0x20320x22
0x30480x33
0x40640x44
0x50800x55
0x60960x66
0x701120x77
0x801280x88
0x901440x99
0xA01600xA10
0xB01760xB11
0xC01920xC12
0xD02080xD13
0xE02240xE14
0xF02400xF15
Decimal  Hexadecimal
320x20
640x40
1000x64
1270x7F
1280x80
1680xA8
1920xC0
2000xC8
2240xE0
2390xEF
2500xFA
 
 
 
 
 

Some of the numbers listed in the table on the right are more obvious than others for why I chose to included them in the map. For example, 100, that's a nice round number that we commonly deal with daily. When you run into 0x64, now you can automatically map that in your mind to 100 and have a reference-point for its value. Alternatively, if you have a value such as 0x6C, you could start by calculating the difference: 0x6C - 0x64 = 8; add that to 100 to know the decimal value is 108.

Some of you will recognize the values 127, 168, 192, 224 and 238. For those that do not see the relevance of these values, they are common octets found in landmark network IP addresses. The table below shows the name and common dotted-decimal for as well as the hexadecimal form of the address in both big-endian and little-endian format:

Name   IPv4 Address   
Hexadecimal
 big-endianlittle-endian
local-host127.0.0.10x7F0000010x0100007F
private range192.168.x.x0xC0A800000x0000A8C0
multicast base address224.0.0.00xE00000000x000000E0
multicast last address239.255.255.2550xEFFFFFFF0xFFFFFFEF

One additional fact related to the IPv4 multicast address range, is the official definition declares any IPv4 address with the leading four bits set to 1110 to be a multicast address. 0xE is the hex-digit that maps to binary 1110. This explains why the full multicast range of addresses is from 0xE0000000 to 0xEFFFFFFF, or written in decimal dotted notation as 224.0.0.0 to 239.255.255.255.

Octal

I'm sure octal has uses, but I have never ran into a situation that I have used it.

Actually, there is one place, which is to demonstrate this equality:

Oct 31 = Dec 25

Landmark Values

Binary has landmark values similar to the way decimal does. For example, 1's up to 10's, 10's through 100, then 1000, 1M, 1B ... These values are focused on powers of 10, and arranged in groups that provide a significant range to be meaningful when you round-off or estimate. Becoming proficient with the scale of each binary bit up to 210, which is equal to 1024, or 1K. At this point, we can use the powers of two in multiple contexts; 1) Simple binary counting, 2) measuring data byte lengths.

I have constructed a table below that shows landmark values with the power of 2. On the left I have indicated the name of the unit if we are discussing bits and the size of a single value; for example, 8-bits is a byte. I haven't had the need to explore data values larger than 64-bits yet, but it's possible that some of you have. To the right I have indicated the units of measure used in computers when we discuss sizes. At 1024 is a kilobyte, and 1024 kilobytes is officially a megabyte (not if you're a hard drive manufacturer...). I continued the table up through 280, which is known as a "yotta-byte." Becoming familiar with the landmarks up to 32-bits is probably enough for most people. To the far right I converted the major landmark values into decimal to give you a sense of size and scale for these numbers.

Unit
(bits)
Binary
Exponent
Unit
(bytes)
  Place
Value
  Largest
Value
Bit (b)20Byte (B) 1
21Word 2
22Double-word 4
Nibble23Quad-word 8
24 16
25 32
26 64
Byte (B)27 128255
28 256
29 512
210Kilobyte (KB) 1024
220Megabyte (MB) 102421,048,576
230Gigabyte (GB) 102431,073,741,824
Double-word231 10243·24,294,967,295
232 10243·22
240Terabyte (TB) 10244
250Petabyte (PB) 10245
260Exabyte (EB) 10246
Quad-word263 10246·239,223,372,036,854,775,807
270Zettabyte (ZB) 10247
280Yottabyte (YB)10248

Summary

Looking at a large mass of symbols, such as a memory dump from a computer, can appear to be overwhelming. We do have tools that we use to develop and debug software help us organize and make sense of this data. However, these tools cannot always display this data in formats that are helpful for particular situations. In these cases, understanding the relationships between the numbers and their different representations can be extremely helpful. This is especially true when the data is encoded at irregular offsets. I presented a few common forms that you are likely to encounter when looking at the values stored in computer memory. Hopefully you can use these identity and navigation tips to improve your development and debugging abilities.

Number Base Conversion

general, beginner, math 5 feedbacks »

I have started writing an entry that discusses the value of becoming familiar with the binary (base-2) and hexadecimal (base-16) number systems because they are generally more useful to a programmer than decimal (base-10). My daughter is currently in high-school and she is taking a programming course. One of the things that she is currently learning how to do is count in binary. So I decided to expand my explanation of conversion between number systems as a reference for her and all of those who would like a refresher. The entry following this one will describe how binary and hexadecimal will make you a more effective programmer.

Place Value: coefficient·radixplace

To be able to convert numbers between different numerical bases, it is important to review the concept of place value. Each place, or column, in a number represents a value equal to number system base raised to the power of its place index starting at 0. The official name of the base-value is the radix. For example, consider the first three place values for a number system that has a radix of b.

b2+b1+b0

If we are dealing with decimal (base-10), the radix = 10 and we would have place values of:

102 + 101 + 100

Now we have the 1's column, 10's column, and 100's column:

100 + 10 + 1

The number stored at each column in the number is called the coefficient. To construct the final number, we multiply the coefficient by its place-value and add the results at each place together. Here is the decimal number 237 broken down by place value:

2·b2 +3·b1 +7·b0

2·102 + 3·101 + 7·10

200 + 30 + 7

237

Hopefully decimal form is so natural to you that 237 seems like a single number, rather than the sum of place values that are multiplied by their coefficient.

The Binary Number System

There are 10 types of people, those that understand binary and those that don't

If you are wondering what the eight other types of people are, continue reading.

Binary is a base-2 number system. This means that each column in a binary number represents a value of 2 raised to the power of its place index. A number system requires a number of symbols to represent each place value that is equal to its Base value, and zero is always the first symbol to include in this set. For instance, decimal (base ten) requires ten symbols: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Therefore, binary (base two) only requires two symbols: {0, 1}.

Adding the radix as a subscript after a number is a common notation to indicate the base-value of a number when there is a possibility for confusion. Unless the context specifically indicates a different number system, decimal form is assumed. For example, the value 2 in binary:

102 = 210

If we had used that notation in the joke at the beginning... well, it just wouldn't have been as funny.

Another place that you will see the subscript notation is when you study logarithms:

logax = logbx / logba

I'll leave the details for logarithms for another time.

Counting in Binary

When learning anything new, it can be helpful to map something that you already know to the new topic. For a new number system, counting with both number systems can be a helpful exercise. Counting in all number systems uses the same process:

  1. Start with zero in the least significant column
  2. Count up until you have used all of the symbols in increasing order in the least significant, 1's, column
  3. When the limit is reached, increment the value in the next column, and reset the current column to zero.
  4. If the next column has used all of the symbols, increment the column after that and reset the current column.
  5. Once no further columns reach their limit, return to step 2 to continue counting.

Starting with decimal, if we count up from zero to 9, we get:

0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 ...roll over

We are now at step 3, we have reached the limit, so we increment the next column from an implied 0 to 1, and reset the current column for the result of:

10

Continuing to count, we increment the 1's column and rolling over the successive columns as necessary:

11 - 12 - 13 ... 98 - 99 ... roll over

100

Here is 0-15 in binary. When working with computers, and binary in general, you will typically see zeroes explicitly written for the more significant columns. We require 4 binary digits to represent 15.

Binary  Sum of Columns  Decimal
00000 + 0 + 0 + 00
00010 + 0 + 0 + 11
00100 + 0 + 2 + 02
00110 + 0 + 2 + 13
01000 + 4 + 0 + 04
01010 + 4 + 0 + 15
01100 + 4 + 2 + 06
01110 + 4 + 2 + 17
10008 + 0 + 0 + 08
10018 + 0 + 0 + 19
10108 + 0 + 2 + 010
10118 + 0 + 2 + 111
11008 + 4 + 0 + 012
11018 + 4 + 0 + 113
11108 + 4 + 2 + 014
11118 + 4 + 2 + 115

The Hexadecimal Number System

Hexadecimal is a base-16 number system. Therefore, we will need sixteen symbols to represent the place values. We can start with the ten numbers used in decimal, and we use letters of the alphabet to represent the remaining six symbols. Although letter-case can matter in programming, the letters used in hexadecimal are case-insensitive. Here is a mapping of the hexadecimal values to decimal:

Decimal:  { 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 }
Hexadecimal:  { 0 1 2 3 4 5 6 7 8 9 A B C D E F }

Number Base Conversion

Let's discuss how to convert between number systems with different base values. Specifically, we will describe how to convert from "Decimal to Basex" and "Basex to Decimal".

Decimal to Basex

Here is the algorithm to convert a number in decimal to a different base:

  1. Divide the decimal number by the radix of the target base
  2. The remainder from step 1 becomes the value for the current column.
  3. Use the quotient (answer) from step 1 as the decimal value to calculate the next column.
  4. Return to step 1 and repeat until the quotient is zero.

Let's return to the number 237 and convert it to a binary number:

decimal:237
radix:2
DecimalRadix  Quotient  Remainder
237/2  118  1 (20)
118/2  59  0 (21)
59/2  29  1 (22)
29/2  14  1 (23)
14/2  7  0 (24)
7/2  3  1 (25)
3/2  1  1 (26)
1/2  0  1 (27)
binary:11101101

Here is 237 converted to a hexadecimal number:

decimal:237
radix:16
DecimalRadix  Quotient  Remainder
237/16  14  D16 (13) (160)
14/16  0  E16 (14) (161)
hexadecimal:ED16

A common notation to represent hexadecimal when dealing with computers and in programming languages themselves, is to prepend an 'x' in front of the number like so: xED.

Here is one more decimal-to-hexadecimal example:

decimal:3,134,243,038
radix:16
DecimalRadix  Quotient  Remainder
3,134,243,038/16  195,890,189  E (14) (160)
195,890,189/16  12,243,136  D (13) (161)
12,243,136/16  765,196  0 ( 0) (162)
765,196/16  47,824  C (12) (163)
47,824/16  2,989  0 ( 0) (164)
2,989/16  186  D (13) (165)
186/16  11  A (10) (166)
11/16  0  B (11) (167)
hexadecimal:xBAD0C0DE

Basex to Decimal

Actually, I have already demonstrated how to convert a number from a base different than ten, into decimal. Once again, here is the complete formula, where cx represents the coefficients at each place-column.

cn·bn + ... + c2·b2 + c1·b1 + c0·b0

As an example, let's convert the binary answer back into decimal:

1·27 + 1·26 + 1·25 + 0·24 + 1·23 + 1·22 + 0·21 + 1·20

1·128 + 1·64 + 1·32 + 0·16 + 1·8 + 1·4 + 0·2 + 1·1

128 + 64 + 32 + 8 + 4 + 1

237

Basex to Basey

Is it possible to convert a number from Basex directly to Basey without converting to decimal (base-10) first?

Yes, however, you will need to perform all of your math operations in either Basex or Basey. The algorithms that I have presented are performed with base-10 since that is the number system most people are familiar with.

Demonstration

Here is a short demonstration program to convert a decimal number into a value of any numeral base between 2-36.

Why between those two ranges?

Try to imagine how a base-1 number system would work with only the symbol {0} to work with. Alternatively, we can combine the numerical set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} with the alphabetic set: {A, B, C ... X, Y, Z} to create a set of 36 symbols.

C++

// The set of possible symbols for representing other bases.
const char symbols[] = {'0', '1', '2', '3', '4', '5',
                        '6', '7', '8', '9', 'A', 'B',
                        'C', 'D', 'E', 'F', 'G', 'H',
                        'I', 'J', 'K', 'L', 'M', 'N',
                        'O', 'P', 'Q', 'R', 'S', 'T',
                        'U', 'V', 'W', 'X', 'Y', 'Z'};

Here is the base-conversion algorithm from above written in C++.

C++

void ConvertToBase( const unsigned long decimal,
                    const unsigned long radix)
{
  unsigned long remainder[32] = {0};
  unsigned long quotient = decimal;
  unsigned char place    = 0;
 
  while (0 != quotient)
  {
    unsigned long value = quotient;
    remainder[place] = value % radix;
    quotient = (value - remainder[place]) / radix;
 
    place++;
  }
 
  cout << decimal << " in base " << radix << " is ";
 
  for (unsigned char index = 1; index <= place; index++)
  {
    cout << symbols[remainder[place - index]];
  }
}

The values are from the examples above. You can modify the values that are used with the function calls to ConvertToBase in the program below:

C++

int main(int argc, char* argv[])
{
  ConvertToBase(237,        2);
  ConvertToBase(237,        10);
  ConvertToBase(237,        16);
  ConvertToBase(3134243038, 16);
  ConvertToBase(3134243038, 36);
 
  return 0;
}
Run this code

Output:

237 in base 2 is 11101101
237 in base 10 is 237
237 in base 16 is ED
3134243038 in base 16 is BAD0C0DE
3134243038 in base 36 is 1FU1PEM

Summary

Ever since you learned to count in decimal you have been using exponents with a base-10; it's just that no one ever made a big deal of this fact. To use a different numeral system, such as binary (base-2) or hexadecimal (base-16), you simply need to determine a set of symbols to represent the values and you can use the same counting rules that you use in base-10, except that you have a different number of symbols. Converting between any number system is possible, however, it is simplest to convert to decimal first if you want to continue to use base-10 arithmetic operations.

Why Does a CS Degree Require So Much Math?

general, CodeProject, engineering Send feedback »

Over the years I have heard this question or criticism many times:

Why is so much math required for a computer science degree?

I never questioned the amount of math that was required to earn my degree. I enjoy learning, especially math and science. Although, a few of the classes felt like punishment. I remember the latter part of the semester in Probability was especially difficult at the time. Possibly because I was challenged with a new way of thinking that is required for these problems, which can be counter-intuitive.

Full story »

Accidental Complexity

general, CodeProject, maintainability, design Send feedback »

Accidental complexity is the entropy that exists in your system that is possible to eliminate. The opposite of this is essential complexity; the parts of a system that are required and cannot be simplified. These two concepts were discussed by Fred Brooks in his essay No Silver Bullet -- Essence and Accidents of Software Engineering.. Many systems today are extremely complex, and any effort that can be done to eliminate complexity, should be.

Full story »

Make Your Comments Matter

general, communication, CodeProject, maintainability Send feedback »

There are many different philosophies with regards to how source code should be commented. The gamut of these philosophies range from "Every single statement must have a comment." to "Comments are useless; avoid them at all costs!" I am not even going to attempt to explain the disparity of range. However, I will attempt to address and suggest potential solutions the reasons that are often cited for these extreme approaches to commenting code.

Full story »

Preparing to Know Modern C++

general, CodeProject, C++, maintainability Send feedback »

"To know and not do, is to not yet know"

This Zen mantra has been the signature that I have placed at the end of every entry since I started this blog. This mantra is the impetus of this entry, my decision to know how to use Modern C++.

Full story »

Contact / Help. ©2017 by Paul Watt; Charon adapted from work by daroz. blog software / web hosting / monetize.
Design & icons by N.Design Studio. Skin by Tender Feelings / EvoFactory.