Alchemy is a collection of independent library components that specifically relate to efficient lowlevel constructs used with embedded and network programming.
The latest version of Embedded Alchemy[^] can be found on GitHub.The most recent entries as well as Alchemy topics to be posted soon:
 Steganography[^]
 Coming Soon: Alchemy: Data View
I just completed my Masters Degree in Cybersecurity at Johns Hopkins University. I plan to resume Alchemy's development. I plan to use my newly acquired knowledge to add constructs that will help improve the security of devices built for the Internet of (Insecure) Things.
Copyright Infringement
Let's start with my personal experience with copyright infringement.
Before I created my own site, I posted many articles at codeproject.com. I like to write on topics that I am interested in to learn them better. Around 2002 I wrote a basic implementation of the game Tetris to learn DirectX and similar technologies. I posted this on CodeProject.com.
The Tetris Company is a very litigious entity, and ownership of game itself is has been surrounded in controversy throughout its existence. In 2007 I received a letter from CodeProject informing me that they had to remove my article from their site because they received a DMCA copyright notice from The Tetris Company.
Then about nine months later, CodeProject sent me another letter indicating that if I were to remove all references to "Tetris" from my article and code, they could repost it on the site. They also included a few legal references for me to help educate myself on copyright law, which I will share in a moment.
After a bit of research I settled on a new name for my game, Quadrino. I removed references to the name "Tetris" and cleaned up my article. CodeProject then reposted it to their site, and I haven't been harrassed by The Tetris Company since then.
If you are interested, you can checkout Quadrino[^] at CodeProject. This version uses DirectX 7 and 8, but it still works. I have a port that I updated to use Direct 2D and added support for Xbox 360 controllers, however, I never polished it up enough to release it.
What Does a Copyright Protect?
(Disclaimer: The following is my understanding and experiences with copyright law. I'm not lawyer and the courts and legal system do not always seem to play out logically to me. Also, it seems to me that what you can prove in a court of law tends to be more valuable than the truth.)
It turns out that a copyright only protects the original expression of an idea, but not the idea itself (which really does lead to a lot of confusion and misinterpretations).
For example:
 Books
 Poems
 Lyrics to a song (written or otherwise)
 Written sheet music for a melody
 A recorded version of the singer singing or a musician playing
 Paintings
 Sculptures
Other examples of creations that are protected:
 Software Source Code as well as compiled binaries
 Hardware design documents
 Research Papers
 Blog entries
 Internet forum comments
 Contracts
 Technical Manuals
 Parker Brothers's written rules to Monopoly
The name of the game Monopoly is trademarked (a different form of protection, which is also different from a "registered trademark"). The written rules to Monopoly have copyright protection, however, the concept of the game of Monopoly itself cannot be protected in any way. That is why you will see games similar to Monopoly from time to time. Such as a local city themed version of the game with local landmarks, events, celebrities. As long as they write their own version of the rules and avoid the name Monopoly, they can legally sell their game.
This is the aspect of copyrights that allowed me to change the name of my game and avoid any further infringement violations.
Then issues start to arise such as the "look and feel" of a product and so on.
And yes, works published on the Internet are publically accessible, however, they are not considered in the public domain, which means you still hold the copyright to whatever you post. Terms of service for a website may state that by posting content on their site that you give them ownership, a limited use copyright license, or many other things (damn fine print.)
How do you Copyright something?
Step 1: You create it
Congratulations! You have just completed the copyright process to the expression of your idea!
That's it!
There is only one step. You do not need to put a copyright symbol on the creative work, no date is required, and the "poor man's" copyright is a myth. That is, sending yourself a sealed copy of your work in the mail doesn't gain you anything (you'll actually be out the cost of the envelope and price of shipping, not to mention the "opportunity cost" of what you could have done with your time instead of mailing something to yourself).
Adding the symbols, date, signing with your name etc. helps establish ownership and disambiguate that you are claiming your copy rights. Otherwise, if you can prove that you are the creator of a work, then you automatically own the copyright of that work (and it's what you can prove in a court of law that actually matters.)
This is for works created after 1989, because copyright notices were required before this point. For more details on this, look up the Berne Convention Implementation Act, which makes copyright ownership automatic. If you created your work before 1989 and forgot to properly mark your creative work, you may still be able to claim it. You should consult a lawyer if it matters that much.
Fair Use
I am not going to go into full detail on this, but there is a concept of fair use on copyrights. For the purpose of reviews, references and citations you can use a portion of a creation that is under copy protection. You can also use this content for satire and parodies and to create derivative works.
Supreme Court decisions
 1994 Campbell v. AcuffRose Music, Inc. [Copyright  Fair Use  Parody]
 1984Sony Corp. of Am. v. Universal City Studios, Inc. [Copyright  Fair Use  Sale and Use of VCRs]
Derivative Works
Derivative works are a sticky issue. These works can be derivations of an existing work, but they must be more your work than the original. Beyond that basic notion, my understanding is limited. This is a very gray area. Hell, Google avoided a $9.2B lawsuit against Oracle that has been raging in our courts since 2011, because the jury ruled Google had Fair Use rights to create a derivative work. Many analysts are expecting Oracle to appeal. We'll have to wait and see what happens.
Digital Millennium Copyright Act(DMCA)
The Digital Millennium Copyright Act (DMCA) is a fourletter word for security researchers and hobbyists, especially section 1201. It was enacted in 1998 and was aimed at protecting the actors, authors, artists and musicians (more accurately studios, publishers, and recording companies) creative works (distributed content) from being illegally pirated on the Internet and other forms of digital media that began to evolve.
One of the clauses and subsequent sideeffects of this law (in the United States) prohibits a lawful owner from reverse engineering anticircumvention provisions in most cases. This has brought John Deere and auto manufacturer's into the spot light recently as they are trying to use this law to prevent security researchers from looking for vulnerabilities in their equipment and maintain a monopoly on the support and repair of these complex systems.
It's some of the sideeffects of the DCMA that make me a little jumpy at the threat of being sued. The penalties could reach a fine of $5M and 5 years in prison. For this reason, the Electronic Frontier Foundation (EFF) is suing the federal government on behalf of Andrew Huang, and Matthew Green. You can read the press release made by the EFF here: EFF Lawsuit Takes on DMCA Section 1201: Research and Technology Restrictions Violate the First Amendment[^].
Wait! What are those subclauses in section 1201?
There are a number of subclauses in section 1201 that actually give owners of lawfully acquired (i.e., not pirated or stolen) copy written material, to reverseengineer and circumvent the copyright protection mechanism in a few select instances:
 f. Reverse Engineering for achieving interoperability
 g. Encryption Research
 i. Protection of Personally Identifying Information
 j. Security Testing (Research)
I mentioned this to Matt Green through Twitter, and his response was:
Matt wrote a blog entry that details why he is doing this. You can read that here: Matthew Green's Statement on DMCA lawsuit[^]
After I read his blog post I asked myself this question:
Even with the law on my side, do I really want to risk getting taken to court by a megacorporation with deep pockets?
My conclusion:
Nope!
Summary
Copyright and patent infringement are civil offenses and are likely only to become a concern for hackers if the goal is to duplicate and manufacture their own product for profit. Regardless of their moral view on if they are entitled to hack systems, violation of one of these IP legal protections is likely to only affect a hacker if their activities will end in a lawsuit and probable loss in an infringement case with the original manufacturer.
Otherwise, the criminal penalties for hacking are much more severe with penalties that could include both jail time and monetary fines. When the topic moves into espionage, a death sentence is even a potential outcome. Therefore, I doubt that any hackers (with the exception of corporate reverseengineers) even consider the legal violations of IP protection that they are committing.
A software library provides no value if it does not simplify the task of creating your application. At the very least we would like to show that the library contains all of the tools required to complete the intended goal. Ideally, the library is complete, easy to use, and is efficient. The only way to learn how well the library is designed and implemented is to use it.
Furthermore, it is useful and sometimes necessary to provide an exemplar for others to see how the library is intended to be used. The Steganography sample program included with Alchemy is this exemplar. I chose steganography to demonstrate that Alchemy is much more useful than the serialization of data for networking. In the process of developing this application I discovered some painpoints with the library and added tools to Alchemy to eliminate this pain.
Steganography
What is steganography?
Steganography is the hiding of messages within plainsight. This should not be confused with "Stenography," which is the recording of dictation. Steganography can be performed in may ways. Normal words can be given special meaning and included within a message that appears to be mundane. The location of words relative to others in the message can have a significant meaning. The second letter of every other word can be extracted to form the message. The possibilities are endless.
The form of steganography that I have implemented with Alchemy embeds a text message within a bitmap image. This can be achieved by taking advantage of the fact that the loworder bits for the color channels in an image affect the final color much less compared to the highorder bits.
The table below shows a sample for each color channel, with and without the two lowerbits set. The row with binary indicates the values of the four lowerbits for each 8bit color. For demonstration purposes, the alpha channel is represented with grayscale.
Red  Green  Blue  Alpha  
FF  FC  FF  FC  FF  FC  FF  FC  
1111 
1100 
1111 
1100 
1111 
1100 
1111 
1100 

Compare this to the result if we substitute only the single highbit for each color channel:
Red  Green  Blue  Alpha  
7F  FF  7F  FF  7F  FF  7F  FF  
0111 
1111 
0111 
1111 
0111 
1111 
0111 
1111 

The only caveat is the image should have a sufficient amount of entropy, otherwise the noise added by the encoded data may become visible; if not to a human, then most certainly to computer searching for such anomalies. Photographs with a range of gradients are good candidates for this form of steganography.
Why Use Steganography as a Sample?
Through the development of the base set of features for Alchemy, I focused solely on the serializing of data for network data transfer protocols. However, Alchemy is a flexible serialization library that is not restricted to network communication. Portable file formats also require serialization capabilities similar to the capabilities found in Alchemy. To this end, loading and storing a bitmap from a file is a good serialization task; bitmaps are relatively easy to acquire, and the format is simple enough to be implemented in a small sample program.
I wanted to keep the program simple. Writing a portable network communication program is not simple; especially since Alchemy does not provide functionality directly related to network communication. I also felt that if I were to use a network related exemplar, potential user of Alchemy would assume it can only be used for network related tasks. Moreover, I did not want to add extra support code to the application that would hide or confuse the usage of Alchemy.
Strategy
In keeping with simplicity, the sample program requires 32bit bitmaps. For this type of encoding, there are four color channels (Red, Green, Blue, and Alpha) for each pixel, where each channel is onebyte in size. We will encode a onebyte of data within each pixel. To accomplish this, we will assign twobits of the encoded byte into the two lowerbits of each color channel. This results in a 25% encoding rate within the image.
Consider an example where we combine the orange color 0xFF9915 with the letter i
, 0x69:
Channel 1  Channel 2  Channel 3  Channel 4  
Input  0xFF  0x99  0x15  0x00  
Value  1111 
1111 
1001 
1001 
0001 
0101 
0000 
0000 

Data  01 
10 
10 
01 

Result  1111 
1101 
1001 
1010 
0001 
0110 
0000 
0001 

Output  0xFD  0x9A  0x16  0x01 
This is not a very complex encoding strategy. However, it will allow me to demonstrate the serialization of data for both input and output, as well as the packeddata bit (bitfield) functionality provided by Alchemy.
Bitmap Format
The bitmap file format has many different definitions. The variety of formats are a result of its inception on IBM's OS/2 platform, migration to Windows, and evolution through the years. Additionally, the format allows for an index 8bit color table, RunLength Encoded (RLE) compression, gamma correction, color profiles and many other features.
The sample application simply uses the bitmap format introduced with Windows 3.0. It contains a file header that indicates the file is of type BITMAP, a bitmap information section, and the pixel data. The Alchemy definitions for each section are found below. These definitions provide the fundamental structure for the data; the goal was to provide a tablebased definition that looks very similar to the definition of a struct. This declaration is also for generating the majority of the serialization logic for Alchemy:
File Header
The bitmap file header is a short constructor that is only 14bytes large. The first two bytes will contain the letters "BM" to indicate that this is a bitmap. The length of the file, and the offset to the first pixel data are also encoded in this structure:
C++
// *************************************************************  
ALCHEMY_STRUCT(bitmap_file_header_t,  
ALCHEMY_DATUM(uint16_t, type),  
ALCHEMY_DATUM(uint32_t, length),  
ALCHEMY_DATUM(uint16_t, reserved_1),  
ALCHEMY_DATUM(uint16_t, reserved_2),  
ALCHEMY_DATUM(uint32_t, offset)  
) 
Bitmap Information Header
The bitmap information section is 40bytes of data that defines the dimensions and colordepth of the encoded bitmap:
C++
// *************************************************************  
ALCHEMY_STRUCT(bitmap_info_header_t,  
ALCHEMY_DATUM(uint32_t, size),  
ALCHEMY_DATUM(int32_t, width),  
ALCHEMY_DATUM(int32_t, height),  
ALCHEMY_DATUM(uint16_t, planes),  
ALCHEMY_DATUM(uint16_t, bit_depth),  
ALCHEMY_DATUM(uint32_t, compression),  
ALCHEMY_DATUM(uint32_t, sizeImage),  
ALCHEMY_DATUM(int32_t, x_pixels_per_meter),  
ALCHEMY_DATUM(int32_t, y_pixels_per_meter),  
ALCHEMY_DATUM(uint32_t, color_count),  
ALCHEMY_DATUM(uint32_t, important_color)  
) 
Bitmap Information
This is a utility definition to combine the information header and the color data from the buffer for convenience:
C++
// *************************************************************  
ALCHEMY_STRUCT(bitmap_info_t,  
ALCHEMY_DATUM(bitmap_info_header_t, header),  
ALCHEMY_ALLOC(byte_t, header.sizeImage, pixels)  
) 
Pixel Definition
This is a convenience structure to access each colorchannel independently in a pixel:
C++
// *************************************************************  
ALCHEMY_STRUCT(rgba_t,  
ALCHEMY_DATUM(byte_t, blue),  
ALCHEMY_DATUM(byte_t, green),  
ALCHEMY_DATUM(byte_t, red),  
ALCHEMY_DATUM(byte_t, alpha)  
) 
Alchemy Declarations
Storage Buffer
Alchemy supports both static and dynamic memory management for its internal buffers; dynamic allocation is the default. However, the storage policy can easily be changed to a static policy with a new typedef
. The definition below shows the static buffer definitions used by the sample program:
C++
namespace detail  
{  
typedef Hg::basic_msg<Hg::bitmap_file_header_t,  
Hg::BufferedStaticStoragePolicy> hg_file_t;  
 
typedef Hg::basic_msg<Hg::bitmap_info_t,  
Hg::BufferedStaticStoragePolicy> hg_info_t;  
} 
Alchemy Message
For convenience, we also predefine a type for the message format type.
C++
typedef Hg::Message< detail::hg_file_t> file_t;  
typedef Hg::Message< detail::hg_info_t> info_t; 
Bitmap Abstraction
As I mentioned previously, I wanted to keep this sample application as simple as possible. One of the things that I was able to do is encapsulate the bitmap data details into the following Bitmap
abstraction. This class provides storage for a loaded bitmap, loads and stores the contents, and provides a generic processing function on each pixel:
C++
class Bitmap  
{  
public:  
bool Load (const std::string &name);  
bool Store(const std::string &name);  
 
void process( std::string &msg,  
pixel_ftor ftor);  
private:  
std::string m_file_name;  
 
file_t m_file_header;  
info_t m_info;  
}; 
The processing function takes a functionpointer as an argument that specifies the processing operation to be performed each time the function is called. This is the definition for that functionpointer.
C++
typedef void (*pixel_ftor) ( Hg::rgba_t& pixel,  
Hg::byte_t& data); 
Load and Store
This section shows the implementation for both the Load
and Store
operations of the bitmap. The implementation uses the Standard C++ Library to open a file, and read or write the contents directly into the Hg::Message
type with the stream operators.
C++
// *************************************************************  
bool Bitmap::Load (const std::string &name)  
{  
m_file_name = name;  
 
std::ifstream input(m_file_name, std::ios::binary);  
if (input.bad())  
{  
return false;  
}  
 
input >> m_file_header;  
 
const size_t k_info_len = 0x36ul;  
if (k_info_len != m_file_header.offset)  
{  
return false;  
}  
 
input >> m_info;  
 
return true;  
} 
And the implementation for Store
:
C++
// ************************************************************  
bool Bitmap::Store (const std::string &name)  
{  
std::ofstream output(name, std::ios::binary);  
if (output.bad())  
{  
return false;  
}  
 
output << m_file_header;  
output << m_info;  
 
return true;  
} 
Process
I mentioned at the beginning that it is important to implement programs that perform realwork with your libraries to verify that your library is easy to use and provides the desired functionality as expected. With my first pass implementation of this program, both of those qualities were true for Alchemy, except the performance was quite slow. The cause turned out to be the load and initialization of every single pixel into my implementation for Hg::packed_bits
.
The problem is that the bytes that represent the pixel data are normally read into an array as a bulk operation. Afterwards, the proper address for each pixel is indexed, rather than reading the data into an independent object that represents the pixel. When I recognized this, I came up with the idea for the data_view<T>
construct. This allows a large buffer to be loaded as raw memory, and a view of the data can be mapped to any type desired, even a complex data structure such as the rgba_t
type that I defined.
The data_view
is an object that provides nonowning access to the underlying raw buffer. If this sounds familiar that is because it is very similar to the string_view
construct that is slated for C++17. It was shortly after I implemented data_view
that discovered that string_view
existed. So I was a bit shocked, and delighted when I realized how similar the concepts and implementations are to each other. It was a bit of validation that I had chosen a good path to solve this problem.
I plan to write an entry that describes the data_view
in detail at a later time. Until then, if you would like to learn more about the approach, I encourage you to check out its implementation in Alchemy, or the documentation for the string_view
object.
The purpose of process
is to sequentially execute the supplied operation on a single message byte and source image pixel. This is continued until the entire message has been processed, or there are no more available pixels.
C++
// *************************************************************  
void Bitmap::process( std::string &msg,  
pixel_ftor ftor)  
{  
auto t = Hg::make_view<Hg::rgba_t>(m_info.pixels.get());  
auto iter = t.begin();  
 
// Calculate the number of bytes that can be encoded or extracted  
// from the image and ensure the the message buffer is large enough.  
size_t length = t.end()  iter;  
msg.resize(length);  
 
for (size_t index = 0; iter != t.end(); ++iter, ++index)  
{  
ftor(*iter, (Hg::byte_t&)(msg[index]));  
}  
} 
Weave and Extract
These are the two functions that provide the pixellevel operations to encode a message byte into a pixel with the strategy that was previously mentioned. Weave
combines the message byte with the supplied pixel, and Extract
reconstructs the message byte from the pixel.
I am investigating the possibility of implementing a uniontype for Alchemy. If I end up doing this I will most likely revisit this sample and provide an alternative implementation that incorporates the Hg::packed_bits
type. This will completely eliminate the manual bittwiddling logic that is present in both of these functions:
C++
// *************************************************************  
void weave_data ( Hg::rgba_t& pixel,  
Hg::byte_t& data)  
{  
using Hg::s_data;  
 
s_data value(data);  
 
pixel.blue = (pixel.blue & ~k_data_mask)  
 (value.d0 & k_data_mask);  
pixel.green = (pixel.green & ~k_data_mask)  
 (value.d1 & k_data_mask);  
pixel.red = (pixel.red & ~k_data_mask)  
 (value.d2 & k_data_mask);  
pixel.alpha = (pixel.alpha & ~k_data_mask)  
 (value.d3 & k_data_mask);  
} 
Extract implementation:
C++
// *************************************************************  
void extract_data ( Hg::rgba_t& pixel,  
Hg::byte_t& data)  
{  
using Hg::s_data;  
 
s_data value;  
 
value.d0 = (pixel.blue & k_data_mask);  
value.d1 = (pixel.green & k_data_mask);  
value.d2 = (pixel.red & k_data_mask);  
value.d3 = (pixel.alpha & k_data_mask);  
 
data = value;  
} 
The Main Program
The main program body is straightforward. Input parameters are parsed to determine if an encode or decode operation should be performed, as well as the names of the files to use.
C++
// *************************************************************  
int main(int argc, char* argv[])  
{  
if (!ParseCmdParams(argc, argv)) {  
PrintHelp();  
return 0;  
}  
 
string message;  
sgraph::Bitmap bmp;  
bmp.Load(input_file);  
if (is_encode) {  
message = ReadFile(msg_file);  
bmp.process(message, weave_data);  
bmp.Store(output_file);  
}  
else {  
bmp.process(message, extract_data);  
WriteFile(output_file, message);  
}  
 
return 0;  
} 
Results
To demonstrate the behavior of this application I ran sgraph
to encode the readme.txt file from its project. Here is the first portion of the file:
======================================================================== CONSOLE APPLICATION : sgraphy Project Overview ======================================================================== AppWizard has created this sgraphy application for you. This file contains a summary of what you will find in each of the files that make up your sgraphy application.
Into this image:
This is the result image:
For comparison, here is a sample screencapture from a Beyond Compare diff of the two files:
Summary
I implemented a basic application that performs steganography to demonstrate how to use the serialization features of my library, Alchemy. I chose a unique application like this to make the demonstration application a bit more interesting and to show the library can be used for much more than just serialization of data for network transfer.
Twitter is an... interesting way to spend one's time. It can be quite a challenge to cram a thought into 140 characters. However, I have found there are many ways to be entertained by this microblogging site. One of them includes interacting and learning from a wide variety of people. During an interaction with a creativewriter, I was a bit challenged by a poem.
This is a brief entry that includes my implementation of that "Poem Anagram Generator."
The Original Poem
The first thing that is required, is the poem. You can find the original at what_adri_writes[^].
Here is an excerpt:
Stanley the fishmonger
told me how to know it
when I saw it
without a poet
And we fished out drowned Phlebas,
Patron saint of Unconsidered Phoenicians
and failed Changers of Minds.
The Highlander Art doublechecked the veil
of Reichenbach Falls
three days later
and found
A beekeeper.
Strategy
When I thought about it, I realized it wouldn't take much to actually write a program to reorganize the words of the poem. Especially if I were to use the algorithms
that are available in the C++ Standard Library. That is, of course, creating the naïve implementation that ignores such possibly desirable features such as sentence structure or proper grammar.
Then again, this is poetry!
So let's go ahead and throw those concerns out the window. This will be a postmodern avantgarde cyberexperience.
Tokenize
My original goal was to tokenize all of the words from the poem, then randomize with something like next_permutation
. However, when I really started to look at the text I saw all of the punctuation. Did I want to remove the punctuation and just live with the words? Well then there are also the newlines that give the text form and clues the reader in that "pssst, this is probably a poem"
So I decided that I would include both the punctuation and newlines as tokens to be generated for the poem generator.
To do this I put a space between ever word, punctuation mark, and newline in the poem; like so:
C++
const std::string poem( "Oh but I too want to be a poet ! \n "  
"and yet \n "  
"Tell me where is meaning bred \n "  
"In my heart or in your head ? \n "  
 
// omitted for brevity  
 
"And so the poem \n "  
"was not to be . \n"); 
Here is a simple function to add each token into a vector
for future modification:
C++
typedef vector<string> words_t;  
 
// ************************************************************  
words_t tokenize(const string &poem)  
{  
words_t words;  
size_t next = 0;  
 
do  
{  
size_t cur = next;  
next = poem.find(' ', cur);  
 
size_t count = next == string::npos  
? string::npos  
: next  cur;  
words.push_back(poem.substr(cur, count));  
 
if (next != string::npos)  
next += 1;  
 
} while (next != string::npos);  
 
return words;  
} 
If I missed a potential algorithm from the standard library that would perform this task I would be interested to learn how this function could be simplified.
The Generator
The generator code is found below. It contains three algorithms from the standard library; A random number generator, shuffle
and copy
. Then of course the call to tokenize
.
You can run the code below to generate a new poem each time.
C++
// ************************************************************  
int main()  
{  
// Tokenize the poem.  
words_t words(tokenize(poem));  
 
// Jumble the words.  
random_device rdev;  
mt19937 rng(rdev());  
 
shuffle(words.begin(), words.end(), rng);  
 
// Print the results.  
copy(words.begin(), words.end(), ostream_iterator<string>(cout, " "));  
 
cout << "\n";  
 
return 0;  
} 
Output:
Twitter and Poetry...
Instant art!
Maybe to improve upon this I could pretokenize based on particular phrases.
Summary
Twitter is fun!
C++ is fun!
Combining Twitter and C++ makes poetry fun even for a leftbrained analytic like myself.
If you end up generating an interesting poem, post it in the comments.
The "Little Planet" effect is the colloquial name often used to refer to the mathematical concept of a stereographic projection. The end result is quite impressive, especially considering the little amount of code that is actually required to create the image. All that is required is a panoramic image with a 360° view from side to side, or a photosphere, such as used with Google Earth to provide an immersive view of a location.
Stereographic Projection
This is a mapping from a spherical position onto a plane. You will commonly see this type of projection in cartography; two examples are mapping the earth and planispheres (celestial charts). This projection is useful because it is not possible to map from a sphere to a plane without some type of distortion. The stereographic projection preserves angles and distorts areas. This tradeoff is preferred for navigation, which is typically performed with angles.
The projection is typically performed from one of the poles. However, it can originate from any point on the sphere. For simplicity, unless otherwise stated I will refer to projections that originate at the North Pole of the sphere. For a unitsphere located at the origin, this would be at \(\matrix{[0 & 0 & 1]}\)
The distortion of the image depends on the placement of the plane relative to the sphere. The upper hemisphere exhibits most of the distortion. The distortion becomes more extreme the closer the point in the sphere's surface approaches the origin of the projection. The projection extends to infinity as the projection's origin is undefined.
If the plane bisects the sphere at the equator, the lower hemisphere will be projected within an area the size of the circumference of the sphere, and the upper hemisphere is projection on the plane outside of the sphere.
When the plane is located at the surface of the sphere, opposite from the projection's origin, the lower hemisphere will project over an area that is twice that of the sphere's equator. The image below illustrates this configuration.
We can reference any point on the spheres surface with two angles representing the latitude \(\phi\) and longitude \(\lambda\), where:
\(\pi \lt \lambda \lt \pi, \cfrac{\pi}{2} \lt \phi \lt \cfrac{\pi}{2} \)
The following image is a scaled down version of the panorama that I used to generate the stereographic projection of the Golden Gate Bridge at the beginning of the article. Normally we would index the pixels in this image with two variables, \(x\) (width) and \(y\) (height).
We can simplify the math to map a fullview panorama to a sphere by normalizing the dimensions for both the sphere and our surface map; that is, to reduce the scale to the unit scale of one. This means we will perform the surface map operation on a unitsphere, and the dimensions of our panorama will then span from: \(1 \lt x \lt 1, 1 \lt y \lt 1\).
The following image shows how the coordinate system from the sphere maps to the image:
Project the sphere onto the Plane
We will create a ray, to calculate the projection of a single point from the sphere to the plane. This ray begins at the projective origin, passes through the sphere and points at any other point on the surface of the sphere. This ray continues and will intersect with the projective plane. This intersection is the point of projection. Alternatively, we can calculate the intersection point on the sphere's surface, given a ray that points between the projective origin and the projection plane. This demonstrates that the stereographic projection is a bijective operation; there is a onetoone correspondence for the values on both surfaces.
The diagram below depicts the projection in twodimensions:
If \(\lambda_0\) as the central longitude and (\phi_1\) as the central latitude, this relationship can be stated mathematically as:
\( \eqalign{ u &= k \cos \phi \sin(\lambda  \lambda_0) \cr v &= k [ \cos \phi_1 \sin \phi  \sin \phi_1 \cos \phi \cos(\lambda  \lambda_0)]\cr & where \cr k &= \cfrac{2R}{1+ \sin \phi_1 \sin \phi + \cos \phi_1 \cos \phi \cos(\lambda  \lambda_0)} } \)
The term \(k\) determines where the projected plane is located.
Codez Plz
That is enough theory and explanation. Here is the code that I used to generate the stereographic projections of the Golden Gate Bridge and Las Vegas. The code presented below is adapted from a project I just completed that used the Computer Vision library OpenCV. The only important thing to note in the code below is that Mat
objects are used to store images and pixels are represented with a type of std::vector
. You should have no problem converting the pixel access operations from the code below to whatever image processing API that you are using.
If you do run into problems, leave a comment and I will help you work through porting the code.
There are three functions:
Main Projection
This function works by creating a ray between the projection origin and a pixel location on the projection plane. The intersection of the sphere's surface is calculated, which indicates the location to sample from the sphere's surface map. Because we are dealing with discrete, pixelated digital images, this sampling process creates visual artifacts. To help improve the smoothness of the image, we use a bilinear filter to average the values of four surrounding pixels of the sample location from the sphere.
C++
void RenderProjection(Mat &pano, long len, Mat &output) {  
output.create(len, len, CV_16UC3)  
long half_len = len / 2;  
Size sz = pano.size();  
 
for (long indexX = 0; indexX < len; ++indexX) {  
for (long indexY = 0; indexY < len; ++indexY) {  
double sphereX = (indexX  half_len) * 10.0 / len;  
double sphereY = (indexY  half_len) * 10.0 / len;  
double Qx, Qy, Qz;  
 
if (GetIntersection(sphereX, sphereY, Qx, Qy, Qz))  
{  
double theta = std::acos(Qz);  
double phi = std::atan2(Qy, Qx) + k_pi;  
theta = theta * k_pi_inverse;  
phi = phi * (0.5 * k_pi_inverse);  
double Sx = min(sz.width 2.0, sz.width * phi);  
double Sy = min(sz.height2.0, sz.height * theta);  
 
output.at<Vec3s>(indexY, indexX) = BilinearSample(pano, Sx, Sy);  
}  
}  
}  
} 
Calculate the Intersection
This calculation is an optimized reduction of the quadratic equation to calculate the intersection point on the surface of the sphere.
C++
bool GetIntersection(double u, double v,  
double &x, double &y, double &z)  
{  
double Nx = 0.0;  
double Ny = 0.0;  
double Nz = 1.0;  
double dir_x = u  Nx;  
double dir_y = v  Ny;  
double dir_z = 1.0  Nz;  
 
double a = (dir_x * dir_x) + (dir_y * dir_y) + (dir_z * dir_z);  
double b = (dir_x * Nx) + (dir_y * Ny) + (dir_z * Nz);  
 
b *= 2;  
double d = b*b;  
double q = 0.5 * (b  std::sqrt(d));  
 
double t = q / a;  
 
x = (dir_x * t) + Nx;  
y = (dir_y * t) + Ny;  
z = (dir_z * t) + Nz;  
 
return true;  
} 
Bilinear Filter
The bilinear filter calculates a weightedsum of the four surrounding pixels for a digital image sample.
C++
Vec3s BilinearSample(Mat &image, double x, double y) {  
Vec3s c00 = image.at<vec3s>(int(y), int(x));  
Vec3s c01 = image.at<vec3s>(int(u), int(x)+1);  
Vec3s c10 = image.at<vec3s>(int(y)+1, int(x));  
Vec3s c11 = image.at<vec3s>(int(y)+1, int(x)+1);  
 
double X0 = x  floor(x);  
double X1 = 1.0  X0;  
double Y0 = y  floor(y);  
double Y1 = 1.0  Y0;  
 
double w00 = X0 * Y0;  
double w01 = X1 * Y0;  
double w10 = X0 * Y1;  
double w11 = X1 * Y1;  
 
short r = short(c00[2] * w00 + c01[2] * w01  
+ c10[2] * w10 + c11[2] * w11);  
short g = short(c00[1] * w00 + c01[1] * w01  
+ c10[1] * w10 + c11[1] * w11);  
short b = short(c00[0] * w00 + c01[0] * w01  
+ c10[0] * w10 + c11[0] * w11);  
 
return make_BGR(b, g, r);  
} 
...and a helper function:
C++
Vec3s make_BGR(short blue, short green, short red)  
{  
Vec3s result;  
result[0] = blue;  
result[1] = green;  
result[2] = red;  
 
return result;  
} 
Here is another sample of a stereographic projection and the panorama that I used to create it:
Summary
The stereographic projection has been known and used since the time of the ancient Greeks. It was heavily used in the Age of Exploration to create maps of the world where the distortion was applied to distances, and the relative angles between local points is preserved. When it is applied to fullview panoramas a neat effect is created called, "The Little Planet Effect." With just a little bit of theory and some concepts from computer graphics we were able to turn this concept into code with less than 100 lines of code.
I wanted to start writing about secure coding practices as well as more instructive posts related to security topics such as encryption and hacking. You probably already have a conceptual understanding of things like the "stack", "heap" and "program counter". However, it's difficult to have concrete discussions regarding security unless you have a solid grasp on the computer memory model. This post is intended to provide a concrete foundation of the memory model, and my future posts related to security will build on this foundation.
It is easy to take for granted the complexity of computer memory because of the many layers of abstraction that programmers work through today. The same basic memory design has existed for all computers that use a paged memory structure since the early 60's. These are some of the areas where the knowledge of the memory layout plays a crucial role in application portability and embedded resources, program security, code optimization. The diagrams I present will also help you understand where the different activities occur in a program during runtime.
Memory (RAM)
Random Access Memory (RAM) is where the majority of your program will live while it is actively running. As the name indicates any position in your application is capable of being addressed by the hardware. When your program is executed, it will be loaded at some starting address. This address is typically called the Base Address. The value of the baseoffset differs from system to system. A technique called, Address Space Layout Randomization (ASLR), loads your program's modules at random address locations to make hacking a bit more difficult.
RAM addresses start at zero and reach whatever limit is imposed by the system on which you are running. However, not every address is accessible by your program; some areas are reserved for the operating system. The hardware and operating system work together to abstract the details for accessing locations in RAM. Therefore, RAM can be thought of as one continuous linear array of bytes. Let this be basic representation of the layout of RAM for this discussion.
All of the sections described below are located in RAM.
Segmentation
Systems that utilized segmented RAM started to appear in the 60's with operating systems written with higher level languages. Originally, segmentation provided many services such as creating the ability address memory address higher than the word size of the processor. For Windows developers, the FAR specifier is a remnant of the limited ability to access addresses on 16bit Intel chips, such as the 80386.
Now, the primary feature implemented with segmented memory is memory paging (segment) and virtual addressing. I do not want to get into the technical details regarding virtual memory here, but I can revisit that topic in the future. For now, realize that these features allow the system to store and order all of the active resources in the most efficient way possible while hiding the details from the programmer.
From this point on, I will call a memory segment a page. A page is a predetermined number of bytes that are considered a single group. Memory is most efficiently managed as pages. All of the modern architectures that I have worked with have used a page size of 4Kb (4096 bytes). Why stop there? Because 4097 bytes would have been just too many?!
Now that memory segmentation has been introduced, let's alter the representation of RAM just a bit to simplify the illustrations. This new abstraction will help us visualize the concepts in a more manageable way:
Program structure
The descriptions that I give below are in general terms because there are subtle nuances between the different executable formats defined on each platform. For instance:
 Unix/Linux:
 COFF is an older format
 ELF is the current format
 Mac OS X:
 MachO
 Windows:
 NE, derived from DOS formats for 16bit Windows
 PE, 32bit Windows
 PE32+, introduced for 64bit Windows
While the actual file definitions vary between different platforms, executable programs are generally organized with different memory sections as described below.
Program header
There are basically two types of sections to consider when discussing executable file formats: 1) ReadOnly segments, 2) ReadWrite segments.Why isn't the entire program just made ReadOnly, we don't want the program to be changed do we?
The actual file the program is store in is not used directly. The program header directs the loader to the different segments, which are described below.
ReadOnly
The ReadOnly segments can be loaded into shared memory of the current system. Therefore, space can be saved when multiple instances of the same module are executed.
ReadWrite
Each ReadWrite segment must be given its own copy relative to the process that uses it. These segments cannot be safely shared between processes unless special care is taken by the developer.
Section header
Each type of program segment is described below. I will indicate whether it is typically a readonly or readwrite segment, as well as the role it plays in the execution of the program. Most module formats do not restrict the segment types to a single instance. I will describe some of the possibilities in the relevant sections below.
.text (RO)
Another name used for this segment is the code segment. This is where the sequence of hardware commands are encoded sequentially to command the processor. It is important for this segment to be loaded into a readonly region of the system's RAM to prevent selfmodifying code. Selfmodifying code is extremely difficult to debug, and it creates a great potential for security vulnerabilities.
It is also important to mention that most systems require memory segments to be marked with the "executable" privilege in order to process the segment as executable instructions.
.rodata (RO)
Some file layouts include a readonly data section to store constants and other readonly data that needs to reference an address by the program. This segment type is found in the ELF format. Again, this segment type is not used in all program file formats.
.data (RW)
The data section contains preallocated storage space for all of the global and static variables defined in your program. These values are packed efficiently for the target platform so they can be accessed on the proper memory boundary for the size of the data type. The commands generated in the .text segment reference the values in this segment at a relative offset during runtime. First the address of the segment is found, then the relative offset of the desired variable is accessed as needed.
A program may want to define multiple .data segments to create a simple mechanism to share data between two processes. This is called a shared data section. I have only used this technique in Windows, so I do not know the specifics for its use on other platforms.
The shared segment is loaded into globally accessible system memory. Any process that loads this module will have access to the same variables. If one process changes a value, this will be instantly reflected in all other processes. I have used this mechanism to synchronize unique data among different programs on a single system.
.bss (RW)
The .bss section is unique compared to the other segments. There is only one .bss per module, and it is not actually included in the module. That is because the .bss is the segment where all of the uninitialized data fields of a module are located.
The size required to store the uninitialized fields is all that is required in the definition for the .bss. The initialized fields defined below would be placed in the .bss address space:
C++
int classId;  
char courseName[32];  
std::vector< student > classRoster;  
 
// Static member variables and local static variables  
// will also be allocated in the BSS. 
Runtime Structure
Now that we have defined each of the major types of segments that exist in all modern computer architectures, let's describe how they are structured in a running program. There are three types of memory that are directly referenced by machine instructions.
Heap
The Heap is where all dynamically allocated memory is located. Another name for this collection of memory is the FreeStore. The heap is essentially all of the remaining accessible RAM that isn't used by your program modules or isn't reserved for the stack.
CallStack
The Stack is a nickname for the CallStack. The callstack is where parameters are passed into functions and storage is allocated for local variables. That is why it is referred to as "creating the variable on the stack". A stackframe is used to represent each instance of a function call. Each time a function is called, the current function state is recorded at the current location on the stack, and a new stackframe is pushed onto the stack to track the next function. When a function call returns, the current stackframe is popped off the stack, and the previous function state is restored to continue execution.
One callstack is allocated for each thread that is started in the program. In order to efficiently use the available address space on the system, a stacksize is usually defined for each thread's callstack. Factors that you want to consider when selecting the size of your callstack is the size of the local variables created in your functions and how deep your function calls execute.
I am going to leave the specific details to how the callstack is managed for a later post; because callstack processing is a complex topic that requires an entire post on its own. What is important to understand for now is that function call management and local variable allocations are the primary responsibilities of the callstack.
System Registers
The hardware registers are memory locations that are actually housed within the CPU. The registers are the same size as the wordsize for the CPU. Word has taken on a different meaning over the years with computers. In the context of hardware, wordsize always means the bitwidth of the processing pipeline of the CPU. For instance, the wordsize on a 32bit processor is 32bits, and the wordsize for a 64bit processor is 64bits.
They are crucial to the structure of a computer system. The registers are the CPUs only way to operate upon the values when a command is executed. Values are loaded into the registers from addressable RAM locations via the system bus. If the CPU has an internal memory cache, large chunks of data can be preloaded from RAM. This will ensure the data is ready when the CPU is ready to process an instruction. The CPU cache can provide an enormous boost in performance.
Fundamental x86 and AMD64 Registers
The types of registers that are available depend upon the CPU's ISA (Industry Standard Architecture). I am going to briefly introduce the commonly used registers for the x86 ISA, because all three major desktop operating systems (Apple, Linux, Windows) support this platform.
There are eight general purpose registers and the Instruction Pointer, which holds the program counter. The registers are named based upon a purpose that was originally envisioned for each register. There are special operations that are designed for specific registers; such as the Accumulator, EAX, has an efficient opcode to add a byte index specified for it. However, the other registers support these operations as well; only the opcodes are longer generalpurpose commands.
16bit  32bit  64bit  Purpose  
AX  EAX  RAX  Accumulator  
BX  EBX  RBX  Base index (arrays)  
CX  ECX  RCX  Counter (loops)  
DX  EDX  RDX  Extends the precision of the accumulator  
SI  ESI  RSI  Source Index for string operations  
DI  EDI  RDI  Destination Index for string operations  
SP  ESP  RSP  Stack Pointer  
BP  EBP  RBP  Base Pointer  
IP  EIP  RIP  Instruction Pointer 
The first four registers of the previous table have additional registers defined to access the loworder and highorder bytes of the 16bit register. The x86 and AMD64 instruction sets use an 8bit byte. Here are their names:
16bit  Low Byte  High Byte  
AX  AL  AH  
BX  BL  BH  
CX  CL  CH  
DX  DL  DH 
The stackpointer typically points to the top of the call stack to indicate the next address to use for parameter passing or local variable allocation. The instructionpointer points to the address of the current instruction to execute. This address should always be pointing to an address that is found in the .text segment of your program, or the system's libraries.
Interactions
Ignoring restrictions due to permissions, RAM can be addressed as one continuous sequence from the lowest address to the highest address. The system will move segments inandout of RAM one page of memory at a time. There are typically regions of the address space that is reserved for the operating system (kernel). If shared memory between processes is allowed on the system, then a region will also be reserved for globally accessible address space.
Your program itself is loaded into available address space. There is no universally common addressing scheme for the layout of the .text segment (executable code), .data segment (variables) and other program segments. However, the layout of the program itself is wellformed according to the system that will execute the program. This allows the system's program loader to navigate the program file and properly load the program into RAM. Jump tables and other internal structures are fixed up to allow the different memory segments to be properly referenced based on their final address.
The diagram below depicts a simplistic view of a single program's elements loaded into memory on a system. The CPU accesses the RAM through the system bus.
The callstack and heap are usually situated at opposite ends of the address space and they grow towards each other. If the program allocates too much dynamic memory, or a recursive call continues unbounded the system will run out of address space. For the recursive function call scenario, the stack will have used all of its allotted space and cause a stackoverflow.
The only hard boundaries in this continuous address space is typically at the pagelevel. Therefore if an operation attempts to access memory across a page boundary a segmentationfault or segfault will occur. If permissions are set to restrict access to specific pages and a program attempts to access the space, some type of access violation is raised.
Summary
Most programmers do not need to have a deep understanding of a computer's memory architecture in order to complete their jobs. However, having a solid understanding of this memory model can help you make better design decisions and improve your debugging skills. As you move closer to the hardware in your development, it becomes more necessary to truly understand this structure. Finally, there are some tasks that are simply not possible to accomplish (or at least them become extremely difficult) if you do not have a clear picture of the memory structure for computers.
Security is one of the concepts with computer programming where it becomes necessary to have a better understanding of this structure. Even though you may never look at a disassembled program or manually access the registers, it is important to understand how what causes a security vulnerability, as well as the qualities that make a vulnerability exploitable. So with this foundation of memory structure, I will be able to write about secure programming practices and also demonstrate some of the techniques used to exploit these flaws.
The criticism of strongencryption by lawenforcement has been an interesting topic to follow in the news and politics for the last nine months. It became even more interesting in February when the shortlived court battle between Apple and the FBI made headlines. Now looming on the horizons is a piece of legislation that proposes to give judges the authority to order makers of products with encryption to help lawenforcement.
Why can’t these companies help lawenforcement and give them a backdoor that they can only get the keys with a court order?
A Word about Encryption
Message encryption has existed for millennia, primarily used for communication in the military. It is now a ubiquitous tool of the Information Age and the Internet. Encryption is a versatile tool that goes well beyond privacy services such as online banking, the protection of personal medical records or even the endtoend message encryption recently added to WhatsApp. Encryption techniques are also used to create digital signatures to verify our digital content has not been tampered. Both aspects are fundamental to computer security.
How does encryption work?
Math is Hard
Specifically, for a computer to efficiently factor extremely large integers and to compute discrete logarithms. These problems are considered “intractable” or hard to deal with. They are difficult problems that cannot be solved quickly. Other types of encryption rely on concepts learned from information theory, computational complexity and statistics.
Essentially, an encryption algorithm is a set of instructions that scramble a plaintext message so there is no discernable pattern. Statistically, it should look like a truly random sequence of numbers, noise. The scrambled message is called a cipher message. There is also an algorithm that is used to decrypt the cipher message.
To make encryption algorithms more useful, a key is used with the algorithm to modify the scrambling instructions in a way that is unique to each key.
It takes relatively little time for a computer to encode and decode a message with an encryption algorithm and the associated key. However, the intractable math problems and encryption concepts used to design the algorithm make decryption take a very long time without the key.
This reddit comment [^] calculates the time required to crack an AES 256bit message at 9 e50 years. To put this in perspective, the universe is believed to be 14 billion (1.4 e10) years old.
What’s the takeaway from all of this?
It is far simpler and faster to search for a vulnerability in the system that employs encryption than it is to attempt to bruteforce crack a message.
Well, demanding that a backdoor be engineered into products that are normally secure is also faster and simpler than a bruteforce attack.
Why Can't We Give Them a Backdoor?
Vulnerabilities are regularly discovered in computer systems. The chances are that there are vulnerabilities in the system. It may not be with the encryption algorithm itself, but it may be with how the keys are transferred, or how the data is copied. Creating a secure system is difficult even when we aren’t trying to create backdoors in the system.
In fact, the National Vulnerability Database (NVD)[^] reports an average of 19 new vulnerabilities were reported each day in 2014. At the time of this writing, NVD vulnerability workload index workload index[^] was 4.62. This is a calculation of the number of important vulnerabilities that information technology operations staff need to address each day.
It seems as though there is no need for a backdoor.
Now these vulnerabilities are spread across a wide variety of networked systems, software and devices. So it is not likely that a new vulnerability is discovered for your iPhone or Android device each day. Moreover, it is likely that many vulnerabilities only become exploitable when you combine two specific systems configured in specific way. Otherwise they would be considered to be secure by themselves.
It is difficult enough to secure the systems that we have when we intend for them to be 100% secure. Imagine what happens when we start adding secret access methods to these designs. Once the secret access method is uncovered by less than honorable groups, the secret access feature becomes the front door… with no locks … a huge welcome mat out front… and a note on the door that says “Let yourself in and make yourself comfortable.”
Summary
So, we actually can add backdoors to these products, but that defeats the purpose of trying to secure a product in the first place. Networked computers cannot communicate securely over a public network without encryption. Adding an alternate method to access a computer that bypasses strongencryption is not an acceptable solution.
My previous two entries have presented a mathematical foundation for the development and presentation of 3D computer graphics. Basic matrix operations were presented, which are used extensively with Linear Algebra. Understanding the mechanics and limitations of matrix multiplication is fundamental to the focus of this essay. This entry demonstrates three ways to manipulate the geometry defined for your scene, and also how to construct the projection transform that provides the threedimensional effect for the final image.
Transforms
There are three types of transformations that are generally used to manipulate a geometric model, translate, scale, and rotate. Your models and be oriented anywhere in your scene with these operations. Ultimately, a single square matrix is composed to perform an entire sequence of transformations as one matrix multiplication operation. We will get to that in a moment.
For the illustration of these transforms, we will use the twodimensional unitsquare shown below. However, I will demonstrate the math and algorithms for three dimensions.
\( \eqalign{ pt_{1} &= (\matrix{0 & 0})\\ pt_{2} &= (\matrix{1 & 0})\\ pt_{3} &= (\matrix{1 & 1})\\ pt_{4} &= (\matrix{0 & 1})\\ }\) 
Translate
Translating a point in 3D space is as simple as adding the desired offset for the coordinate of each corresponding axes.
\( V_{translate}=V+D \)
For instance, to move our sample unit cube from the origin by \(x=2\) and \(y=3\), we simply add \(2\) to the xcomponent for every point in the square, and \(3\) to every ycomponent.
\( \eqalign{ pt_{1D} &= pt_1 + [\matrix{2 & 3}] \\ pt_{2D} &= pt_2 + [\matrix{2 & 3}] \\ pt_{3D} &= pt_3 + [\matrix{2 & 3}] \\ pt_{4D} &= pt_4 + [\matrix{2 & 3}] \\ }\) 
There is a better way, though.
That will have to wait until after I demonstrate the other two manipulation methods.
Identity Matrix
The first thing that we need to become familiar with is the identity matrix. The identity matrix has ones set along the diagonal with all of the other elements set to zero. Here is the identity matrix for a \(3 \times 3\) matrix.
\( \left\lbrack \matrix{1 & 0 & 0\cr 0 & 1 & 0\cr 0 & 0 & 1} \right\rbrack\)
When any valid input matrix is multiplied with the identity matrix, the result will be the unchanged input matrix. Only square matrices can identity matrices. However, nonsquare matrices can be used for multiplication with the identity matrix if they have compatible dimensions.
For example:
\( \left\lbrack \matrix{a & b & c \\ d & e & f \\ g & h & i} \right\rbrack \left\lbrack \matrix{1 & 0 & 0\cr 0 & 1 & 0\cr 0 & 0 & 1} \right\rbrack = \left\lbrack \matrix{a & b & c \\ d & e & f \\ g & h & i} \right\rbrack \)
\( \left\lbrack \matrix{7 & 6 & 5} \right\rbrack \left\lbrack \matrix{1 & 0 & 0\cr 0 & 1 & 0\cr 0 & 0 & 1} \right\rbrack = \left\lbrack \matrix{7 & 6 & 5} \right\rbrack \)
\( \left\lbrack \matrix{1 & 0 & 0\cr 0 & 1 & 0\cr 0 & 0 & 1} \right\rbrack \left\lbrack \matrix{7 \\ 6 \\ 5} \right\rbrack = \left\lbrack \matrix{7 \\ 6 \\ 5} \right\rbrack \)
We will use the identity matrix as the basis of our transformations.
Scale
It is possible to independently scale geometry along each of the standard axes. We first start with a \(2 \times 2\) identity matrix for our twodimensional unit square:
\( \left\lbrack \matrix{1 & 0 \\ 0 & 1} \right\rbrack\)
As we demonstrated with the identity matrix, if we multiply any valid input matrix with the identity matrix, the output will be the same as the input matrix. This is similar to multiplying by \(1\) with scalar numbers.
\( V_{scale}=VS \)
Based upon the rules for matrix multiplication, notice how there is a onetoone correspondence between each row/column in the identity matrix and for each component in our point..
\(\ \matrix{\phantom{x} & \matrix{x & y} \\ \phantom{x} & \phantom{x} \\ \matrix{x \\ y} & \left\lbrack \matrix{1 & 0 \\ 0 & 1} \right\rbrack } \)
We can create a linear transformation to scale point, by simply multiplying the \(1\) in the identity matrix, with the desired scale factor for each standard axis. For example, let's make our unitsquare twice as large along the \(xaxis\) and half as large along the \(yaxis\):
\(\ S= \left\lbrack \matrix{2 & 0 \\ 0 & 0.5} \right\rbrack \)
This is the result, after we multiply each point in our square with this transformation matrix; I have also broken out the mathematic calculations for one of the points, \(pt_3(1, 1)\):
\( \eqalign{ pt_{3}S &= \\ &= [\matrix{1 & 1}] \left\lbrack\matrix{2 & 0 \\ 0 & 0.5} \right\rbrack\\ &= [\matrix{(1 \cdot 2 + 1 \cdot 0) & (1 \cdot 0 + 1 \cdot 0.5)}] \\ &= [\matrix{2 & 0.5}] }\) 
Here is the scale transformation matrix for three dimensions:
\(S = \left\lbrack \matrix{S_x & 0 & 0 \\ 0 & S_y & 0 \\ 0 & 0 & S_z }\right\rbrack\)
Rotate
A rotation transformation can be created similarly to how we created a transformation for scale. However, this transformation is more complicated than the previous one. I am only going to present the formula, I am not going to explain the details of how it works. The matrix below will perform a counterclockwise rotation about a pivotpoint that is placed at the origin:
\(\ R= \left\lbrack \matrix{\phantom{} \cos \Theta & \sin \Theta \\ \sin \Theta & \cos \Theta} \right\rbrack \)
\(V'=VR\)
As I mentioned, the rotation transformation pivots the point about the origin. Therefore, if we attempt to rotate our unit square, the result will look like this:
The effects become even more dramatic if the object is not located at the origin:
In most cases, this may not be what we would like to have happen. A simpler way to reason about rotation is to place the pivot point in the center of the object, so it appears to rotate in place.
To rotate an object in place, we will need to first translate the object so the desired pivot point of the rotation is located at the origin, rotate the object, then undo the original translation to move it back to its original position.
While all of those steps must be taken in order to rotate an object about its center, there is a way to combine those three steps to create a single matrix. Then only one matrix multiplication operation will need to be performed on each point in our object. Before we get to that, let me show you how we can expand this rotation into three dimensions.
Imagine that our flat unitsquare object existed in three dimensions. It is still defined to exist on the \(xy\) plane, but it has a third component added to each of its points to mark its location along the \(zaxis\). We simply need to adjust the definition of our points like this:
\( \eqalign{ pt_{1} &= (\matrix{0 & 0 & 0})\\ pt_{2} &= (\matrix{1 & 0 & 0})\\ pt_{3} &= (\matrix{1 & 1 & 0})\\ pt_{4} &= (\matrix{0 & 1 & 0})\\ }\) 
So, you would be correct to surmise that in order to perform this same \(xy\) rotation about the \(zaxis\) in three dimensions, we simply need to begin with a \(3 \times 3\) identity matrix before we add the \(xy\) rotation.
\( R_z = \left\lbrack \matrix{ \phantom{}\cos \Theta & \sin \Theta & 0 \\ \sin \Theta & \cos \Theta & 0 \\ 0 & 0 & 1 } \right\rbrack \) 
We can also adjust the rotation matrix to perform a rotation about the \(x\) and \(y\) axis, respectively:
R_x = \( \left\lbrack \matrix{ 1 & 0 & 0 \\ 0 & \phantom{}\cos \Theta & \sin \Theta \\ 0 & \sin \Theta & \cos \Theta } \right\rbrack \) 
\( R_y = \left\lbrack \matrix{ \cos \Theta & 0 & \sin \Theta\\ 0 & 1 & 0 \\ \sin \Theta & 0 & \phantom{}\cos \Theta } \right\rbrack \) 
Composite Transform Matrices
I mentioned that it is possible to combine a sequence of matrix transforms into a single matrix. This is highly desirable, considering that every point in an object model needs to be multiplied by the transform. What I demonstrated up to this point is how to individually create each of the transform types, translate, scale and rotate. Technically, the translate operation, \(V'=V+D\) is not a linear transformation because it is not composed within a matrix. Before we can continue, we must find a way to turn the translation operation from a matrix add to a matrix multiplication.
It turns out that we can create a translation operation that is a linear transformation by adding another row to our transformation matrix. However, a transformation matrix must be square because it starts with the identity matrix. So we must also add another column, which means we now have a \(4 \times 4\) matrix:
\(T = \left\lbrack \matrix{1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ t_x & t_y & t_z & 1 }\right\rbrack\)
A general transformation matrix is in the form:
\( \left\lbrack \matrix{ a_{11} & a_{12} & a_{13} & 0\\ a_{21} & a_{22} & a_{23} & 0\\ a_{31} & a_{32} & a_{33} & 0\\ t_x & t_y & t_z & 1 }\right\rbrack\)
Where \(t_x\),\(t_y\) and \(t_z\) are the final translation offsets, and the submatrix \(a\) is a combination of the scale and rotate operations.
However, this matrix is not compatible with our existing point definitions because they only have three components.
Homogenous Coordinates
The final piece that we need to finalize our geometry representation and coordinate management logic is called Homogenous Coordinates. What this basically means, is that we will add one more parameter to our vector definition. We refer to this fourth component as, \(w\), and we initially set it to \(1\).
\(V(x,y,z,w) = [\matrix{x & y & z & 1}]\)
Our vector is only valid when \(w=1\). After we transform a vector, \(w\) has often changed to some other scale factor. Therefore, we must rescale our vector back to its original basis by dividing each component by \(w\). The equations below show the threedimensional Cartesian coordinate representation of a vector, where \(w \ne 0\).
\( \eqalign{ x&= X/w \\ y&= Y/w \\ z&= Z/w }\)
Creating a Composite Transform Matrix
We now possess all of the knowledge required to compose a linear transformation sequence that is stored in a single, compact transformation matrix. So let's create the rotation sequence that we discussed earlier. We will a compose a transform that translates an object to the origin, rotates about the \(zaxis\) then translates the object back to its original position.
Each individual transform matrix is shown below in the order that it must be multiplied into the final transform.
\( T_c R_z T_o = \left\lbrack \matrix{ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ t_x & t_y & t_z & 1 \\ } \right\rbrack \left\lbrack \matrix{ \phantom{}\cos \Theta & \sin \Theta & 0 & 0 \\ \sin \Theta & \cos \Theta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ } \right\rbrack \left\lbrack \matrix{ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ t_x & t_y & t_z & 1 \\ } \right\rbrack \)
Here, values are selected and the result matrix is calculated:
\(t_x = 0.5, t_y = 0.5, t_z = 0.5, \Theta = 45°\)
\( \eqalign{ T_c R_z T_o &= \left\lbrack \matrix{ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0.5 & 0.5 & 0.5 & 1 \\ } \right\rbrack \left\lbrack \matrix{ 0.707 & 0.707 & 0 & 0 \\ 0.707 & 0.707 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ } \right\rbrack \left\lbrack \matrix{ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0.5 & 0.5 & 0.5 & 1 \\ } \right\rbrack \\ \\ &=\left\lbrack \matrix{ \phantom{}0.707 & 0.707 & 0 & 0 \\ 0.707 & 0.707 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0.5 & 0.207 & 0 & 1 \\ } \right\rbrack }\)
It is possible to derive the final composite matrix by using algebraic manipulation to multiply the transforms with the unevaluated variables. The matrix below is the simplified definition for \(T_c R_z T_o\) from above:
\( T_c R_z T_o = \left\lbrack \matrix{ \phantom{}\cos \Theta & \sin \Theta & 0 & 0 \\ \sin \Theta & \cos \Theta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ (t_x \cos \Theta + t_y \sin \Theta + t_x) & (t_x \sin \Theta  t_y \cos \Theta + t_y) & 0 & 1 \\ } \right\rbrack\)
Projection Transformation
The last thing to do, is to convert our 3D model into an image. We have threedimensional coordinates, that must be mapped to a twodimensional surface. To do this, we will project a view of our worldspace onto a flat twodimensional screen. This is known as the "projection transformation" or "projection matrix".
Coordinate Systems
It may be already be apparent, but if not I will state it anyway; a linear transformation is a mapping between two coordinate systems. Whether we translate, scale or rotate we are simply changing the location and orientation of the origin. We have actually used the origin as our frame of reference, so our transformations appear to be moving our geometric objects. This is a valid statement, because the frame of reference determines many things.
We need a frame of reference that can be used as the eye or camera. We call this frame of reference the view space and the eye is located at the viewpoint. In fact, up to this point we have also used two other vectorspaces, local space (also called model space and world space. To be able to create our projection transformation, we need to introduce one last coordinate space called screen space. Screen space is a 2D coordinate system defined by the \(uvplane\), and it maps directly to the display image or screen.
The diagram below depicts all four of these different coordinate spaces:
The important concept to understand, is that all of these coordinate systems are relatively positioned, scaled, and oriented to each other. We construct our transforms to move between these coordinate spaces. To render a view of the scene, we need to construct a transform to go from world space to view space.
View Space
The view space is the frame of reference for the eye. Basically, our view of the scene. This coordinate space allows us to transform the world coordinates to a position relative to our view point. This makes it possible for us to create a projection of this visible scene onto our view plane.
We need a point and two vectors to define the location and orientation of the view space. The method I demonstrate here is called the "eye, at, up" method. This is because we specify a point to be the location of the eye, a vector from the eye to the focalpoint, at, and a vector that indicates which direction is up. I like this method because I think it is an intuitive model for visualizing where your viewpoint is located, how it is oriented, and what you are looking at.
Right or Left Handed?
The view space is a lefthanded coordinate system, which differs from the righthanded coordinate systems that we have used up to this point. In math and physics, righthanded systems are the norm. This places the \(yaxis\) up, and the \(xaxis\) pointing to the right (same as the classic 2D Cartesian grid), and the resulting \(zaxis\) will point toward you. For a lefthanded space, the positive \(zaxis\) points into the paper. This makes things very convenient in computer graphics, because \(z\) is typically used to measure and indicate depth.
You can tell which type of coordinate system that you have created by taking your open palm and pointing it perpendicularly to the \(xaxis\), and your palm facing the \(yaxis\). The positive \(zaxis\) will point in the direction of your thumb, based upon the direction your coordinate space is oriented.
Let's define a simple view space transformation.
\( \eqalign{ eye&= [\matrix{2 & 3 & 3}] \\ at&= [\matrix{0 & 0 & 0}] \\ up&= [\matrix{0 & 1 & 0}] }\)
Simple Indeed.
From this, we can create a view located at \([\matrix{2 & 3 & 3}]\) that is looking at the origin and is oriented vertically (no tilt to either side). We start by using the vector operations, which I demonstrated in my previous post, to define vectors for each of the three coordinate axes.
\( \eqalign{ z_{axis}&= \ ateye \ \\ &= \ [\matrix{2 & 3 & 3}] \ \\ &= [\matrix{0.4264 & 0.6396 & 0.6396}] \\ \\ x_{axis}&= \ up \times z_{axis} \ \\ &=\ [\matrix{3 & 0 & 2}] \\\ &=[\matrix{0.8321 & 0 & 0.5547}]\\ \\ y_{axis}&= \ z_{axis} \times x_{axis} \ \\ &= [\matrix{0.3547 & 0.7687 & 0.5322}] }\)
We now take these three vectors, and plug them into this composite matrix that is designed to transform world space to a new coordinate space defined by axisaligned unit vectors.
\( \eqalign { T_{view}&=\left\lbrack \matrix{ x_{x_{axis}} & x_{y_{axis}} & x_{z_{axis}} & 0 \\ y_{x_{axis}} & y_{y_{axis}} & y_{z_{axis}} & 0 \\ z_{x_{axis}} & z_{y_{axis}} & z_{z_{axis}} & 0 \\ (x_{axis} \cdot eye) & (y_{axis} \cdot eye) & (z_{axis} \cdot eye) & 1 \\ } \right\rbrack \\ \\ &=\left\lbrack \matrix{ 0.8321 & 0.3547 & 0.4264 & 0 \\ 0 & 0.7687 & 0.6396 & 0 \\ 0.5547 & 0.5322 & 0.6396 & 0 \\ 3.3283 & 4.6121 & 4.6904 & 1 \\ } \right\rbrack }\)
Gaining Perspective
The final step is to project what we can see from the \(eye\) in our view space, onto the viewplane. As I mentioned previous, the viewplane is commonly referred to as the \(uvplane\). The \(uv\) coordinatesystem refers to the 2D mapping on the \(uvplane\). The projection mapping that I demonstrate here places the \(uvplane\) some distance \(d\) from the \(eye\), so that the viewspace \(zaxis\) is parallel with the plane's normal.
The \(eye\) is commonly referred to by another name for the context of the projection transform, it is called the center of projection. That is because this final step, basically, projects the intersection of a projected vector onto the \(uvplane\). The projected vector originates at the center of projection and points to each point in the viewspace.
The cues that let us perceive perspective with stereoscopic vision is called, foreshortening. This effect is caused by objects appearing much smaller as they are further from our vantagepoint. Therefore, we can recreate the foreshortening in our image by scaling the \(xy\) values based on their \(z\)coordinate. Basically, the farther away an object is placed from the center of projection, the smaller it will appear.
The diagram above shows a side view of the \(y\) and \(z\) axes, and is parallel with the \(xaxis\). Notice the have similar triangles that are created by the view plane. The smaller triangle on the bottom is the distance \(d\), which is the distance from the center of projection to the view plane. The larger triangle on the top extends from the point \(Pv\) to the \(yaxis\), this distance is the \(z\)component value for the point \(Pv\). Using the property of similar triangles we now have a geometric relationship that allows us to scale the location of the point for the view plane intersection.
\(\frac{x_s}d = \frac{x_v}{z_v}\) 
\(\frac{y_s}d = \frac{y_v}{z_v}\) 
\(x_s = \frac{x_v}{z_v/d}\) 
\(y_s = \frac{y_v}{z_v/d}\) 
With this, we can define a perspectivebased projection transform:
\( T_{project}=\left\lbrack \matrix{ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1/d \\ 0 & 0 & 0 & 0} \right\rbrack \)
This is now where the homogenizing component \(w\) returns. This transform will have the affect of scaling the \(w\) component to create:
\( \eqalign{ X&= x_v \\ Y&= y_v \\ Z&= z_v \\ w&= z_v/d }\)
Which is why we need to access these values as shown below after the following transform is used:
\( \eqalign{ [\matrix{X & Y & Z & w}] &= [\matrix{x_v & y_v & z_v & 1}] T_{project}\\ \\ x_v&= X/w \\ y_v&= Y/w \\ z_v&= Z/w }\)
Scale the Image Plane
Here is one last piece of information that I think you will appreciate. The projection transform above defines a \(uv\) coordinate system that is \(1 \times 1\). Therefore, if you make the following adjustment, you will scale the unitplane to match the size of your final image in pixels:
\( T_{project}=\left\lbrack \matrix{ width & 0 & 0 & 0 \\ 0 & height & 0 & 0 \\ 0 & 0 & 1 & 1/d \\ 0 & 0 & 0 & 0} \right\rbrack \)
Summary
We made it! With a little bit of knowledge from both Linear Algebra and Trigonometry, we have been able to construct and manipulate geometric models in 3D space, concatenate transform matrices into compact and efficient transformation definitions, and finally create a projection transform to create a twodimensional image from a threedimensional definition.
As you can probably guess, this is only the beginning. There is so much more to explore in the creation of computer graphics. Some of the more interesting topics that generally follow after the 3D projection is created are:
 Shading Models: Flat, Gouraud and Phong.
 Shadows
 Reflections
 Transparency and Refraction
 Textures
 Bumpmapping ...
The list goes on and on. I imagine that I will continue to write on this topic and address polygon fill algorithms and the basic shading models. However, I think things will be more interesting if I have code to demonstrate and possibly an interactive viewer for the browser. I have a few demonstration programs written in C++ for Windows, but I would prefer to create something that is a bit more portable.
Once I decide how I want to proceed I will return to this topic. However, if enough people express interest, I wouldn't mind continuing forward demonstrating with C++ Windows programs. Let me know what you think.
References
Watt, Alan, "Threedimensional geometry in computer graphics," in 3D Computer Graphics, Harlow, England: AddisonWesley, 1993
Foley, James, D., et al., "Geometrical Transformations," in Computer Graphics: Principles and Practice, 2nd ed. in C, Reading: AddisonWesley, 1996
I previously introduced how to perform the basic matrix operations commonly used in 3D computer graphics. However, we haven’t yet reached anything that resembles graphics. I now introduce the concepts of geometry as it’s related to computer graphics. I intend to provide you with the remaining foundation necessary to be able to mathematically visualize the concepts at work. Unfortunately, we will still not be able to reach the "computer graphics" stage with this post; that must wait until the next entry.
This post covers a basic form of a polygonal mesh to represent models, points and Euclidean vectors. With the previous entry and this one, we will have all of the background necessary for me to cover matrix transforms in the next post. Specifically the 3D projection transform, which provides the illusion of perspective. This will be everything that we need to create a wireframe viewer. For now, let's focus on the topics of this post.
Modeling the Geometry
We will use models composed of a mesh of polygons. All of the polygons will be defined as triangles. It is important to consistently define the points of each triangle either clockwise or counterclockwise. This is because the definition order affects the direction a surface faces. If a single model mixes the order of point definitions, your rendering will have missing polygons.
The order of point definition for the triangle can be ignored if a surface normal is provided for each polygon. However, to keep things simple, I will only be demonstrating with basic models that are built from an index array structure. This data structure contains two arrays. The first array is a list of all of the vertices (points) in the model. The second array contains a threeelement structure with the index of the point that can be found in the first array. I define all of my triangles counterclockwise while looking at the front of the surface.
Generally, your models should be defined at or near the origin. The rotation transformation that we will explore in the next post requires a pivot point, which is defined at the origin. Model geometries defined near the origin are also easier to work with when building models from smaller components and using instancing to create multiple copies of a single model.
Here is a definition for a single triangle starting at the origin and adjacent to the Y plane. The final corner decreases for Y as it moves diagonally along the XZ plane:


This definition represents a unit cube starting at the origin. A cube has 8 corners to hold six squarefaces. It requires two triangles to represent each square. Therefore, there are 8 points used to define 12 triangles:


Vectors
Recall from my previous post that a vector is a special type of matrix, in which there is only a single row or column. You may be aware of a different definition for the term "vector"; something like
NOUN
A quantity having direction as well as magnitude, especially determining the position of one point in space relative to another.
Oxford Dictionary
From this point forward, when I use the term "vector", I mean a Euclidean vector. We will even use singlerow and singlecolumn matrices to represent these vectors; we’re just going to refer to those structures as matrices to minimize the confusion.
Notational Convention
It is most common to see vectors represented as singlerow matrices, as opposed to the standard mathematical notation which uses column matrices to represent vectors. It is important to remember this difference in notation based on what type of reference you are following.
The reason for this difference, is the rowform makes composing the transformation matrix more natural as the operations are performed lefttoright. I discuss the transformation matrix in the next entry. As you will see, properly constructing transformation matrices is crucial to achieve the expected results, and anything that we can do to simplify this sometimes complex topic, will improve our chances of success.
Representation
As the definition indicates, a vector has a direction and magnitude. There is a lot of freedom in that definition for representation, because it is like saying "I don’t know where I am, but I know which direction I am heading and how far I can go." Therefore, if we think of our vector as starting at the origin of a Cartesian grid, we could then represent a vector as a single point.
This is because starting from the origin, the direction of the vector is towards the point. The magnitude can be calculated by using Pythagoras' Theorem to calculate the distance between two points. All of this information can be encoded in a compact, singlerow matrix. As the definition indicates, we derive a vector from two points, by subtracting one point from the other using inverse of matrix addition.
This will give us the direction to the destination point, relative to the starting point. The subtraction effectively acts as a translation of the vectors starting point to the origin.
For these first few vector examples, I am going to use 2D vectors. Because they are simpler to demonstrate and diagram, and the only difference to moving to 3D vectors is an extra field has been added.
Here is a simple example with two points (2, 1) and (5, 5). The first step is to encode them in a matrix:
\( {\bf S} = \left\lbrack \matrix{2 & 1} \right\rbrack, {\bf D} = \left\lbrack \matrix{5 & 5} \right\rbrack \)
Let’s create two vectors, one that determines how to get to point D starting from S, and one to get to point S starting from D. The first step is to subtract the starting point from the destination point. Notice that if we were to subtract the starting point from itself, the result is the origin, (0, 0).
\( \eqalign{ \overrightarrow{SD} &= [ \matrix{5 & 5} ]  [ \matrix{2 & 1} ] \cr &= [ \matrix{3 & 4} ] } \)
\( \eqalign{ \overrightarrow{DS} &= [ \matrix{2 & 1} ]  [ \matrix{5 & 5} ] \cr &= [ \matrix{3 & 4} ] } \)
Notice that the only difference between the two vectors is the direction. The magnitude of each vector is the same, as expected. Here is how to calculate the magnitude of a vector:
Given:
\( \eqalign{ v &= \matrix{ [a_1 & a_2 & \ldots & a_n]} \cr v &= \sqrt{\rm (a_1^2 + a_2^2 + \ldots + a_n^2)} } \)
Operations
I covered scalar multiplication and matrix addition in my previous post. However, I wanted to briefly revisit the topic to demonstrate one way to think about vectors, which may help you develop a better intuition for the math. For the coordinate systems that we are working with, vectors are straight lines.
When you add two or more vectors together, you can think of them as moving the tail of each successive vector to the head of the previous vector. Then add the corresponding components.
\( \eqalign{ v &= A+B \cr v &= \sqrt{\rm 3^2 + 4^2} \cr &= 5 }\)
Regardless of the orientation of the vector, it can be decomposed into its individual contributions per axis. In our model for the vector, the contribution for each axis is the raw value defined in our matrix.
\( v = A+B+C \)
\( \eqalign{ v &= \sqrt{\rm (3+23)^2 + (1+3+2)^2} \cr &= \sqrt{\rm 2^2 + 6^2} \cr &= 2 \sqrt{\rm 10} }\)
Unit Vector
We can normalize a vector to create unit vector, which is a vector that has a length of 1. The concept of the unit vector is introduced to reduce a vector to simply indicate its direction. This type of vector can now be used as a unit of measure to which other vectors can be compared. This also simplifies the task of deriving other important information when necessary.
To normalize a vector, divide the vector by its magnitude. In terms of the matrix operations that we have discussed, this is simply scalar multiplication of the vector with the inverse of its magnitude.
\( u = \cfrac{v}{ v } \)
\(u\) now contains the information for the direction of \(v\). With algebraic manipulation we can also have:
\( v = u v \)
This basically says the vector \(v\) is equal to the unit vector that has the direction of \(u\) multiplied by the magnitude of \(v\).
Dot Product
The dot product is used for detecting visibility and shading of a surface. The dot product operation is shown below:
\( \eqalign{ w &= u \cdot v \cr &= u_1 v_1 + u_2 v_2 + \cdots + u_n v_n }\)
This formula may look familiar. That is because this is a special case in matrix multiplication called the inner product, which looks like this (remember, we are representing vectors as a row matrix):
\( \eqalign{ u \cdot v &= u v^T \cr &= \matrix {[u_1 & u_2 & \cdots & u_n]} \left\lbrack \matrix{v_1 \\ v_2 \\ \vdots \\ v_n } \right\rbrack }\)
We will use the dot product on two unit vectors to help us determine the angle between these vectors; that’s right, angle. We can do this because of the cosine rule from Trigonometry. The equation that we can use is as follows:
\( \cos \Theta = \cfrac{u \cdot v} {uv} \)
To convert this to the raw angle, \(\Theta\), you can use the inverse function of cosine, arccos(\( \Theta \)). In C and C++ the name of the function is acos().
\( \cos {\arccos \Theta} = \Theta \)
So what is actually occurring when we calculate the dot product that allows us to get an angle?
We are essentially calculating the projection of u onto v. This gives us enough information to manipulate the Trigonometric properties to derive \( \Theta \).
I would like to make one final note regarding the dot product. Many times it is only necessary to know the sign of the angle between two vectors. The following table shows the relationship between the dot product of two vectors and the value of the angle, \( \Theta \), which is in the range \( 0° \text{ to } 180° \). If you like radians, that is \( 0 \text{ to } \pi \).
\( w = u \cdot v = f(u,w) \)
\( f(u,w) = \cases{ w \gt 0 & \text { if } \Theta \lt 90° \cr w = 0 & \text { if } \Theta = 90° \cr w \lt 0 & \text { if } \Theta \gt 90° }\)
Cross Product
The cross product is used to calculate a vector that is perpendicular to a surface. The name for this vector is the surface normal. The surface normal indicates which direction the surface is facing. Therefore, it is used quite extensively for shading and visibility as well.
We have reached the point where we need to start using examples and diagrams with three dimensions. We will use the three points that define our triangular surface to create two vectors. Remember, it is important to use consistent definitions when creating models.
The first step is to calculate our planar vectors. What does that mean? We want to create two vectors that are coplanar (on the same plane), so that we may use them to calculate the surface normal. All that we need to be able to do this are three points that are noncollinear, and we have this with each triangle definition in our polygonal mesh model.
If we have defined a triangle \(\triangle ABC\), we will define two vectors, \(u = \overrightarrow{AB}\) and \(v = \overrightarrow{AC}\).
If we consistently defined the triangle surfaces in our model, we should now be able to take the cross product of \(u \times v\) to receive the surface normal, \(N_p\), of \(\triangle ABC\). Again, the normal vector is perpendicular to its surface.
In practice, this tells us which direction the surface is facing. We would combine this with the dot product to help us determine if are interested in this surface for the current view.
So how do you calculate the cross product?
Here is what you need to know in order to calculate the surface normal vector:
\(\eqalign { N_p &= u \times v \cr &= [\matrix{(u_2v_3 – u_3v_2) & (u_3v_1 – u_1v_3) & (u_1v_2 – u_2v_1)}] }\)
To help you remember how to perform the calculation, I have rewritten it in parts, and used \(x\), \(y\), and \(z\) rather than the index values.
\(\eqalign { x &= u_yv_z – u_zv_y \cr y &= u_zv_x – u_xv_z \cr z &= u_xv_y – u_yv_x }\)
Let's work through an example to demonstrate the cross product. We will use surface A from our axisaligned cube model that we constructed at the beginning of this post. Surface A is defined completely within the XY plane. How do we know this? Because all of the points have the same \(z\)coordinate value of 1. Therefore, we should expect that our calculation for \(N_p\) of surface A will be an \(z\) axisaligned vector pointing in the positive direction.
Step 1: Calculate our surface description vectors \(u\) and \(v\):
\( Surface_A = pt_7: (0,0,1), pt_6: (1,0,1), \text{ and } pt_4: (0,1,1) \)
\( \eqalign { u &= pt_6  pt_7 \cr &= [\matrix{1 & 0 & 1}]  [\matrix{0 & 0 & 1}] \cr &=[\matrix{\color{red}{1} & \color{green}{0} & \color{blue}{0}}] }\) 
\( \eqalign { v &= pt_4  pt_7 \cr &= [\matrix{0 & 1 & 1}]  [\matrix{0 & 0 & 1}] \cr &=[\matrix{\color{red}{0} & \color{green}{1} & \color{blue}{0}}] }\) 
Step 2: Calculate \(u \times v\):
\(N_p = [ \matrix{x & y & z} ]\text{, where:}\)
\( \eqalign { x &= \color{green}{u_y}\color{blue}{v_z} – \color{blue}{u_z}\color{green}{v_y} \cr &= \color{green}{0} \cdot \color{blue}{0}  \color{blue}{0} \cdot \color{green}{1} \cr &= 0 }\) 
\( \eqalign { y &= \color{blue}{u_z}\color{red}{v_x} – \color{red}{u_x}\color{blue}{v_z} \cr &= \color{blue}{0} \cdot \color{red}{0}  \color{red}{1} \cdot \color{blue}{0} \cr &= 0 }\) 
\( \eqalign { z &= \color{red}{u_x}\color{green}{v_y} – \color{green}{u_y}\color{red}{v_x} \cr &= \color{red}{1} \cdot \color{green}{1}  \color{green}{0} \cdot \color{red}{0} \cr &= 1 }\) 
\(N_p = [ \matrix{0 & 0 & 1} ]\)
This is a \(z\) axisaligned vector pointing in the positive direction, which is what we expected to receive.
Notation Formality
If you look at other resources to learn more about the cross product, it is actually defined as a sum in terms of three axisaligned vectors \(i\), \(j\), and \(k\), which then must be multiplied by the vectors \(i=[\matrix{1&0&0}]\), \(j=[\matrix{0&1&0}]\), and \(k=[\matrix{0&0&1}]\) to get the final vector. The difference between what I presented and the form with the standard vectors, \(i\), \(j\), and \(k\), is I omitted a step by placing the result calculations directly in a new vector. I wanted you to be aware of this discrepancy between what I have demonstrated and what is typically taught in the classroom.
Apply What We Have Learned
We have discovered quite a few concepts between the two posts that I have written regarding the basic math and geometry involved with computer graphics. Let's apply this knowledge and solve a useful problem. Detecting if a surface is visible from a specific pointofview is a fundamental problem that is used extensively in this domain. Since we have all of the tools necessary to solve this, let's run through two examples.
For the following examples we will specify the viewpoint as \(pt_{eye} = [\matrix{3 & 2 & 3}]:\)
Also, refer to this diagram for the surfaces that we will test for visibility from the viewpoint. The side surface is surface \(D\) from the cube model, and the hidden surface is the bottomfacing surface \(J\).
Detect a visible surface
Step 1: Calculate our surface description vectors \(u\) and \(v\):
\( \eqalign{ Surface_D: &pt_5: (1,1,1), \cr &pt_1: (1,0,0), \cr &pt_2: (1,1,0) } \)
\( \eqalign { u &= pt_1  pt_5 \cr &= [\matrix{1 & 0 & 0}]  [\matrix{1 & 1 & 1}] \cr &=[\matrix{0 & 1 & 1}] }\) 
\( \eqalign { v &= pt_2  pt_5 \cr &= [\matrix{1 & 1 & 0}]  [\matrix{1 & 1 & 1}] \cr &=[\matrix{0 & 0 & 1}] }\) 
Step 2: Calculate \(u \times v\):
\(N_{pD} = [ \matrix{x & y & z} ]\text{, where:}\)
\( \eqalign { x &= 1 \cdot (1)  (1) \cdot 0 \cr &= 1 }\) 
\( \eqalign { y &= 1 \cdot 0  (1) \cdot 0 \cr &= 0 }\) 
\( \eqalign { z &= 0 \cdot 0  (1) \cdot 0 \cr &= 0 }\) 
\(N_pD = [ \matrix{1 & 0 & 0} ]\)
Step 3: Calculate vector to the eye:
To calculate a vector to the eye, we need to select a point on the target surface and create a vector to the eye. For this example, we will select \(pt_5\)
\( \eqalign{ \overrightarrow{view_D} &= pt_{eye}  pt_5\cr &= [ \matrix{3 & 2 & 3} ]  [ \matrix{1 & 1 & 1} ]\cr &= [ \matrix{2 & 1 & 2} ] }\)
Step 4: Normalize the viewvector:
Before we can use this vector in a dot product, we must normalize the vector:
\( \eqalign{ eye_D &= \sqrt{\rm 2^2 + 1^2 + 2^2} \cr &= \sqrt{\rm 4 + 1 + 4} \cr &= \sqrt{\rm 9} \cr &= 3 \cr \cr eye_u &= \cfrac{eye_D}{eye_D} \cr &= \cfrac{1}3 [\matrix{2 & 1 & 2}] \cr &= \left\lbrack\matrix{\cfrac{2}3 & \cfrac{1}3 & \cfrac{2}3} \right\rbrack }\)
Step 5: Calculate the dot product of the viewvector and surface normal:
\( \eqalign{ w &= eye_{u} \cdot N_{pD} \cr &= \cfrac{2}3 \cdot 1 + \cfrac{1}3 \cdot 0 + \cfrac{2}3 \cdot 0 \cr &= \cfrac{2}3 }\)
Step 6: Test for visibility:
\(w \gt 0\), therefore \(Surface_D\) is visible. 
Detect a surface that is not visible
Step 1: Calculate our surface description vectors \(u\) and \(v\):
\( \eqalign{ Surface_J: &pt_6: (1,1,1), \cr &pt_0: (0,0,0), \cr &pt_1: (1,0,0) } \)
\( \eqalign { u &= pt_0  pt_6 \cr &= [\matrix{0 & 0 & 0}]  [\matrix{1 & 0 & 1}] \cr &=[\matrix{1 & 0 & 1}] }\) 
\( \eqalign { v &= pt_1  pt_6 \cr &= [\matrix{1 & 0 & 0}]  [\matrix{1 & 0 & 1}] \cr &=[\matrix{0 & 0 & 1}] }\) 
Step 2: Calculate \(u \times v\):
\(N_{pJ} = [ \matrix{x & y & z} ]\text{, where:}\)
\( \eqalign { x &= 0 \cdot (1)  (1) \cdot 0 \cr &= 0 }\) 
\( \eqalign { y &= 1 \cdot 0  (1) \cdot (1) \cr &= 1 }\) 
\( \eqalign { z &= (1) \cdot 0  0 \cdot 0 \cr &= 0 }\) 
\(N_pJ = [ \matrix{0 & 1 & 0} ]\)
Step 3: Calculate vector to the eye:
For this example, we will select \(pt_6\)
\( \eqalign{ \overrightarrow{view_J} &= pt_{eye}  pt_6\cr &= [ \matrix{3 & 2 & 3} ]  [ \matrix{1 & 0 & 1} ]\cr &= [ \matrix{2 & 2 & 2} ] }\)
Step 4: Normalize the viewvector:
Before we can use this vector in a dot product, we must normalize the vector:
\( \eqalign{ eye_J &= \sqrt{\rm 2^2 + 2^2 + 2^2} \cr &= \sqrt{\rm 4 + 4 + 4} \cr &= \sqrt{\rm 12} \cr &= 2 \sqrt{\rm 3} \cr \cr eye_u &= \cfrac{eye_J}{eye_J} \cr &= \cfrac{1}{2 \sqrt 3} [\matrix{2 & 2 & 2}] \cr &= \left\lbrack\matrix{\cfrac{1}{\sqrt 3} & \cfrac{1}{\sqrt 3} & \cfrac{1}{\sqrt 3}} \right\rbrack }\)
Step 5: Calculate the dot product of the viewvector and surface normal:
\( \eqalign{ w &= eye_{u} \cdot N_{pJ} \cr &= \cfrac{1}{\sqrt 3} \cdot 0 + \cfrac{1}{\sqrt 3} \cdot (1) + \cfrac{1}{\sqrt 3} \cdot 0 \cr &= \cfrac{1}{\sqrt 3} }\)
Step 6: Test for visibility:
\(w \lt 0\), therefore \(Surface_J\) is not visible. 
What has this example demonstrated?
Surprisingly, this basic task demonstrates every concept that we have learned from the two posts up to this point:
 Matrix Addition: Euclidean vector creation
 Scalar Multiplication: Unit vector calculation, multiplying by \(1 / v \).
 Matrix Multiplication: Dot Product calculation, squarematrix multiplication will be used in the next post.
 Geometric Model Representation: The triangle surface representation provided the points for our calculations.
 Euclidean Vector: Calculation of each surfaces planar vectors, as well as the view vector.
 Cross Product: Calculation of the surface normal, \(N_p\).
 Unit Vector: Preparation of vectors for dot product calculation.
 Dot Product: Used the sign from this calculation to test for visibility of a surface.
Summary
Math can be difficult sometimes, especially with all of the foreign notation. However, with a bit of practice the notation will soon become second nature and many calculations can be performed in your head, similar to basic arithmetic with integers. While many graphics APIs hide these calculations from you, don't be fooled because they still exist and must be performed to display the threedimensional images on your screen. By understanding the basic concepts that I have already demonstrated, you will be able to better troubleshoot your graphics programs when something wrong occurs. Even if you rely solely on a graphics library to abstract the complex mathematics.
There is only one more entry to go, and we will have enough basic knowledge of linear algebra, Euclidean geometry, and computer displays to be able to create, manipulate and display 3D models programmatically. The next entry discusses how to translate, scale and rotate geometric models. It also describes what must occur to create the illusion of a threedimensional image on a twodimensional display with the projection transform.
References
Watt, Alan, "Threedimensional geometry in computer graphics," in 3D Computer Graphics, Harlow, England: AddisonWesley, 1993
Threedimensional computer graphics is an exciting aspect of computing because of the amazing visual effects that can be created for display. All of this is created from an enormous number of calculations that manipulate virtual models, which are constructed from some form of geometric definition. While the math involved for some aspects of computer graphics and animation can become quite complex, the fundamental mathematics that is required is very accessible. Beyond learning some new mathematic notation, a basic threedimensional view can be constructed with algorithms that only require basic arithmetic and a little bit of trigonometry.
I demonstrate how to perform some basic operations with two key constructs for 3D graphics, without the rigorous mathematic introduction. Hopefully the level of detail that I use is enough for anyone that doesn't have a strong background in math, but would very much like to play with 3D APIs. I introduce the math that you must be familiar with in this entry, and in two future posts I demonstrate how to manipulate geometric models and apply the math towards the display of a 3D image.
Linear Algebra
The type of math that forms the basis of 3D graphics is called linear algebra. In general, linear algebra is quite useful for solving systems of linear equations. Rather than go into the details of linear algebra, and all of its capabilities, we are going to simply focus on two related constructs that are used heavily within the topic, the matrix and the vertex.
Matrix
The Matrix provides an efficient mechanism to translate, scale, rotate, and convert between different coordinate systems. This is used extensively to manipulate the geometric models for environment calculations and display.
These are all examples of valid matrices:
\( \left\lbrack \matrix{a & b & c \cr d & e & f} \right\rbrack, \left\lbrack \matrix{10 & 27 \cr 0 & 13 \cr 7 & 17} \right\rbrack, \left\lbrack \matrix{x^2 & 2x \cr 0 & e^x} \right\rbrack\)
Square matrices are particularly important, that is a matrix with the same number of columns and rows. There are some operations that can only be performed on a square matrix, which I will introduce shortly. The notation for matrices, in general, uses capital letters as the variable names. Matrix dimensions can be specified as a shorthand notation, and to also identify an indexed position within the matrix. As far as I am aware, rowmajor indexing is always used for consistency, that is, the first index represents the row, and the second represents the column.
\( A= [a_{ij}]= \left\lbrack \matrix{a_{11} & a_{12} & \ldots & a_{1n} \cr a_{21} & a_{22} & \ldots & a_{2n} \cr \vdots & \vdots & \ddots & \vdots \cr a_{m1} & a_{m2} & \ldots & a_{mn} } \right\rbrack \)
Vector
The vector is a special case of a matrix, where there is only a single row or a single column; also a common practice is to use lowercase letters to represent vectors:
\( u= \left\lbrack \matrix{u_1 & u_2 & u_3} \right\rbrack \), \( v= \left\lbrack \matrix{v_1\cr v_2\cr v_3} \right\rbrack\)
Operations
The mathematical shorthand notation for matrices is very elegant. It simplifies these equations that would be quite cumbersome, otherwise. There are only a few operations that we are concerned with to allow us to get started. Each operation has a basic algorithm to follow to perform a calculation, and the algorithm easily scales with the dimensions of a matrix.
Furthermore, the relationship between math and programming is not always clear. I have a number of colleagues that are excellent programmers and yet they do not consider their math skills very strong. I think that it could be helpful to many to see the conversion of the matrix operations that I just described from mathematical notation into code form. Once I demonstrate how to perform an operation with the mathematical notation and algorithmic steps, I will also show a basic C++ implementation of the concept to help you understand how these concepts map to actual code.
The operations that I implement below assume a Matrix class with the following interface:
C++
class Matrix  
{  
public:  
// Ctor and Dtor omitted  
 
// Calculate and return a reference to the specified element.  
double& element(size_t row, size_t column);  
 
// Resizes this Matrix to have the specified size.  
void resize(size_t row, size_t column);  
 
// Returns the number rows.  
size_t rows();  
 
// Returns the number of columns.  
size_t columns();  
private:  
std::vector<double> data;  
}; 
Transpose
Transpose is a unary operation that is performed on a single matrix, and it is represented by adding a superscript T to target matrix.
For example, the transpose of a matrix A is represented with A^{T}.
The transpose "flips" the orientation of the matrix so that each row becomes a column, and each original column is transformed into a row. I think that a few examples will make this concept more clear.
\( A= \left\lbrack \matrix{a & b & c \cr d & e & f \cr g & h & i} \right\rbrack \)
\(A^T= \left\lbrack \matrix{a & d & g \cr b & e & h \cr c & f & i} \right\rbrack \)
The resulting matrix will contain the same set of values as the original matrix, only their position in the matrix changes.
\( B= \left\lbrack \matrix{1 & 25 & 75 & 100\cr 0 & 5 & 50 & 25\cr 0 & 0 & 10 & 22} \right\rbrack \)
\(B^T= \left\lbrack \matrix{1 & 0 & 0 \cr 25 & 5 & 0 \cr 75 & 50 & 10 \cr 100 & 25 & 22} \right\rbrack \)
It is very common to transpose a matrix, including vertices, before performing other operations. The reason will become clear for matrix multiplication.
\( u= \left\lbrack \matrix{u_1 & u_2 & u_3} \right\rbrack \)
\(u^T= \left\lbrack \matrix{u_1 \cr u_2 \cr u_3} \right\rbrack \)
Matrix Addition
Addition can only be performed between two matrices that are the same size. By size, we mean that the number of rows and columns are the same for each matrix. Addition is performed by adding the values of the corresponding positions of both matrices. The sum of the values creates a new matrix that is the same size as the original two matrices provided to the add operation.
If
\( A= \left\lbrack \matrix{0 & 2 & 4 \cr 6 & 8 & 0 \cr 2 & 4 & 6} \right\rbrack, B= \left\lbrack \matrix{1 & 3 & 5 \cr 7 & 9 & 11 \cr 13 & 15 & 17} \right\rbrack \)
Then
\(A+B=\left\lbrack \matrix{1 & 1 & 9 \cr 1 & 17 & 11 \cr 15 & 11 & 23} \right\rbrack \)
The size of these matrices do not match in their current form.
\(U= \left\lbrack \matrix{4 & 8 & 5}\right\rbrack, V= \left\lbrack \matrix{1 \\ 5 \\ 4}\right\rbrack \)
However, if we take the transpose of one of them, their sizes will match and they can then be added together. The size of the result matrix depends upon which matrix we perform the transpose operation on from the original expression.:
\(U^T+V= \left\lbrack \matrix{3 \\ 3 \\ 9} \right\rbrack \)
Or
\(U+V^T= \left\lbrack \matrix{3 & 3 & 9} \right\rbrack \)
Matrix addition has the same algebraic properties as with the addition of two scalar values:
Commutative Property:
Associative Property:
Identity Property:
Inverse Property:
The code required to implement matrix addition is relatively simple. Here is an example for the Matrix
class definition that I presented earlier:
C++
void Matrix::operator+=(const Matrix& rhs)  
{  
if (rhs.data.size() == data.size())  
{  
// We can simply add each corresponding element  
// in the matrix element data array.  
for (size_t index = 0; index < data.size(); ++index)  
{  
data[index] += rhs.data[index];  
}  
}  
}  
 
Matrix operator+( const Matrix& lhs,  
const Matrix& rhs)  
{  
Matrix result(lhs);  
result += rhs;  
 
return result;  
} 
Scalar Multiplication
Scalar multiplication allows a single scalar value to be multiplied with every entry within a matrix. The result matrix is the same size as the matrix provided to the scalar multiplication expression:
If
\( A= \left\lbrack \matrix{3 & 6 & 9 \cr 12 & 15 & 18} \right\rbrack \)
Then
\( \frac{1}3 A= \left\lbrack \matrix{1 & 2 & 3 \cr 4 & 5 & 6} \right\rbrack, 0A= \left\lbrack \matrix{0 & 0 & 0 \cr 0 & 0 & 0} \right\rbrack, A= \left\lbrack \matrix{3 & 6 & 9 \cr 12 & 15 & 18} \right\rbrack, \)
Scalar multiplication with a matrix exhibits these properties, where c and d are scalar values:
Distributive Property:
Identity Property:
The implementation for scalar multiplication is even simpler than addition.
Note: this implementation only allows the scalar value to appear before the Matrix object in multiplication expressions, which is how the operation is represented in math notation.:
C++
void Matrix::operator*=(const double lhs)  
{  
for (size_t index = 0; index < data.size(); ++index)  
{  
data[index] *= rhs;  
}  
}  
 
Matrix operator*( const double scalar,  
const Matrix& rhs)  
{  
Matrix result(rhs);  
result *= scalar;  
 
return result;  
} 
Matrix Multiplication
Everything seems very simple with matrices, at least once you get used to the new structure. Then you are introduced to matrix multiplication. The algorithm for multiplication is not difficult, however, it is much more labor intensive compared to the other operations that I have introduced to you. There are also a few more restrictions on the parameters for multiplication to be valid. Finally, unlike the addition operator, the matrix multiplication operator does not have all of the same properties as the multiplication operator for scalar values; specifically, the order of parameters matters.
Input / Output
Let's first address what you need to be able to multiply matrices, and what type of matrix you can expect as output. Once we have addressed the structure, we will move on to the process.
Given an the following two matrices:
\( A= \left\lbrack \matrix{a_{11} & \ldots & a_{1n} \cr \vdots & \ddots & \vdots \cr a_{m1} & \ldots & a_{mn} } \right\rbrack, B= \left\lbrack \matrix{b_{11} & \ldots & b_{1v} \cr \vdots & \ddots & \vdots \cr b_{u1} & \ldots & b_{uv} } \right\rbrack \)
A valid product for \( AB=C \) is only possible if number of columns \( n \) in \( A \) is equal to the number of rows \( u \) in \( B \). The resulting matrix \( C \) will have the dimensions \( m \times v \).
\( AB=C= \left\lbrack \matrix{c_{11} & \ldots & c_{1v} \cr \vdots & \ddots & \vdots \cr c_{m1} & \ldots & c_{mv} } \right\rbrack, \)
Let's summarize this in a different way, hopefully this arrangement will make the concept more intuitive:
One last form of the rules for matrix multiplication:
 The number of columns, \(n\), in matrix \( A \) must be equal to the number of rows, \(u\), in matrix \( B \):
\( n = u \)
 The output matrix \( C \) will have the number of rows, \(m\), in \(A\), and the number of columns, \(v\), in \(B\):
\( m \times v \)
 \(m\) and \(v\) do not have to be equal. The only requirement is that they are both greaterthan zero:
\( m \gt 0,\)
\(v \gt 0 \)
How to Multiply
To calculate a single entry in the output matrix, we must multiply the element from each column in the first matrix, with the element in the corresponding row in the second matrix, and add all of these products together. We use the same row in the first matrix, \(A\), for which we are calculating the row element in \(C\). Similarly, we use the column in the second matrix, \(B\) that corresponds with the calculating column element in \(C\).
More succinctly, we can say we are multiplying rows into columns.
For example:
\( A= \left\lbrack \matrix{a_{11} & a_{12} & a_{13} \cr a_{21} & a_{22} & a_{23}} \right\rbrack, B= \left\lbrack \matrix{b_{11} & b_{12} & b_{13} & b_{14} \cr b_{21} & b_{22} & b_{23} & b_{24} \cr b_{31} & b_{32} & b_{33} & b_{34} } \right\rbrack \)
The number of columns in \(A\) is \(3\) and the number of rows in \(B\) is \(3\), therefore, we can perform this operation. The size of the output matrix will be \(2 \times 4\).
This is the formula to calculate the element \(c_{11}\) in \(C\) and the marked rows used from \(A\) and the columns from \(B\):
\( \left\lbrack \matrix{\color{#B11D0A}{a_{11}} & \color{#B11D0A}{a_{12}} & \color{#B11D0A}{a_{13}} \cr a_{21} & a_{22} & a_{23}} \right\rbrack \times \left\lbrack \matrix{\color{#B11D0A}{b_{11}} & b_{12} & b_{13} & b_{14} \cr \color{#B11D0A}{b_{21}} & b_{22} & b_{23} & b_{24} \cr \color{#B11D0A}{b_{31}} & b_{32} & b_{33} & b_{34} } \right\rbrack = \left\lbrack \matrix{\color{#B11D0A}{c_{11}} & c_{12} & c_{13} & c_{14}\cr c_{21} & c_{22} & c_{23} & c_{24} } \right\rbrack \)
\( c_{11}= (a_{11}\times b_{11}) + (a_{12}\times b_{21}) + (a_{13}\times b_{31}) \)
To complete the multiplication, we need to calculate these other seven values \( c_{12}, c_{13}, c_{14}, c_{21}, c_{22}, c_{23}, c_{24}\). Here is another example for the element \(c_{23}\):
\( \left\lbrack \matrix{ a_{11} & a_{12} & a_{13} \cr \color{#B11D0A}{a_{21}} & \color{#B11D0A}{a_{22}} & \color{#B11D0A}{a_{23}} } \right\rbrack \times \left\lbrack \matrix{b_{11} & b_{12} & \color{#B11D0A}{b_{13}} & b_{14} \cr b_{21} & b_{22} & \color{#B11D0A}{b_{23}} & b_{24} \cr b_{31} & b_{32} & \color{#B11D0A}{b_{33}} & b_{34} } \right\rbrack = \left\lbrack \matrix{c_{11} & c_{12} & c_{13} & c_{14}\cr c_{21} & c_{22} & \color{#B11D0A}{c_{23}} & c_{24} } \right\rbrack \)
\( c_{23}= (a_{21}\times b_{13}) + (a_{22}\times b_{23}) + (a_{23}\times b_{33}) \)
Notice how the size of the output matrix changes. Based on this and the size of the input matrices you can end up with some interesting results:
\( \left\lbrack \matrix{a_{11} \cr a_{21} \cr a_{31} } \right\rbrack \times \left\lbrack \matrix{ b_{11} & b_{12} & b_{13} } \right\rbrack = \left\lbrack \matrix{c_{11} & c_{12} & c_{13} \cr c_{21} & c_{22} & c_{23} \cr c_{31} & c_{32} & c_{33} } \right\rbrack \)
\( \left\lbrack \matrix{ a_{11} & a_{12} & a_{13} } \right\rbrack \times \left\lbrack \matrix{b_{11} \cr b_{21} \cr b_{31} } \right\rbrack = \left\lbrack \matrix{c_{11} } \right\rbrack \)
Tip:
To help you keep track of which row to use from the first matrix and which column from the second matrix, create your result matrix of the proper size, then methodically calculate the value for each individual element.
The algebraic properties for the matrix multiplication operator do not match those of the scalar multiplication operator. These are the most notable:
Not Commutative:
The order of the factor matrices definitely matters.
I think it is very important to illustrate this fact. Here is a simple \(2 \times 2 \) multiplication performed two times with the order of the input matrices switched. I have highlighted the only two terms that the two resulting answers have in common:
\( \left\lbrack \matrix{a & b \cr c & d } \right\rbrack \times \left\lbrack \matrix{w & x\cr y & z } \right\rbrack = \left\lbrack \matrix{(\color{red}{aw}+by) & (ax+bz)\cr (cw+dy) & (cx+\color{red}{dz}) } \right\rbrack \)
\( \left\lbrack \matrix{w & x\cr y & z } \right\rbrack \times \left\lbrack \matrix{a & b \cr c & d } \right\rbrack = \left\lbrack \matrix{(\color{red}{aw}+cx) & (bw+dx)\cr (ay+cz) & (by+\color{red}{dz}) } \right\rbrack \)
Product of Zero:
Scalar Multiplication is Commutative:
Associative Property:
Multiplication is associative, however, take note that the relative order for all of the matrices remains the same.
Transpose of a Product:
The transpose of a product is equivalent to the product of transposed factors multiplied in reverse order
Code
We are going to use a twostep solution to create a general purpose matrix multiplication solution. The first step is to create a function that properly calculates a single element in the output matrix:
C++
double Matrix::multiply_element(  
const Matrix& rhs,  
const Matrix& rhs,  
const size_t i,  
const size_t j  
)  
{  
double product = 0;  
 
// Multiply across the specified row, i, for the left matrix  
// and the specified column, j, for the right matrix.  
// Accumulate the total of the products  
// to return as the calculated result.  
for (size_t col_index = 1; col_index <= lhs.columns(); ++col_index)  
{  
for (size_t row_index = 1; row_index <= rhs.rows(); ++row_index)  
{  
product += lhs.element(i, col_index)  
* rhs.element(row_index, j);  
}  
}  
 
return product;  
} 
Now create the outer function that performs the multiplication to populate each field of the output matrix:
C++
// Because we may end up creating a matrix with  
// an entirely new size, it does not make sense  
// to have a *= operator for this generalpurpose solution.  
Matrix& Matrix::operator*( const Matrix& lhs,  
const Matrix& rhs)  
{  
if (lhs.columns() == rhs.rows())  
{  
// Resize the result matrix to the proper size.  
this>resize(lhs.row(), rhs.columns());  
 
// Calculate the value for each element  
// in the result matrix.  
for (size_t i = 1; i <= this>rows(); ++i)  
{  
for (size_t j = 1; j <= this>columns(); ++j)  
{  
element(i,j) = multiply_element(lhs, rhs, i, j);  
}  
}  
}  
 
return *this;  
} 
Summary
3D computer graphics relies heavily on the concepts found in the branch of math called, Linear Algebra. I have introduced two basic constructs from Linear Algebra that we will need to move forward and perform the fundamental calculations for rendering a threedimensional display. At this point I have only scratched the surface as to what is possible, and I have only demonstrated how. I will provide context and demonstrate the what and why, to a degree, on the path helping you begin to work with threedimensional graphics libraries, even if math is not one of your strongest skills.
References
Kreyszig, Erwin; "Chapter 7" from Advanced Engineering Mathematics, 7th ed., New York: John Wiley & Sons, 1993
The binary, octal and hexadecimal number systems pervade all of computing. Every command is reduced to a sequence of strings of 1s and 0s for the computer to interpret. These commands seem like noise, garbage, especially with the sheer length of the information. Becoming familiar with binary and other number systems can make it much simpler to interpret the data.
Once you become familiar with the relationships between the number systems, you can manipulate the data in more convenient forms. Your ability to reason, solve and implement solid programs will grow. Patterns will begin to emerge when you view the raw data in hexadecimal. Some of the most efficient algorithms are based on the powers of two. So do yourself a favor and become more familiar with hexadecimal and binary.
Number Base Conversion and Place Value
I am going to skip this since you can refer to my previous entry for a detailed review of number system conversions[^].
Continuous Data
Binary and hexadecimal are more natural number systems for use with computers because they both have a base of 2 raised to some exponent; binary is 2^{1} and hexadecimal is 2^{4}. We can easily convert from binary to decimal. However, decimal is not always the most useful form. Especially when we consider that we don't always have a nice organized view of our data. Learning to effectively navigate between the formats, especially in your head, increases your ability to understand programs loaded into memory, as well as streams of raw data that may be found in a debugger or analyzers like Wireshark.
The basic unit of data is a byte. While it is true that a byte can be broken down into bits, we tend to work with these bundled collections and process bytes. Modern computer architectures process multiple bytes, specifically 8 for 64bit computers. And then there's graphics cards, which are using 128, 192 and even 256bit width memory accesses. While these large data widths could represent extremely large numbers in decimal, the values tend to have encodings that only use a fraction of the space.
Recognize Your Surroundings
What is the largest binary value that can fit in an 8bit field?
It will be a value that has eight, ones: 1111 1111
. Placing a space after every four bits helps with readability.
What is the largest hexadecimal value that can fit in an 8bit field?
We can take advantage of the power of 2 relationship between binary and hexadecimal. Each hexadecimal digit requires four binary bits. It would be very beneficial to you to commit the following table to memory:
Dec  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15 
Bin  0000  0001  0010  0011  0100  0101  0110  0111  1000  1001  1010  1011  1100  1101  1110  1111 
Hex  0x0  0x1  0x2  0x3  0x4  0x5  0x6  0x7  0x8  0x9  0xA  0xB  0xC  0xD  0xE  0xF 
Now we can simply take each grouping of four bits, 1111 1111
, and convert them into hexdigits, FF
.
What is the largest decimal value that can fit in an 8bit field? This isn't as simple, we must convert the binary value into decimal. Using the number base conversion algorithm, we know that the eighth bit is equal to 2^{8}, or 256. Since zero must be represented the largest value is 2^{8}  1, or 255
.
Navigating the Sea of Data
Binary and hexadecimal (ok, and octal) are all number systems whose base is a power of two. Considering that computers work with binary, the representation of binary and other number systems that fit nicely into the logical data blocks, bytes, become much more meaningful. Through practice and the relatively small scale of the values that can be represented with 8bits, converting a byte a to decimal value feels natural to me. However, when the data is represented with 2, 4 and 8 bytes, the values can grow quickly, and decimal form quickly loses its meaning and my intuition becomes useless as to what value I am dealing with.
For example:
What is the largest binary value that can fit in an 16bit field?
It will be a value that has sixteen, ones: 1111 1111 1111 1111
.
What is the largest hexadecimal value that can fit in an 16bit field? Again, let's convert each of the blocks of fourbits into a hexdigit, FFFF
.
What is the largest decimal value that can fit in an 16bit field?
Let's see, is it 2^{16}  1, so that makes it 65355
, 65535
, or 65555
, it's somewhere around there.
Here's a realworld example that I believe many people are familiar with is the RGB color encoding for pixels. You could add a fourth channel to represent an alpha channel an encode RGBA. If we use onebyte per channel, we can encode all four channels in a single 32bit word, which can be processed very efficiently.
Imagine we are looking at pixel values in memory and the bytes appear in this format: RR GG BB
. It takes two hexadecimal digits to represent a single byte. Therefore, the value of pure green could be represented as 00 FF 00
. To view this same value as a 24bit decimal, is much less helpful, 65,280
.
If we were to change the value to this, 8,388,608
, what has changed? We can tell the value has changed by roughly 8.3M. Since a 16bit value can hold ~65K, we know that the third byte has been modified, and we can guess that it has been increased to 120 or more (8.3M / 65K). But what is held in the lower two bytes now? Our ability to deduce information is not much greater than an estimate. The value in hexadecimal is 80 00 00
.
The difference between 8,388,608
and 8,388,607
are enormous with respect to data encoded at the bytelevel:
8,388,608 
8,388,607 

00  80  00  00  7F  FF  
Now consider that we are dealing with 24bit values in a stream of pixels. For every 12bytes, we will have encoded 4 pixels. Here is a representation of what we would see in the data as most computer views of memory are grouped into 32bit groupings:
4,260,948,991 
2,568,312,378 
3,954,253,066 

FD  F8  EB  FF  99  15  56  3A  8C  B1  1D  0A  
Binary
I typically try to use binary only up to 8bits. Anything larger than that, I simply skip the first 8bits (1byte), and focus on the next 8bits. For example: 1111 1111
1111 1111
. As I demonstrated with the RGB color encoding, values do not always represent a single number. In fact, it is a stream of data. So whether there is one byte, four bytes, or a gigabytes worth of data, we usually process it either one byte or one word at a time. We actually break down decimal numbers into groups by using a comma (or some other regional punctuation) to separate thousands, e.g. 4,294,967,295
, 294
of what? Millions.
Binary manipulation can be found in many contexts. One of the most common is storing a collection of flags in an unsigned buffer. These flags can be flipped on and off with the Boolean flag operations of your programming language. Using a mask with multiple bits allows an enumerated value with more than two options to be encoded within the binary buffer. I'm not going to go to describe the mechanics here, I simply want to demonstrate that data is encoded in many ways, and there are many reasons for you to become proficient at picking apart bytestreams down to the bit.
Hexadecimal
It is just as beneficial to be able to convert between hexadecimal and decimal in your head from 1 to 255, especially if you ever deal with selecting colors for webpages or you edit images in programs like Photoshop. It only takes two hexdigits to represent an 8bit byte. If you memorize the values that are defined by the highorder hexdigit, reading hexadecimal byte values becomes almost as easy as reading decimal values. There are two tables listed below. The table on the left indicates the value mapping for the highorder and loworder hexdigits of a byte. The table on the right contains a set of landmark values that you will most certainly encounter and find useful:


Some of the numbers listed in the table on the right are more obvious than others for why I chose to included them in the map. For example, 100, that's a nice round number that we commonly deal with daily. When you run into 0x64, now you can automatically map that in your mind to 100 and have a referencepoint for its value. Alternatively, if you have a value such as 0x6C, you could start by calculating the difference: 0x6C  0x64 = 8; add that to 100 to know the decimal value is 108.
Some of you will recognize the values 127, 168, 192, 224 and 238. For those that do not see the relevance of these values, they are common octets found in landmark network IP addresses. The table below shows the name and common dotteddecimal for as well as the hexadecimal form of the address in both bigendian and littleendian format:
Name  IPv4 Address  
bigendian  littleendian  
localhost  127.0.0.1  0x7F000001  0x0100007F  
private range  192.168.x.x  0xC0A80000  0x0000A8C0  
multicast base address  224.0.0.0  0xE0000000  0x000000E0  
multicast last address  239.255.255.255  0xEFFFFFFF  0xFFFFFFEF 
One additional fact related to the IPv4 multicast address range, is the official definition declares any IPv4 address with the leading four bits set to 1110
to be a multicast address. 0xE is the hexdigit that maps to binary 1110
. This explains why the full multicast range of addresses is from 0xE0000000 to 0xEFFFFFFF, or written in decimal dotted notation as 224.0.0.0 to 239.255.255.255.
Octal
I'm sure octal has uses, but I have never ran into a situation that I have used it.
Actually, there is one place, which is to demonstrate this equality:
Oct 31 = Dec 25
Landmark Values
Binary has landmark values similar to the way decimal does. For example, 1's up to 10's, 10's through 100, then 1000, 1M, 1B ... These values are focused on powers of 10, and arranged in groups that provide a significant range to be meaningful when you roundoff or estimate. Becoming proficient with the scale of each binary bit up to 2^{10}, which is equal to 1024, or 1K. At this point, we can use the powers of two in multiple contexts; 1) Simple binary counting, 2) measuring data byte lengths.
I have constructed a table below that shows landmark values with the power of 2. On the left I have indicated the name of the unit if we are discussing bits and the size of a single value; for example, 8bits is a byte. I haven't had the need to explore data values larger than 64bits yet, but it's possible that some of you have. To the right I have indicated the units of measure used in computers when we discuss sizes. At 1024 is a kilobyte, and 1024 kilobytes is officially a megabyte (not if you're a hard drive manufacturer...). I continued the table up through 2^{80}, which is known as a "yottabyte." Becoming familiar with the landmarks up to 32bits is probably enough for most people. To the far right I converted the major landmark values into decimal to give you a sense of size and scale for these numbers.
Unit (bits)  Binary Exponent  Unit (bytes)  Place Value  Largest Value  
Bit (b)  2^{0}  Byte (B)  1  
2^{1}  Word  2  
2^{2}  Doubleword  4  
Nibble  2^{3}  Quadword  8  
2^{4}  16  
2^{5}  32  
2^{6}  64  
Byte (B)  2^{7}  128  255  
2^{8}  256  
2^{9}  512  
2^{10}  Kilobyte (KB)  1024  
2^{20}  Megabyte (MB)  1024^{2}  1,048,576  
2^{30}  Gigabyte (GB)  1024^{3}  1,073,741,824  
Doubleword  2^{31}  1024^{3}·2  4,294,967,295  
2^{32}  1024^{3}·2^{2}  
2^{40}  Terabyte (TB)  1024^{4}  
2^{50}  Petabyte (PB)  1024^{5}  
2^{60}  Exabyte (EB)  1024^{6}  
Quadword  2^{63}  1024^{6}·2^{3}  9,223,372,036,854,775,807  
2^{70}  Zettabyte (ZB)  1024^{7}  
2^{80}  Yottabyte (YB)  1024^{8} 
Summary
Looking at a large mass of symbols, such as a memory dump from a computer, can appear to be overwhelming. We do have tools that we use to develop and debug software help us organize and make sense of this data. However, these tools cannot always display this data in formats that are helpful for particular situations. In these cases, understanding the relationships between the numbers and their different representations can be extremely helpful. This is especially true when the data is encoded at irregular offsets. I presented a few common forms that you are likely to encounter when looking at the values stored in computer memory. Hopefully you can use these identity and navigation tips to improve your development and debugging abilities.
Recent Comments