Alchemy
Alchemy is an open-source compile-time data serialization library
The Alchemy library is designed and implemented to improve the portability, robustness and maintainability of serialization code. Alchemy is written in C++ and composed of loosely-coupled and non-invasive components that allow it to be integrated into existing projects and even work along side other serialization libraries.
Features
- Simple format definitions
- Most code is generated and optimized at compile-time
- Provides compile-time reflection for Alchemy data structures
- Abstracts the details to create portable, robust and maintainable data I/O
- Creates ABI compatible serialized data
Supported Types
- All C++ fundamental types (although no special processing for floating-point types)
- Packed Bits: A type that simulates bit-fields in a portable way.
- Fixed-size arrays of any supported type.
- Dynamically-sized arrays (vector) of any supported type.
- Nested sub-structures.
- Opaque view: Allows a message-format view to be mapped on an existing buffer of raw bytes.
Types that will soon be supported:
- Unions: A type can be interpreted as any of a defined set.
- Optional: Fields that may or may not be present in the message buffer.
Clone Alchemy at GitHub
You can clone the source and participate in the project at GitHub: Alchemy[^].
Performance
Since Alchemy uses the compiler to generate most of the code at compile-time, all of the type information that exists is available to the compiler for optimizations. In most cases Alchemy messages can be serialized at least as fast if not faster than a safe and portable hand-written version.
Here is a summary of the current benchmarks that exist:
Each of the tests are run across a pre-allocated buffer of 512 MB. The structures for each test will extract as many messages as they can from the 512 MB buffer. Each message that is extracted will:
- Deserialize the message from the buffer
- Convert the byte-order
- Serialize the message into a different buffer
This is a summary of the size of each structure and the number of iterations that were performed over the 512 MB block of memory:
Test Name |
Struct Size |
Iterations |
basic | 14 | 28247922 |
packed | 7 | 76695844 |
unaligned | 19 | 28256363 |
complex | 86 | 6242685 |
array | 1024 | 524288 |
no_conversion | 64 | 8388608 |
This test reveals an area where Alchemy will need to focus at some point in the future. The extra processing that is performed just for the copy adds an enormous amount of overhead.
Data Serialization
Let's be honest, serializing data is not an interesting or sexy problem. In fact, it usually becomes a necessary evil that is required as a means to an end. Here are some examples:
- Inter-process Communication
- Network transfer (sockets)
- Pipes
- Mail-slots
- Shared memory
- Remote procedure calls
- File I/O
- Configuration settings
- Program state
- Error Logging
- Fault Tolerance (backup / restore)
- Transform data
- Import and Export formats
- Adapt between protocols
- Maintain or add compatibility
- Interoperability of platforms
All of these problems above relate to collecting or moving data to a final location so that something interesting can be done with it. Therefore, serialization can be left until it becomes necessary to move further with development.
Interfaces are how you control access at the boundaries. When you succeed in designing a solid interface, maintaining your software becomes much simpler. However, poorly designed interfaces lead to a lot of pain for maintenance and feature additions in the future. Failing to create a solid serialization process leads to the same type of development pain for the very same reasons.
Value
Alchemy's greatest value is solving problems that most developers do not even realize exist until it's too late.
Transferring data between two machines can be accomplished naïvely with a very small amount of code. To transfer data reliably, accurately and safely requires much more thought and paying careful attention to many mundane details. Alchemy abstracts these mundane details for developers so more effort can be focused on the goals of their software.
A decade ago, a programmer could be blissfully ignorant of other platforms, because there were so few that were in wide use among the masses. Fast-forward to now, the world is all about mobility, The Internet of Things and integrating devices of all types.
As you develop and test between two devices for a single platform these issues will remain hidden. In essence, these problems do not exist between homogenous devices. However, each new platform or processor type that is supported increases the probability that a program will encounter these portability issues.
Memory alignment
Each processor has its own requirements for how data should be aligned in memory to process instructions. Quite often, packed-structures are used in C and C++ build a serialized buffer. The data is then copied in bulk with memcpy
to transfer to and from the structure. Even if care is taken to define your data-structures to place your fields on the appropriate boundary, you will eventually run into a processor that has different requirements.
A bulk memcpy
approach is simple and efficient. It is also not portable or robust. There are no guarantees from the compiler for how data must be structured. This is especially true with bit-fields.
Byte-order management
The majority of devices used by consumers use little-endian byte order. However, most network protocols transfer data in a predefined order called network byte-order, which is big-endian byte-order. If you build your software to run on a new platform that has a different byte-order than your data protocols assume, the new platform may still appear to run. However, you will experience data corruption issues, and the software will be anything but reliable. The best case scenario is that the software immediately crashes on this new platform letting you know there is a problem.
Emulated bit-fields
The standard for both C and C++ do not place any requirements on how bits should be packed in a data field when bit-fields are specified. Therefore, there is no guarantee that a program compiled by two different compilers will pack the bits in the same order. This essentially breaks ABI compatibility.
The PackedBits
type in Hg provides named fields in a sub-field and performs the proper mask and shift operations that you would have to perform manually in order to get the proper values. This format is more portable than the languages built in bit-fields, and less error-prone than a manual implementation of a packed-bits data field..
Alchemy is not a communication protocol
Alchemy only serializes and structures the data to be transferred by the protocol. Alchemy will help you create and maintain the definitions required to communicate with any protocol.
These are examples of serialization APIs that are also protocols:
- Flat Buffers (Google)
- Protocol Buffers (Google)
- Thrift (Apache)
Alchemy does not create a proprietary format
The format that Alchemy creates for your data, is the same structured-format that you define. This allows you to create ABI compatible formats regardless of the medium, network transfer, IPC, file formats.
Components
Alchemy is comprised of a set of components that are named after elements on the Periodic Table of Elements, or possibly terms derived from Alchemy.
Hg: Mercury (Messenger of the Gods)
Hg is a message and structured data processing framework that provides the ability to define portable structured message formats. Hg creates ABI consistent definitions that are compatible to the bit-level. These structured definitions will facilitate the population of the data in the memory buffers for inter-process communication, file and network transfer, and even direct memory access mapping.
C: Carbon (Copy)
Carbon uses the Alchemy messages defined for Hg, and creates C-linkable struct
s and function calls. These structures and APIs can be compiled into a library, which can then be imported into any language that is capable of interfacing with C-Libraries.
Bi: Bismuth (Big Integer)
Bismuth (Bi) is a Big-Integer library to handle operations that require numbers greater than the native processor can support. The primary goals of Bismuth are ease-of-use and security. A wide variety of operations from Number Theory are implemented in Bismuth, including modular arithmetic and discrete logarithms. The primary goal for this library is to become a building block to build other robust applications, such as cryptosystems. It is also written using C++ 11 features with clarity of implementation and simplicity to make security more realistic.
More information to follow...
Sample Code
Documentation
Catalog of development entries posted for Alchemy.
Participation and Feedback
If you would like to participate in Alchemy, have suggestions, or find some tasks difficult please contact me. This library has served me well, and the limited feedback that I have received from others has also been valuable.
Recent Comments