Create a simple application that translates a binary file to CSV and vice versa using ANSI-C and Macrocoding

CSV and binary files

CSV, Comma Separated Values, is a very simple yet versatile data format. CSV files can be natively edited by spreadsheets like Excel or OpenOffice Calc, loaded into RDBMS tables, processed by automated scripts or even modified with a bare text editor. On the other side, proprietary binary files can be edited only by the application that created them. Sometimes such applications are not well suited to perform all the operations we need: so why not exporting the binary data in CSV files and do the edits otherwise?

My binary file

Screenshot from the CPS editing software.

Being a licensed amateur radio operator, I recently purchased a new DMR transceiver, a Retevis RT-3 (also known as Tytera DM-380). This kind of transceivers are designed for utility use and normally programmed once on the assigned channels. Instead, when used for amateur radio operations, due the huge number of repeaters, networks and zones, they require literally hundreds of setting to be typed in. Moreover, the ham radio world is in frantic evolution, so these settings need to be continuosly updated and extended. These settings are stored in a binary file called codeplug, which is edited on a PC by a software named "CPS"; this software also provides to transfer the codeplug from/to the radio by mean of an USB connection.

The problem is that the editing capabilities of the "CPS" software are very basic: can't import anything external, you can't change the order of the items (to do so you must delete them and re-enter everything in the desired order!), not even CTRL-V works but you have to right-click and select "Paste"! It is inconceivable that in 2016 a human operator is forced to manually copy hundreds of numbers from a computer window to another! This is why I decided to translate the codeplug binary files into CSV files: once in CSV, they would become easily accessible.

Is this binary format suited to CSV export?

Not every binary file is suitable to be translated in CSV. The CSV format is inherently tabular so it is suited for data that can be represented as one or more "records" having identical fields. In the case of the codeplug, the internal format was exactly formed by a dozen or so of different series of records that could be conveniently represented as CSV tables. For example, a table can contain the list of channels, another table the contacts (i.e. the "phonebook" of the radio), another the zones and so on.

Know your binary format

The translation process, i.e. converting the binary file to a CSV and vice versa, begins from the knowledge of the internal format of the binary file. Without an official format definition, the only way is to deduce it by detecting what bits change in the binary file when editing each field with the supplied editor. In my case, I found that the check-boxed fields were represented by a single bit in octets shared among other settings. Fields requiring more choices were represented by multiple bits or by sequences of entire octets. Some of them were  encoded as packed BCD little-endian, where other were packed BCD big-endian, while other were little-endian pure binary (don't ask me the reason of this variety of formats in the same file!). Most strings were saved as little-endian 16-bit Unicode, while other were stored as plain 8-bit ASCII. At the end of this process, I had a fairly complete specification of the binary file.

The translation process

The translation process concept is quite simple: it is just a matter of extracting every field value from the binary file and writing it in the related .CSV file. And vice versa. The complexity is hidden in the number of ways the same type of data can be encoded. A simple unsigned integer value can be found in BCD, reverse BCD, BCD with extensions, little-endian binary, as decimal ASCII number (e.g. 123) or quoted decimal ASCII number (e.g. "123"). Strings can be found as fixed size Unicode, fixed size ASCII, unquoted ASCIIZ or quoted ASCIIZ with escape (for example "15"" monitor", as is in CSV). Furthermore, the various values must be validated: most values have valid ranges and steps to be enforced; other accept only values among a given set. Other have ranges that change according to the setting of other fields. All these rules must be enforced by a validation function to make sure no invalid binary files are generated.

The normalized structure

The best approach to reduce the impact of this variety of encodings is to implement a "normalized structure". A normalized structure is a structure where all the fields of the same "logical" type are implemented with the same type. For example, all numeric fields, no matter whether they are saved as BCD, binary or ASCII, are represented in the normalized structure by the C type "unsigned".

Every different record type found in the binary file has its related normalized structure. In my binary file, for example, we have up to 1000 digital contacts (the "phonebook" of the radio): they will be mapped to the T_DigitalContact normalized structure. We have also up to 1000 Channel Information records (the channels list of the radio): they will be mapped to the T_ChannelInformation normalized structure. And so on for all the other record types.

For example, this is the definition of the "Digital Contact" binary format:

Name Type Offset bits Len bits Notes
CallId Binary 0 24  
CallReceiveTone Binary 26 1 0=off, 1=on
CallType Binary 30 2 1=group, 2=private, 3=all
Name Unicode 32 256 Maximum length: 16 characters.

The normalized structure for "digital contacts" will look like this:

/* Structure for record DigitalContact */
typedef unsigned short t_unicode;
typedef unsigned long t_numeric;

typedef struct {
	t_numeric CallId;
	t_numeric CallReceiveTone;
	t_numeric CallType;
	t_unicode Name [17];
} T_DigitalContact;

For every record type in the binary field I had to provide with its normalized structure. For every field in the record type I had to add the related attribute to the normalized structure.

The support functions

Besides the normalized structure, I needed the following items:

  1. a binary-to-normalized conversion function: it reads a record in binary form and writes the related fields to the normalized structure;
  2. a normalized-to-binary convesion function: it reads the normalized structure and writes the binary records;
    these two function must keep track of which encoding/decoding format (BCD, binary, etc.) is to be used on each binary field.
  3. a normalized-to-CSV conversion function: it reads the normalized structure and writes the CSV column header and the following lines with the values, correctly formatted according to rfc4180;
  4. a CSV-to-normalized conversion function: it reads the CSV file and feeds the normalized structure;
  5. validation function: checks the values in the normalized structure to see if they violate any rule (values out of range, references to unexisting records, etc.)
  6. an user manual, i.e. a HTML page describing the allowed CSV titles and the values expected for them;
  7. format specification, i.e. another HTML page describing the binary format to allow other people to take advantage of it.

This is the outline of the application:

rdt2csv_norm

The macrocoding solution

The code of all the functions and documents above depend on a single source, which is the format specification of the binary file. To understand that, let's see the consequences of the simple addition of a new field. When a new field is added in a record "R", we have to:

  1. add the new field to the normalized structure for the record of type "R"
  2. modify the binary-to-normalized function for the record of type "R"
  3. modify the normalized-to-binary function
  4. modify the normalized-to-CSV function
  5. modify the CSV-to-normalized function
  6. modify the user-manual HTML file
  7. modify the format specification HTML file

The drawbacks are evident: the sequence of required changes is long and it is easy to make mistakes and have some parts misaligned. Furthermore, when changes occur later in time, it gets harder to remember exactly which the required steps were.

This is why for this project I decided to implement the code by mean of a macrocoding solution. Using Macrocoder I created a simple description language which contains, in one place, all the information related to the binary records and fields: their position in the binary file, their length, their type, the list of valid values, the allowed ranges, the validation rules and textual annotations where required.

rdt2csv_mc

Once the code generation rules have been defined in Macrocoder, the development of the entire application and related documentation is done entirely by editing the source seen in the picture above.

The best way to show this is probably by the mean of a video. So, let's see macrocoding in action on a simple fix: a field erroneusly called "WorkerLone" instead of "LoneWorker" in this video:

Mixing generated and manual code

The complete rdt2csv application is formed by several .c source files. The files implementing single functions (i.e. the function that converts one binary BCD value into a C unsigned) are implemented manually, as regular C files. These files are general purpose and have no reference to the actual fields: they simply serve as "library" for the other sources.

The files containing the structure definitions and the translation functions, which strictly depend on the binary file format, are generated by a Macrocoder project. In this way, they are always aligned with a single source, which is the only source file that has to be maintained.

The two sets of source files, manual and generated, are then compiled and linked together in the executable.

This approach is used also with web pages. For example, in the web page that serves as user manual for the application, contains text that has been manually written with the regular WordPress editor. However, the table containing all the field specification is included from an external file:

[includeme file="../images/rdt2csv/rdt2csv_table.html"]

The rdt2csv_table.html file is generated by the Macrocoder project and included at runtime in the main page: in this way the format page can be update at any time, having the online documentation always aligned with the common source specification.

Conclusions

The rdt2csv application converts a DMR radio .rdt binary file to a set of CSV files and vice versa. Although this application is dedicated to that specific binary format, the same approach can be used for any other similar application.

The presence of precise coding patterns (i.e. "everytime a field of this type is found, implement the code in this way") allows to employ macrocoding very effectively.

Source download

The archives below contain the complete source files for the rdt2csv application, licensed under GPL v.3.

Source files can be downloaded from GitHub at the address https://github.com/Macrocoding/rdt2csv.

To build the executable enter the c directory and launch make.

To edit the macrocoded definition download and install Macrocoder (free for non commercial use), enter the mc directory and open MD380.fct.