Mr. Spock with a fictionary computer in the "Star Trek" series

Mr. Spock with a fictionary computer in the "Star Trek" series

In the 1960's movies, futuristic computers were machines interfaced with humans by mean of natural speech. People just explained them what they wanted and computers diligently did it. However, fifty years later, well into the 21th century, we are still manually writing billions lines of code and this phenomenon keeps on growing. Something must have gone badly wrong.

Focus on "explaining"

In the last decades, the computer world has always attempted to create a direct link final user↔computer without the need of human programmers in the middle: the blocking problem has never been whether to use a microphone instead of a keyboard, but the contents of what is said.

Go to an average customer, let him explain by himself what he wants and silently write everything down. Then try to create a satisfactory software application just using your notes. Very likely, even taking advantage of all of your knowledge and experience, you won't be able to compensate the all wrong, missing, implied or misexplained information from his analisys.

A matter of domains

The key point is that the creation of a software application involves the collaboration of at least two domains, two different spheres of knowledge: one involves the problem being addressed (application domain), the other the technology used to solve it (software domain). For example, the creation of an accounting application requires two roles: one that knows the rules of accounting and another that knows about SQL and computer programming.

An attempt reduce the impact of the programming role has been made with the introduction of Domain-specific languages. The idea was to create a description language specifically designed for that given application domain. For example, the development of an accounting software would be consisted in the creation of a language and related compiler well suited to describe accounting rules. Then the accounting expert would have written the software specification (i.e. the accounting rules and requirements) in that language. A dedicated compiler would have verified congruence, completeness, coherency of the specification and produced the expected executable.

This approach historically proved very unsuccessful for several reasons that include:

  • thinking of a language really suitable to application domain experts is most of the time very difficult;
  • the development of related the parser and compiler requires specialized knowledge and it can easily be more complex and expensive than developing the entire application the traditional way.

The only success stories are about cases where the application domain was already used to express its requirements formally: process control, math, communication protocols and a few more.

SDL, a domain specific language for telephone protocols

SDL, a domain specific language for telephone protocols

Successful languages

If "strategic" domain languages have been of limited success, "tactical" languages, instead, have been very successful. Such languages are not meant for the domain expert to define the entire application, but they are targeted to the software developers to address specific software issues. We can list a few examples:

  • SQL – a language to manipulate and query relational databases;
  • HTML/CSS – a set of languages to describe web pages;
  • make – a domain specific language that defines a build process;
  • Macrocoder – a domain specific language designed to create domain specific languages;

These languages are totally unsuitable to do anything else than their main purpose, but in their field, they are extremely effective. No one would ever write a videogame using SQL, but when it is time to deal with relational databases, this language is ubiquitous.

These languages are so effective because they are devoted to programmers which are already used to the concept of "programming language". Also their scope is limited and well delineated: it is easy to define what they need to know as input and what they need to do as ouput.

Taking advantage of DSLs

We are already taking advantage of Domain Specific Languages (DSLs) quite a lot: if you can read this text is because of HTML and CSS, which both are DSLs. However, there is a way to take even more advantage by using DSLs that go beyond the set of DSLs available on the market: you can create your our own.

In my experience, the use of DSLs for limited and well defined tasks within a software project can be very rewarding. Obviously not every software project is suited for this approach, but when it is the case, results are astonishing.

The usual domain specific languages like SQL, HTML, make and so on, have their own compiler/interpreter. An HTML page is interpreted and rendered by a dedicated software called "internet browser" and that's it. In our case, instead, the best solution is to have a tool create, under our guidance, some source code in the generalist language we are using for our project. In other words, are we writing a Java program? We can have an helper program that writes some Java for us.

I shall use an example to better explain what I mean. In the rdt2csv project I needed to write a C program that was able to convert binary files to a CSV file and vice versa. The binary files were formed by records composed of several fields, each with different name, size, encoding (binary, BCD, unicode, etc.) and validation rules (i.e. valid ranges, valid values, etc.).

To implement it I needed a C function able to decode each field with its own format and save the values in a C structure. I needed another C function able to read that structure and write the data into a CSV file plus other two functions to do the opposite job (CSV to structure, structure to binary). Finally, I needed a fifth function with the role to read the C structure and verify that all values where within range according to the validation rules. Thanks to the intermediate structure, the same function could be used to validate both the CSV and the binary file.

So I manually developed some encode/decode functions that were able to convert one field in each packed binary format into a standard C "int" value and vice versa. I manually wrote a couple of functions that could parse a CSV and produce standard C "int" values and vice versa. Then the rest of the job was to:

  • for every field, define the related field in the C intermediate structure
  • for every field, call the right encode/decode functions among those I prepared
  • for every field, call the right CSV parser/writer function
  • for every field, write in the validation function the related validation rule

Being the different fields 158, the sentences "for every field do…" called for some automation. Using Macrocoder, I developed a simple DSL where I could list records and, within them, name, size, position, format and validation rules of each field. Then I programmed it to write the C code I would have written for each case.

With this technique, I had some .C files manually written and some other generated by Macrocoder. The latter never to be edited manually. When new entries are added to the binary format, I just  edit the definition source and regenerate the files, re-build and that's it.

Macrocoding

This technique of using DSLs to create some of the source files in a generic language is called macrocoding because it is an extended form of what it is normally done with language macros or templates. While the ability of generating code of macros and templates it is limited to search and replace a keyword in a template text, with more powerful tools like Macrocoder we can take complex decisions on what is to be generated. Unlike macros and templates, with this approach we can use the same source information to address multiple targets at once. In the example above, from the same list of fields, not only I generate five different functions, but also two HTML tables included in a bigger web page that describe the program usage and document the binary format.

Conclusions

Coding complexity is usually reduced by splitting concepts horizontally in multiple independent coding artifacts, like function, classes and libraries.

With macrocoding complexity can be further reduced by splitting vertically among multiple abstraction levels: the division occurs between the concept and the way it is to be implemented. We have two formal places where the concepts to be implemented are specified (the domain specific language) and where the way they are to be implemented is specified (the code generation rules).

Identifying where and how macrocoding can be applied, unfortunately, is not a trivial task. The correct application must be sensed and implemented on a case-by-case basis. In this blog I will report and describe in detail many real-life cases that I hope will serve as sources of inspiration for several other projects. Stay tuned: if you wish to receive a notification when new articles are added to this blog, just type your email address below. No spam, no ads.