Discovering Docker, Python, LLVM, and Emscripten

As 2020 was finally coming to an end, I began listing the tools that came into my day-to-day work during this period, checking what I have learnt from them. Interested? Let’s see that together.

Note: This is mostly a translation of my post on LinuxFR.org. If your native language is French or similar, you may as well read it there.

Docker

I knew Docker only by its name and I had heard here and there that it was a great tool to isolate your programs from the system. For sure it is a great tool.

If you develop an application for Linux, it is a real gain to have a Docker image with all the dependencies required by the application. On one side it allows to easily retrieve the list of dependencies as well as their versions, and on the other side it gives the ability to launch the builds in a clean environment. Did you ever had the problem of a build linking with libfoo 2.4 while it actually needed 3.7, because both were available on the system? Isolating the build will help you on this.

It is especially very practical on the CI. There you launch your build in a Docker image containing only was is required to compile. If you forgot a dependency, you will see it immediately. If you put too many dependencies, mmh…, you still don’t see it. Well, yes, it is still necessary to have proper dependency management, it just makes that easier. For example, if you have a project that requires awesome-tool 2.3 and another one that requires awesome-tool 3.2, you create an image with each version and there you go.

In the general case of a project having specific dependencies, you can also easily handle dependencies without Docker by configuring the build properly and by installing the dependencies in a project-specific directory rather than at the system level. But there are still some tools that do not like being duplicated, for example Python or Ruby. With Python you can get away with virtualenv, even though it feels a bit fragile to me as it creates links to the original Python installation. With Ruby, if I recall correctly, it is incredibly painful, maybe even impossible, to have multiple installations of Ruby Version Manager. That being said, I did not look into it since two years. Maybe it has improved since.

Nevertheless, I would not use a Docker image for the day-to-day development. First because it takes forever to download the images, then because it is as impractical as possible to compile in the image. Eventually you can share your project folder to code on the host and compile in the image, but it is not quite smooth.

In the end, we still need something to handle dependencies properly.

Finally, the main problem of Docker is that it cannot run an OSX image, and barely run a Windows image. When your project targets all these platforms, it is very limiting. So you end up using ad-hoc dependency management for OSX, with build scripts and stuff, and while you are there you may as well use these scripts on Windows and Linux. Then the problem is solved, and why would you add Docker in the equation?

Now I appreciate using Docker on the CI to have a fresh environment on each build, but I think that for dependency management I still prefer to use something at the project level.

Python

I did very few Python programming during my career; I must say that I had a very poor picture of it. First it is an interpreted language, so you cannot do anything serious with it, and moreover it is slow.

Now that I practiced a bit, I can say it is not so bad.

Coming from C++, I especially appreciate the facilities for text formatting, list manipulations, etc. There is a concise and direct aspect in some operations that make writing the code quite pleasant.

Coming from C++, I do not appreciate duck typing and the interpreted side of the language. On the first code iteration it is quite nice, the script is clean and we have everything in our mind, but when we have to come back to it later, for example to add a parameter to a function, then it becomes a pain. We start searching for all callers to update them, then the callers of the callers… And if you forget some of them you won’t know until the execution path needs them. What a damn pain in the neck.

I see you coming, you are going to say something like “yeaaaaah, unit testiiiiiing, blah blah blah, it forces the developers to actually cover their code.” Yes, but no. It is still a pain.

In my opinion, code must be correct syntactically, semantically and algorithmically. Tests written by the developer cover the last point. For the two others we have tools that have proven their worth to do it better. Validating the semantics at the unit-test level is mixing up problems, it’s not very single responsibility principle.

One thing I found quite peculiar in Python is that there is some kind of resistance to algorithms — the negative side of concision. Like, if you write a good old for-loop, with initialization, stop condition and increment, you feel dirty. For example, to extract the entries of a list with respect to two independent properties, in Python we will prefer “filtering” the list twice with a predicate for each property, rather than iterating once and testing both properties at each iteration. That, I find debatable as, with the filter approach, the second filter will test entries already selected by the first one, which is useless.

Well, in the end it is just the opinion of a C++ developer. Regardless of these small troubles it is really quite good. I even begin to appreciate writing my scripts with Python rather than Bash.

LLVM

Aaaah LLVM, what a nice project. This is the basis for Clang, one of today’s best C++ compilers. In addition to using the tool, I began working on the components of LLVM itself, meaning the API for the intermediate language.

During the last decade LLVM’s popularity went through the roof. A young and accessible code base, in C++ and object oriented, it’s refreshing. Add to that a nice plug-in system and a license allowing you not to share your patches or your plug-ins, it is not surprising that it went popular. Anecdotically they also incredibly improved the state of the error messages printed by the compiler, compared to what we got with GCC. Meanwhile, on the side of the old GCC, people were complaining about the code complexity since years. LLVM offered hope for renewal, and everybody went to it, abandoning the ancestor who, even though it has more than 30 years of experience, was put on the shelf in favor of the young and fresh newcomer.

One of the great strength of LLVM is also the collection of tools it provides to transform and manipulate code, especially clang-tidy and clang-format, to enforce some properties on the code and to format the code according to some rules, respectively.

Did I tell you that LLVM itself was coded in C++? For sure it must be very clean.

Argh… Well, no, it is absolutely not clean, at all. LLVM is like the child of a shoemaker. You will find source files of tens of thousands lines, and objects everywhere, even when it does not fit at all. In which object model is it acceptable for an instance to downcast itself? Also, the documentation is excellent on one side (the language reference, for example) and on the other side it is just useless crap (see the Doxygen documentation, filled with useless “collaboration diagrams” large like a football field, and where types and functions are just listed with no explanation of the intent).

Aside from that we have templates everywhere and never ending compilations. I hoped, a bit, to avoid that by using unity builds, but obviously, between the using namespace directives at file scope and other #define that are never #undef, it cannot work.

Add to that the source files with tens of thousands lines and a chaotic repository, and you’re done.

In short, the tools provided by LLVM are really awesome but it is very disappointing internally.

Emscripten

Emscripten is a compiler targeting Javascript from C or C++ code; based on LLVM, and aiming to be a direct replacement for GCC or Clang. The goal is to allow compiling any project toward WebAssembly.

The project had three main versions: the first one, I don’t know anything about it. The second version was a fork of LLVM, with custom patches. And the current version relies on the official version of LLVM, with no extra modifications. Which is awesome.

I did try to use Emscripten height years ago to create a web version of a C++ game of mine, that used the SDL, OpenGL, Boost… I don’t remember how it went but I think it did not end well. There are still traces on StackOverflow of the difficulties I had compiling Boost with Emscripten. To be honest, combining the annoying BJam with the young Emscripten was kind of asking for trouble.

Personally I think that combining C and JavaScript is a bit like trying to unify two opposite worlds. Apart from the fun of the technical aspects, I have doubts of the usefulness of the tool. In my opinion it mainly allows to not have the performance of native programs all while not having the simplicity of JavaScript, and also while not benefiting from the existing developer tools of either side.

I had to come back into Emscripten recently and discovered WASI by the way, which offers among other things a way to launch WebAssembly programs without a web browser. And seriously, the idea of being able to code my software in C++, to compile it into WebAssembly, to plug it into an Electron app, installed via Flatpak, to finally launch that in a Docker running on a virtual machine… All these indirections, it makes me dream.

Okay, okay, I’ll stop kidding about that.

So, I used WebAssembly at work, and it is actually quite accessible. In effect, it does replace quite simply another compiler, and Emscripten also provides many customization points. One can easily, for example, replace the LLVM compiler used in the backend by another version. Easily with some limitations though, as Emscripten uses hard-coded compilation options that must thus be supported by the backend compiler.

On the C++ to WebAssembly transformation, we can almost convert everything as long that there are no threads, no exceptions, and other subtleties.

Finally, there is also one thing I literally loved when using Emscripten, it is how welcoming are its developers. I sent some small patches and opened a couple of bug reports, and every time the feedback was clear, stimulating and constructive. A project that put so much care into newcomers, seeeeeeriously. It’s awesome.

That’s all

These four projects are the largest ones I discovered this year. Maybe my opinion and my comments are a bit misinformed? Please keep in mind that, as a matter of fact, I have very few practical knowledge of each.

If you know more or if you see important points I missed, feel free to correct me in the comments!