Comparing Memory Sanitizers
Some time ago I had to compare some memory sanitizers for our programs at work. You probably already know this kind of tool, for example Valgrind or Address Sanitizer. These two are quite well known but there is more, like Dr. Memory (of which I only know the name) or Intel Inspector (which I barely used).
In simple terms, these tools work by keeping a map of the allocated memory regions (on the heap and on the stack) and their state (initialized or not). Each memory access is then instrumented to check that it uses a valid memory region with usable data. There are two strategies to implement the monitoring: either the instrumentation is injected during the compilation, this is the Address Sanitizer way, or else they are done a posteriori on any compiled program, this is the Valgrind and Inspector way.
At work we used the tool from Intel, the version from 2016. The process is simple, all the developer has to do is to pass the program binary to the tool with the correct parameters, then to read the reported errors. As you can expect, no errors are reported on our programs since we are excellent developers {{Citation needed}}. Anyway, this kind of tool fits very well in a CI.
2016 is a bit old tough, so one day, just like that, seeing a version from 2018 waiting in a dusty folder, I decided to update the tool on our CI. And bam! The analysis time had exploded. It should be pointed out that these tools are already very slow by default, but with this update it became unbearable. Six hours to get the results for a PR, it’s too much.
That’s when I decided to check what the other tools were up to.
Testing protocol
In order to test the tools I gathered thirty-plus programs with intentional errors, like this one which reads after the end of a global array:
#include <cstdio>int values[5] = {1, 2, 3, 4, 5};// Launch this program with four arguments to trigger an out of
// bounds read.
int main(int argc, char** argv)
{
printf(“%d\n”, values[argc]);
return 0;
}
The test set contains:
- 11 dynamic memory allocations errors (leaks, mismatching calls to new and delete, …),
- 7 out of bounds reads in arrays,
- 7 out of bounds writes,
- 4 uses on uninitialized variables,
- 1 use of a stack variable after return,
- and 1 legit move of uninitialized memory (which should not be reported as an error).
Everything is compiled without optimizations, to prevent the compiler to remove everything because it understood that the code was meaningless, and with the adequate flags for the intrusive tools.
Then I run the test with the tested tool, which is one of
- Address Sanitizer, with GCC,
- Memory Sanitizer, with Clang,
- Valgrind,
- Intel Inspector.
For Inspector I test all versions from 2016 to 2020 even if it is a bit old, because with the increase in processing time we had with 2018 I prefer to cast a wide net, even if it means using an older version to have a faster analysis for the same service.
Finally everything is launched on two environments. One computer is in the cloud, running C CentOS 7.8, GCC 4.8 (modern C++ for the win!), Inspector 2016–2020, Valgrind 3.15, with an Intel Xeon at 2.30GHz and with 16 cores. The other computer is my laptop, running Ubuntu 21.04, GCC 10.3, Clang 12, Inspector 2020, Valgrind 3.17, all of that in Hyper-V with 100% CPUs dedicated to the VM, and an i7–8665U at 1.90GHz with 8 cores.
The results
For the results I will keep it short but you can find the detail for each test in our public repository, as well as all the code needed to reproduce the benchmark.
The less efficient tool on this test set is Memory Sanitizer, which finds only two errors, also reported by other tools.
Address Sanitizer is excellent. Its impact on the execution speed is low, it finds many errors, including errors not found by the other tools.
Valgrind and Inspector 2020 are as good as each other. They find the same errors, including errors not found by Address Sanitizer. Both are also very slow, in the same order of magnitude.
Inspector 2020 is the only good version of this tool. 2016 and 2017 could not manage to start the test; 2018 and 2019 report an error on the test where there is none.
When running the tools on non-toy use cases the program ran 3 to 13 times slower with Address Sanitizer, and 163 to 565 times slower with Inspector or Valgrind. For these two we pay a lot for the fact that they execute the binary on a single core (our program is heavily multi-threaded). I also observed that an increase in dynamic memory allocations had a huge negative impact on the processing time, and similarly for an increase of mutex uses.
Some errors from the test set are never detected, like out of bounds accesses in an array aggregated in a structure. From the understanding I have of the way these sanitizers work, it seems very unlikely that they will be able to detect this kind of error some day. They would have to insert padding between the struct members during the compilation, which is not trivial and impacts other compilation units. In the other hand, the memory access error is reported with a warning during the compilation with GCC 10 and after.
Final words
In the end we moved to Valgrind in our CI, mostly because we had a history of false positives with previous versions of Inspector. Still, I invite you to check this tool which, even tough it is not free and open source software, is available for free on Intel’s website.
All the tests and the report are available on our GitHub.
Now is the time to think about tools to find errors in our threading. Thread Sanitizer? Helgrind or DRD (from Valgrind)? Intel Inspector? Many thing to check :)