I don’t claim to be a master at linking and ELF (Linux) executables, but there’s some tricks I’ve learned that I wish someone had explained to me back then.
There’s two problems to resolve to make clean distributable binaries: dependency hell and the libc compatibility. By solving both, we can get an executable to run on any recent Linux system, regardless of the distribution and installed packages.
Compiling to a binary is a two-step process. There’s the actual compiling, then the linking. In the first step, each source file gets turned into an Object file (
.o extension). Then all the
.o and the
.a files are put together in a binary and the ELF meta-information is added.
.a files are static library files. If your code uses an external library and that library has previously been statically compiled, then the
.a file can be directly embedded into the binary. The ELF headers contain data such as “where is the main?” and “where should I look for the dynamically linked libraries (
Back in the old UNIX days, the thought of embedding
.a files into binaries was considered a bit crazy. Even if a library is only 50Kb, it would be duplicated into ALL of the executables on your system where it is used. It was also thought that if executables relied on a central
.so file, then that file could be updated with bug fixes once and all binaries would benefit without having to be recompiled. Dynamic linking means that on startup, an executable will look for the required
.so files on your system. In practice, the massive complexity incurred by dynamic linking makes it a nightmare for binary distribution. You might have an application that needs version 0.9.x and another one that needs 0.10.6, and now you’re stuck in “dll hell” on Linux. They might not be backward or forward compatible. Package managers exist in part to track dependencies like that and make sure that all packages installed through it agree on the version that a shared dependency will have. But for people that need to distribute an executable that will work on ANY Linux system it means we just can’t rely on the central libraries.
It also means we can’t ask our users to
apt-get install or
yum install or even “manually compile” anything because it could clash with another version currently installed that other programs rely on. So what about statically embedding everything into the binary? We could go through every dependency our program has and manually modify their Makefile or autoconf or CMakeList.txt, etc, to make them output
.a files that we can then use. It’s painful, but it works, except… if our dependencies use
.so dependencies themselves -and they frequently do- then it doesn’t work and we’re still stuck with dynamic linking.
In my opinion, the easiest and safest way to fix all these problems is to make our dynamic dependencies behave like static ones by shipping the
.so files and making the binary use them over the local ones. First, I compile everything using the defaults. Then I examine the resulting binary with the
The dependencies that start with
/usr/ are those that will need to be shipped, except for
libstdc++ because that’s another can of worms. Notice how
ldd couldn’t find one of the dependencies? This is what happens if I were to run the executable:
It happens because the system went to
/etc/ld.so.conf to get a list of directories to look for
libev.so.4 and in the end couldn’t find it. So I copy it from my dev laptop and put it in a folder named
lib and distribute that long with the executable itself. But then I need to instruct my users to execute it like so:
LD_LIBRARY_PATH=/current/working/directory/lib ./myexecutable. Of course they’re going to get it wrong or forget and then blame us. Remember how at the beginning I mentioned that the ELF headers contain a field that says where to look for the dependencies? That field can be edited with the
patchelf --set-rpath '$ORIGIN/lib/' myexecutable
That solves the first problem. The executable will look for its dependencies in the local
lib folder first before looking anywhere else on the system.
Libc is the C standard library. It provides an interface between the code and the kernel. Virtually every binary will dynamically link against it, even non-C ones. The problem is that libc is 100% forward compatible, but absolutely not backwards compatible. In other words, binaries compiled on a system with libc version 2.13 will work on any system with version 2.13 and up, but will not work at all on any lower versions. On Reddit, NotUniqueOrSpecial mentioned that symbol versioning can also be used to solve this exact problem.
Compiling with an older libc is not an issue unless your program happens to need new functionality or bug fixes, which has never been an issue for me. Downgrading libc is seriously not a good idea. The best and safest way to compile with an old libc is to install an older version of Debian or CentOS in a virtual machine. Then I install all the development tools and build the application for release there. I personally build on Debian Squeeze, which is old enough to guarantee compatibility with all the major distributions still in use, but recent enough to support my development toolchain.
As I said at the beginning, I’m not a low level expert and there is probably a cleaner way to achieve truly portable applications on linux without making the users compile from source other than the one described here, I’m looking forward to learning it. Please leave a comment if you have any suggestions or criticisms.