This article was originally published on the Red Hat Customer Portal. The information may no longer be current.
Posted here for posterity, 1/1/2023
The Fedora Engineering Steering Committee maintains a conservative list of packages that must be built and hardened using security features of GCC. Packages not on this list have these security features enabled at the packagers’ discretion. There is not currently a consensus in the community as to when security hardened binaries are necessary. As a result the use of security hardened binaries can be a controversial topic. Most arguments can be reduced to whether the security benefit outweighs the performance overhead involved in using the feature.
Position Independent Executables (PIE) are an output of the hardened package build process. A PIE binary and all of its dependencies are loaded into random locations within virtual memory each time the application is executed. This makes Return Oriented Programming (ROP) attacks much more difficult to execute reliably. These blog posts are designed to showcase the results of a study I did recently which looked at the effect of building applications using PIE. In the study I investigated the overhead incurred in the loader during program startup with the aim to help distributions make better security decisions based on a technical analysis. The focus on program startup was chiefly to examine the place where PIE has the largest performance impact. The performance post process execution is largely comparable to standard Dynamic Shared Objects (DSOs) on x86_64 machines depending on how well the program and shared libraries have been designed. As this is a security blog I am biased towards functionality that increases security. However, in the tests that I performed, the start time of a PIE application and a regular application were comparable.
One of the more interesting things for me personally whilst doing this work was looking at how compiling with PIE enabled affects the resultant binary. Consider the following “Hello World” program:
To reduce other influences, I used my own implementation of the standard library functions during compilation:
The ELF binary that is produced by this build has no dependencies on libc or the loader in order to run. This means that it can be loaded into memory and run without depending on the linker to find and bind dynamically with dependencies. This makes sharing and reusing routines difficult, however. The common solution to this problem is to create a shared library:
The next step is to recompile the main binary indicating that some symbol definitions exist within an external shared library:
The size of the resultant binary has a smaller .text section as that code is contained within the shared library libnotc.so. There are some other significant differences:
In order for the program to execute correctly the ELF binary needs to be constructed in such a way that it allows the loader to resolve symbols at runtime. As the address of the symbol in memory is not a part of the main binary the loader adds a level of indirection in the procedure linkage table (the .plt section). Instead of calling puts() directly, the .plt section contains a special entry that points to the loader. The loader then has to resolve the actual address of the function. Once it has done that it updates an entry in the Global Offset Table (GOT). Subsequent calls to the same routine are made by jumps from the GOT entry.
A standard ELF binary is typically loaded into the the same base address in virtual memory each time it is executed. The linker takes advantage of this in non-relocatable code by jumping to absolute addresses of symbols. This turns out to have a slight performance benefit as it is quicker to jump to an absolute address than using relative addressing. This is especially true for i386 applications as another register is required for this process.
To see the difference between the dynamic and PIE applications we need to recompile the example program as a PIE. This simply requires the addition of the -fpic -pie flags to what we had previously:
Note that the address listed by the size command for each of the ELF sections is a relative address, whilst the address listed for the dynamic-example uses an absolute location. This is necessary because the program and all of its dependencies will be loaded into random locations in virtual memory upon execution. This is inclusive of prelinked libraries, and as such serves as an effective exploit mitigation technology for attacks that rely on returning to known addresses of standard system libraries. The overhead that is incurred by this defense mechanism and ways in which the number of relative relocations can be reduced will be covered in the next post of this series.