View on GitHub

talks

Harsh Kapadia's talks and interviews.

Executable and Linkable Format

Table of Contents

Introduction

We’re going to learn about ELF, the primary executable file format on Linux. We’re going to explore its format and look into a few examples using an ELF parser that I built. We’re also going to look into utilities to inspect binaries. To get started with ELFs, though, we need some high level knowledge on how programs are compiled and loaded into memory (RAM).

NOTE:

Compiling a Program

To execute a program, we usually do the following:

The high level view of the compilation and execution of a 'Hello World' program in C.
Image credits: Harsh Kapadia (me)

The a.out file generated after compilation, as shown in the above image, is an ELF file.

In reality, there is a lot going on in the backend for each step in the above image.

To compile a program, i.e., to create an executable binary, the high level steps are:

To load the program into memory, ‘Loading’ is the process that’s undertaken by the Loader.

The high level view of the compilation and loading steps of a program.
Image source: Compiler, Assembler, Linker and Loader: A Brief Story

More information on each of the compilation steps and examples to illustrate each can be found at github.com/HarshKapadia2/compilation-examples.

Process Memory Layout

The Loader loads a program into memory in a specific manner to execute it.

The high level view of a process' memory layout.
Image source: Anatomy of a Program in Memory

At a high level, from the top to bottom (virtual address 0x0) of a process’ memory layout, the layout consists of

NOTE: The process’ entire memory space appears consecutive and contiguous in the above representation, because that is the virtual address space representation of the memory space of the process. In reality, i.e. in terms of physical location in memory, the mapping for each segment might be in different locations in memory. Virtual addressing only makes the entire process’ memory space appear contiguous for various security and convenience reasons.

A mapping showcasing segments in multiple virtual address spaces mapping to different locations in physical memory.
Image source: Compiler, Assembler, Linker and Loader: A Brief Story

The Executable and Linkable Format

The Executable and Linkable Format (ELF) is a format which standardizes and defines the structure in which each type of data in a file should be stored and also defines how the metadata associated with the file should be stored in the file.

The ELF was adopted from UNIX System V and has remained unchanged since the early 2000s, which is impressive! ELF initially stood for ‘Extensible Linking Format’.

The Executable and Linkable Format is not just a file format for executables. Some file types that use the ELF:

The following images show the output of the file command on various types of ELF files.

ELF file types.
1. 64-bit unstripped dynamically linked ELF executable for the x86-64 architecture
2. 64-bit stripped dynamically linked ELF executable for the x86-64 architecture
3. 64-bit unstripped statically linked ELF executable for the x86-64 architecture
4. 64-bit unstripped dynamically linked ELF executable for the RISC-V architecture
Image credits: Harsh Kapadia (me)

ELF file types.
1. 64-bit stripped dynamically linked ELF shared object for the x86-64 architecture
2. 64-bit unstripped ELF object (relocatable) file for the x86-64 architecture
3. 64-bit ELF core (dump) file for the x86-64 architecture
Image credits: Harsh Kapadia (me)

There are two formats of the Executable and Linkable Format:

Most modern machines are 64-bit machines, so this article will mainly look into the 64-bit format. Most things remain the same for the 32-bit format.

ELF File Structure

A high level view of the ELF format.
Image source: ELF structure

The main parts of an ELF file:

Views of an ELF File

ELF is an abbreviation for Executable and Linkable Format, which implies that the format it describes has something to do with Execution and Linking.

The same ELF file can be looked at in two different ways, depending on the file handler (Linker or Loader):

NOTE: It is important to realise that the same file is being looked at in two different representations. There aren’t different files. It’s the same file. The same file is just represented differently.

Linking and Execution views of the same ELF file.
Image source: Compiler, Assembler, Linker and Loader: A Brief Story

Linking View of an ELF File

The Linking View of an ELF file.
Image source: Compiler, Assembler, Linker and Loader: A Brief Story

Exploring ELF Sections

The original file (hello.c):

#include <stdio.h>

int main() {
    printf("Hello World!\n");

    return 0;
}

The above code is compiled using GCC to produce the a.out ELF executable file that will be examined below.

$ gcc hello.c
# Produces 'a.out' as its output. This is an ELF executable file.

The Section Header Table of the compiled executable:

$ readelf -S a.out
There are 29 section headers, starting at offset 0x3148:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000000318  00000318
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.gnu.pr[...] NOTE             0000000000000338  00000338
       0000000000000030  0000000000000000   A       0     0     8
  [ 3] .note.gnu.bu[...] NOTE             0000000000000368  00000368
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .note.ABI-tag     NOTE             000000000000038c  0000038c
       0000000000000020  0000000000000000   A       0     0     4
  [ 5] .gnu.hash         GNU_HASH         00000000000003b0  000003b0
       0000000000000024  0000000000000000   A       6     0     8
  [ 6] .dynsym           DYNSYM           00000000000003d8  000003d8
       00000000000000a8  0000000000000018   A       7     1     8
  [ 7] .dynstr           STRTAB           0000000000000480  00000480
       000000000000008d  0000000000000000   A       0     0     1
  [ 8] .gnu.version      VERSYM           000000000000050e  0000050e
       000000000000000e  0000000000000002   A       6     0     2
  [ 9] .gnu.version_r    VERNEED          0000000000000520  00000520
       0000000000000030  0000000000000000   A       7     1     8
  [10] .rela.dyn         RELA             0000000000000550  00000550
       00000000000000c0  0000000000000018   A       6     0     8
  [11] .rela.plt         RELA             0000000000000610  00000610
       0000000000000018  0000000000000018  AI       6    24     8
  [12] .init             PROGBITS         0000000000001000  00001000
       000000000000001b  0000000000000000  AX       0     0     4
  [13] .plt              PROGBITS         0000000000001020  00001020
       0000000000000020  0000000000000010  AX       0     0     16
  [14] .plt.got          PROGBITS         0000000000001040  00001040
       0000000000000010  0000000000000010  AX       0     0     16
  [15] .plt.sec          PROGBITS         0000000000001050  00001050
       0000000000000010  0000000000000010  AX       0     0     16
  [16] .text             PROGBITS         0000000000001060  00001060
       0000000000000107  0000000000000000  AX       0     0     16
  [17] .fini             PROGBITS         0000000000001168  00001168
       000000000000000d  0000000000000000  AX       0     0     4
  [18] .rodata           PROGBITS         0000000000002000  00002000
       0000000000000011  0000000000000000   A       0     0     4
  [19] .eh_frame_hdr     PROGBITS         0000000000002014  00002014
       0000000000000034  0000000000000000   A       0     0     4
  [20] .eh_frame         PROGBITS         0000000000002048  00002048
       00000000000000ac  0000000000000000   A       0     0     8
  [21] .init_array       INIT_ARRAY       0000000000003db8  00002db8
       0000000000000008  0000000000000008  WA       0     0     8
  [22] .fini_array       FINI_ARRAY       0000000000003dc0  00002dc0
       0000000000000008  0000000000000008  WA       0     0     8
  [23] .dynamic          DYNAMIC          0000000000003dc8  00002dc8
       00000000000001f0  0000000000000010  WA       7     0     8
  [24] .got              PROGBITS         0000000000003fb8  00002fb8
       0000000000000048  0000000000000008  WA       0     0     8
  [25] .data             PROGBITS         0000000000004000  00003000
       0000000000000010  0000000000000000  WA       0     0     8
  [26] .bss              NOBITS           0000000000004010  00003010
       0000000000000008  0000000000000000  WA       0     0     1
  [27] .comment          PROGBITS         0000000000000000  00003010
       000000000000002b  0000000000000001  MS       0     0     1
  [28] .shstrtab         STRTAB           0000000000000000  0000303b
       000000000000010a  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

The contents of the .rodata section:

$ objdump -sj ".rodata" a.out

a.out:     file format elf64-x86-64

Contents of section .rodata:
 2000 01000200 48656c6c 6f20576f 726c6421  ....Hello World!
 2010 00                                   .

Execution View of an ELF File

The Execution View of an ELF file.
Image source: Compiler, Assembler, Linker and Loader: A Brief Story

Exploring ELF Segments

Using the executable generated in the ‘Exploring ELF Sections’ section above.

The Segment Header Table of the compiled executable with the mapping of Sections to Segments:

$ readelf -l a.out

Elf file type is DYN (Position-Independent Executable file)
Entry point 0x1060
There are 13 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000002d8 0x00000000000002d8  R      0x8
  INTERP         0x0000000000000318 0x0000000000000318 0x0000000000000318
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000628 0x0000000000000628  R      0x1000
  LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                 0x0000000000000175 0x0000000000000175  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x00000000000000f4 0x00000000000000f4  R      0x1000
  LOAD           0x0000000000002db8 0x0000000000003db8 0x0000000000003db8
                 0x0000000000000258 0x0000000000000260  RW     0x1000
  DYNAMIC        0x0000000000002dc8 0x0000000000003dc8 0x0000000000003dc8
                 0x00000000000001f0 0x00000000000001f0  RW     0x8
  NOTE           0x0000000000000338 0x0000000000000338 0x0000000000000338
                 0x0000000000000030 0x0000000000000030  R      0x8
  NOTE           0x0000000000000368 0x0000000000000368 0x0000000000000368
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_PROPERTY   0x0000000000000338 0x0000000000000338 0x0000000000000338
                 0x0000000000000030 0x0000000000000030  R      0x8
  GNU_EH_FRAME   0x0000000000002014 0x0000000000002014 0x0000000000002014
                 0x0000000000000034 0x0000000000000034  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000002db8 0x0000000000003db8 0x0000000000003db8
                 0x0000000000000248 0x0000000000000248  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
   03     .init .plt .plt.got .plt.sec .text .fini
   04     .rodata .eh_frame_hdr .eh_frame
   05     .init_array .fini_array .dynamic .got .data .bss
   06     .dynamic
   07     .note.gnu.property
   08     .note.gnu.build-id .note.ABI-tag
   09     .note.gnu.property
   10     .eh_frame_hdr
   11
   12     .init_array .fini_array .dynamic .got

Demonstrations

File Utilities

An important thing to know is the tools available to work with executable files, to be able to examine their contents and gather information about them.

Among other tools, the GNU Binutils is an important collection of tools for working with binaries (executables) maintained by GNU. We will look into a few tools from this collection among others. This list is not exhaustive.

File Inspection

File Compilation

File Manipulation

Resources