From Source Code to Bytes

Before we can talk about how Windows loads an executable, we need to talk about how that executable came to exist in the first place. Four tools, a stack of intermediate files, and a quiet handoff that almost no one watches.

This is the opening post of a four-part series on the Windows Portable Executable (PE) format and the loader that turns it into a running process. Where most guides drop you straight into the PE header on page one, we're going to start one step earlier — at the source code — and trace the path of a single program from a text file to a process. By the end of Part 4, you'll understand every layer in that stack: what the compiler emits, what the linker assembles, what the loader maps, and how the CPU finally runs your code.

The reason to start here is selfish. The PE format makes very little sense in isolation. Most fields, sections, and alignment constraints exist because some producer or consumer in the toolchain — compiler, assembler, linker, loader, memory manager, signer, debugger, runtime, or security tooling — needs them as a stable contract. If you understand the toolchain first, the PE format reads like a contract between people who already agreed on what they were doing. Skip the toolchain, and it reads like an arbitrary list of structures to memorize.

What you actually do when you "compile"

You type a command. It might look like cl /Fe:hello.exe hello.c on Windows, or gcc -o hello hello.c on Linux, or clang hello.c -o hello almost anywhere. To you, it's one step: source code goes in, executable comes out. To the toolchain, it's four steps, and three of them produce intermediate files that get deleted before you can see them.

A quick note on those three commands, since they'll show up throughout the post. cl is Microsoft's C/C++ compiler, the one that ships with Visual Studio. gcc is the GNU Compiler Collection, the default on Linux and the engine behind MinGW on Windows. clang is the LLVM project's compiler, used by Apple's toolchain, increasingly common on Linux, and also available on Windows. They produce the same kind of output — executables — and they all follow the same four-stage process internally, even though their command-line flags and intermediate file extensions differ. We'll use all three interchangeably in examples; the concepts don't change.

Those four steps are: preprocess, compile, assemble, link. Each one takes the output of the previous step and transforms it. The compiler driver — the thing you actually invoke — orchestrates all four behind a single command-line interface, but the four programs underneath are real, separate tools that you can run by hand. We're going to do exactly that.

First, the boundary between "compiler" and "compiler driver" is fuzzy in casual conversation. When someone says "the GCC compiler," they usually mean the driver (gcc), which is just a launcher that runs cpp, cc1, as, and ld in sequence. The actual compiler — the program that transforms C into assembly — is cc1, hidden inside GCC's internal directories. MSVC blurs the line further by putting the preprocessor and the compiler into a single executable (cl.exe) that internally loads c1.dll, c2.dll, and so on. Same four stages, slightly different packaging.

Second, this four-stage shape — preprocess, compile, assemble, link — is the conventional native C/C++ pipeline across every major platform. The same C source can be turned into a Windows .exe by MSVC, a Linux ELF by GCC, or a Mach-O by Clang. But each stage is already targeting a specific environment: a Windows build emits Windows calling conventions and COFF-style object files; a Linux build emits SysV calling conventions and ELF-style ones. The conceptual steps are portable; the bytes each step emits are very much not. We're going to focus on the Windows path from here on, but the shape of the conversation up through the linker is roughly the same wherever you go — only the file formats and ABIs change.

Third, the intermediate files are real and you can ask for them. Let's do that now.

Try it yourself

You can run the four stages by hand and inspect what each one produces. Save a one-line program as hello.c:

#include <stdio.h>
int main(void) { printf("hello\n"); return 0; }

Then ask the driver to stop after each stage and keep the output:

# On Linux or MinGW
gcc -E hello.c -o hello.i      # preprocess only
gcc -S hello.i -o hello.s      # compile to assembly
gcc -c hello.s -o hello.o      # assemble to object
gcc    hello.o -o hello        # link

# On Windows with MSVC
cl /P hello.c                  # preprocess → hello.i
cl /FA /c hello.c              # compile → hello.asm and hello.obj
link hello.obj                 # link → hello.exe

Open hello.i and hello.s (or hello.asm on Windows) in a text editor — those two are plain text and you can read them straight through. The object file and the final executable are binary; inspect those with dumpbin, objdump, a PE viewer, or a hex editor. We'll start reading binary in Part 2.

Stage 1: The preprocessor

The preprocessor is the simplest of the four tools and the only one that doesn't understand C semantics. It works at the level of tokens and directives, not types or program structure. It sees #include <stdio.h> and physically pastes the contents of stdio.h into your file. It sees #define MAX 100 and replaces every later occurrence of MAX with 100. It sees #ifdef DEBUG and either keeps or deletes the block that follows, depending on whether DEBUG is defined.

That's it. No type checking. No syntax awareness. The preprocessor would happily expand #define POTATO if and let you write POTATO (x > 0) in your code. By the time the result reaches the compiler proper, POTATO is gone — replaced by if — and the compiler never knew it existed.

The output of the preprocessor for a trivial hello.c is anything but trivial. A three-line program produces an expanded file (.i for C, .ii for C++) that's typically tens of thousands of lines long, because stdio.h transitively pulls in dozens of other headers, each of which pulls in more. Open hello.i and scroll: you'll see hundreds of function declarations, type definitions, and macro expansions before you ever reach your own main.

This matters for one specific reason that will come up again in Part 4. For external library functions like printf, the header gives the compiler a declaration, not the compiled definition. When stdio.h says int printf(const char *fmt, ...);, it's telling the compiler "this function exists somewhere, here's its signature, trust me." The compiler trusts it. The actual code for printf — the machine instructions that format a string and write it to standard output — isn't in the header. It isn't in your own source file either. It lives in a library, in a different file entirely, and the linker will go find it later. (Headers can contain other kinds of things too — macros, type definitions, inline functions, templates — but including stdio.h does not copy the C runtime's machine code for printf into your program.)

This split between declarations and definitions is the single most useful idea to internalize about how C and C++ programs come together, and it's the source of an enormous amount of confusion. It's natural to read #include <stdio.h> as "import printf" — the way a Python import statement actually works. But it doesn't. The header gives the compiler enough information to type-check calls to printf — to verify that you're passing the right argument types and using the return value correctly — but the header contains no executable code. The compiler emits a placeholder call instruction; somebody else has to fill in the actual function.

The compiler does its job knowing only what the header told it. When it emits the machine code for main, it leaves a placeholder where the call to printf goes — essentially writing "fill this in with printf's address, whatever that turns out to be." Stitching that placeholder to the actual function in the library is the linker's job, which we'll get to shortly.

One more concept worth a name. After the preprocessor runs, the file it produces is called a translation unit. A translation unit is, formally, what the compiler proper takes as input: one source file with all its includes expanded, all its macros substituted, all its conditional blocks resolved. Every .c file in your project becomes one translation unit. They're compiled independently, in parallel if you have the cores for it, and the linker stitches the results together at the end.

Stage 2: The compiler proper

This is the stage where source code stops being source code. The compiler takes a translation unit and emits assembly — a textual representation of machine instructions. Along the way it does almost everything you think of when you hear the word "compile": parsing, type checking, optimization, register allocation, instruction selection. By volume, this is where the real work happens. By word count in tutorials, it's usually where the least is said, because every step of it is its own field of study.

We're going to skip the internals. What matters for the rest of the series is what the compiler produces, not how. Open the .s file you saved earlier and you'll see something like the snippet below. The exact output depends on your compiler, your target platform, your optimization level, and which version of which header you included — the snippet here is a simplified, Windows x64, Intel-syntax listing for illustration. Real output from any specific toolchain will differ in directives, comment markers, and register choices.

If you've never read assembly before, this looks intimidating but it's actually short on vocabulary. A few primitives:

The CPU has a small number of registers — named storage locations, each holding a single value, that the CPU operates on directly. On x86-64 there are about sixteen general-purpose registers (named rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp, and r8–r15), plus one special register called rip — the instruction pointer, which always holds the address of the current instruction. Registers are the CPU's working surface: instructions read from them, write to them, and move data between them and memory.

Each line of assembly is one of two things: an instruction (which tells the CPU to do something — sub subtracts, mov copies a value, call jumps to a function, lea computes an address, ret returns from a function) or a directive (which gives the assembler bookkeeping information — .text, .section, .ascii all start with a dot and aren't instructions). A name followed by a colon, like main: or .L.str:, is a label — a human-readable name for the location of whatever comes immediately after it. Labels are how assembly code refers to things by name instead of by address. main is a label that names the start of the function; .L.str is a label the compiler invented to name the start of the string literal in the data section.

With that vocabulary in hand, three things are worth noticing in the listing, because they show up at every level of the stack from here on.

The code is split into sections. The directive .text says "what follows is executable code." The directive .section .rdata says "what follows is read-only data" — in this case, the string literal "hello\n". Even at the assembly level, the compiler is already separating code from data, because the two will have different requirements when the program eventually runs: code needs to be executable, read-only data needs to be readable but not writable, mutable data needs to be writable but not executable. That separation will follow these bytes all the way into the running process.

The code refers to things by name, not by numeric address. The instruction call printf does not say "call the byte at address 0x7FFB4001A234." It says "call the thing called printf, whose address somebody else will fill in later." The string literal works similarly: the compiler gave it the local label .L.str, and the lea instruction (which stands for load effective address — it computes an address and puts the result in a register) refers to the string by that label rather than by a numeric address. The compiler emits these placeholder names because, at this stage in the pipeline, it simply doesn't know where any of these things will live in memory. It hasn't decided where main goes within the final binary. It has no idea where printf lives — printf's definition isn't even part of this translation unit. Filling in real addresses is somebody else's problem: first the linker's, and then the loader's. We'll look at exactly what "filling in" means at the byte level in the next section.

The code uses relative addressing wherever it can. Look closely at lea rcx, [rip + .L.str]. Parsed literally, this says: "take an instruction-pointer-relative reference to .L.str and put the resulting address in rcx." The encoded displacement is added to RIP at the moment the CPU has finished decoding this instruction — which is to say, the address of the next instruction. That's a footnote-level detail that matters when you're hand-computing offsets; the conceptual point is simpler: the distance between this instruction and the string is a fixed property of the binary, and the compiler-and-linker pipeline computes it once and bakes it into the four-byte field of the encoded instruction. Both the instruction and the string sit inside the same image, so however the operating system shuffles that image around in memory at runtime, the two move together. Their distance never changes. This is the essence of position-independent code: instead of saying "the string is at absolute address 0x1400040A0" (which would only work if the image always loaded at the same place), the instruction says "the string is 0x2EDA bytes after me, wherever 'me' happens to be." We'll come back to it in Part 4, because it's the reason modern x86-64 binaries need much less help from the loader than older 32-bit code did.

One Windows-specific detail worth knowing about the sub rsp, 40 at the top of the function. That 40 bytes isn't arbitrary scratch space — it's required by the Microsoft x64 calling convention. The first four integer or pointer arguments to any function are passed in registers (rcx, rdx, r8, r9), but the convention also requires the caller to reserve 32 bytes of stack space immediately above the return address before making the call. That 32-byte region is called the shadow space (or "home space"), and the called function is allowed to spill its register arguments into it for debugger inspection, address-taking, or its own scratch use. The caller doesn't need to put anything specific there; the called function just needs the space to exist. The remaining 8 bytes in our sub rsp, 40 bring the stack pointer to a 16-byte alignment, which the convention also requires at the moment a call instruction executes. So: 32 bytes for shadow space + 8 bytes for alignment = 40. The Linux System V x86-64 ABI doesn't have shadow space (arguments are passed in different registers, no caller-side reservation), which is why the same C source compiled on Linux produces a tighter stack frame.

Stage 3: The assembler

The assembler is the most mechanical of the four tools. It takes a text file of assembly mnemonics and produces a binary file of machine instructions — but it does not produce a complete program. It produces an object file.

An object file is a structured container. It holds several distinct kinds of content side by side, each with its own purpose. The machine code is the actual binary instructions the assembler produced — the bytes the CPU will eventually execute. The data is whatever initial values your program's variables need (a global integer initialized to 42, for example, becomes four bytes of 2A 00 00 00 sitting in the data section). The strings are exactly what they sound like — the text literals your program contains, like the bytes for "hello\n". So far, so concrete: these are just bytes the assembler wrote, organized into named sections.

The interesting parts are the metadata that travels alongside. The object file carries a symbol table — a list of every named thing the file knows about. A symbol, in this context, is just a name attached to a location: a function name like main attached to the byte where that function starts, or a variable name like g_counter attached to the byte where that variable lives. The symbol table records which symbols the file defines (we have the bytes for these; here's where they sit) and which symbols the file references but does not define (we use these by name, but someone else has to provide the actual bytes — printf is the classic example). Alongside the symbol table sits a list of relocations — bookkeeping notes telling the linker where in the machine code the addresses still need to be filled in once everything has been laid out. The next section is entirely about what those notes look like and what the linker does with them.

On Windows, this whole object-file format is called COFF, the Common Object File Format. The extension is .obj. On Linux it's ELF (.o); on macOS it's Mach-O. They differ in details but agree on the broad strokes: machine code, data, strings, symbol table, relocations.

Put another way: the object file is a half-built thing — code with deliberate holes in it, plus the metadata telling the linker where the holes are and how to fill them.

It helps to see this concretely. Let's zoom all the way in on one hole. To keep the example clean, suppose your program also defines a small function called helper in another source file (helper.c), and your main calls it: helper();. We'll trace the bytes of that call helper instruction from the assembly source, into the object file as raw bytes with the hole exposed, and into the final executable after the linker has filled the hole in. The reason for using helper here rather than printf: helper is defined in another .obj we'll be linking together with main.obj, so the linker can compute a real byte distance to its final location. printf takes a slightly different path — it lives in a DLL and the linker never sees its code at all — and we'll come back to that path at the end of the linker section.

Notice that the diagram shows two things sitting side by side inside the object file: the actual bytes of the .text section on the left, and a separate relocation entry on the right. The relocation entry doesn't live inside the .text bytes — it lives in a parallel metadata table that the assembler writes to a different part of the same .obj file, specifically so the linker can find every spot in the code that needs patching without having to disassemble the code itself.

Walk through it slowly. Panel 1 is the assembly source — a single line of source. Panel 2 is what that line becomes after the assembler runs. The byte E8 is the actual machine encoding of the x86-64 call opcode; that part is real, and it never changes again. The next four bytes are where the puzzle lives. A relative call instruction on x86-64 encodes its target as a 32-bit signed offset from the end of the instruction — four bytes' worth of distance — and the assembler has no idea what that distance should be, because nobody has decided where helper will live yet. So it writes four zeros and moves on, leaving a relocation entry alongside the section to mark the spot.

That entry is the structured note shown on the right. It records the symbol the linker needs to resolve (helper), the offset of the four placeholder bytes within the section (.text + 0x19 — one byte past the start of the instruction, skipping over the opcode), and the kind of value the linker should write (PC-relative 32, meaning a 32-bit displacement relative to the program counter). Every address-bearing reference the assembler can't resolve locally gets its own relocation entry; together, they're the linker's to-do list.

Panel 3 is the same four bytes after the linker has finished its work. The placeholders are gone; in their place are the bytes 7B 04 00 00, which on x86-64 read as the little-endian 32-bit value 0x0000047B. That's the actual byte distance from the end of this call instruction to the start of helper, computed by the linker once it knew where both ended up in the merged image. The CPU, when it runs this instruction, will add that offset to the instruction pointer and jump to the right place.

Two ideas to carry forward from this diagram. First: a "relocation" is not a mysterious linker concept — it's a literal note attached to a literal set of bytes, saying "fill these in." Second: the object file is a file with deliberate gaps, and the linker's most important job is filling those gaps with the right numeric values. Once you've seen the bytes change, the rest of this section is bookkeeping.

One terminology trap worth defusing now, because it confuses many readers later. The relocation we just walked through is a linker relocation: a note in a .obj file telling the linker how to patch a byte range when it merges the object into the final image. By the time the linker is done, every linker relocation has been resolved, and the placeholder bytes hold real values. Linker relocations don't survive into the .exe.

There's a second, separate thing also called "relocations" — base relocations — which live in the .reloc section of the finished .exe and exist to handle load-time address fixups when the operating system places the image at a different base address than the linker expected (a thing the OS does on every launch under ASLR). We'll cover base relocations properly in Part 4. The thing to remember now is that they're a different mechanism applied to different bytes for a different reason.

And the call helper example we just walked through doesn't generate a base relocation in the final .exe at all. The 32-bit displacement the linker wrote is PC-relative — it encodes the distance from the call instruction to the target. When ASLR moves the whole image to a different base, the instruction and the target move together by the same amount, so the relative distance between them is unchanged. No fixup needed at load time. Base relocations exist for the references that are sensitive to where the image lands: absolute addresses baked into the code, pointer constants stored in data, and so on. PC-relative calls and jumps within the image are exactly the kind of reference that doesn't need them.

Here is what the broad structure of a COFF object file looks like, alongside the PE executable it will eventually contribute to. The two formats are deliberately similar — PE is a direct descendant of COFF, with extra headers bolted on the front to make it loadable.

The structure on the left is what one .obj looks like. The structure on the right is what the linker produces after combining several .obj files together. The symbol table and per-section relocation list don't survive into the final executable, because they've already done their job by then. (A small relocation section, .reloc, does survive — but it serves a different purpose, which we'll cover in Part 4.)

The Microsoft Portable Executable and COFF specification — currently maintained at learn.microsoft.com/en-us/windows/win32/debug/pe-format, last revised in July 2025 — is the authoritative reference for both formats. It runs about a hundred pages. We're not going to read it cover to cover. We're going to read the parts that matter, in the order they matter.

Symbols, and why names get mangled

We've talked about the symbol table as if symbol names were a transparent mapping from your source code: write main in C, get a symbol named main in the object file. For C, that mapping really is almost trivial. For C++, it gets weird quickly, and the weirdness is worth a section of its own — because if you ever read the symbol table of a real Windows binary with a debugger or a hex viewer, the names you'll see don't look anything like what you wrote.

For C code, the mapping is pleasantly readable. A function called main in your source code shows up as the symbol main in the object file. A function foo shows up as the symbol foo — or, on 32-bit Windows under the default cdecl calling convention, often as _foo. (Calling convention is the agreement between caller and callee about how arguments are passed, who cleans up the stack, and how the symbol is named. cdecl is the standard C convention on Windows x86; it prepends an underscore to C function names. Other Windows x86 conventions like stdcall use different decorations — _foo@8 for a function taking eight bytes of arguments. For ordinary 64-bit Windows C/C++ code the default x64 ABI removes most of this legacy decoration; the symbol is just foo. Specialty conventions like __vectorcall exist on x64 too, but you rarely meet them outside performance-sensitive numerics.) The relationship between source-level name and symbol-level name in C is nearly trivial.

For C++ code, it isn't. The C++ language allows function overloading: you can have two functions called print that take different arguments. You can have functions inside namespaces, inside classes, inside templates instantiated with arbitrary type arguments. All of these need to coexist in a single symbol table where every entry must have a unique name. The solution is name mangling: the compiler encodes enough of the function's context — its namespace, its class, its parameter types, its calling convention, its template arguments, and on some compilers its return type — into the symbol name itself, using a deterministic scheme that produces unique strings.

The schemes differ by compiler, which is one of the reasons you can't mix C++ object files from MSVC and GCC. A function int foo(int, double) in a namespace ns might mangle to ?foo@ns@@YAHHN@Z under MSVC and _ZN2ns3fooEid under the Itanium ABI used by GCC and Clang. Both are well-defined, both are reversible — you can demangle them back to a human-readable signature — but they share nothing.

The names look unreadable. They are. That's the point: the mangled name carries enough information to distinguish every possible overload, namespace, and template instantiation in your program, and to do that it has to encode every type involved. Most people never look at mangled names directly. The tools that consume them — linkers, debuggers, profilers — read them as opaque strings and demangle on demand when showing them to humans.

Try it yourself

You can see the symbol table of any object file with dumpbin on Windows or nm on Linux. Compile a tiny C++ file with one overloaded function and look at what comes out:

# On Windows (Developer Command Prompt)
dumpbin /symbols hello.obj

# On Linux or MinGW
nm hello.o
nm --demangle hello.o      # show readable names

On Windows, MSVC ships undname.exe for demangling individual MSVC-style names. On Linux, c++filt does the same job for the Itanium ABI. Both are cheap demonstrations that the ugly strings are just signatures in disguise.

There's one escape hatch from mangling that you'll see in almost every piece of cross-language code: extern "C". When you declare a C++ function as extern "C", the compiler suppresses mangling and emits the symbol under its plain C name. This is the convention Windows system DLLs use for their public surface — exports like CreateFileW or VirtualAlloc appear under those plain C names regardless of how the DLL was implemented internally. That C-style boundary is what lets a Rust program, a Python extension, and a C++ application all call the same DLL without sharing a C++ name-mangling scheme. The cost is that you lose overloading, namespaces, and templates at the boundary. The benefit is that any language with a C-compatible foreign function interface — which is to say, every language — can call into your code. (DLLs can also export decorated C++ names or ordinals; public OS APIs just generally avoid making callers depend on a specific compiler's private naming.)

This constraint will matter directly in Part 4. When we walk through how a program imports functions from a DLL at runtime, we'll see that DLLs publish a flat list of unmangled, C-style names. That isn't a coincidence or a design oversight — it's the unifying convention that lets a Rust program, a Python extension, and a C++ application all call into the same kernel32.dll without anyone having to agree on a name-mangling scheme.

Stage 4: The linker

The linker is the most underappreciated tool in the toolchain. It receives every object file the compiler produced for your program, decides what shape the final executable will take, and connects all the loose ends the compiler couldn't tie up on its own. It does its work in three broad phases: resolve symbols, combine sections, and patch the placeholders. We'll take them in order.

What the linker is working with

Before we get to the phases, it's worth being concrete about the linker's inputs, because part of the confusion around linking comes from not knowing where the linker looks for what.

The linker takes three kinds of input. First, the object files you just produced — one .obj per source file in your project. Second, any static libraries you've explicitly listed on the command line — these are .lib files on Windows, .a files on Linux, and they're essentially bundles of object files packaged into a single archive. Third, on Windows, any import libraries you've listed — also .lib files, confusingly using the same extension as static libraries, but containing something completely different. We'll get to import libraries shortly; for now, treat them as "instructions for the linker about which DLLs the final program needs to load at runtime."

The user (you, or your build system on your behalf) controls what ends up in this input set. When you run cl hello.c on Windows, the compiler driver implicitly hands the linker a default set of libraries — including the C runtime and import libraries for the most common Windows DLLs — so that simple programs link without you having to know anything about libraries. When you want to use something less common, you tell the linker by adding entries to the command line: /DEFAULTLIB:ws2_32.lib for Windows sockets, -lpthread for POSIX threads, and so on. The linker doesn't go searching the system for missing symbols; it only looks in the inputs you gave it.

Phase 1: Resolving symbols

Each object file the linker receives lists two kinds of symbols, in the sense we defined earlier: definitions (symbols the file provides — the code or initial data is here, at a known offset within one of the file's own sections) and references (symbols the file uses but does not provide). The linker's first job is to match every reference, in every input, to a definition somewhere.

It walks the inputs in order. For each reference it finds, it searches the definitions across all the object files and libraries until it finds a match. The matching is by exact symbol name — the same mangled string that appears in both the reference and the definition. There's a subtlety for libraries: when the linker scans a static library, it doesn't pull in every object file inside the archive. It pulls in only the object files that contain definitions of symbols still needed by the current resolution state, plus their transitive dependencies. This is why linking against a huge C runtime library doesn't make your tiny hello.exe bloat with thousands of unused functions — only the ones you actually depend on come along.

If every reference finds a definition, the linker moves on. If a reference can't be matched anywhere in the inputs, the linker emits the error that every C and C++ programmer has seen at least once: unresolved external symbol. The cause is almost always one of three things: you forgot to add a source file to the build, you forgot to link against a library that provides the symbol, or you mistyped the function's name or signature so the symbol you're referencing doesn't match the one being defined.

Here's what symbol resolution looks like for a small program with two source files and a call to printf:

The diagram shows two of the three resolution paths a reference can take. help_user is defined in a sibling object file (helper.obj); the linker matches the reference to the definition, and the code for help_user gets physically pulled into the final executable. printf is different: it's not defined in any object file we have, but the import library msvcrt.lib advertises it as a function that msvcrt.dll will provide at runtime. The linker records that promise in the executable's import table and moves on. The code for printf is never copied — only the metadata that says "find this at runtime."

There's a third path we haven't shown: a reference resolved by pulling in an object file from a static library. That path looks exactly like resolving against a sibling .obj — the code gets copied physically into the executable — except the object file came from inside an archive instead of being passed on the command line directly.

Phase 2: Combining sections

Every object file has its own .text, its own .rdata, possibly its own .data. The linker's second job is to concatenate same-named sections from all the inputs into single combined sections in the output. All the .text sections become one big .text. All the .rdata sections become one big .rdata. The order is determined partly by the linker's defaults and partly by directives in the object files themselves, but the result is one section per name in the final executable.

This is where the layout of the final binary gets decided. The linker chooses how much space each section gets, where each section begins relative to the others, and — critically — what offsets within those sections each symbol ends up at. By the time this phase finishes, every symbol that ended up in the final image has a fixed location: main is at offset such-and-such inside the combined .text, the string "hello\n" is at offset such-and-such inside the combined .rdata, and so on. The compiler couldn't have known these offsets when it emitted its assembly; the linker invents them.

Phase 3: Patching the placeholders

This is the phase the placeholder diagram from earlier was showing in miniature. Now that the linker has merged every section in Phase 2 and knows exactly where every symbol sits within the merged image, it walks through every relocation entry in every input object file and writes the right numeric value into every placeholder. The call helper example we drew was one such operation; lea rcx, [rip + .L.str] is another — for that one, the placeholder gets the byte distance between the instruction (in the merged .text) and the string (in the merged .rdata). For thousands of similar relocations across a real program — every call, every absolute pointer to a global, every reference to a string literal — it's the same operation repeated.

The first kind is an offset: the byte distance between two locations within the image. The PC-relative 0x47B we saw earlier (in panel 3 of the placeholder diagram) is an example — it's the distance from one instruction to another, measured in bytes, with no notion of "where in memory" attached. Offsets work no matter where the operating system eventually loads the image, because they describe relationships between things that are inside the same binary, and those things always move together. (Offsets are usually measured from a defined anchor: the end of the current instruction, the start of the merged section the symbol lives in, or the image base. The linker and the CPU agree on which anchor for each relocation type.)

The second kind is a placeholder absolute address: an actual memory address, computed by adding the symbol's offset within the image to the image's preferred load address (recorded in the executable's headers). If the image happens to load at exactly that preferred address — which it often won't, for security reasons we'll see in Part 4 — the address is correct as written. If it loads somewhere else, the loader has to patch the absolute address again at runtime. Modern compilers prefer the first kind whenever possible, because position-independent code costs nothing at load time. But absolute addresses still appear in some places, particularly when a 64-bit absolute pointer needs to be stored in a data table.

We'll spend half of Part 2 disentangling these two kinds of value and the file-format machinery around them. For now, the takeaway is just: when Phase 3 fills in a placeholder, it's writing either a fixed distance between two things in the same binary, or a guess at an absolute address that the loader may have to correct later.

That leaves one corner case the diagram couldn't show: what happens when the symbol being referenced isn't in the executable at all? That's the situation with printf.

What about printf?

This is where the import library does its real work. When Phase 1 resolved the printf reference against msvcrt.lib, the linker didn't have printf's actual machine code to merge in. What it had was a small stub — a few bytes of import metadata describing where printf will come from at runtime. The linker reserves a slot in a special table inside the executable, called the Import Address Table, and patches every call printf instruction in your code to instead read its target address out of that slot. Then it records, in the executable's header, the instruction "when this program loads, find msvcrt.dll, look up printf inside it, and write its address into the IAT slot."

So the executable, when it lands on disk, contains: machine code for main and help_user, the string "hello\n", an empty IAT slot reserved for printf, and a note in the headers saying "fill that slot from msvcrt.dll at runtime." None of those bytes are printf's actual code. printf's code lives in msvcrt.dll on the user's machine, exactly where it has always lived, and it will continue to live there long after your executable is forgotten.

This is the seam between linking and loading, and it's the seam we'll open up in Part 4. The linker can't reach across the process-and-DLL boundary to resolve external addresses, so it documents the gap in the executable's own headers and trusts the loader to close it.

Static and dynamic linking

Everything in the previous section described how the linker handles one specific external symbol — printf — by treating it as a runtime import. But that's only one of two options. The choice between them is the single biggest architectural decision the linker makes on your behalf, and it's controlled by you, partly through your code and mostly through which libraries you tell the linker to use.

Static linking pulls a library's code physically into your executable. If you link statically against a math library, the compiled bytes of every math function you use end up as part of your .exe. Your executable is self-contained — you can copy it to another machine and run it, and it doesn't need that library to exist on the target system, because your binary contains the library. The cost is size, duplication, and version lock-in: every program that links the same library carries its own copy, and updating the library means rebuilding every dependent program.

Dynamic linking does the opposite. The library lives in a separate file — a .dll on Windows, a .so on Linux — and your executable only carries a reference to it, exactly the way we described for printf above. The reference says, in effect: "at runtime, load msvcrt.dll, find the function printf, and put its address in the IAT slot I've reserved." The actual code of printf is not in your binary. It exists once on the system, and the operating system can share its read-only code pages between every process that imports it (each process still has its own private IAT and its own writable data pages for the DLL — but the actual machine instructions are shared physical memory). The cost is fragility: if the DLL is missing, the wrong version, or has been replaced by something malicious, your program loads something other than what you intended.

You choose between the two by deciding which library to link against. The C runtime is the canonical example. We've been writing msvcrt.dll as the dynamic C runtime throughout this part, and you'll see it in real binaries constantly — it's been the legacy CRT shipped with Windows for decades, it's still present on every modern Windows install, and any MinGW build or older MSVC build will use it. But it's worth flagging that modern MSVC (Visual Studio 2015 and later) splits the runtime differently: the C standard library lives in ucrtbase.dll (the Universal CRT, now a Windows system component), reached through the ucrt.lib import library, with compiler-specific support in vcruntime{version}.dll via vcruntime.lib. The pedagogical story is the same — your build picks an import library, the linker records dependencies on a DLL, the loader fills in the IAT — and a malware analyst or RE looking at typical Windows binaries will see both msvcrt.dll (in older or MinGW-built binaries) and ucrtbase.dll (in modern MSVC-built binaries). We'll keep using msvcrt.dll as our running shorthand for the dynamic C runtime; just know that the modern MSVC name is ucrtbase.dll.

On the static side, the static C runtime is a library file like libcmt.lib (or libucrt.lib with modern MSVC) containing the actual code of printf, malloc, and friends. The default on most compilers today is dynamic, but a single flag flips the switch (/MT versus /MD on MSVC). Many other libraries ship in both forms and let the user pick. Some libraries — most notably the Windows APIs themselves, in kernel32.dll, user32.dll, ntdll.dll — are dynamic-only; there's no static version, and you reach them through their import libraries (kernel32.lib, user32.lib, and so on) at build time.

Import libraries: the file that confuses everyone

Since import libraries have appeared several times in this part already, they deserve a section of their own.

An import library on Windows is a .lib file that looks superficially identical to a static library — same file extension, same archive format inside, same way of being passed to the linker. The confusion is built into the file naming convention. If you see foo.lib on disk, you genuinely cannot tell whether it's a static library full of compiled code or an import library full of forwarding stubs without inspecting its contents. (dumpbin /headers foo.lib on Windows will reveal which kind it is.)

The difference is in what they contain. A static library is a bundle of complete object files — same format as your own .obj files, with real machine code in their .text sections. When the linker resolves a symbol against a static library, it copies the matching object file's contents into the executable, and the code becomes part of your binary. An import library, by contrast, contains no implementation of the DLL's functions. It contains import records — one per exported symbol of the DLL it represents — plus, depending on configuration, small thunks that route calls through the Import Address Table. Each import record says, in effect: "I can satisfy the linker's request for the symbol printf, but my satisfaction is actually a pointer to msvcrt.dll!printf that the loader will resolve at runtime."

So when you link against kernel32.lib, you are not pulling kernel32's code into your binary — kernel32.lib doesn't contain that code. You are telling the linker to record imports on kernel32.dll for every function you reference, and to wire your code through the IAT so those references will be resolved when the program loads. The compiler-and-linker side of the contract is satisfied at build time; the loader-and-DLL side will be satisfied at runtime, by mechanisms we'll cover in Part 4.

Almost every Windows program is a hybrid. The C runtime can be statically or dynamically linked, depending on your build flags. The Windows APIs themselves are almost always dynamically linked. You don't usually think about any of this when you compile, because the toolchain handles it automatically. But the executable you produce carries the consequences in its headers: a list of every DLL it expects to find, and a list of every function it expects to import from each one.

That list is the bridge between Part 1 and the rest of the series. The PE file format we'll dissect in Part 2 is, in large part, a structured way of encoding the contract between your program and its dynamic dependencies. The loader's job, which we'll cover in Part 4, is to honor that contract — find each DLL, map it into the process, look up the requested functions, and fill in the IAT slots the linker left empty.

What we have at the end of Part 1

You started with source code. Four tools later, you have a file on disk. That file contains, at minimum: the machine instructions the compiler generated, the data and string literals your code referenced, a small header describing what kind of file this is and what CPU it's for, a list of dynamic dependencies it expects the loader to satisfy, and a description of how its internal structure should be laid out in memory when the time comes to run it.

None of those bytes will execute, ever, in the form they currently exist on disk. The CPU does not run files; the CPU runs instructions at memory addresses. Between the bytes-on-disk state and the running-in-memory state stands a translation layer — the Windows loader — that knows how to read that file, map its sections into the right kind of memory, connect the imports to their providers, and finally hand control to your code.

To understand the loader, we need to understand what it's reading. So that's Part 2: the structure of the file itself.