Anatomy of a PE File — PE Loading, Part 2

Headers, sections, data directories, and the three different kinds of "offset" the format insists on using interchangeably. We open a real binary in a hex editor and read it byte by byte.

In Part 1 we traced a program from source code to bytes — through the preprocessor, the compiler, the assembler, and finally the linker, which produced an executable file. That file contained sections, a symbol table that was thrown away, relocations that had been patched, imports recorded as future dependencies, and an image-base value the loader could ignore at runtime. Everything we built up to was a sketch of what comes out the other end of the toolchain.

This part fills in the sketch. We're going to open an actual .exe in a hex editor and walk through it byte by byte, structure by structure, until you can point at any region of a real PE file and say what it is. The Portable Executable format is not complicated — it has about a dozen relevant structures and a hundred or so fields total — but it does have an unusual feature that trips up almost everyone: it uses three different coordinate systems to describe where things are, and it switches between them mid-structure. Getting comfortable with those three coordinate systems is the only conceptual leap in this part. The rest is just reading.

The binary we're going to read is the same hello.c program from Part 1, compiled with MinGW's x86_64-w64-mingw32-gcc and stripped of debug information. That gives us a tidy 39 KB Windows executable with ten sections — small enough to fit on screen, large enough to contain everything we want to see. Every hex value and field shown in this post comes from that real binary; you can reproduce it yourself with the commands in the callout at the end.

One small note before we dive in. This post is about file structures, but a few of them only make full sense in the context of a running program. Terms like process, virtual address, and memory will show up below — if any of them feel under-defined here, Part 3 pins them down properly. For now, it's enough to think of a process as one running program with its own private view of memory.

Three kinds of "where"

Before we look at any structure in the file, we have to nail down the source of confusion that ambushes almost everyone learning PE internals. There are three different ways to express the location of something in a PE — three coordinate systems — and the format uses all three, sometimes within a single structure. They aren't interchangeable; converting between them requires information that isn't always at hand. If you don't keep them straight, you will read a hex dump and end up at the wrong byte.

The three coordinates are file offset, RVA, and virtual address. Let's define them concretely.

A file offset is a byte position from the start of the PE file on disk. "Go to byte 0x3C" means open the file in a hex editor, scroll to position 0x3C, and start reading there. File offsets are absolute within the file: byte 0 is the first byte, byte 0x80 is the 128th byte, and so on. File offsets work whether the program is loaded into memory or not, because they're just positions within an on-disk file. They are the coordinate system of disk tools — hex editors, file readers, the linker writing the output.

An RVA — Relative Virtual Address — is an offset from the start of the PE image once it has been loaded into memory. "RVA 0x1410" means "the byte that ends up 0x1410 bytes from wherever Windows decided to place the image when it loaded it." RVAs are how the PE format expresses locations inside the running image without committing to where the image will actually live in memory. The same RVA is valid every time the program runs, regardless of where the loader places it that day.

A virtual address (VA) is the actual memory address inside the running process — a real number the CPU can dereference. You compute it from an RVA the moment you know where the image was loaded: VA = ImageBase + RVA. If the image happened to load at 0x140000000 (the standard preferred base for 64-bit executables) and the RVA is 0x1410, the VA is 0x140001410. The CPU sees and uses VAs; the file format mostly hides them, because the file is written long before anyone knows what they'll be.

Why does the format need all three? Because of timing. The loader's life is divided into three phases — before it has mapped the file, during the mapping, and after — and each phase has access to different information.

Before mapping, the loader is just reading bytes off disk. It hasn't allocated any memory for the image yet, doesn't know where the image will land, hasn't even decided how big the allocation needs to be. The only coordinate system it can use is file offsets. RVAs would be meaningless: there is no image-in-memory for them to be relative to.

During mapping, the loader is conceptually reading sections from the file and arranging them in newly allocated memory. This is the moment where both coordinate systems matter at once — it needs to know which bytes correspond to the file (file offset) and where they end up in the new image (RVA). The section headers, which the loader reads at this phase, contain exactly these two fields side by side. We'll see them shortly. (Part 3 shows the actual mechanism: Windows doesn't eagerly copy each section's bytes into freshly allocated memory; it installs an image mapping and lets demand paging bring the bytes in when the program touches them. The conceptual "read from file, write to memory" model is fine here — it tells you what the resulting image contains, just not exactly when the bytes move.)

After mapping, the image exists in memory. Now everything is described in RVAs, because the entire layout has been built and locations relative to ImageBase are well-defined. The entry point, the import table, the export table, the relocation table — all of these are expressed as RVAs in the file's headers, because they describe locations within the mapped image. The loader reads these RVAs out of the file early (before mapping), but it doesn't use them — doesn't dereference them, doesn't follow them — until after the image is in memory.

Some PE structures look like they break this rule. The entry point's RVA, for instance, is stored in the Optional Header, which the loader reads before mapping. How can a pre-mapping structure contain a post-mapping coordinate? The trick is that the loader doesn't use the RVA at the moment it reads it. It stores the value as a number, finishes mapping, and only then computes ImageBase + RVA to find the actual entry point. The same trick applies to every data directory: read early, dereferenced late.

The rule is simple: any pointer the loader must follow before mapping has to be a file offset; any pointer that describes a location in the mapped image is an RVA. The format follows this rule consistently, with exactly one exception we'll meet when we get to data directories (the Security Directory, which points at digital signature data that doesn't get mapped into memory at all). Other than that, the rule is reliable.

A real PE, top to bottom

Before we read the bytes, here's the lay of the land. A PE file on disk is laid out in a fixed sequence: a short MS-DOS-era header at the very start, then a small DOS program (the "DOS stub"), then a four-byte signature, then a COFF File Header, then an "Optional" Header, then a table of section headers, then the sections themselves — in the order the linker arranged them. The boundaries between these regions are not negotiable. Every field that follows tells the loader, in effect, "the next thing is exactly this many bytes ahead."

The binary we'll be reading throughout this post is the stripped hello.exe we built with MinGW. It's 39,424 bytes total and contains ten sections: .text, .data, .rdata, .pdata, .xdata, .bss, .idata, .CRT, .tls, and .reloc. You met four of these in Part 1 — .text (code), .rdata (read-only data), .data (writable initialized data), and .reloc (relocation fixups) — and the other six are runtime-support sections we'll touch on as they become relevant. The point of looking at a real binary, rather than an idealized two-section diagram, is that it shows you the actual texture of a Windows executable: most PEs have eight to fifteen sections, not three.

Here's the structural map. Each region is a contiguous run of bytes; the file is read sequentially from top to bottom.

The DOS header and stub: vestigial, but mandatory

Open the file at byte zero. The first 64 bytes are the DOS Header, an MS-DOS-era structure that has been preserved at the front of every Windows executable for over thirty years for one reason: backward compatibility with a 1981 operating system that almost nobody actually runs anymore.

The DOS Header is defined as a structure called IMAGE_DOS_HEADER in Windows headers, with thirty-some fields that record things like the size of the original MS-DOS program in 512-byte pages, the initial values for the segment registers, the relocation table offset, and so on. Almost all of them are zero in any modern PE, and the loader ignores them. Only two fields still matter on modern Windows, and they're the two we'll focus on.

The first is e_magic, the very first two bytes of the file. Its value is fixed at 0x5A4D, which is "MZ" in ASCII — the initials of Mark Zbikowski, an MS-DOS developer who designed the original executable format. You can see it plainly at the top-left of the hex dump: 4D 5A followed by the rest of the structure. The Windows loader looks at these two bytes and refuses to load anything that doesn't start with them. Every .exe, every .dll, every .sys driver on Windows starts with MZ.

The second field that matters is e_lfanew, a 4-byte field at offset 0x3C. You can read it directly from the dump: at offset 0x3C the bytes are 80 00 00 00, which as a little-endian 32-bit integer is 0x00000080. This is a file offset — the byte position where the modern PE structure begins. The loader's logic for finding the PE header is, almost literally, "read the 4 bytes at offset 0x3C, jump there, and start reading the PE signature." That tiny pointer is the bridge from the DOS-era format to the modern Windows format.

Why is e_lfanew at exactly 0x3C? Because back when this format was designed, that location was reserved space in the original MS-DOS executable header — a place where four bytes could be added without breaking compatibility with existing DOS tools. The PE format hijacked that slot to store a forwarding pointer.

Why is it a file offset rather than an RVA? Because of the timing rule from the previous section. The loader reads e_lfanew at the very start of the loading process, when it has just opened the file and hasn't mapped anything into memory yet. There is no image-in-memory for an RVA to be relative to. Worse: the loader doesn't yet know the image base (which is stored inside the Optional Header, which is what e_lfanew is helping us find). The logic would be circular — to follow an RVA, the loader would need information it can only obtain by following e_lfanew. A file offset breaks the circularity.

Between the DOS Header and the PE structure that e_lfanew points at, there's a small region — typically 64 bytes in our binary — called the DOS Stub. It's not a "header" in any sense. It's an actual MS-DOS program. Here are its bytes, starting at offset 0x40:

The first fourteen bytes are real x86-16 machine code. Disassembled, they read: push the code segment, pop it into the data segment, load the address of an offset-14 string into DX, call MS-DOS print-string service (interrupt 21h, function 9), then call the exit service (interrupt 21h, function 4Ch). The remaining bytes are the ASCII string the program prints, terminated with $ — MS-DOS string convention. If you took just this part of the file and ran it on real DOS, it would print "This program cannot be run in DOS mode." and exit cleanly. That's the whole point of the stub: a courtesy message to anyone who tries to run a Windows executable on MS-DOS.

The Windows loader does not read or execute the DOS Stub. It jumps over it entirely using e_lfanew. The stub exists purely as a vestigial limb — useful in 1993, harmless today.

Modern toolchains sometimes hide useful information in the DOS Stub region, in the area between the end of the stub code and the start of the PE structure. The Microsoft linker writes a "Rich header" there — a small undocumented blob containing version IDs of the Microsoft toolchain components used to build the binary (cl.exe, link.exe, masm.exe, etc.). The Rich header isn't part of the official PE specification, and binaries from non-Microsoft toolchains like MinGW (including the one we're looking at) don't have one. But for MSVC-built binaries — which is most native Windows software you'll encounter — malware analysts read it routinely because it can fingerprint the exact build environment. We won't go further into it here, but it's worth knowing that the DOS-stub region isn't quite as empty as the official spec implies.

The PE signature and the COFF File Header

Following e_lfanew takes us to file offset 0x80. Here begins the modern part of the format. The first thing we encounter is a four-byte signature, and the structure of what comes next will be familiar from Part 1.

Recall from Part 1 that the linker's output is a COFF-style file. PE is, in Microsoft's own framing, "COFF plus extra headers bolted onto the front so the operating system can load it." We've just walked past those extra headers — the DOS bits and the four-byte signature — and we're about to land on the COFF File Header itself. Once we get into the section table, the structures are the same ones we discussed at the byte level for object files in Part 1.

The first four bytes — 50 45 00 00 — spell "PE\0\0" in ASCII. This is the PE signature: the moment in the file where the loader has officially crossed the boundary from "this might just be an MS-DOS executable" to "this is a Portable Executable." If these four bytes aren't here exactly as expected, the loader rejects the file. That's the entire purpose of the signature: a sanity check at a known location.

The remaining 20 bytes are the COFF File Header, a structure called IMAGE_FILE_HEADER. It has exactly seven fields, all of them small, all of them read by the loader before mapping (though as we'll see, not all of them are still loader-relevant on modern Windows). Here's what each of those bytes encodes for our binary.

Machine (offset 0x84, 2 bytes). The bytes 64 86 read as the little-endian value 0x8664, which is IMAGE_FILE_MACHINE_AMD64 — x86-64. The loader uses this to refuse executables compiled for the wrong CPU; an ARM64 Windows machine running our x86-64 binary would either reject it or hand it to a binary translator. Other common values are 0x014C for 32-bit x86 and 0xAA64 for native ARM64.

NumberOfSections (offset 0x86, 2 bytes). 0A 00 reads as 0x000A = 10. There are ten section header entries following the Optional Header. The loader needs this count to know how many 40-byte section headers to read.

TimeDateStamp (offset 0x88, 4 bytes). 5F E1 15 6A reads as 0x6A15E15F = 1,779,818,847 seconds since the Unix epoch, which is Tuesday, May 26, 2026 at 18:07:27 UTC — the moment MinGW finished linking our binary. Historically this field was a literal build timestamp and analysts used it to correlate binaries to a build environment. That's no longer a reliable read on modern toolchains. MSVC's /Brepro switch, Rust's default behaviour, and Go's deterministic builds all overwrite this field with a hash of the binary's contents instead of a timestamp — the goal being reproducible builds, where compiling the same source twice yields byte-identical output. Many shipping Windows binaries today have TimeDateStamp values that look like timestamps from 2038 or 2106 (because hash bits happened to land in those ranges) and bear no relationship to when the linker actually ran. Treat this field as advisory at best; for malware analysis it's also frequently spoofed or zeroed outright.

PointerToSymbolTable (offset 0x8C, 4 bytes) and NumberOfSymbols (offset 0x90, 4 bytes). Both are 00 00 00 00. These fields are leftovers from the COFF object-file world we discussed in Part 1 — they pointed at the symbol table that traveled with the object file. The linker stripped the symbol table when it built the executable (it had served its purpose during linking), so both fields are zero. They are almost always zero in modern PE files; symbol information for debugging lives elsewhere now.

SizeOfOptionalHeader (offset 0x94, 2 bytes). F0 00 reads as 0x00F0 = 240 bytes. This tells the loader how many bytes the Optional Header occupies — important because the Optional Header's actual size depends on whether it's the PE32 or PE32+ variant, and the loader needs the exact count to know where the section header table starts.

Characteristics (offset 0x96, 2 bytes). 2E 02 reads as 0x022E, which is a bitfield. The bits set in this value are IMAGE_FILE_EXECUTABLE_IMAGE (0x0002, "this file is valid for execution"), IMAGE_FILE_LINE_NUMS_STRIPPED (0x0004), IMAGE_FILE_LOCAL_SYMS_STRIPPED (0x0008), IMAGE_FILE_LARGE_ADDRESS_AWARE (0x0020, "this binary can handle addresses above 2 GB"), and IMAGE_FILE_DEBUG_STRIPPED (0x0200, "debug information has been removed from this image"). Together they add up to 0x022E. The most useful bit to recognize is IMAGE_FILE_DLL (0x2000) — when that's set, the binary is a DLL rather than an EXE.

That's the entire COFF File Header. Seven fields, 20 bytes, read by the loader before mapping. A few of them — Machine, NumberOfSections, SizeOfOptionalHeader, and parts of Characteristics — are directly loader-relevant. Others (TimeDateStamp, the two zeroed symbol-table fields, the stripping flags) are linker output or legacy debug metadata that the loader doesn't really care about. Now we step into the part of the format that's specific to executables.

The Optional Header — not optional at all

Immediately after the 20-byte COFF File Header comes the structure called the Optional Header. Despite the name, this header is not optional for executables — it's required for every .exe and .dll Windows knows how to load. The "optional" part of the name is a holdover from the COFF specification, which defined this structure as optional for object files (the .obj files from Part 1). Object files don't need an Optional Header because they don't need to be loaded; they just need to be linked. Executables, by contrast, need every byte of it.

The Optional Header is where the loader learns almost everything it needs to construct the in-memory image. Where the COFF File Header says "this is a binary, here's the CPU, here are the section count and characteristics flags," the Optional Header says "here is where to load me, here is how to align my sections, here is where my code starts, and here are sixteen pointers to the data structures inside me that you'll need to set up the process." It is by far the most information-dense structure in a PE file.

The structure is 240 bytes for a 64-bit (PE32+) executable like ours and 224 bytes for a 32-bit (PE32) executable. We're going to walk through the most important fields field-by-field. There are about thirty in total; we'll look at ten in detail, then list the rest in a reference table at the end of the section.

The first field of the Optional Header is the byte that distinguishes 32-bit from 64-bit PEs.

Magic (offset 0x98 in the file, the very first field of the Optional Header). For our binary, the bytes are 0B 02, which reads as 0x020B — the magic number for PE32+ (64-bit). The other value you'll see is 0x010B, the PE32 (32-bit) magic. There's a third value, 0x0107, for ROM images that you'll basically never encounter outside firmware work. Tools sometimes refer to PE32+ as "PE64" — same thing.

The Optional Header's structure is slightly different between PE32 and PE32+. Specifically, a few address-type fields are 4 bytes in PE32 and 8 bytes in PE32+ — ImageBase, the four stack/heap reserve and commit values — and PE32 has one extra 4-byte field (BaseOfData) that PE32+ does without. That's why the overall size differs by 16 bytes. The Magic field tells the loader which variant to expect so it can parse the rest correctly.

AddressOfEntryPoint (offset 0x10 within the Optional Header, so file offset 0xA8, 4 bytes). Our bytes are 10 14 00 00 = 0x00001410. This is an RVA — the offset within the mapped image where execution begins. The first instruction the CPU will execute, once the image is fully loaded and ready to run, lives at ImageBase + 0x1410. The linker chose this RVA at build time by placing the entry function (typically the C runtime's mainCRTStartup, which calls main) at that offset within the .text section.

The entry point's value is one of the clearest examples of the read-early-use-late pattern we discussed. The loader reads this 4-byte RVA out of the Optional Header while it's still parsing on-disk headers. It stores the number. Only much later — after mapping every section, applying relocations, resolving every import — does the loader actually compute ImageBase + 0x1410 and jump there. The RVA in the Optional Header is a coordinate that won't be used for a while yet.

ImageBase (offset 0x18 within the Optional Header, file offset 0xB0, 8 bytes for PE32+). Our bytes are 00 00 00 40 01 00 00 00, which reads as 0x0000000140000000. This is the preferred virtual address where the linker would like the loader to map the image. For ordinary 64-bit Windows EXEs — including the toolchains you'll meet first — the conventional preferred base is 0x0000000140000000. DLLs typically use a different convention, commonly 0x0000000180000000 (though this is a linker default, not a format requirement). Both fit comfortably in the 48 bits that x86-64 currently uses for virtual addresses.

The preferred word is doing real work in that sentence. The loader is not required to honor it. On modern Windows, if the binary opts in to ASLR (via the IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE flag we'll see below) and ships with the relocation information needed to be moved safely, the loader may choose a randomized base address instead of the preferred ImageBase. When that happens, every absolute address inside the binary that depended on ImageBase being 0x140000000 is now wrong — and the .reloc section we saw in the file layout exists specifically to tell the loader what to patch. We'll dig into how that works in Part 4. For now, the field is best understood as a hint: "if you can load me here, do; if not, fix me up."

SectionAlignment (file offset 0xB8, 4 bytes) and FileAlignment (file offset 0xBC, 4 bytes). These are the two alignment values that govern how the binary is laid out — one for memory, one for disk. Our values are 00 10 00 00 = 0x1000 (4 KB) for the section alignment, and 00 02 00 00 = 0x200 (512 bytes) for the file alignment.

These two values control the relationship between the on-disk file and the in-memory image. Every section, once mapped into memory, starts at an address that's a multiple of SectionAlignment. Every section, on disk, starts at a file offset that's a multiple of FileAlignment. Section alignment is almost always one page (4 KB on x86-64) because the operating system enforces page-level permissions — you can't make half a page executable. File alignment is typically smaller (512 bytes is common, though tools can produce smaller values) because there's no equivalent requirement on disk: the file just needs to be a stream of bytes, and packing sections closer together saves disk space. The mismatch between these two alignments is what causes the file to "stretch" when loaded — sections that sit adjacently on disk get spread out in memory to hit page boundaries. That stretch is the entire subject of Part 3.

SizeOfImage (file offset 0xD0, 4 bytes). Our bytes are 00 10 01 00 = 0x00011000 = 69,632 bytes. This is the total size of the image once it has been mapped into memory, rounded up to SectionAlignment. The loader's first concrete act, after parsing the headers, is to ask the operating system for exactly this many bytes of contiguous virtual address space. Everything from RVA 0x0000 (the start of the headers) to RVA 0x00011000 (just past the end of the last section) lives within that allocation. The file on disk is 39 KB; the image in memory is 68 KB. The difference is the alignment stretch.

SizeOfHeaders (file offset 0xD4, 4 bytes). Our bytes are 00 04 00 00 = 0x400 = 1,024 bytes. This is the total size of everything from the DOS Header through the section header table, rounded up to FileAlignment. It defines where the section bodies start on disk: the very first section's PointerToRawData will be at offset 0x400, which is exactly what we saw in the file layout earlier.

Subsystem (file offset 0xDC, 2 bytes). Our bytes are 03 00 = 0x0003 = IMAGE_SUBSYSTEM_WINDOWS_CUI, "Windows character-mode (console) UI." When Windows launches our binary, it'll see this value and ensure the process has a console attached — if the program was double-clicked from Explorer rather than launched from a command prompt, Windows allocates a new console window for it. The two most common alternatives are IMAGE_SUBSYSTEM_WINDOWS_GUI (0x0002, for graphical applications — no console) and IMAGE_SUBSYSTEM_NATIVE (0x0001, for drivers and other kernel-mode-ish code that doesn't use the Win32 subsystem at all). This field is what determines whether running an .exe pops up a black console window or not.

DllCharacteristics (file offset 0xDE, 2 bytes). Our bytes are 60 01 = 0x0160. Like the COFF Characteristics field, this is a bitfield, but the bits here are the ones that matter for modern security. The flags set in our binary are:

Other bits you'll meet in real binaries: IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY (0x0080, requires a valid Authenticode signature to load), IMAGE_DLLCHARACTERISTICS_GUARD_CF (0x4000, the binary supports Control Flow Guard), and IMAGE_DLLCHARACTERISTICS_APPCONTAINER (0x1000, the binary requires the AppContainer sandbox). There's also a separate, newer "Extended DLL Characteristics" mechanism — added because the original 16-bit flag field ran out of room — that carries flags like CET shadow stack compatibility (IMAGE_DLLCHARACTERISTICS_EX_CET_COMPAT). These extended flags live in the debug directory rather than this header, so they don't appear in the DllCharacteristics field directly.

One field we'll meet again that's worth naming now: NumberOfRvaAndSizes (file offset 0x104, 4 bytes), with the value 10 00 00 00 = 0x00000010 = 16. This counts how many data directory entries follow. The PE spec defines 16 standard data-directory slots, and ordinary modern toolchains emit all 16. But the PE specification explicitly warns parsers to honor this field before probing any specific directory entry — unusual binaries, packers, and tiny-PE experiments sometimes set it to smaller values to shave bytes off the optional header. The data directories — the array of (RVA, size) pairs that comes next — are the bridge between the Optional Header and the actual section content, and they're the subject of the next section.

The remaining fields of the Optional Header are mostly bookkeeping and version metadata that the loader either uses for compatibility checks or ignores. For reference:

Of these, the four stack/heap reserve and commit values are the ones most likely to be of interest in real analysis — they affect process memory layout and occasionally show up as quirks in malware that wants unusually large or unusually small stacks. The version fields are mostly compatibility lies told by linkers and aren't enforced. CheckSum is computed over the entire image; it's required to be valid for kernel-mode drivers and a few other specific cases, but for ordinary user-mode EXEs the loader doesn't verify it, so different linkers handle it differently — MinGW writes a real checksum (ours is 0x1493D), while many other toolchains leave it at zero.

Data Directories: pointers into the sections

The final piece of the Optional Header is an array of data directories. The spec defines sixteen standard slots, and ordinary modern toolchains emit all sixteen — including our binary, which sets NumberOfRvaAndSizes = 16. The 128 bytes of directories occupy the last region of the Optional Header. Parsers that want to be safe should always honor NumberOfRvaAndSizes rather than assume sixteen entries are present, because unusual binaries, packers, and tiny-PE experiments do sometimes use smaller values. Each entry is a tiny structure called IMAGE_DATA_DIRECTORY, just two fields:

Eight bytes each, sixteen of them, total 128 bytes. Each entry is, in effect, a (where, how big) pointer to an important structure somewhere inside the image's sections. VirtualAddress here is an RVA — the in-memory offset coordinate we covered in "Three kinds of 'where'" — not a file offset. The data directories are how the loader finds things like the import table or the relocation table without having to scan every section looking for them. Instead, it knows exactly which RVA to follow and how many bytes to read.

The sixteen entries have fixed meanings, assigned by index. Most modern binaries don't fill all sixteen; entries are zeroed out when not used. Here are all sixteen for our hello.exe:

Of the sixteen possible entries, our binary fills five: Import Table, Exception Table, Base Relocation Table, TLS Table, and IAT. The others are zero. This is typical for a simple native executable; large applications with resources, signatures, and CLR metadata fill more.

Notice that every non-zero entry's RVA falls within one of the sections we saw earlier. The Import Table at RVA 0xD000 lands inside .idata (which starts at RVA 0xD000). The Exception Table at RVA 0xA000 lands inside .pdata (which starts at RVA 0xA000). The Base Relocation Table at RVA 0x10000 lands inside .reloc. This is the key mental model: the data directories do not contain structures themselves; they're pointers into the section bodies, telling the loader where to find structures that the linker placed inside specific sections at build time.

The natural question is: where do those RVA values come from? Nothing in the source code mentions 0xD000 or 0x10000; we never asked for the import table to live at any particular address. The answer is that the linker chose those RVAs itself, during its final layout pass. After it decided which structures go in which sections (the import descriptors in .idata, the relocation blocks in .reloc, and so on) and what RVA each section would start at, it knew the RVA of every structure — and it wrote each one into the corresponding data directory entry as the last step before emitting the file. The data directories are how the linker tells the loader what it built. We'll see the precise algorithm the linker uses to assign section RVAs a few sections from now, in "How the linker assigns RVAs."

That arrangement explains the read-early-use-late pattern from earlier. The loader reads the data directories out of the Optional Header before mapping anything — they're just sixteen 8-byte records sitting in the header. The RVAs they contain don't mean anything yet, because the sections haven't been placed in memory. But once the loader has mapped the sections, those RVAs suddenly become valid pointers to live data. The loader then revisits each non-zero directory entry, computes ImageBase + RVA, and processes the structure there: parsing imports, applying relocations, registering TLS callbacks, and so on.

There's exactly one data directory that breaks the rule we established about RVAs vs file offsets. Entry 4, the Certificate Table (also called the Security Directory) — the digital signature on a code-signed binary. The "VirtualAddress" field in that one directory entry is actually a file offset, not an RVA. The reason: certificate data is not mapped into memory as part of the image; it's appended to the file after the last section, and verified by tools that read the file from disk (Authenticode signature verification happens at install time and at execution time, but always against the file). Putting an RVA there would be meaningless — the bytes aren't in the image. So Microsoft used a file offset, and documented the exception. Our binary is unsigned, so this entry is zero and the exception doesn't bite us; but if you ever inspect a signed binary and find a non-zero entry at index 4, remember that the number you're looking at is a file offset.

The data directories that matter most for the rest of this series are the ones we'll dig into in Part 4: the Import Table and IAT (entries 1 and 12), which together describe what DLLs the program needs and where the loader should fill in the imported function addresses, and the Base Relocation Table (entry 5), which tells the loader what to patch if the image is loaded somewhere other than its preferred ImageBase. We're not going to dissect those structures here — that's Part 4's territory — but you now know how the loader finds them.

Section headers — where on-disk meets in-memory

Immediately after the Optional Header ends at file offset 0x188 comes the section header table. There's one entry per section — ten of them in our binary — and each entry is exactly 40 bytes. The structure is called IMAGE_SECTION_HEADER. These are the most consequential 400 bytes in the whole file, because they're what tells the loader how to actually build the in-memory image from the on-disk bytes.

Each section header contains eight fields. We'll walk through them using the real bytes for our binary's first section, .text, whose 40-byte header starts at file offset 0x188:

Name (offset 0, 8 bytes). 2E 74 65 78 74 00 00 00 — the ASCII string ".text" followed by three zero bytes of padding. The field is a fixed-size 8-byte buffer; section names that don't fill it are zero-padded, and section names exactly 8 bytes long are stored without a null terminator. The loader doesn't use this name to make decisions — it's purely a human-readable label. Conventions like .text, .data, .rdata, .reloc are just that, conventions; the linker can name a section anything it wants, and you'll occasionally meet binaries with unusual section names chosen by obfuscators or specialized linkers.

VirtualSize (offset 8, 4 bytes). 68 6B 00 00 = 0x00006B68 = 27,496 bytes. This is the exact, unpadded size the section occupies in memory once mapped. It's the linker's honest count of how many bytes of real content the section contains — code, data, whatever.

VirtualAddress (offset 12, 4 bytes). 00 10 00 00 = 0x00001000. This is an RVA — where the section should be placed within the mapped image, measured from ImageBase. The loader will arrange to have the section's content visible at ImageBase + 0x1000 after mapping. For our 64-bit binary with ImageBase = 0x140000000, that's 0x140001000 — the address the CPU sees when executing code in .text.

SizeOfRawData (offset 16, 4 bytes). 00 6C 00 00 = 0x00006C00 = 27,648 bytes. This is the size of the section on disk, rounded up to FileAlignment (which is 0x200 in our binary). Note the difference from VirtualSize: 27,648 versus 27,496 — the on-disk size is 152 bytes larger, because file alignment requires rounding up to a 512-byte boundary, and the unpadded data fell short of that boundary.

PointerToRawData (offset 20, 4 bytes). 00 04 00 00 = 0x00000400. This is a file offset — where this section's bytes start in the file on disk. The loader will read SizeOfRawData bytes starting from this position in the file.

This is the moment to call out the central trick of this structure: this one 40-byte record contains both a file offset and an RVA, side by side. PointerToRawData tells the loader "read from this position in the file"; VirtualAddress tells the loader "write to this offset in the mapped image." The section header is precisely the structure that straddles the boundary between disk and memory. It is the entry in the format whose only purpose is to express the relationship between the two coordinate systems we discussed at the start of this post.

The "if VirtualSize > SizeOfRawData" case is rare for code and ordinary data sections (where they're typically equal except for alignment padding), but it's the rule for BSS-style data. The .bss section in our binary has VirtualSize = 0xB80 and SizeOfRawData = 0 — it occupies 2,944 bytes of memory, all zero-filled, and contributes zero bytes to the file. That's how uninitialized data is stored cheaply: the file says "I want this much space, here's no content," and the loader zeroes the memory at load time.

One toolchain caveat to keep in your back pocket: a separate .bss section is what MinGW does. MSVC by default merges uninitialized data into .data, so you'll often see binaries where .data has VirtualSize > SizeOfRawData — the initialized portion sits in the file as raw bytes, and the uninitialized portion just hangs off the end as a virtual-size "tail" that the loader zeroes. There's no rule that says BSS must live in its own section; it just has to live somewhere with VirtualSize > SizeOfRawData, and the linker is free to combine it with .data if it wants. When you don't see a .bss in a binary, that's usually why.

PointerToRelocations and PointerToLinenumbers (offsets 24 and 28, 4 bytes each). Both zero in our binary. These are leftovers from the COFF object-file format we discussed in Part 1, where each section carried its own relocations and per-line debug information. In an executable, that information is consolidated elsewhere (relocations go to the .reloc section and the Base Relocation Table; debug info, if present, goes to its own section and is pointed at by the Debug Directory). For executables, these fields are always zero.

NumberOfRelocations and NumberOfLinenumbers (offsets 32 and 34, 2 bytes each). Both zero. Same reason.

Characteristics (offset 36, 4 bytes). 60 00 00 60 = 0x60000060. This is the third bitfield we've seen so far, and it's the most important one for the loader's actual work: it determines what page permissions the section gets when it's mapped. The bits set in our .text Characteristics are:

Four flags, OR'd together, equal 0x60000060. The combination says "code with mixed data, executable, readable" — exactly what you want for a .text section. Notice what's not set: IMAGE_SCN_MEM_WRITE (0x80000000). The page on which our entry-point code lives will be readable and executable, but not writable. That's a deliberate security choice — preventing the program from accidentally or maliciously modifying its own code at runtime.

For comparison, .data in our binary has Characteristics 0xC0000040 = INITIALIZED_DATA | MEM_READ | MEM_WRITE — readable, writable, not executable. .rdata has 0x40000040 = INITIALIZED_DATA | MEM_READ — readable only. And .reloc has 0x42000040 = INITIALIZED_DATA | MEM_DISCARDABLE | MEM_READ — readable, and the unusual MEM_DISCARDABLE flag, which tells the loader that this section can be thrown away after it's been processed, because the relocation entries inside it are only useful during loading.

The Characteristics field is, in effect, a compact description of what permissions the operating system should grant the memory pages that hold this section's content. The loader translates these flags directly into page-protection settings when it maps the section. That's the topic of Part 3 — the protections, the alignment stretch, the layout transformation. For now, just know that every section in every PE carries, encoded in those four bytes, the answer to "is this readable, writable, executable, or something else?"

How the linker assigns RVAs

The VirtualAddress values in the section headers — 0x1000 for .text, 0x8000 for .data, 0x9000 for .rdata, and so on — are not magic. They were chosen by the linker at build time using a simple sequential algorithm, the one we sketched in Part 1's discussion of the linker's Phase 2. Now that we have all the surrounding machinery in view, we can describe the algorithm precisely.

The headers occupy RVA 0x0000 through some value just past the end of the section header table. In our binary, the headers and section table together take 792 bytes — DOS Header (64) + DOS Stub (64) + PE signature (4) + COFF File Header (20) + Optional Header (240) + ten section headers at 40 bytes each (400). That ends at file offset 0x318, which becomes RVA 0x318 once the headers are mapped. But the first section can't start at 0x318 in memory — it has to align to a multiple of SectionAlignment (0x1000). So the linker rounds up: the first section gets VirtualAddress = 0x1000.

Let's trace it through our binary. .text has VirtualAddress = 0x1000 and VirtualSize = 0x6B68. Adding those gives 0x7B68, the first byte after .text ends. Rounding up to the next 0x1000 boundary gives 0x8000 — and that's exactly the VirtualAddress of .data. .data has VirtualSize = 0xC0, so it ends at 0x80C0, which rounds up to 0x9000 — the VirtualAddress of .rdata. .rdata's size is 0xDA0, ending at 0x9DA0, rounding up to 0xA000 — the VirtualAddress of .pdata. The pattern continues through all ten sections.

The math on the disk side is the same shape, with FileAlignment (0x200) substituted in. Our .text starts at PointerToRawData = 0x400 and has SizeOfRawData = 0x6C00, ending at file offset 0x7000, where .data begins — and 0x7000 is already a multiple of 0x200, no rounding needed. .data with SizeOfRawData = 0x200 ends at 0x7200, where .rdata begins. And so on, sequentially through the file.

Two things follow from this. First, the section table is entirely deterministic — given a list of sections and the two alignment values, you can compute every RVA and every file offset by walking down the list once. The linker does this exactly once, at build time, and writes the results into the section headers. The loader doesn't recompute them; it just reads them and obeys.

Second, the difference between the two alignments is what makes the file smaller than the image. FileAlignment = 0x200 packs sections close together on disk; SectionAlignment = 0x1000 spreads them out in memory. Our file is 39,424 bytes (rounded to file-alignment); our image is 69,632 bytes (rounded to section-alignment). The image is 77% larger than the file, even though most of the content — every byte of code, every byte of initialized data — is the same. The difference is mostly alignment gaps between sections, plus memory-only zero-filled regions like .bss (which has 2,944 bytes of VirtualSize and zero on-disk presence).

That stretch is the central topic of Part 3. The PE format is, in this sense, a compact encoding of an image that intentionally has gaps in it once it's expanded.

Converting between RVA and file offset

Now we can finally answer the question this part has been building toward: given an RVA — say, the entry point's 0x1410 — how do you find the byte in the file? And how do you go the other direction, from a file offset to an RVA? This conversion is the single skill PE analysts perform most often, because disassemblers display RVAs and hex editors display file offsets, and you'll constantly find yourself with one when you need the other.

There is no single formula. The file and the image have different alignments, and sections sit at different relative positions in each. The section table is the bridge — every conversion goes through it.

Step 1. Which section contains 0x1410? Checking each section header: .text has VirtualAddress = 0x1000 and VirtualSize = 0x6B68, so it covers RVAs 0x1000 through 0x7B68. Our target 0x1410 is comfortably inside that range. The entry point lives in .text.

Step 2. The offset within .text: 0x1410 - 0x1000 = 0x410. The entry point is 0x410 bytes from the start of the .text section.

Step 3. The file offset: .text starts in the file at PointerToRawData = 0x400, so the entry point lives at 0x400 + 0x410 = 0x810 in the file.

Let's verify by looking at the actual bytes. If we open hello.exe in a hex editor and jump to file offset 0x810:

The first eight bytes are 55 48 89 E5 48 83 EC 20. Decoded as x86-64 instructions, that's:

That's the standard function prologue we discussed in Part 1's assembly snippet — push the old base pointer, set up a new frame, reserve stack space. The sub rsp, 0x20 reserves exactly 32 bytes, which is the Windows x64 shadow space we walked through in Part 1's calling-convention discussion. The entry point of our program is, byte for byte, the prologue we predicted it would be. The RVA-to-file-offset conversion landed on the right bytes.

The inverse direction — file offset → RVA — works the same way in reverse:

A subtlety: bytes in the headers and in any file-alignment padding don't belong to any section, and don't have RVAs in the usual sense. (The headers do get mapped into memory at RVA 0, so they have RVAs by extension; padding bytes just don't exist in memory.) For locations inside section bodies the conversion is always well-defined; for locations elsewhere in the file, the question may not have a meaningful answer.

A second subtlety is for anyone writing tooling. The clean algorithm above works for ordinary well-formed binaries where each RVA falls neatly into exactly one section's VirtualAddress..VirtualAddress + VirtualSize range. Real PE parsers have to be more defensive: SizeOfRawData can be larger than VirtualSize (alignment padding on disk), or smaller (some on-disk bytes are tail-zeroed in memory), or zero (BSS-style sections with no on-disk content). Packers and malware deliberately exploit those edge cases — overlapping section ranges, zero-sized sections, sections with VirtualSize that crosses image boundaries — to break naive parsers. If you're building tools, mirror what hardened parsers like pefile do; if you're just reading binaries, the simple algorithm covers the common case.

Every PE-inspection tool implements this algorithm internally. It is, in the end, the entire reason the section header table exists — to let anyone with the headers and a position in either coordinate system compute the matching position in the other.

Reading PE files in 2026

In practice, nobody decodes a PE file by hand for very long. Once you understand the structure, you switch to tools that parse it for you. The ones an analyst is most likely to use in 2026, roughly in order of how often they come up:

dumpbin ships with Visual Studio and is the canonical Microsoft tool. dumpbin /headers foo.exe dumps every header structure; /imports shows the import table; /exports shows what the binary exports; /all dumps everything. It only runs from the Visual Studio developer command prompt.

objdump (the GNU version) and llvm-objdump / llvm-readobj are the cross-platform equivalents. They work on Linux, macOS, and Windows, and they handle PE files alongside ELF and Mach-O. objdump -p foo.exe dumps PE-specific headers; llvm-readobj --coff-load-config -r foo.exe gives the most modern dump including extended characteristics.

PE-bear (by hasherezade) is a free graphical tool that handles malformed PE files — important when analyzing malware, which often deliberately stretches the format to confuse parsers. It's particularly good at side-by-side hex / structure views.

CFF Explorer (by Erik Pistelli) is a free PE editor that supports both PE32/PE32+ and .NET binaries. It can read, edit, and rebuild structures — useful for both analysis and patching.

PEStudio performs static malware-triage analysis: it parses the structures and flags suspicious indicators (uncommon imports, packed sections, suspicious entropy, etc.). Free for non-commercial use.

pefile is a Python library by Ero Carrera, widely used for scripting analysis at scale. If you ever need to process a thousand PE files programmatically, this is the tool.

For ad-hoc work on small files, plain xxd or any hex editor (HxD, 010 Editor, ImHex) is enough — we've been doing exactly that throughout this post.

Try it yourself — reproduce every byte

Every number quoted in this post comes from a real binary you can build and inspect in two minutes. On a Linux machine or in WSL, with the MinGW cross-compiler installed (apt install gcc-mingw-w64-x86-64 on Debian/Ubuntu):

# Write the source file
cat > hello.c << 'EOF'
#include <stdio.h>
int main(void) { printf("hello\n"); return 0; }
EOF

# Compile to Windows PE, optimized and stripped
x86_64-w64-mingw32-gcc -O2 -s -o hello.exe hello.c

# Inspect the headers
x86_64-w64-mingw32-objdump -p hello.exe
x86_64-w64-mingw32-objdump -h hello.exe

# Read bytes by file offset
xxd -s 0x00 -l 256 hello.exe    # DOS header + stub
xxd -s 0x80 -l 24  hello.exe    # PE signature + COFF File Header
xxd -s 0x98 -l 240 hello.exe    # Optional Header
xxd -s 0x188 -l 400 hello.exe   # Section header table
xxd -s 0x810 -l 16  hello.exe   # Entry point bytes

The exact byte values may differ slightly from the ones in this post — TimeDateStamp will reflect your build time, and tiny binary differences are expected across MinGW versions — but the structure and the offsets will match. The values are stable enough that the worked example in the previous section (entry point RVA 0x1410 → file offset 0x810, prologue bytes 55 48 89 E5 48 83 EC 20) reproduces reliably.

On Windows itself, the equivalent inspection commands are dumpbin /headers hello.exe from a Visual Studio developer command prompt, or any of the GUI tools listed above. The output formatting differs; the bytes don't.

A few realities you'll meet that we haven't covered in detail: many production binaries are signed (the Certificate Table at data directory index 4 is populated; verification happens against the on-disk file), and managed .NET binaries are common enough that you'll encounter them quickly in PE analysis (the COM Descriptor at index 14 is populated; the actual code is in CIL bytecode within a section, with a tiny native stub to bootstrap the CLR). Both cases preserve all the structures we've covered; they just have additional structures sitting alongside. The DOS header, PE signature, COFF File Header, Optional Header, data directories, and section headers are present in every PE file Windows can load, signed or not, managed or not.

What we have at the end of Part 2

You can now read a PE file. Open any Windows binary in a hex editor, and you can walk down its bytes naming what each one is: MZ at byte zero, the DOS stub, the PE signature, the COFF File Header with its machine type and section count, the Optional Header with its image base and alignments and data directories, the section header table with one entry per section, padding to file alignment, then the section bodies in linker-chosen order. You can convert between the three coordinate systems — file offset, RVA, virtual address — when you need to find a specific byte. You know which fields the loader consumes before mapping and which it consumes after.

What you don't yet know is what actually happens when the file becomes a process. The structures we've described are static. The loader does something specific with them — a series of steps that turns a 39 KB on-disk file into a 68 KB region of mapped memory with the right page permissions, then patches and connects and finally hands control to the entry point. The transformation is more interesting than it sounds, because the difference between the file and the running image is not just size: it's alignment, it's permissions, it's a relationship between bytes that don't move and pointers that have to be fixed up.