Headers, sections, data directories, and the three different kinds of "offset" the format insists on using interchangeably. We open a real binary in a hex editor and read it byte by byte.
In Part 1 we traced a program from source code to bytes — through the preprocessor, the compiler, the assembler, and finally the linker, which produced an executable file. That file contained sections, a symbol table that was thrown away, relocations that had been patched, imports recorded as future dependencies, and an image-base value the loader could ignore at runtime. Everything we built up to was a sketch of what comes out the other end of the toolchain.
This part fills in the sketch. We're going to open an actual .exe in a hex editor and walk through it byte by byte, structure by structure, until you can point at any region of a real PE file and say what it is. The Portable Executable format is not complicated — it has about a dozen relevant structures and a hundred or so fields total — but it does have an unusual feature that trips up almost everyone: it uses three different coordinate systems to describe where things are, and it switches between them mid-structure. Getting comfortable with those three coordinate systems is the only conceptual leap in this part. The rest is just reading.
The binary we're going to read is the same hello.c program from Part 1, compiled with MinGW's x86_64-w64-mingw32-gcc and stripped of debug information. That gives us a tidy 39 KB Windows executable with ten sections — small enough to fit on screen, large enough to contain everything we want to see. Every hex value and field shown in this post comes from that real binary; you can reproduce it yourself with the commands in the callout at the end.
Before we look at any structure in the file, we have to nail down the source of confusion that ambushes almost everyone learning PE internals. There are three different ways to express the location of something in a PE — three coordinate systems — and the format uses all three, sometimes within a single structure. They aren't interchangeable; converting between them requires information that isn't always at hand. If you don't keep them straight, you will read a hex dump and end up at the wrong byte.
The three coordinates are file offset, RVA, and virtual address. Let's define them concretely.
A file offset is a byte position from the start of the PE file on disk. "Go to byte 0x3C" means open the file in a hex editor, scroll to position 0x3C, and start reading there. File offsets are absolute within the file: byte 0 is the first byte, byte 0x80 is the 128th byte, and so on. File offsets work whether the program is loaded into memory or not, because they're just positions within an on-disk file. They are the coordinate system of disk tools — hex editors, file readers, the linker writing the output.
An RVA — Relative Virtual Address — is an offset from the start of the PE image once it has been loaded into memory. "RVA 0x1410" means "the byte that ends up 0x1410 bytes from wherever Windows decided to place the image when it loaded it." RVAs are how the PE format expresses locations inside the running image without committing to where the image will actually live in memory. The same RVA is valid every time the program runs, regardless of where the loader places it that day.
A virtual address (VA) is the actual memory address inside the running process — a real number the CPU can dereference. You compute it from an RVA the moment you know where the image was loaded: VA = ImageBase + RVA. If the image happened to load at 0x140000000 (the standard preferred base for 64-bit executables) and the RVA is 0x1410, the VA is 0x140001410. The CPU sees and uses VAs; the file format mostly hides them, because the file is written long before anyone knows what they'll be.
Why does the format need all three? Because of timing. The loader's life is divided into three phases — before it has mapped the file, during the mapping, and after — and each phase has access to different information.
Before mapping, the loader is just reading bytes off disk. It hasn't allocated any memory for the image yet, doesn't know where the image will land, hasn't even decided how big the allocation needs to be. The only coordinate system it can use is file offsets. RVAs would be meaningless: there is no image-in-memory for them to be relative to.
During mapping, the loader is reading sections from the file and writing them into newly allocated memory. This is the moment where both coordinate systems matter at once — it needs to know which bytes to read from the file (file offset) and where to put them in the new image (RVA). The section headers, which the loader reads at this phase, contain exactly these two fields side by side. We'll see them shortly.
After mapping, the image exists in memory. Now everything is described in RVAs, because the entire layout has been built and locations relative to ImageBase are well-defined. The entry point, the import table, the export table, the relocation table — all of these are expressed as RVAs in the file's headers, because they describe locations within the mapped image. The loader reads these RVAs out of the file early (before mapping), but it doesn't use them — doesn't dereference them, doesn't follow them — until after the image is in memory.
Some PE structures look like they break this rule. The entry point's RVA, for instance, is stored in the Optional Header, which the loader reads before mapping. How can a pre-mapping structure contain a post-mapping coordinate? The trick is that the loader doesn't use the RVA at the moment it reads it. It stores the value as a number, finishes mapping, and only then computes ImageBase + RVA to find the actual entry point. The same trick applies to every data directory: read early, dereferenced late.
The rule is simple: any pointer the loader must follow before mapping has to be a file offset; any pointer that describes a location in the mapped image is an RVA. The format follows this rule consistently, with exactly one exception we'll meet when we get to data directories (the Security Directory, which points at digital signature data that doesn't get mapped into memory at all). Other than that, the rule is reliable.
0x55 (push rbp). The on-disk panel uses dashed borders to mark "file" coordinates; the in-memory panels use solid borders. The conversions between the systems require external information: the section table for file offset ↔ RVA, and the image base for RVA ↔ virtual address.Before we read the bytes, here's the lay of the land. A PE file on disk is laid out in a fixed sequence: a short MS-DOS-era header at the very start, then a small DOS program (the "DOS stub"), then a four-byte signature, then a COFF File Header, then an "Optional" Header, then a table of section headers, then the sections themselves — in the order the linker arranged them. The boundaries between these regions are not negotiable. Every field that follows tells the loader, in effect, "the next thing is exactly this many bytes ahead."
The binary we'll be reading throughout this post is the stripped hello.exe we built with MinGW. It's 39,424 bytes total and contains ten sections: .text, .data, .rdata, .pdata, .xdata, .bss, .idata, .CRT, .tls, and .reloc. You met four of these in Part 1 — .text (code), .rdata (read-only data), .data (writable initialized data), and .reloc (relocation fixups) — and the other six are runtime-support sections we'll touch on as they become relevant. The point of looking at a real binary, rather than an idealized two-section diagram, is that it shows you the actual texture of a Windows executable: most PEs have eight to fifteen sections, not three.
Here's the structural map. Each region is a contiguous run of bytes; the file is read sequentially from top to bottom.
hello.exe. The first 0x400 bytes are all headers and section-header table; everything from 0x400 onwards is the section bodies in linker-chosen order. The .bss section contains uninitialized zero-filled data and has no on-disk content — the loader allocates space for it at runtime. File offsets shown on the left are real values from the actual compiled binary.Open the file at byte zero. The first 64 bytes are the DOS Header, an MS-DOS-era structure that has been preserved at the front of every Windows executable for over thirty years for one reason: backward compatibility with a 1981 operating system that almost nobody actually runs anymore.
Here are the first 64 bytes of our hello.exe, exactly as xxd displays them:
00000000: 4d5a 9000 0300 0000 0400 0000 ffff 0000 MZ..............
00000010: b800 0000 0000 0000 4000 0000 0000 0000 ........@.......
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 8000 0000 ............[..]
The DOS Header is defined as a structure called IMAGE_DOS_HEADER in Windows headers, with thirty-some fields that record things like the size of the original MS-DOS program in 512-byte pages, the initial values for the segment registers, the relocation table offset, and so on. Almost all of them are zero in any modern PE, and the loader ignores them. Only two fields still matter in 2026, and they're the two we'll focus on.
The first is e_magic, the very first two bytes of the file. Its value is fixed at 0x5A4D, which is "MZ" in ASCII — the initials of Mark Zbikowski, an MS-DOS developer who designed the original executable format. You can see it plainly at the top-left of the hex dump: 4D 5A followed by the rest of the structure. The Windows loader looks at these two bytes and refuses to load anything that doesn't start with them. Every .exe, every .dll, every .sys driver on Windows starts with MZ.
The second field that matters is e_lfanew, a 4-byte field at offset 0x3C. You can read it directly from the dump: at offset 0x3C the bytes are 80 00 00 00, which as a little-endian 32-bit integer is 0x00000080. This is a file offset — the byte position where the modern PE structure begins. The loader's logic for finding the PE header is, almost literally, "read the 4 bytes at offset 0x3C, jump there, and start reading the PE signature." That tiny pointer is the bridge from the DOS-era format to the modern Windows format.
Why is e_lfanew at exactly 0x3C? Because back when this format was designed, that location was reserved space in the original MS-DOS executable header — a place where four bytes could be added without breaking compatibility with existing DOS tools. The PE format hijacked that slot to store a forwarding pointer.
Why is it a file offset rather than an RVA? Because of the timing rule from the previous section. The loader reads e_lfanew at the very start of the loading process, when it has just opened the file and hasn't mapped anything into memory yet. There is no image-in-memory for an RVA to be relative to. Worse: the loader doesn't yet know the image base (which is stored inside the Optional Header, which is what e_lfanew is helping us find). The logic would be circular — to follow an RVA, the loader would need information it can only obtain by following e_lfanew. A file offset breaks the circularity.
Between the DOS Header and the PE structure that e_lfanew points at, there's a small region — typically 64 bytes in our binary — called the DOS Stub. It's not a "header" in any sense. It's an actual MS-DOS program. Here are its bytes, starting at offset 0x40:
00000040: 0e1f ba0e 00b4 09cd 21b8 014c cd21 5468 ........!..L.!Th
00000050: 6973 2070 726f 6772 616d 2063 616e 6e6f is program canno
00000060: 7420 6265 2072 756e 2069 6e20 444f 5320 t be run in DOS
00000070: 6d6f 6465 2e0d 0d0a 2400 0000 0000 0000 mode....$.......
The first fourteen bytes are real x86-16 machine code. Disassembled, they read: push the code segment, pop it into the data segment, load the address of an offset-14 string into DX, call MS-DOS print-string service (interrupt 21h, function 9), then call the exit service (interrupt 21h, function 4Ch). The remaining bytes are the ASCII string the program prints, terminated with $ — MS-DOS string convention. If you took just this part of the file and ran it on real DOS, it would print "This program cannot be run in DOS mode." and exit cleanly. That's the whole point of the stub: a courtesy message to anyone who tries to run a Windows executable on MS-DOS.
The Windows loader does not read or execute the DOS Stub. It jumps over it entirely using e_lfanew. The stub exists purely as a vestigial limb — useful in 1993, harmless today.
Modern toolchains sometimes hide useful information in the DOS Stub region, in the area between the end of the stub code and the start of the PE structure. The Microsoft linker writes a "Rich header" there — a small undocumented blob containing version IDs of the Microsoft toolchain components used to build the binary (cl.exe, link.exe, masm.exe, etc.). The Rich header isn't part of the official PE specification, and binaries from non-Microsoft toolchains like MinGW (including the one we're looking at) don't have one. But for MSVC-built binaries — which is most native Windows software you'll encounter — malware analysts read it routinely because it can fingerprint the exact build environment. We won't go further into it here, but it's worth knowing that the DOS-stub region isn't quite as empty as the official spec implies.
Following e_lfanew takes us to file offset 0x80. Here begins the modern part of the format. The first thing we encounter is a four-byte signature, and the structure of what comes next will be familiar from Part 1.
Recall from Part 1 that the linker's output is a COFF-style file. PE is, in Microsoft's own framing, "COFF plus extra headers bolted onto the front so the operating system can load it." We've just walked past those extra headers — the DOS bits and the four-byte signature — and we're about to land on the COFF File Header itself. Once we get into the section table, the structures are the same ones we discussed at the byte level for object files in Part 1.
Here are the next 24 bytes, starting from 0x80:
00000080: 5045 0000 6486 0a00 5fe1 156a 0000 0000 PE..d..._..j....
00000090: 0000 0000 f000 2e02 ........
The first four bytes — 50 45 00 00 — spell "PE\0\0" in ASCII. This is the PE signature: the moment in the file where the loader has officially crossed the boundary from "this might just be an MS-DOS executable" to "this is a Portable Executable." If these four bytes aren't here exactly as expected, the loader rejects the file. That's the entire purpose of the signature: a sanity check at a known location.
The remaining 20 bytes are the COFF File Header, a structure called IMAGE_FILE_HEADER. It has exactly seven fields, all of them small, all of them read by the loader before mapping (though as we'll see, not all of them are still loader-relevant in 2026). Here's what each of those bytes encodes for our binary.
Machine (offset 0x84, 2 bytes). The bytes 64 86 read as the little-endian value 0x8664, which is IMAGE_FILE_MACHINE_AMD64 — x86-64. The loader uses this to refuse executables compiled for the wrong CPU; an ARM64 Windows machine running our x86-64 binary would either reject it or hand it to a binary translator. Other common values are 0x014C for 32-bit x86 and 0xAA64 for native ARM64.
NumberOfSections (offset 0x86, 2 bytes). 0A 00 reads as 0x000A = 10. There are ten section header entries following the Optional Header. The loader needs this count to know how many 40-byte section headers to read.
TimeDateStamp (offset 0x88, 4 bytes). 5F E1 15 6A reads as 0x6A15E15F = 1,779,818,847 seconds since the Unix epoch, which is Tuesday, May 26, 2026 at 18:07:27 UTC — the moment MinGW finished linking our binary. This is the linker's build timestamp; it can be useful for analysts trying to correlate binaries to a build environment, but it is also frequently spoofed or zeroed out by tools, so it isn't trustworthy on its own.
PointerToSymbolTable (offset 0x8C, 4 bytes) and NumberOfSymbols (offset 0x90, 4 bytes). Both are 00 00 00 00. These fields are leftovers from the COFF object-file world we discussed in Part 1 — they pointed at the symbol table that traveled with the object file. The linker stripped the symbol table when it built the executable (it had served its purpose during linking), so both fields are zero. They are almost always zero in modern PE files; symbol information for debugging lives elsewhere now.
SizeOfOptionalHeader (offset 0x94, 2 bytes). F0 00 reads as 0x00F0 = 240 bytes. This tells the loader how many bytes the Optional Header occupies — important because the Optional Header's actual size depends on whether it's the PE32 or PE32+ variant, and the loader needs the exact count to know where the section header table starts.
Characteristics (offset 0x96, 2 bytes). 2E 02 reads as 0x022E, which is a bitfield. The bits set in this value are IMAGE_FILE_EXECUTABLE_IMAGE (0x0002, "this file is valid for execution"), IMAGE_FILE_LINE_NUMS_STRIPPED (0x0004), IMAGE_FILE_LOCAL_SYMS_STRIPPED (0x0008), IMAGE_FILE_LARGE_ADDRESS_AWARE (0x0020, "this binary can handle addresses above 2 GB"), and IMAGE_FILE_DEBUG_STRIPPED (0x0200, "debug information has been removed from this image"). Together they add up to 0x022E. The most useful bit to recognize is IMAGE_FILE_DLL (0x2000) — when that's set, the binary is a DLL rather than an EXE.
That's the entire COFF File Header. Seven fields, 20 bytes, read by the loader before mapping. A few of them — Machine, NumberOfSections, SizeOfOptionalHeader, and parts of Characteristics — are directly loader-relevant. Others (TimeDateStamp, the two zeroed symbol-table fields, the stripping flags) are linker output or legacy debug metadata that the loader doesn't really care about. Now we step into the part of the format that's specific to executables.
hello.exe, every field annotated with the actual bytes from offset 0x84–0x97. PointerToSymbolTable and NumberOfSymbols are pre-zeroed legacy fields; the rest carry real loader-relevant data. Bytes shown in file-order (little-endian as stored on disk).Immediately after the 20-byte COFF File Header comes the structure called the Optional Header. Despite the name, this header is not optional for executables — it's required for every .exe and .dll Windows knows how to load. The "optional" part of the name is a holdover from the COFF specification, which defined this structure as optional for object files (the .obj files from Part 1). Object files don't need an Optional Header because they don't need to be loaded; they just need to be linked. Executables, by contrast, need every byte of it.
The Optional Header is where the loader learns almost everything it needs to construct the in-memory image. Where the COFF File Header says "this is a binary, here's the CPU, here are the section count and characteristics flags," the Optional Header says "here is where to load me, here is how to align my sections, here is where my code starts, and here are sixteen pointers to the data structures inside me that you'll need to set up the process." It is by far the most information-dense structure in a PE file.
The structure is 240 bytes for a 64-bit (PE32+) executable like ours and 224 bytes for a 32-bit (PE32) executable. We're going to walk through the most important fields field-by-field. There are about thirty in total; we'll look at ten in detail, then list the rest in a reference table at the end of the section.
The first field of the Optional Header is the byte that distinguishes 32-bit from 64-bit PEs.
Magic (offset 0x98 in the file, the very first field of the Optional Header). For our binary, the bytes are 0B 02, which reads as 0x020B — the magic number for PE32+ (64-bit). The other value you'll see is 0x010B, the PE32 (32-bit) magic. There's a third value, 0x0107, for ROM images that you'll basically never encounter outside firmware work. Tools sometimes refer to PE32+ as "PE64" — same thing.
The Optional Header's structure is slightly different between PE32 and PE32+. Specifically, a few address-type fields are 4 bytes in PE32 and 8 bytes in PE32+ — ImageBase, the four stack/heap reserve and commit values — and PE32 has one extra 4-byte field (BaseOfData) that PE32+ does without. That's why the overall size differs by 16 bytes. The Magic field tells the loader which variant to expect so it can parse the rest correctly.
AddressOfEntryPoint (offset 0x10 within the Optional Header, so file offset 0xA8, 4 bytes). Our bytes are 10 14 00 00 = 0x00001410. This is an RVA — the offset within the mapped image where execution begins. The first instruction the CPU will execute, once the image is fully loaded and ready to run, lives at ImageBase + 0x1410. The linker chose this RVA at build time by placing the entry function (typically the C runtime's mainCRTStartup, which calls main) at that offset within the .text section.
The entry point's value is one of the clearest examples of the read-early-use-late pattern we discussed. The loader reads this 4-byte RVA out of the Optional Header while it's still parsing on-disk headers. It stores the number. Only much later — after mapping every section, applying relocations, resolving every import — does the loader actually compute ImageBase + 0x1410 and jump there. The RVA in the Optional Header is a coordinate that won't be used for a while yet.
ImageBase (offset 0x18 within the Optional Header, file offset 0xB0, 8 bytes for PE32+). Our bytes are 00 00 00 40 01 00 00 00, which reads as 0x0000000140000000. This is the preferred virtual address where the linker would like the loader to map the image. For ordinary 64-bit Windows EXEs — including the toolchains you'll meet first — the conventional preferred base is 0x0000000140000000. DLLs typically use a different convention, commonly 0x0000000180000000 (though this is a linker default, not a format requirement). Both fit comfortably in the 48 bits that x86-64 currently uses for virtual addresses.
The preferred word is doing real work in that sentence. The loader is not required to honor it. On modern Windows, if the binary opts in to ASLR (via the IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE flag we'll see below) and ships with the relocation information needed to be moved safely, the loader may choose a randomized base address instead of the preferred ImageBase. When that happens, every absolute address inside the binary that depended on ImageBase being 0x140000000 is now wrong — and the .reloc section we saw in the file layout exists specifically to tell the loader what to patch. We'll dig into how that works in Part 4. For now, the field is best understood as a hint: "if you can load me here, do; if not, fix me up."
SectionAlignment (file offset 0xB8, 4 bytes) and FileAlignment (file offset 0xBC, 4 bytes). These are the two alignment values that govern how the binary is laid out — one for memory, one for disk. Our values are 00 10 00 00 = 0x1000 (4 KB) for the section alignment, and 00 02 00 00 = 0x200 (512 bytes) for the file alignment.
These two values control the relationship between the on-disk file and the in-memory image. Every section, once mapped into memory, starts at an address that's a multiple of SectionAlignment. Every section, on disk, starts at a file offset that's a multiple of FileAlignment. Section alignment is almost always one page (4 KB on x86-64) because the operating system enforces page-level permissions — you can't make half a page executable. File alignment is typically smaller (512 bytes is common, though tools can produce smaller values) because there's no equivalent requirement on disk: the file just needs to be a stream of bytes, and packing sections closer together saves disk space. The mismatch between these two alignments is what causes the file to "stretch" when loaded — sections that sit adjacently on disk get spread out in memory to hit page boundaries. That stretch is the entire subject of Part 3.
SizeOfImage (file offset 0xD0, 4 bytes). Our bytes are 00 10 01 00 = 0x00011000 = 69,632 bytes. This is the total size of the image once it has been mapped into memory, rounded up to SectionAlignment. The loader's first concrete act, after parsing the headers, is to ask the operating system for exactly this many bytes of contiguous virtual address space. Everything from RVA 0x0000 (the start of the headers) to RVA 0x00011000 (just past the end of the last section) lives within that allocation. The file on disk is 39 KB; the image in memory is 68 KB. The difference is the alignment stretch.
SizeOfHeaders (file offset 0xD4, 4 bytes). Our bytes are 00 04 00 00 = 0x400 = 1,024 bytes. This is the total size of everything from the DOS Header through the section header table, rounded up to FileAlignment. It defines where the section bodies start on disk: the very first section's PointerToRawData will be at offset 0x400, which is exactly what we saw in the file layout earlier.
Subsystem (file offset 0xDC, 2 bytes). Our bytes are 03 00 = 0x0003 = IMAGE_SUBSYSTEM_WINDOWS_CUI, "Windows character-mode (console) UI." When Windows launches our binary, it'll see this value and ensure the process has a console attached — if the program was double-clicked from Explorer rather than launched from a command prompt, Windows allocates a new console window for it. The two most common alternatives are IMAGE_SUBSYSTEM_WINDOWS_GUI (0x0002, for graphical applications — no console) and IMAGE_SUBSYSTEM_NATIVE (0x0001, for drivers and other kernel-mode-ish code that doesn't use the Win32 subsystem at all). This field is what determines whether running an .exe pops up a black console window or not.
DllCharacteristics (file offset 0xDE, 2 bytes). Our bytes are 60 01 = 0x0160. Like the COFF Characteristics field, this is a bitfield, but the bits here are the ones that matter for modern security. The flags set in our binary are:
IMAGE_DLLCHARACTERISTICS_HIGH_ENTROPY_VA (0x0020) — "this binary can handle high-entropy 64-bit ASLR." When set on a 64-bit binary, this tells the loader the image can tolerate a much wider randomized address range than older 64-bit images, allowing significantly more entropy in the chosen base address.IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE (0x0040) — "this binary opts in to ASLR." When this flag is set, the loader is permitted to relocate the image to a randomized base address; without it, ASLR is effectively disabled for this binary and it loads at the preferred ImageBase.IMAGE_DLLCHARACTERISTICS_NX_COMPAT (0x0100) — "this binary is compatible with data-execution prevention." DEP/NX makes data pages non-executable; this flag indicates the binary won't try anything stupid like executing code from the stack. Ordinary modern toolchains set this by default; old binaries, hand-crafted shellcode loaders, and some packed/obfuscated samples may not.Other bits you'll meet in real binaries: IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY (0x0080, requires a valid Authenticode signature to load), IMAGE_DLLCHARACTERISTICS_GUARD_CF (0x4000, the binary supports Control Flow Guard), and IMAGE_DLLCHARACTERISTICS_APPCONTAINER (0x1000, the binary requires the AppContainer sandbox). There's also a separate, newer "Extended DLL Characteristics" mechanism — added because the original 16-bit flag field ran out of room — that carries flags like CET shadow stack compatibility (IMAGE_DLLCHARACTERISTICS_EX_CET_COMPAT). These extended flags live in the debug directory rather than this header, so they don't appear in the DllCharacteristics field directly.
One field we'll meet again that's worth naming now: NumberOfRvaAndSizes (file offset 0x104, 4 bytes), with the value 10 00 00 00 = 0x00000010 = 16. This counts how many data directory entries follow. The PE spec defines 16 standard data-directory slots, and ordinary modern toolchains emit all 16. But the PE specification explicitly warns parsers to honor this field before probing any specific directory entry — unusual binaries, packers, and tiny-PE experiments sometimes set it to smaller values to shave bytes off the optional header. The data directories — the array of (RVA, size) pairs that comes next — are the bridge between the Optional Header and the actual section content, and they're the subject of the next section.
The remaining fields of the Optional Header are mostly bookkeeping and version metadata that the loader either uses for compatibility checks or ignores. For reference:
MajorLinkerVersion 02 MinGW ld version major
MinorLinkerVersion 29 (0x29 = 41) version minor
SizeOfCode 0x6C00 sum of all code section sizes
SizeOfInitializedData 0x9600 sum of all initialized-data sizes
SizeOfUninitializedData 0x0C00 sum of all BSS-style sizes
BaseOfCode 0x1000 RVA of the first code section
MajorOperatingSystemVersion 4 minimum Windows version (advisory)
MinorOperatingSystemVersion 0
MajorImageVersion 0 image-specific version (set by linker)
MinorImageVersion 0
MajorSubsystemVersion 5 minimum subsystem version
MinorSubsystemVersion 2
Win32VersionValue 0 reserved, must be zero
CheckSum 0x1493D PE-image checksum (MinGW writes a real one; many tools don't)
SizeOfStackReserve 0x200000 virtual memory reserved for primary thread stack
SizeOfStackCommit 0x1000 stack memory actually committed at start
SizeOfHeapReserve 0x100000 reserved for default process heap
SizeOfHeapCommit 0x1000 committed for default process heap
LoaderFlags 0 reserved, must be zero
NumberOfRvaAndSizes 16 number of data directory entries that follow
Of these, the four stack/heap reserve and commit values are the ones most likely to be of interest in real analysis — they affect process memory layout and occasionally show up as quirks in malware that wants unusually large or unusually small stacks. The version fields are mostly compatibility lies told by linkers and aren't enforced. CheckSum is computed over the entire image; it's required to be valid for kernel-mode drivers and a few other specific cases, but for ordinary user-mode EXEs the loader doesn't verify it, so different linkers handle it differently — MinGW writes a real checksum (ours is 0x1493D), while many other toolchains leave it at zero.
DllCharacteristics close-up shows the three security bits set in our binary — the typical modern-Windows-binary configuration.The final piece of the Optional Header is an array of data directories. There are exactly sixteen of them, occupying the last 128 bytes of the Optional Header. Each entry is a tiny structure called IMAGE_DATA_DIRECTORY, just two fields:
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress; // RVA where the structure lives
DWORD Size; // size of the structure in bytes
} IMAGE_DATA_DIRECTORY;
Eight bytes each, sixteen of them, total 128 bytes. Each entry is, in effect, a (where, how big) pointer to an important structure somewhere inside the image's sections. VirtualAddress here is an RVA — the in-memory offset coordinate we covered in "Three kinds of 'where'" — not a file offset. The data directories are how the loader finds things like the import table or the relocation table without having to scan every section looking for them. Instead, it knows exactly which RVA to follow and how many bytes to read.
The sixteen entries have fixed meanings, assigned by index. Most modern binaries don't fill all sixteen; entries are zeroed out when not used. Here are all sixteen for our hello.exe:
Index Name RVA Size Status
───── ───────────────────────── ─────────── ───────── ──────────────
0 Export Table 0x00000000 0 unused (we export nothing)
1 Import Table 0x0000D000 0x6D0 points into .idata
2 Resource Table 0x00000000 0 unused
3 Exception Table 0x0000A000 0x468 points into .pdata
4 Certificate Table 0x00000000 0 unused (unsigned binary)
5 Base Relocation Table 0x00010000 0x84 points into .reloc
6 Debug Directory 0x00000000 0 unused (stripped)
7 Architecture 0x00000000 0 reserved, always zero
8 Global Pointer 0x00000000 0 unused on x86-64
9 TLS Table 0x00009040 0x28 points into .rdata
10 Load Config Table 0x00000000 0 unused
11 Bound Import 0x00000000 0 unused (legacy mechanism)
12 IAT (Import Address Table) 0x0000D1C8 0x188 points into .idata
13 Delay Import Descriptor 0x00000000 0 unused
14 COM Descriptor (.NET) 0x00000000 0 unused (native binary)
15 Reserved 0x00000000 0 must be zero
Of the sixteen possible entries, our binary fills five: Import Table, Exception Table, Base Relocation Table, TLS Table, and IAT. The others are zero. This is typical for a simple native executable; large applications with resources, signatures, and CLR metadata fill more.
Notice that every non-zero entry's RVA falls within one of the sections we saw earlier. The Import Table at RVA 0xD000 lands inside .idata (which starts at RVA 0xD000). The Exception Table at RVA 0xA000 lands inside .pdata (which starts at RVA 0xA000). The Base Relocation Table at RVA 0x10000 lands inside .reloc. This is the key mental model: the data directories do not contain structures themselves; they're pointers into the section bodies, telling the loader where to find structures that the linker placed inside specific sections at build time.
The natural question is: where do those RVA values come from? Nothing in the source code mentions 0xD000 or 0x10000; we never asked for the import table to live at any particular address. The answer is that the linker chose those RVAs itself, during its final layout pass. After it decided which structures go in which sections (the import descriptors in .idata, the relocation blocks in .reloc, and so on) and what RVA each section would start at, it knew the RVA of every structure — and it wrote each one into the corresponding data directory entry as the last step before emitting the file. The data directories are how the linker tells the loader what it built. We'll see the precise algorithm the linker uses to assign section RVAs a few sections from now, in "How the linker assigns RVAs."
That arrangement explains the read-early-use-late pattern from earlier. The loader reads the data directories out of the Optional Header before mapping anything — they're just sixteen 8-byte records sitting in the header. The RVAs they contain don't mean anything yet, because the sections haven't been placed in memory. But once the loader has mapped the sections, those RVAs suddenly become valid pointers to live data. The loader then revisits each non-zero directory entry, computes ImageBase + RVA, and processes the structure there: parsing imports, applying relocations, registering TLS callbacks, and so on.
There's exactly one data directory that breaks the rule we established about RVAs vs file offsets. Entry 4, the Certificate Table (also called the Security Directory) — the digital signature on a code-signed binary. The "VirtualAddress" field in that one directory entry is actually a file offset, not an RVA. The reason: certificate data is not mapped into memory as part of the image; it's appended to the file after the last section, and verified by tools that read the file from disk (Authenticode signature verification happens at install time and at execution time, but always against the file). Putting an RVA there would be meaningless — the bytes aren't in the image. So Microsoft used a file offset, and documented the exception. Our binary is unsigned, so this entry is zero and the exception doesn't bite us; but if you ever inspect a signed binary and find a non-zero entry at index 4, remember that the number you're looking at is a file offset.
The data directories that matter most for the rest of this series are the ones we'll dig into in Part 4: the Import Table and IAT (entries 1 and 12), which together describe what DLLs the program needs and where the loader should fill in the imported function addresses, and the Base Relocation Table (entry 5), which tells the loader what to patch if the image is loaded somewhere other than its preferred ImageBase. We're not going to dissect those structures here — that's Part 4's territory — but you now know how the loader finds them.
Immediately after the Optional Header ends at file offset 0x188 comes the section header table. There's one entry per section — ten of them in our binary — and each entry is exactly 40 bytes. The structure is called IMAGE_SECTION_HEADER. These are the most consequential 400 bytes in the whole file, because they're what tells the loader how to actually build the in-memory image from the on-disk bytes.
Each section header contains eight fields. We'll walk through them using the real bytes for our binary's first section, .text, whose 40-byte header starts at file offset 0x188:
00000188: 2e74 6578 7400 0000 686b 0000 0010 0000
00000198: 006c 0000 0004 0000 0000 0000 0000 0000
000001a8: 0000 0000 6000 0060
Reading those bytes field by field:
Name (offset 0, 8 bytes). 2E 74 65 78 74 00 00 00 — the ASCII string ".text" followed by three zero bytes of padding. The field is a fixed-size 8-byte buffer; section names that don't fill it are zero-padded, and section names exactly 8 bytes long are stored without a null terminator. The loader doesn't use this name to make decisions — it's purely a human-readable label. Conventions like .text, .data, .rdata, .reloc are just that, conventions; the linker can name a section anything it wants, and you'll occasionally meet binaries with unusual section names chosen by obfuscators or specialized linkers.
VirtualSize (offset 8, 4 bytes). 68 6B 00 00 = 0x00006B68 = 27,496 bytes. This is the exact, unpadded size the section occupies in memory once mapped. It's the linker's honest count of how many bytes of real content the section contains — code, data, whatever.
VirtualAddress (offset 12, 4 bytes). 00 10 00 00 = 0x00001000. This is an RVA — where the section should be placed within the mapped image, measured from ImageBase. The loader will arrange to have the section's content visible at ImageBase + 0x1000 after mapping. For our 64-bit binary with ImageBase = 0x140000000, that's 0x140001000 — the address the CPU sees when executing code in .text.
SizeOfRawData (offset 16, 4 bytes). 00 6C 00 00 = 0x00006C00 = 27,648 bytes. This is the size of the section on disk, rounded up to FileAlignment (which is 0x200 in our binary). Note the difference from VirtualSize: 27,648 versus 27,496 — the on-disk size is 152 bytes larger, because file alignment requires rounding up to a 512-byte boundary, and the unpadded data fell short of that boundary.
PointerToRawData (offset 20, 4 bytes). 00 04 00 00 = 0x00000400. This is a file offset — where this section's bytes start in the file on disk. The loader will read SizeOfRawData bytes starting from this position in the file.
This is the moment to call out the central trick of this structure: this one 40-byte record contains both a file offset and an RVA, side by side. PointerToRawData tells the loader "read from this position in the file"; VirtualAddress tells the loader "write to this offset in the mapped image." The section header is precisely the structure that straddles the boundary between disk and memory. It is the entry in the format whose only purpose is to express the relationship between the two coordinate systems we discussed at the start of this post.
The mapping operation for each section reduces to a single recipe:
Read SizeOfRawData bytes from file position PointerToRawData
Write them to memory at ImageBase + VirtualAddress
If VirtualSize > SizeOfRawData, zero-fill the remainder
Set page permissions according to Characteristics
The "if VirtualSize > SizeOfRawData" case is rare for code and ordinary data sections (where they're typically equal except for alignment padding), but it's the rule for BSS-style data. The .bss section in our binary has VirtualSize = 0xB80 and SizeOfRawData = 0 — it occupies 2,944 bytes of memory, all zero-filled, and contributes zero bytes to the file. That's how uninitialized data is stored cheaply: the file says "I want this much space, here's no content," and the loader zeroes the memory at load time.
Three more fields complete the structure.
PointerToRelocations and PointerToLinenumbers (offsets 24 and 28, 4 bytes each). Both zero in our binary. These are leftovers from the COFF object-file format we discussed in Part 1, where each section carried its own relocations and per-line debug information. In an executable, that information is consolidated elsewhere (relocations go to the .reloc section and the Base Relocation Table; debug info, if present, goes to its own section and is pointed at by the Debug Directory). For executables, these fields are always zero.
NumberOfRelocations and NumberOfLinenumbers (offsets 32 and 34, 2 bytes each). Both zero. Same reason.
Characteristics (offset 36, 4 bytes). 60 00 00 60 = 0x60000060. This is the third bitfield we've seen so far, and it's the most important one for the loader's actual work: it determines what page permissions the section gets when it's mapped. The bits set in our .text Characteristics are:
IMAGE_SCN_CNT_CODE (0x00000020) — "this section contains executable code."IMAGE_SCN_CNT_INITIALIZED_DATA (0x00000040) — "this section contains initialized data." (MinGW marks .text with both flags because it also stores small read-only constants alongside the code; MSVC typically sets only CNT_CODE.)IMAGE_SCN_MEM_EXECUTE (0x20000000) — "these pages should be marked executable in memory."IMAGE_SCN_MEM_READ (0x40000000) — "these pages should be marked readable in memory."Four flags, OR'd together, equal 0x60000060. The combination says "code with mixed data, executable, readable" — exactly what you want for a .text section. Notice what's not set: IMAGE_SCN_MEM_WRITE (0x80000000). The page on which our entry-point code lives will be readable and executable, but not writable. That's a deliberate security choice — preventing the program from accidentally or maliciously modifying its own code at runtime.
For comparison, .data in our binary has Characteristics 0xC0000040 = INITIALIZED_DATA | MEM_READ | MEM_WRITE — readable, writable, not executable. .rdata has 0x40000040 = INITIALIZED_DATA | MEM_READ — readable only. And .reloc has 0x42000040 = INITIALIZED_DATA | MEM_DISCARDABLE | MEM_READ — readable, and the unusual MEM_DISCARDABLE flag, which tells the loader that this section can be thrown away after it's been processed, because the relocation entries inside it are only useful during loading.
The Characteristics field is, in effect, a compact description of what permissions the operating system should grant the memory pages that hold this section's content. The loader translates these flags directly into page-protection settings when it maps the section. That's the topic of Part 3 — the protections, the alignment stretch, the layout transformation. For now, just know that every section in every PE carries, encoded in those four bytes, the answer to "is this readable, writable, executable, or something else?"
.text section header is the entry whose only purpose is to relate two coordinate systems. Disk-side fields (PointerToRawData, SizeOfRawData) tell the loader where to read; memory-side fields (VirtualAddress, VirtualSize) tell it where to write. The slight size difference between disk and memory comes from file-alignment padding on disk.The VirtualAddress values in the section headers — 0x1000 for .text, 0x8000 for .data, 0x9000 for .rdata, and so on — are not magic. They were chosen by the linker at build time using a simple sequential algorithm, the one we sketched in Part 1's discussion of the linker's Phase 2. Now that we have all the surrounding machinery in view, we can describe the algorithm precisely.
The headers occupy RVA 0x0000 through some value just past the end of the section header table. In our binary, the headers and section table together take 792 bytes — DOS Header (64) + DOS Stub (64) + PE signature (4) + COFF File Header (20) + Optional Header (240) + ten section headers at 40 bytes each (400). That ends at file offset 0x318, which becomes RVA 0x318 once the headers are mapped. But the first section can't start at 0x318 in memory — it has to align to a multiple of SectionAlignment (0x1000). So the linker rounds up: the first section gets VirtualAddress = 0x1000.
For each subsequent section, the linker applies the same rule:
next_VA = align_up(current_VA + current_VirtualSize, SectionAlignment)
Let's trace it through our binary. .text has VirtualAddress = 0x1000 and VirtualSize = 0x6B68. Adding those gives 0x7B68, the first byte after .text ends. Rounding up to the next 0x1000 boundary gives 0x8000 — and that's exactly the VirtualAddress of .data. .data has VirtualSize = 0xC0, so it ends at 0x80C0, which rounds up to 0x9000 — the VirtualAddress of .rdata. .rdata's size is 0xDA0, ending at 0x9DA0, rounding up to 0xA000 — the VirtualAddress of .pdata. The pattern continues through all ten sections.
The math on the disk side is the same shape, with FileAlignment (0x200) substituted in. Our .text starts at PointerToRawData = 0x400 and has SizeOfRawData = 0x6C00, ending at file offset 0x7000, where .data begins — and 0x7000 is already a multiple of 0x200, no rounding needed. .data with SizeOfRawData = 0x200 ends at 0x7200, where .rdata begins. And so on, sequentially through the file.
Two things follow from this. First, the section table is entirely deterministic — given a list of sections and the two alignment values, you can compute every RVA and every file offset by walking down the list once. The linker does this exactly once, at build time, and writes the results into the section headers. The loader doesn't recompute them; it just reads them and obeys.
Second, the difference between the two alignments is what makes the file smaller than the image. FileAlignment = 0x200 packs sections close together on disk; SectionAlignment = 0x1000 spreads them out in memory. Our file is 39,424 bytes (rounded to file-alignment); our image is 69,632 bytes (rounded to section-alignment). The image is 77% larger than the file, even though most of the content — every byte of code, every byte of initialized data — is the same. The difference is mostly alignment gaps between sections, plus memory-only zero-filled regions like .bss (which has 2,944 bytes of VirtualSize and zero on-disk presence).
That stretch is the central topic of Part 3. The PE format is, in this sense, a compact encoding of an image that intentionally has gaps in it once it's expanded.
Now we can finally answer the question this part has been building toward: given an RVA — say, the entry point's 0x1410 — how do you find the byte in the file? And how do you go the other direction, from a file offset to an RVA? This conversion is the single skill PE analysts perform most often, because disassemblers display RVAs and hex editors display file offsets, and you'll constantly find yourself with one when you need the other.
There is no single formula. The file and the image have different alignments, and sections sit at different relative positions in each. The section table is the bridge — every conversion goes through it.
Here's the algorithm for RVA → file offset:
VirtualAddress ≤ RVA < VirtualAddress + VirtualSize.section_offset = RVA - section.VirtualAddress.file_offset = section.PointerToRawData + section_offset.Let's walk through it for our entry point, RVA 0x1410.
Step 1. Which section contains 0x1410? Checking each section header: .text has VirtualAddress = 0x1000 and VirtualSize = 0x6B68, so it covers RVAs 0x1000 through 0x7B68. Our target 0x1410 is comfortably inside that range. The entry point lives in .text.
Step 2. The offset within .text: 0x1410 - 0x1000 = 0x410. The entry point is 0x410 bytes from the start of the .text section.
Step 3. The file offset: .text starts in the file at PointerToRawData = 0x400, so the entry point lives at 0x400 + 0x410 = 0x810 in the file.
Let's verify by looking at the actual bytes. If we open hello.exe in a hex editor and jump to file offset 0x810:
00000810: 5548 89e5 4883 ec20 488b 0561 8300 00c7
00000820: 0000 0000 00e8 66fd ffff 9090 4883 c420
00000830: 5dc3
The first eight bytes are 55 48 89 E5 48 83 EC 20. Decoded as x86-64 instructions, that's:
55 push rbp
48 89 E5 mov rbp, rsp
48 83 EC 20 sub rsp, 0x20
That's the standard function prologue we discussed in Part 1's assembly snippet — push the old base pointer, set up a new frame, reserve stack space. The sub rsp, 0x20 reserves exactly 32 bytes, which is the Windows x64 shadow space we walked through in Part 1's calling-convention discussion. The entry point of our program is, byte for byte, the prologue we predicted it would be. The RVA-to-file-offset conversion landed on the right bytes.
The inverse direction — file offset → RVA — works the same way in reverse:
PointerToRawData ≤ offset < PointerToRawData + SizeOfRawData.section_offset = file_offset - section.PointerToRawData.RVA = section.VirtualAddress + section_offset.A subtlety: bytes in the headers and in any file-alignment padding don't belong to any section, and don't have RVAs in the usual sense. (The headers do get mapped into memory at RVA 0, so they have RVAs by extension; padding bytes just don't exist in memory.) For locations inside section bodies the conversion is always well-defined; for locations elsewhere in the file, the question may not have a meaningful answer.
A second subtlety is for anyone writing tooling. The clean algorithm above works for ordinary well-formed binaries where each RVA falls neatly into exactly one section's VirtualAddress..VirtualAddress + VirtualSize range. Real PE parsers have to be more defensive: SizeOfRawData can be larger than VirtualSize (alignment padding on disk), or smaller (some on-disk bytes are tail-zeroed in memory), or zero (BSS-style sections with no on-disk content). Packers and malware deliberately exploit those edge cases — overlapping section ranges, zero-sized sections, sections with VirtualSize that crosses image boundaries — to break naive parsers. If you're building tools, mirror what hardened parsers like pefile do; if you're just reading binaries, the simple algorithm covers the common case.
Every PE-inspection tool implements this algorithm internally. It is, in the end, the entire reason the section header table exists — to let anyone with the headers and a position in either coordinate system compute the matching position in the other.
In practice, nobody decodes a PE file by hand for very long. Once you understand the structure, you switch to tools that parse it for you. The ones an analyst is most likely to use in 2026, roughly in order of how often they come up:
dumpbin ships with Visual Studio and is the canonical Microsoft tool. dumpbin /headers foo.exe dumps every header structure; /imports shows the import table; /exports shows what the binary exports; /all dumps everything. It only runs from the Visual Studio developer command prompt.
objdump (the GNU version) and llvm-objdump / llvm-readobj are the cross-platform equivalents. They work on Linux, macOS, and Windows, and they handle PE files alongside ELF and Mach-O. objdump -p foo.exe dumps PE-specific headers; llvm-readobj --coff-load-config -r foo.exe gives the most modern dump including extended characteristics.
PE-bear (by hasherezade) is a free graphical tool that handles malformed PE files — important when analyzing malware, which often deliberately stretches the format to confuse parsers. It's particularly good at side-by-side hex / structure views.
CFF Explorer (by Erik Pistelli) is a free PE editor that supports both PE32/PE32+ and .NET binaries. It can read, edit, and rebuild structures — useful for both analysis and patching.
PEStudio performs static malware-triage analysis: it parses the structures and flags suspicious indicators (uncommon imports, packed sections, suspicious entropy, etc.). Free for non-commercial use.
pefile is a Python library by Ero Carrera, widely used for scripting analysis at scale. If you ever need to process a thousand PE files programmatically, this is the tool.
For ad-hoc work on small files, plain xxd or any hex editor (HxD, 010 Editor, ImHex) is enough — we've been doing exactly that throughout this post.
Every number quoted in this post comes from a real binary you can build and inspect in two minutes. On a Linux machine or in WSL, with the MinGW cross-compiler installed (apt install gcc-mingw-w64-x86-64 on Debian/Ubuntu):
# Write the source file
cat > hello.c << 'EOF'
#include <stdio.h>
int main(void) { printf("hello\n"); return 0; }
EOF
# Compile to Windows PE, optimized and stripped
x86_64-w64-mingw32-gcc -O2 -s -o hello.exe hello.c
# Inspect the headers
x86_64-w64-mingw32-objdump -p hello.exe
x86_64-w64-mingw32-objdump -h hello.exe
# Read bytes by file offset
xxd -s 0x00 -l 256 hello.exe # DOS header + stub
xxd -s 0x80 -l 24 hello.exe # PE signature + COFF File Header
xxd -s 0x98 -l 240 hello.exe # Optional Header
xxd -s 0x188 -l 400 hello.exe # Section header table
xxd -s 0x810 -l 16 hello.exe # Entry point bytes
The exact byte values may differ slightly from the ones in this post — TimeDateStamp will reflect your build time, and tiny binary differences are expected across MinGW versions — but the structure and the offsets will match. The values are stable enough that the worked example in the previous section (entry point RVA 0x1410 → file offset 0x810, prologue bytes 55 48 89 E5 48 83 EC 20) reproduces reliably.
On Windows itself, the equivalent inspection commands are dumpbin /headers hello.exe from a Visual Studio developer command prompt, or any of the GUI tools listed above. The output formatting differs; the bytes don't.
A few realities you'll meet that we haven't covered in detail: many production binaries are signed (the Certificate Table at data directory index 4 is populated; verification happens against the on-disk file), and managed .NET binaries are common enough that you'll encounter them quickly in PE analysis (the COM Descriptor at index 14 is populated; the actual code is in CIL bytecode within a section, with a tiny native stub to bootstrap the CLR). Both cases preserve all the structures we've covered; they just have additional structures sitting alongside. The DOS header, PE signature, COFF File Header, Optional Header, data directories, and section headers are present in every PE file Windows can load, signed or not, managed or not.
You can now read a PE file. Open any Windows binary in a hex editor, and you can walk down its bytes naming what each one is: MZ at byte zero, the DOS stub, the PE signature, the COFF File Header with its machine type and section count, the Optional Header with its image base and alignments and data directories, the section header table with one entry per section, padding to file alignment, then the section bodies in linker-chosen order. You can convert between the three coordinate systems — file offset, RVA, virtual address — when you need to find a specific byte. You know which fields the loader consumes before mapping and which it consumes after.
What you don't yet know is what actually happens when the file becomes a process. The structures we've described are static. The loader does something specific with them — a series of steps that turns a 39 KB on-disk file into a 68 KB region of mapped memory with the right page permissions, then patches and connects and finally hands control to the entry point. The transformation is more interesting than it sounds, because the difference between the file and the running image is not just size: it's alignment, it's permissions, it's a relationship between bytes that don't move and pointers that have to be fixed up.
That's Part 3: the file-to-memory stretch.