Windows Internals · Part 2 of 4

Anatomy of a PE File

Headers, sections, data directories, and the three different kinds of "offset" the format insists on using interchangeably. We open a real binary in a hex editor and read it byte by byte.

In Part 1 we traced a program from source code to bytes — through the preprocessor, the compiler, the assembler, and finally the linker, which produced an executable file. That file contained sections, a symbol table that was thrown away, relocations that had been patched, imports recorded as future dependencies, and an image-base value the loader could ignore at runtime. Everything we built up to was a sketch of what comes out the other end of the toolchain.

This part fills in the sketch. We're going to open an actual .exe in a hex editor and walk through it byte by byte, structure by structure, until you can point at any region of a real PE file and say what it is. The Portable Executable format is not complicated — it has about a dozen relevant structures and a hundred or so fields total — but it does have an unusual feature that trips up almost everyone: it uses three different coordinate systems to describe where things are, and it switches between them mid-structure. Getting comfortable with those three coordinate systems is the only conceptual leap in this part. The rest is just reading.

The binary we're going to read is the same hello.c program from Part 1, compiled with MinGW's x86_64-w64-mingw32-gcc and stripped of debug information. That gives us a tidy 39 KB Windows executable with ten sections — small enough to fit on screen, large enough to contain everything we want to see. Every hex value and field shown in this post comes from that real binary; you can reproduce it yourself with the commands in the callout at the end.

Three kinds of "where"

Before we look at any structure in the file, we have to nail down the source of confusion that ambushes almost everyone learning PE internals. There are three different ways to express the location of something in a PE — three coordinate systems — and the format uses all three, sometimes within a single structure. They aren't interchangeable; converting between them requires information that isn't always at hand. If you don't keep them straight, you will read a hex dump and end up at the wrong byte.

The three coordinates are file offset, RVA, and virtual address. Let's define them concretely.

A file offset is a byte position from the start of the PE file on disk. "Go to byte 0x3C" means open the file in a hex editor, scroll to position 0x3C, and start reading there. File offsets are absolute within the file: byte 0 is the first byte, byte 0x80 is the 128th byte, and so on. File offsets work whether the program is loaded into memory or not, because they're just positions within an on-disk file. They are the coordinate system of disk tools — hex editors, file readers, the linker writing the output.

An RVA — Relative Virtual Address — is an offset from the start of the PE image once it has been loaded into memory. "RVA 0x1410" means "the byte that ends up 0x1410 bytes from wherever Windows decided to place the image when it loaded it." RVAs are how the PE format expresses locations inside the running image without committing to where the image will actually live in memory. The same RVA is valid every time the program runs, regardless of where the loader places it that day.

A virtual address (VA) is the actual memory address inside the running process — a real number the CPU can dereference. You compute it from an RVA the moment you know where the image was loaded: VA = ImageBase + RVA. If the image happened to load at 0x140000000 (the standard preferred base for 64-bit executables) and the RVA is 0x1410, the VA is 0x140001410. The CPU sees and uses VAs; the file format mostly hides them, because the file is written long before anyone knows what they'll be.

Why does the format need all three? Because of timing. The loader's life is divided into three phases — before it has mapped the file, during the mapping, and after — and each phase has access to different information.

Before mapping, the loader is just reading bytes off disk. It hasn't allocated any memory for the image yet, doesn't know where the image will land, hasn't even decided how big the allocation needs to be. The only coordinate system it can use is file offsets. RVAs would be meaningless: there is no image-in-memory for them to be relative to.

During mapping, the loader is reading sections from the file and writing them into newly allocated memory. This is the moment where both coordinate systems matter at once — it needs to know which bytes to read from the file (file offset) and where to put them in the new image (RVA). The section headers, which the loader reads at this phase, contain exactly these two fields side by side. We'll see them shortly.

After mapping, the image exists in memory. Now everything is described in RVAs, because the entire layout has been built and locations relative to ImageBase are well-defined. The entry point, the import table, the export table, the relocation table — all of these are expressed as RVAs in the file's headers, because they describe locations within the mapped image. The loader reads these RVAs out of the file early (before mapping), but it doesn't use them — doesn't dereference them, doesn't follow them — until after the image is in memory.

Some PE structures look like they break this rule. The entry point's RVA, for instance, is stored in the Optional Header, which the loader reads before mapping. How can a pre-mapping structure contain a post-mapping coordinate? The trick is that the loader doesn't use the RVA at the moment it reads it. It stores the value as a number, finishes mapping, and only then computes ImageBase + RVA to find the actual entry point. The same trick applies to every data directory: read early, dereferenced late.

The rule is simple: any pointer the loader must follow before mapping has to be a file offset; any pointer that describes a location in the mapped image is an RVA. The format follows this rule consistently, with exactly one exception we'll meet when we get to data directories (the Security Directory, which points at digital signature data that doesn't get mapped into memory at all). Other than that, the rule is reliable.

Three coordinate systems pointing at the same byte A single byte — the first byte of the entry point function, 0x55, the push-rbp instruction — shown in three contexts. On the left, the same byte in a hex dump of the file on disk, labeled with its file offset 0x810. In the middle, the same byte inside the .text section once mapped into memory, labeled with its RVA 0x1410. On the right, the same byte at its actual virtual address 0x140001410 in the process's address space. Arrows show the conversion: file offset to RVA happens through the section table; RVA to virtual address happens by adding the image base. THREE COORDINATE SYSTEMS One byte, three names for where it lives ON DISK file offset 55 byte 0x810 absolute, on disk IN THE IMAGE RVA — relative to image base 55 RVA 0x1410 relative, in-image IN THE PROCESS virtual address — what runs 55 0x140001410 absolute, in memory via the section table add the image base Same byte value, different coordinate system — which name to use depends on which stage of the loader's life you're in.
Each panel describes the same physical byte — the first byte of the entry function, 0x55 (push rbp). The on-disk panel uses dashed borders to mark "file" coordinates; the in-memory panels use solid borders. The conversions between the systems require external information: the section table for file offset ↔ RVA, and the image base for RVA ↔ virtual address.

A real PE, top to bottom

Before we read the bytes, here's the lay of the land. A PE file on disk is laid out in a fixed sequence: a short MS-DOS-era header at the very start, then a small DOS program (the "DOS stub"), then a four-byte signature, then a COFF File Header, then an "Optional" Header, then a table of section headers, then the sections themselves — in the order the linker arranged them. The boundaries between these regions are not negotiable. Every field that follows tells the loader, in effect, "the next thing is exactly this many bytes ahead."

The binary we'll be reading throughout this post is the stripped hello.exe we built with MinGW. It's 39,424 bytes total and contains ten sections: .text, .data, .rdata, .pdata, .xdata, .bss, .idata, .CRT, .tls, and .reloc. You met four of these in Part 1 — .text (code), .rdata (read-only data), .data (writable initialized data), and .reloc (relocation fixups) — and the other six are runtime-support sections we'll touch on as they become relevant. The point of looking at a real binary, rather than an idealized two-section diagram, is that it shows you the actual texture of a Windows executable: most PEs have eight to fifteen sections, not three.

Here's the structural map. Each region is a contiguous run of bytes; the file is read sequentially from top to bottom.

The complete layout of hello.exe on disk A vertical map of the 39,424-byte PE file, top to bottom. The headers occupy the first 0x400 bytes — DOS header, DOS stub, PE signature, COFF File Header, Optional Header, and a table of ten section headers, followed by padding to the file-alignment boundary. Then come the ten sections in order: .text (code), .data (writable data), .rdata (read-only data), .pdata and .xdata (exception-handling tables), .idata (imports), .CRT (initializers), .tls (thread-local storage), and .reloc (base relocations). The .bss section has no on-disk content. File offsets are shown on the left. A REAL PE FILE, TOP TO BOTTOM hello.exe — 39,424 bytes, ten sections FILE OFFSET REGION WHAT IT IS 0x000 DOS Header 64 bytes · only e_magic and e_lfanew matter 0x040 DOS Stub x86-16 code: "This program cannot be run…" 0x080 PE Signature ("PE\0\0") 4 bytes · where the modern format begins 0x084 COFF File Header 20 bytes · machine type, section count 0x098 Optional Header (PE32+) 240 bytes · image base, alignments, entry RVA, 16 data directories 0x188 Section Header Table (10 × 40 bytes) layout instructions for each section 0x318 padding to 0x400 (file alignment) end of headers 0x400 .text machine code · 27,648 bytes 0x7000 .data initialized writable globals · 512 bytes 0x7200 .rdata read-only constants & strings · 3,584 bytes 0x8000 .pdata exception-handling tables · 1,536 bytes 0x8600 .xdata unwind metadata for .pdata · 1,536 bytes 0x8C00 .idata imports: KERNEL32.dll, msvcrt.dll · 2,048 bytes 0x9400 .CRT C runtime initializer table · 512 bytes 0x9600 .tls thread-local storage template · 512 bytes 0x9800 .reloc base relocations if image moves · 512 bytes 0x9A00 end of file .bss exists only in memory — no on-disk presence, allocated zero-filled at load COLOR KEY structural metadata code data imports fixups & tables
The complete on-disk layout of our 39 KB hello.exe. The first 0x400 bytes are all headers and section-header table; everything from 0x400 onwards is the section bodies in linker-chosen order. The .bss section contains uninitialized zero-filled data and has no on-disk content — the loader allocates space for it at runtime. File offsets shown on the left are real values from the actual compiled binary.

The DOS header and stub: vestigial, but mandatory

Open the file at byte zero. The first 64 bytes are the DOS Header, an MS-DOS-era structure that has been preserved at the front of every Windows executable for over thirty years for one reason: backward compatibility with a 1981 operating system that almost nobody actually runs anymore.

Here are the first 64 bytes of our hello.exe, exactly as xxd displays them:

00000000: 4d5a 9000 0300 0000 0400 0000 ffff 0000  MZ..............
00000010: b800 0000 0000 0000 4000 0000 0000 0000  ........@.......
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 8000 0000  ............[..]

The DOS Header is defined as a structure called IMAGE_DOS_HEADER in Windows headers, with thirty-some fields that record things like the size of the original MS-DOS program in 512-byte pages, the initial values for the segment registers, the relocation table offset, and so on. Almost all of them are zero in any modern PE, and the loader ignores them. Only two fields still matter in 2026, and they're the two we'll focus on.

The first is e_magic, the very first two bytes of the file. Its value is fixed at 0x5A4D, which is "MZ" in ASCII — the initials of Mark Zbikowski, an MS-DOS developer who designed the original executable format. You can see it plainly at the top-left of the hex dump: 4D 5A followed by the rest of the structure. The Windows loader looks at these two bytes and refuses to load anything that doesn't start with them. Every .exe, every .dll, every .sys driver on Windows starts with MZ.

The second field that matters is e_lfanew, a 4-byte field at offset 0x3C. You can read it directly from the dump: at offset 0x3C the bytes are 80 00 00 00, which as a little-endian 32-bit integer is 0x00000080. This is a file offset — the byte position where the modern PE structure begins. The loader's logic for finding the PE header is, almost literally, "read the 4 bytes at offset 0x3C, jump there, and start reading the PE signature." That tiny pointer is the bridge from the DOS-era format to the modern Windows format.

Why is e_lfanew at exactly 0x3C? Because back when this format was designed, that location was reserved space in the original MS-DOS executable header — a place where four bytes could be added without breaking compatibility with existing DOS tools. The PE format hijacked that slot to store a forwarding pointer.

Why is it a file offset rather than an RVA? Because of the timing rule from the previous section. The loader reads e_lfanew at the very start of the loading process, when it has just opened the file and hasn't mapped anything into memory yet. There is no image-in-memory for an RVA to be relative to. Worse: the loader doesn't yet know the image base (which is stored inside the Optional Header, which is what e_lfanew is helping us find). The logic would be circular — to follow an RVA, the loader would need information it can only obtain by following e_lfanew. A file offset breaks the circularity.

Between the DOS Header and the PE structure that e_lfanew points at, there's a small region — typically 64 bytes in our binary — called the DOS Stub. It's not a "header" in any sense. It's an actual MS-DOS program. Here are its bytes, starting at offset 0x40:

00000040: 0e1f ba0e 00b4 09cd 21b8 014c cd21 5468  ........!..L.!Th
00000050: 6973 2070 726f 6772 616d 2063 616e 6e6f  is program canno
00000060: 7420 6265 2072 756e 2069 6e20 444f 5320  t be run in DOS
00000070: 6d6f 6465 2e0d 0d0a 2400 0000 0000 0000  mode....$.......

The first fourteen bytes are real x86-16 machine code. Disassembled, they read: push the code segment, pop it into the data segment, load the address of an offset-14 string into DX, call MS-DOS print-string service (interrupt 21h, function 9), then call the exit service (interrupt 21h, function 4Ch). The remaining bytes are the ASCII string the program prints, terminated with $ — MS-DOS string convention. If you took just this part of the file and ran it on real DOS, it would print "This program cannot be run in DOS mode." and exit cleanly. That's the whole point of the stub: a courtesy message to anyone who tries to run a Windows executable on MS-DOS.

The Windows loader does not read or execute the DOS Stub. It jumps over it entirely using e_lfanew. The stub exists purely as a vestigial limb — useful in 1993, harmless today.

Modern toolchains sometimes hide useful information in the DOS Stub region, in the area between the end of the stub code and the start of the PE structure. The Microsoft linker writes a "Rich header" there — a small undocumented blob containing version IDs of the Microsoft toolchain components used to build the binary (cl.exe, link.exe, masm.exe, etc.). The Rich header isn't part of the official PE specification, and binaries from non-Microsoft toolchains like MinGW (including the one we're looking at) don't have one. But for MSVC-built binaries — which is most native Windows software you'll encounter — malware analysts read it routinely because it can fingerprint the exact build environment. We won't go further into it here, but it's worth knowing that the DOS-stub region isn't quite as empty as the official spec implies.

The PE signature and the COFF File Header

Following e_lfanew takes us to file offset 0x80. Here begins the modern part of the format. The first thing we encounter is a four-byte signature, and the structure of what comes next will be familiar from Part 1.

Recall from Part 1 that the linker's output is a COFF-style file. PE is, in Microsoft's own framing, "COFF plus extra headers bolted onto the front so the operating system can load it." We've just walked past those extra headers — the DOS bits and the four-byte signature — and we're about to land on the COFF File Header itself. Once we get into the section table, the structures are the same ones we discussed at the byte level for object files in Part 1.

Here are the next 24 bytes, starting from 0x80:

00000080: 5045 0000 6486 0a00 5fe1 156a 0000 0000  PE..d..._..j....
00000090: 0000 0000 f000 2e02                      ........

The first four bytes — 50 45 00 00 — spell "PE\0\0" in ASCII. This is the PE signature: the moment in the file where the loader has officially crossed the boundary from "this might just be an MS-DOS executable" to "this is a Portable Executable." If these four bytes aren't here exactly as expected, the loader rejects the file. That's the entire purpose of the signature: a sanity check at a known location.

The remaining 20 bytes are the COFF File Header, a structure called IMAGE_FILE_HEADER. It has exactly seven fields, all of them small, all of them read by the loader before mapping (though as we'll see, not all of them are still loader-relevant in 2026). Here's what each of those bytes encodes for our binary.

Machine (offset 0x84, 2 bytes). The bytes 64 86 read as the little-endian value 0x8664, which is IMAGE_FILE_MACHINE_AMD64 — x86-64. The loader uses this to refuse executables compiled for the wrong CPU; an ARM64 Windows machine running our x86-64 binary would either reject it or hand it to a binary translator. Other common values are 0x014C for 32-bit x86 and 0xAA64 for native ARM64.

NumberOfSections (offset 0x86, 2 bytes). 0A 00 reads as 0x000A = 10. There are ten section header entries following the Optional Header. The loader needs this count to know how many 40-byte section headers to read.

TimeDateStamp (offset 0x88, 4 bytes). 5F E1 15 6A reads as 0x6A15E15F = 1,779,818,847 seconds since the Unix epoch, which is Tuesday, May 26, 2026 at 18:07:27 UTC — the moment MinGW finished linking our binary. This is the linker's build timestamp; it can be useful for analysts trying to correlate binaries to a build environment, but it is also frequently spoofed or zeroed out by tools, so it isn't trustworthy on its own.

PointerToSymbolTable (offset 0x8C, 4 bytes) and NumberOfSymbols (offset 0x90, 4 bytes). Both are 00 00 00 00. These fields are leftovers from the COFF object-file world we discussed in Part 1 — they pointed at the symbol table that traveled with the object file. The linker stripped the symbol table when it built the executable (it had served its purpose during linking), so both fields are zero. They are almost always zero in modern PE files; symbol information for debugging lives elsewhere now.

SizeOfOptionalHeader (offset 0x94, 2 bytes). F0 00 reads as 0x00F0 = 240 bytes. This tells the loader how many bytes the Optional Header occupies — important because the Optional Header's actual size depends on whether it's the PE32 or PE32+ variant, and the loader needs the exact count to know where the section header table starts.

Characteristics (offset 0x96, 2 bytes). 2E 02 reads as 0x022E, which is a bitfield. The bits set in this value are IMAGE_FILE_EXECUTABLE_IMAGE (0x0002, "this file is valid for execution"), IMAGE_FILE_LINE_NUMS_STRIPPED (0x0004), IMAGE_FILE_LOCAL_SYMS_STRIPPED (0x0008), IMAGE_FILE_LARGE_ADDRESS_AWARE (0x0020, "this binary can handle addresses above 2 GB"), and IMAGE_FILE_DEBUG_STRIPPED (0x0200, "debug information has been removed from this image"). Together they add up to 0x022E. The most useful bit to recognize is IMAGE_FILE_DLL (0x2000) — when that's set, the binary is a DLL rather than an EXE.

That's the entire COFF File Header. Seven fields, 20 bytes, read by the loader before mapping. A few of them — Machine, NumberOfSections, SizeOfOptionalHeader, and parts of Characteristics — are directly loader-relevant. Others (TimeDateStamp, the two zeroed symbol-table fields, the stripping flags) are linker output or legacy debug metadata that the loader doesn't really care about. Now we step into the part of the format that's specific to executables.

The COFF File Header, field by field The 20-byte COFF File Header structure shown as a stack of fields. Each field shows its offset within the file, its width in bytes, its name, the raw little-endian bytes from our hello.exe, and the interpreted value with a brief description. Fields are: Machine (2 bytes, 0x8664 AMD64), NumberOfSections (2 bytes, 10), TimeDateStamp (4 bytes, May 26 2026), PointerToSymbolTable (4 bytes, 0 stripped), NumberOfSymbols (4 bytes, 0 stripped), SizeOfOptionalHeader (2 bytes, 240), Characteristics (2 bytes, executable image with large-address-aware). COFF FILE HEADER Seven fields, 20 bytes, read before mapping OFFSET SIZE FIELD BYTES INTERPRETED 0x84 2 Machine 64 86 0x8664 x86-64 (AMD64) 0x86 2 NumberOfSections 0A 00 10 ten sections follow the headers 0x88 4 TimeDateStamp 5F E1 15 6A 0x6A15E15F May 26 2026 18:07 UTC (link time) 0x8C 4 PointerToSymbolTable 00 00 00 00 0 stripped — legacy COFF field 0x90 4 NumberOfSymbols 00 00 00 00 0 stripped — legacy COFF field 0x94 2 SizeOfOptionalHeader F0 00 0xF0 = 240 size of the Optional Header in bytes 0x96 2 Characteristics 2E 02 0x022E EXECUTABLE | LARGE_ADDRESS_AWARE | … Pale rows are legacy fields that survive only for backward compatibility; modern linkers leave them at zero.
The COFF File Header for our hello.exe, every field annotated with the actual bytes from offset 0x84–0x97. PointerToSymbolTable and NumberOfSymbols are pre-zeroed legacy fields; the rest carry real loader-relevant data. Bytes shown in file-order (little-endian as stored on disk).

The Optional Header — not optional at all

Immediately after the 20-byte COFF File Header comes the structure called the Optional Header. Despite the name, this header is not optional for executables — it's required for every .exe and .dll Windows knows how to load. The "optional" part of the name is a holdover from the COFF specification, which defined this structure as optional for object files (the .obj files from Part 1). Object files don't need an Optional Header because they don't need to be loaded; they just need to be linked. Executables, by contrast, need every byte of it.

The Optional Header is where the loader learns almost everything it needs to construct the in-memory image. Where the COFF File Header says "this is a binary, here's the CPU, here are the section count and characteristics flags," the Optional Header says "here is where to load me, here is how to align my sections, here is where my code starts, and here are sixteen pointers to the data structures inside me that you'll need to set up the process." It is by far the most information-dense structure in a PE file.

The structure is 240 bytes for a 64-bit (PE32+) executable like ours and 224 bytes for a 32-bit (PE32) executable. We're going to walk through the most important fields field-by-field. There are about thirty in total; we'll look at ten in detail, then list the rest in a reference table at the end of the section.

The first field of the Optional Header is the byte that distinguishes 32-bit from 64-bit PEs.

Magic (offset 0x98 in the file, the very first field of the Optional Header). For our binary, the bytes are 0B 02, which reads as 0x020B — the magic number for PE32+ (64-bit). The other value you'll see is 0x010B, the PE32 (32-bit) magic. There's a third value, 0x0107, for ROM images that you'll basically never encounter outside firmware work. Tools sometimes refer to PE32+ as "PE64" — same thing.

The Optional Header's structure is slightly different between PE32 and PE32+. Specifically, a few address-type fields are 4 bytes in PE32 and 8 bytes in PE32+ — ImageBase, the four stack/heap reserve and commit values — and PE32 has one extra 4-byte field (BaseOfData) that PE32+ does without. That's why the overall size differs by 16 bytes. The Magic field tells the loader which variant to expect so it can parse the rest correctly.

AddressOfEntryPoint (offset 0x10 within the Optional Header, so file offset 0xA8, 4 bytes). Our bytes are 10 14 00 00 = 0x00001410. This is an RVA — the offset within the mapped image where execution begins. The first instruction the CPU will execute, once the image is fully loaded and ready to run, lives at ImageBase + 0x1410. The linker chose this RVA at build time by placing the entry function (typically the C runtime's mainCRTStartup, which calls main) at that offset within the .text section.

The entry point's value is one of the clearest examples of the read-early-use-late pattern we discussed. The loader reads this 4-byte RVA out of the Optional Header while it's still parsing on-disk headers. It stores the number. Only much later — after mapping every section, applying relocations, resolving every import — does the loader actually compute ImageBase + 0x1410 and jump there. The RVA in the Optional Header is a coordinate that won't be used for a while yet.

ImageBase (offset 0x18 within the Optional Header, file offset 0xB0, 8 bytes for PE32+). Our bytes are 00 00 00 40 01 00 00 00, which reads as 0x0000000140000000. This is the preferred virtual address where the linker would like the loader to map the image. For ordinary 64-bit Windows EXEs — including the toolchains you'll meet first — the conventional preferred base is 0x0000000140000000. DLLs typically use a different convention, commonly 0x0000000180000000 (though this is a linker default, not a format requirement). Both fit comfortably in the 48 bits that x86-64 currently uses for virtual addresses.

The preferred word is doing real work in that sentence. The loader is not required to honor it. On modern Windows, if the binary opts in to ASLR (via the IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE flag we'll see below) and ships with the relocation information needed to be moved safely, the loader may choose a randomized base address instead of the preferred ImageBase. When that happens, every absolute address inside the binary that depended on ImageBase being 0x140000000 is now wrong — and the .reloc section we saw in the file layout exists specifically to tell the loader what to patch. We'll dig into how that works in Part 4. For now, the field is best understood as a hint: "if you can load me here, do; if not, fix me up."

SectionAlignment (file offset 0xB8, 4 bytes) and FileAlignment (file offset 0xBC, 4 bytes). These are the two alignment values that govern how the binary is laid out — one for memory, one for disk. Our values are 00 10 00 00 = 0x1000 (4 KB) for the section alignment, and 00 02 00 00 = 0x200 (512 bytes) for the file alignment.

These two values control the relationship between the on-disk file and the in-memory image. Every section, once mapped into memory, starts at an address that's a multiple of SectionAlignment. Every section, on disk, starts at a file offset that's a multiple of FileAlignment. Section alignment is almost always one page (4 KB on x86-64) because the operating system enforces page-level permissions — you can't make half a page executable. File alignment is typically smaller (512 bytes is common, though tools can produce smaller values) because there's no equivalent requirement on disk: the file just needs to be a stream of bytes, and packing sections closer together saves disk space. The mismatch between these two alignments is what causes the file to "stretch" when loaded — sections that sit adjacently on disk get spread out in memory to hit page boundaries. That stretch is the entire subject of Part 3.

SizeOfImage (file offset 0xD0, 4 bytes). Our bytes are 00 10 01 00 = 0x00011000 = 69,632 bytes. This is the total size of the image once it has been mapped into memory, rounded up to SectionAlignment. The loader's first concrete act, after parsing the headers, is to ask the operating system for exactly this many bytes of contiguous virtual address space. Everything from RVA 0x0000 (the start of the headers) to RVA 0x00011000 (just past the end of the last section) lives within that allocation. The file on disk is 39 KB; the image in memory is 68 KB. The difference is the alignment stretch.

SizeOfHeaders (file offset 0xD4, 4 bytes). Our bytes are 00 04 00 00 = 0x400 = 1,024 bytes. This is the total size of everything from the DOS Header through the section header table, rounded up to FileAlignment. It defines where the section bodies start on disk: the very first section's PointerToRawData will be at offset 0x400, which is exactly what we saw in the file layout earlier.

Subsystem (file offset 0xDC, 2 bytes). Our bytes are 03 00 = 0x0003 = IMAGE_SUBSYSTEM_WINDOWS_CUI, "Windows character-mode (console) UI." When Windows launches our binary, it'll see this value and ensure the process has a console attached — if the program was double-clicked from Explorer rather than launched from a command prompt, Windows allocates a new console window for it. The two most common alternatives are IMAGE_SUBSYSTEM_WINDOWS_GUI (0x0002, for graphical applications — no console) and IMAGE_SUBSYSTEM_NATIVE (0x0001, for drivers and other kernel-mode-ish code that doesn't use the Win32 subsystem at all). This field is what determines whether running an .exe pops up a black console window or not.

DllCharacteristics (file offset 0xDE, 2 bytes). Our bytes are 60 01 = 0x0160. Like the COFF Characteristics field, this is a bitfield, but the bits here are the ones that matter for modern security. The flags set in our binary are:

Other bits you'll meet in real binaries: IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY (0x0080, requires a valid Authenticode signature to load), IMAGE_DLLCHARACTERISTICS_GUARD_CF (0x4000, the binary supports Control Flow Guard), and IMAGE_DLLCHARACTERISTICS_APPCONTAINER (0x1000, the binary requires the AppContainer sandbox). There's also a separate, newer "Extended DLL Characteristics" mechanism — added because the original 16-bit flag field ran out of room — that carries flags like CET shadow stack compatibility (IMAGE_DLLCHARACTERISTICS_EX_CET_COMPAT). These extended flags live in the debug directory rather than this header, so they don't appear in the DllCharacteristics field directly.

One field we'll meet again that's worth naming now: NumberOfRvaAndSizes (file offset 0x104, 4 bytes), with the value 10 00 00 00 = 0x00000010 = 16. This counts how many data directory entries follow. The PE spec defines 16 standard data-directory slots, and ordinary modern toolchains emit all 16. But the PE specification explicitly warns parsers to honor this field before probing any specific directory entry — unusual binaries, packers, and tiny-PE experiments sometimes set it to smaller values to shave bytes off the optional header. The data directories — the array of (RVA, size) pairs that comes next — are the bridge between the Optional Header and the actual section content, and they're the subject of the next section.

The remaining fields of the Optional Header are mostly bookkeeping and version metadata that the loader either uses for compatibility checks or ignores. For reference:

MajorLinkerVersion        02    MinGW ld version major
MinorLinkerVersion        29    (0x29 = 41) version minor
SizeOfCode                0x6C00    sum of all code section sizes
SizeOfInitializedData     0x9600    sum of all initialized-data sizes
SizeOfUninitializedData   0x0C00    sum of all BSS-style sizes
BaseOfCode                0x1000    RVA of the first code section
MajorOperatingSystemVersion  4     minimum Windows version (advisory)
MinorOperatingSystemVersion  0
MajorImageVersion         0    image-specific version (set by linker)
MinorImageVersion         0
MajorSubsystemVersion     5    minimum subsystem version
MinorSubsystemVersion     2
Win32VersionValue         0    reserved, must be zero
CheckSum                  0x1493D    PE-image checksum (MinGW writes a real one; many tools don't)
SizeOfStackReserve        0x200000   virtual memory reserved for primary thread stack
SizeOfStackCommit         0x1000     stack memory actually committed at start
SizeOfHeapReserve         0x100000   reserved for default process heap
SizeOfHeapCommit          0x1000     committed for default process heap
LoaderFlags               0    reserved, must be zero
NumberOfRvaAndSizes       16   number of data directory entries that follow

Of these, the four stack/heap reserve and commit values are the ones most likely to be of interest in real analysis — they affect process memory layout and occasionally show up as quirks in malware that wants unusually large or unusually small stacks. The version fields are mostly compatibility lies told by linkers and aren't enforced. CheckSum is computed over the entire image; it's required to be valid for kernel-mode drivers and a few other specific cases, but for ordinary user-mode EXEs the loader doesn't verify it, so different linkers handle it differently — MinGW writes a real checksum (ours is 0x1493D), while many other toolchains leave it at zero.

Optional Header structure, with the security-flag close-up The 240-byte Optional Header is divided into three regions. The Standard COFF Fields (the first 24 bytes, offsets 0x00 through 0x17 within the header) describe the basic shape of the code and data and where execution begins. The Windows-specific fields (88 bytes, offsets 0x18 through 0x6F) describe everything Windows needs to set up the process: image base, alignments, image size, security flags, stack and heap sizes. The Data Directories (the last 128 bytes, offsets 0x70 through 0xEF) are sixteen RVA-and-size pairs pointing to important structures. Below the main structure, a callout zooms into the DllCharacteristics security flags: HIGH_ENTROPY_VA, DYNAMIC_BASE, and NX_COMPAT are set in our binary, indicating it supports modern ASLR and data-execution prevention. OPTIONAL HEADER (NOT OPTIONAL) 240 bytes, three regions, everything the loader needs STANDARD COFF FIELDS 24 bytes · offsets 0x00 – 0x17 Magic 0x020B (PE32+) SizeOfCode 0x6C00 AddressOfEntryPoint 0x1410 (RVA) + a few more size and base fields WINDOWS-SPECIFIC FIELDS 88 bytes · offsets 0x18 – 0x6F ImageBase 0x140000000 SectionAlignment 0x1000 (4 KB pages) FileAlignment 0x200 (512 B) SizeOfImage 0x11000 (68 KB) SizeOfHeaders 0x400 Subsystem 3 (Windows CUI) DllCharacteristics 0x0160 (see below) SizeOfStackReserve 0x200000 (2 MB) NumberOfRvaAndSizes 16 + version, checksum, heap-reserve, loader flags DATA DIRECTORIES 128 bytes · offsets 0x70 – 0xEF 16 × (RVA, size) pairs DLLCHARACTERISTICS · 0x0160 · A CLOSE-UP Three security bits are set in our binary's flag word 0x0020 HIGH_ENTROPY_VA high-entropy 64-bit ASLR 0x0040 DYNAMIC_BASE opts in to ASLR randomization 0x0100 NX_COMPAT compatible with DEP / no-execute pages 0x0020 | 0x0040 | 0x0100 = 0x0160 — modern Windows binary defaults Three regions of the 240-byte structure, with the security-flag close-up below.
The Optional Header divided into its three logical regions: Standard COFF fields (carried over from the COFF object-file format), Windows-specific fields (everything the OS loader needs), and Data Directories (sixteen pointers into important structures). The DllCharacteristics close-up shows the three security bits set in our binary — the typical modern-Windows-binary configuration.

Data Directories: pointers into the sections

The final piece of the Optional Header is an array of data directories. There are exactly sixteen of them, occupying the last 128 bytes of the Optional Header. Each entry is a tiny structure called IMAGE_DATA_DIRECTORY, just two fields:

typedef struct _IMAGE_DATA_DIRECTORY {
    DWORD   VirtualAddress;   // RVA where the structure lives
    DWORD   Size;             // size of the structure in bytes
} IMAGE_DATA_DIRECTORY;

Eight bytes each, sixteen of them, total 128 bytes. Each entry is, in effect, a (where, how big) pointer to an important structure somewhere inside the image's sections. VirtualAddress here is an RVA — the in-memory offset coordinate we covered in "Three kinds of 'where'" — not a file offset. The data directories are how the loader finds things like the import table or the relocation table without having to scan every section looking for them. Instead, it knows exactly which RVA to follow and how many bytes to read.

The sixteen entries have fixed meanings, assigned by index. Most modern binaries don't fill all sixteen; entries are zeroed out when not used. Here are all sixteen for our hello.exe:

Index  Name                       RVA          Size       Status
─────  ─────────────────────────  ───────────  ─────────  ──────────────
   0   Export Table               0x00000000   0          unused (we export nothing)
   1   Import Table               0x0000D000   0x6D0      points into .idata
   2   Resource Table             0x00000000   0          unused
   3   Exception Table            0x0000A000   0x468      points into .pdata
   4   Certificate Table          0x00000000   0          unused (unsigned binary)
   5   Base Relocation Table      0x00010000   0x84       points into .reloc
   6   Debug Directory            0x00000000   0          unused (stripped)
   7   Architecture               0x00000000   0          reserved, always zero
   8   Global Pointer             0x00000000   0          unused on x86-64
   9   TLS Table                  0x00009040   0x28       points into .rdata
  10   Load Config Table          0x00000000   0          unused
  11   Bound Import               0x00000000   0          unused (legacy mechanism)
  12   IAT (Import Address Table) 0x0000D1C8   0x188      points into .idata
  13   Delay Import Descriptor    0x00000000   0          unused
  14   COM Descriptor (.NET)      0x00000000   0          unused (native binary)
  15   Reserved                   0x00000000   0          must be zero

Of the sixteen possible entries, our binary fills five: Import Table, Exception Table, Base Relocation Table, TLS Table, and IAT. The others are zero. This is typical for a simple native executable; large applications with resources, signatures, and CLR metadata fill more.

Notice that every non-zero entry's RVA falls within one of the sections we saw earlier. The Import Table at RVA 0xD000 lands inside .idata (which starts at RVA 0xD000). The Exception Table at RVA 0xA000 lands inside .pdata (which starts at RVA 0xA000). The Base Relocation Table at RVA 0x10000 lands inside .reloc. This is the key mental model: the data directories do not contain structures themselves; they're pointers into the section bodies, telling the loader where to find structures that the linker placed inside specific sections at build time.

The natural question is: where do those RVA values come from? Nothing in the source code mentions 0xD000 or 0x10000; we never asked for the import table to live at any particular address. The answer is that the linker chose those RVAs itself, during its final layout pass. After it decided which structures go in which sections (the import descriptors in .idata, the relocation blocks in .reloc, and so on) and what RVA each section would start at, it knew the RVA of every structure — and it wrote each one into the corresponding data directory entry as the last step before emitting the file. The data directories are how the linker tells the loader what it built. We'll see the precise algorithm the linker uses to assign section RVAs a few sections from now, in "How the linker assigns RVAs."

That arrangement explains the read-early-use-late pattern from earlier. The loader reads the data directories out of the Optional Header before mapping anything — they're just sixteen 8-byte records sitting in the header. The RVAs they contain don't mean anything yet, because the sections haven't been placed in memory. But once the loader has mapped the sections, those RVAs suddenly become valid pointers to live data. The loader then revisits each non-zero directory entry, computes ImageBase + RVA, and processes the structure there: parsing imports, applying relocations, registering TLS callbacks, and so on.

There's exactly one data directory that breaks the rule we established about RVAs vs file offsets. Entry 4, the Certificate Table (also called the Security Directory) — the digital signature on a code-signed binary. The "VirtualAddress" field in that one directory entry is actually a file offset, not an RVA. The reason: certificate data is not mapped into memory as part of the image; it's appended to the file after the last section, and verified by tools that read the file from disk (Authenticode signature verification happens at install time and at execution time, but always against the file). Putting an RVA there would be meaningless — the bytes aren't in the image. So Microsoft used a file offset, and documented the exception. Our binary is unsigned, so this entry is zero and the exception doesn't bite us; but if you ever inspect a signed binary and find a non-zero entry at index 4, remember that the number you're looking at is a file offset.

The data directories that matter most for the rest of this series are the ones we'll dig into in Part 4: the Import Table and IAT (entries 1 and 12), which together describe what DLLs the program needs and where the loader should fill in the imported function addresses, and the Base Relocation Table (entry 5), which tells the loader what to patch if the image is loaded somewhere other than its preferred ImageBase. We're not going to dissect those structures here — that's Part 4's territory — but you now know how the loader finds them.

Data directories as pointers into sections On the left, a compact list of the data directory entries from our binary. Each entry is a pair of an RVA and a size, and only five are non-zero. On the right, the sections of the image. Arrows show how each non-zero data directory entry points into a specific section: Import Table and IAT both point into .idata, Exception Table points into .pdata, TLS Table points into .rdata, and Base Relocation Table points into .reloc. The diagram makes clear that the data directories don't contain the structures they describe; they're just (RVA, size) pointers that the loader follows after mapping the image. DATA DIRECTORIES Sixteen pointers into the sections DIRECTORY ENTRIES [1] IMPORT TABLE RVA 0x0000D000 · size 0x6D0 [3] EXCEPTION TABLE RVA 0x0000A000 · size 0x468 [5] BASE RELOCATION TABLE RVA 0x00010000 · size 0x84 [9] TLS TABLE RVA 0x00009040 · size 0x28 [12] IAT RVA 0x0000D1C8 · size 0x188 [0, 2, 4, 6, 7, 8, 10, 11, 13, 14, 15] — all zero SECTIONS IN MEMORY (RVA) .text @ 0x1000 .data @ 0x8000 .rdata @ 0x9000 .pdata @ 0xA000 .xdata @ 0xB000 .idata @ 0xD000 .reloc @ 0x10000 THE ONE EXCEPTION Entry 4, the Certificate Table (digital signatures), uses a file offset in its "VirtualAddress" field — because signatures aren't mapped into memory. Our unsigned binary has this entry at zero.
Five of our sixteen data directories are populated. Each one is just an (RVA, size) pair that points into a section the loader has yet to map. The loader stores these values from the Optional Header, finishes mapping the sections, and then follows each non-zero pointer to the structure it describes.

Section headers — where on-disk meets in-memory

Immediately after the Optional Header ends at file offset 0x188 comes the section header table. There's one entry per section — ten of them in our binary — and each entry is exactly 40 bytes. The structure is called IMAGE_SECTION_HEADER. These are the most consequential 400 bytes in the whole file, because they're what tells the loader how to actually build the in-memory image from the on-disk bytes.

Each section header contains eight fields. We'll walk through them using the real bytes for our binary's first section, .text, whose 40-byte header starts at file offset 0x188:

00000188: 2e74 6578 7400 0000 686b 0000 0010 0000
00000198: 006c 0000 0004 0000 0000 0000 0000 0000
000001a8: 0000 0000 6000 0060

Reading those bytes field by field:

Name (offset 0, 8 bytes). 2E 74 65 78 74 00 00 00 — the ASCII string ".text" followed by three zero bytes of padding. The field is a fixed-size 8-byte buffer; section names that don't fill it are zero-padded, and section names exactly 8 bytes long are stored without a null terminator. The loader doesn't use this name to make decisions — it's purely a human-readable label. Conventions like .text, .data, .rdata, .reloc are just that, conventions; the linker can name a section anything it wants, and you'll occasionally meet binaries with unusual section names chosen by obfuscators or specialized linkers.

VirtualSize (offset 8, 4 bytes). 68 6B 00 00 = 0x00006B68 = 27,496 bytes. This is the exact, unpadded size the section occupies in memory once mapped. It's the linker's honest count of how many bytes of real content the section contains — code, data, whatever.

VirtualAddress (offset 12, 4 bytes). 00 10 00 00 = 0x00001000. This is an RVA — where the section should be placed within the mapped image, measured from ImageBase. The loader will arrange to have the section's content visible at ImageBase + 0x1000 after mapping. For our 64-bit binary with ImageBase = 0x140000000, that's 0x140001000 — the address the CPU sees when executing code in .text.

SizeOfRawData (offset 16, 4 bytes). 00 6C 00 00 = 0x00006C00 = 27,648 bytes. This is the size of the section on disk, rounded up to FileAlignment (which is 0x200 in our binary). Note the difference from VirtualSize: 27,648 versus 27,496 — the on-disk size is 152 bytes larger, because file alignment requires rounding up to a 512-byte boundary, and the unpadded data fell short of that boundary.

PointerToRawData (offset 20, 4 bytes). 00 04 00 00 = 0x00000400. This is a file offset — where this section's bytes start in the file on disk. The loader will read SizeOfRawData bytes starting from this position in the file.

This is the moment to call out the central trick of this structure: this one 40-byte record contains both a file offset and an RVA, side by side. PointerToRawData tells the loader "read from this position in the file"; VirtualAddress tells the loader "write to this offset in the mapped image." The section header is precisely the structure that straddles the boundary between disk and memory. It is the entry in the format whose only purpose is to express the relationship between the two coordinate systems we discussed at the start of this post.

The mapping operation for each section reduces to a single recipe:

Read SizeOfRawData bytes from file position PointerToRawData
Write them to memory at  ImageBase + VirtualAddress
If VirtualSize > SizeOfRawData, zero-fill the remainder
Set page permissions according to Characteristics

The "if VirtualSize > SizeOfRawData" case is rare for code and ordinary data sections (where they're typically equal except for alignment padding), but it's the rule for BSS-style data. The .bss section in our binary has VirtualSize = 0xB80 and SizeOfRawData = 0 — it occupies 2,944 bytes of memory, all zero-filled, and contributes zero bytes to the file. That's how uninitialized data is stored cheaply: the file says "I want this much space, here's no content," and the loader zeroes the memory at load time.

Three more fields complete the structure.

PointerToRelocations and PointerToLinenumbers (offsets 24 and 28, 4 bytes each). Both zero in our binary. These are leftovers from the COFF object-file format we discussed in Part 1, where each section carried its own relocations and per-line debug information. In an executable, that information is consolidated elsewhere (relocations go to the .reloc section and the Base Relocation Table; debug info, if present, goes to its own section and is pointed at by the Debug Directory). For executables, these fields are always zero.

NumberOfRelocations and NumberOfLinenumbers (offsets 32 and 34, 2 bytes each). Both zero. Same reason.

Characteristics (offset 36, 4 bytes). 60 00 00 60 = 0x60000060. This is the third bitfield we've seen so far, and it's the most important one for the loader's actual work: it determines what page permissions the section gets when it's mapped. The bits set in our .text Characteristics are:

Four flags, OR'd together, equal 0x60000060. The combination says "code with mixed data, executable, readable" — exactly what you want for a .text section. Notice what's not set: IMAGE_SCN_MEM_WRITE (0x80000000). The page on which our entry-point code lives will be readable and executable, but not writable. That's a deliberate security choice — preventing the program from accidentally or maliciously modifying its own code at runtime.

For comparison, .data in our binary has Characteristics 0xC0000040 = INITIALIZED_DATA | MEM_READ | MEM_WRITE — readable, writable, not executable. .rdata has 0x40000040 = INITIALIZED_DATA | MEM_READ — readable only. And .reloc has 0x42000040 = INITIALIZED_DATA | MEM_DISCARDABLE | MEM_READ — readable, and the unusual MEM_DISCARDABLE flag, which tells the loader that this section can be thrown away after it's been processed, because the relocation entries inside it are only useful during loading.

The Characteristics field is, in effect, a compact description of what permissions the operating system should grant the memory pages that hold this section's content. The loader translates these flags directly into page-protection settings when it maps the section. That's the topic of Part 3 — the protections, the alignment stretch, the layout transformation. For now, just know that every section in every PE carries, encoded in those four bytes, the answer to "is this readable, writable, executable, or something else?"

The .text section header maps disk bytes to memory bytes Two side-by-side panels showing the .text section in both coordinate systems. The on-disk panel, with dashed borders, shows the section starting at file offset 0x400 with size 0x6C00 — a tight block of bytes packed to file-alignment boundaries. The in-memory panel, with solid borders, shows the section at RVA 0x1000 with virtual size 0x6B68 — slightly smaller because file alignment overshoots the actual content size — and with the executable and readable bits set in its page protection. Below, the four fields of the section header that span both worlds are shown side by side: PointerToRawData and SizeOfRawData describe the disk side, VirtualAddress and VirtualSize describe the memory side. SECTION HEADER · THE BRIDGE One section, both sides of the loader's divide ON DISK · file offsets .text content 27,648 bytes of machine code real content 0x6B68 bytes 152 bytes file-alignment padding 0x400 0x7000 MAPPED BY THE LOADER IN MEMORY · RVAs .text content 27,496 bytes mapped, R-X pages real content 0x6B68 bytes no padding within section next section starts at page boundary RVA 0x1000 RVA 0x7B68 THE FOUR FIELDS THAT WIRE THE TWO SIDES TOGETHER DISK PointerToRawData 0x400 SizeOfRawData 0x6C00 MEMORY VirtualAddress 0x1000 (RVA) VirtualSize 0x6B68 Dashed borders mark disk-side coordinates; solid borders mark in-memory coordinates.
The .text section header is the entry whose only purpose is to relate two coordinate systems. Disk-side fields (PointerToRawData, SizeOfRawData) tell the loader where to read; memory-side fields (VirtualAddress, VirtualSize) tell it where to write. The slight size difference between disk and memory comes from file-alignment padding on disk.

How the linker assigns RVAs

The VirtualAddress values in the section headers — 0x1000 for .text, 0x8000 for .data, 0x9000 for .rdata, and so on — are not magic. They were chosen by the linker at build time using a simple sequential algorithm, the one we sketched in Part 1's discussion of the linker's Phase 2. Now that we have all the surrounding machinery in view, we can describe the algorithm precisely.

The headers occupy RVA 0x0000 through some value just past the end of the section header table. In our binary, the headers and section table together take 792 bytes — DOS Header (64) + DOS Stub (64) + PE signature (4) + COFF File Header (20) + Optional Header (240) + ten section headers at 40 bytes each (400). That ends at file offset 0x318, which becomes RVA 0x318 once the headers are mapped. But the first section can't start at 0x318 in memory — it has to align to a multiple of SectionAlignment (0x1000). So the linker rounds up: the first section gets VirtualAddress = 0x1000.

For each subsequent section, the linker applies the same rule:

next_VA = align_up(current_VA + current_VirtualSize, SectionAlignment)

Let's trace it through our binary. .text has VirtualAddress = 0x1000 and VirtualSize = 0x6B68. Adding those gives 0x7B68, the first byte after .text ends. Rounding up to the next 0x1000 boundary gives 0x8000 — and that's exactly the VirtualAddress of .data. .data has VirtualSize = 0xC0, so it ends at 0x80C0, which rounds up to 0x9000 — the VirtualAddress of .rdata. .rdata's size is 0xDA0, ending at 0x9DA0, rounding up to 0xA000 — the VirtualAddress of .pdata. The pattern continues through all ten sections.

The math on the disk side is the same shape, with FileAlignment (0x200) substituted in. Our .text starts at PointerToRawData = 0x400 and has SizeOfRawData = 0x6C00, ending at file offset 0x7000, where .data begins — and 0x7000 is already a multiple of 0x200, no rounding needed. .data with SizeOfRawData = 0x200 ends at 0x7200, where .rdata begins. And so on, sequentially through the file.

Two things follow from this. First, the section table is entirely deterministic — given a list of sections and the two alignment values, you can compute every RVA and every file offset by walking down the list once. The linker does this exactly once, at build time, and writes the results into the section headers. The loader doesn't recompute them; it just reads them and obeys.

Second, the difference between the two alignments is what makes the file smaller than the image. FileAlignment = 0x200 packs sections close together on disk; SectionAlignment = 0x1000 spreads them out in memory. Our file is 39,424 bytes (rounded to file-alignment); our image is 69,632 bytes (rounded to section-alignment). The image is 77% larger than the file, even though most of the content — every byte of code, every byte of initialized data — is the same. The difference is mostly alignment gaps between sections, plus memory-only zero-filled regions like .bss (which has 2,944 bytes of VirtualSize and zero on-disk presence).

That stretch is the central topic of Part 3. The PE format is, in this sense, a compact encoding of an image that intentionally has gaps in it once it's expanded.

Converting between RVA and file offset

Now we can finally answer the question this part has been building toward: given an RVA — say, the entry point's 0x1410 — how do you find the byte in the file? And how do you go the other direction, from a file offset to an RVA? This conversion is the single skill PE analysts perform most often, because disassemblers display RVAs and hex editors display file offsets, and you'll constantly find yourself with one when you need the other.

There is no single formula. The file and the image have different alignments, and sections sit at different relative positions in each. The section table is the bridge — every conversion goes through it.

Here's the algorithm for RVA → file offset:

  1. Find the section whose RVA range contains the target: i.e., the section where VirtualAddress ≤ RVA < VirtualAddress + VirtualSize.
  2. Compute the offset within that section: section_offset = RVA - section.VirtualAddress.
  3. Add that to the section's file position: file_offset = section.PointerToRawData + section_offset.

Let's walk through it for our entry point, RVA 0x1410.

Step 1. Which section contains 0x1410? Checking each section header: .text has VirtualAddress = 0x1000 and VirtualSize = 0x6B68, so it covers RVAs 0x1000 through 0x7B68. Our target 0x1410 is comfortably inside that range. The entry point lives in .text.

Step 2. The offset within .text: 0x1410 - 0x1000 = 0x410. The entry point is 0x410 bytes from the start of the .text section.

Step 3. The file offset: .text starts in the file at PointerToRawData = 0x400, so the entry point lives at 0x400 + 0x410 = 0x810 in the file.

Let's verify by looking at the actual bytes. If we open hello.exe in a hex editor and jump to file offset 0x810:

00000810: 5548 89e5 4883 ec20 488b 0561 8300 00c7
00000820: 0000 0000 00e8 66fd ffff 9090 4883 c420
00000830: 5dc3

The first eight bytes are 55 48 89 E5 48 83 EC 20. Decoded as x86-64 instructions, that's:

55              push   rbp
48 89 E5        mov    rbp, rsp
48 83 EC 20     sub    rsp, 0x20

That's the standard function prologue we discussed in Part 1's assembly snippet — push the old base pointer, set up a new frame, reserve stack space. The sub rsp, 0x20 reserves exactly 32 bytes, which is the Windows x64 shadow space we walked through in Part 1's calling-convention discussion. The entry point of our program is, byte for byte, the prologue we predicted it would be. The RVA-to-file-offset conversion landed on the right bytes.

The inverse direction — file offset → RVA — works the same way in reverse:

  1. Find the section whose file-offset range contains the target: PointerToRawData ≤ offset < PointerToRawData + SizeOfRawData.
  2. Compute the offset within the section: section_offset = file_offset - section.PointerToRawData.
  3. Add to the section's RVA: RVA = section.VirtualAddress + section_offset.

A subtlety: bytes in the headers and in any file-alignment padding don't belong to any section, and don't have RVAs in the usual sense. (The headers do get mapped into memory at RVA 0, so they have RVAs by extension; padding bytes just don't exist in memory.) For locations inside section bodies the conversion is always well-defined; for locations elsewhere in the file, the question may not have a meaningful answer.

A second subtlety is for anyone writing tooling. The clean algorithm above works for ordinary well-formed binaries where each RVA falls neatly into exactly one section's VirtualAddress..VirtualAddress + VirtualSize range. Real PE parsers have to be more defensive: SizeOfRawData can be larger than VirtualSize (alignment padding on disk), or smaller (some on-disk bytes are tail-zeroed in memory), or zero (BSS-style sections with no on-disk content). Packers and malware deliberately exploit those edge cases — overlapping section ranges, zero-sized sections, sections with VirtualSize that crosses image boundaries — to break naive parsers. If you're building tools, mirror what hardened parsers like pefile do; if you're just reading binaries, the simple algorithm covers the common case.

Every PE-inspection tool implements this algorithm internally. It is, in the end, the entire reason the section header table exists — to let anyone with the headers and a position in either coordinate system compute the matching position in the other.

Reading PE files in 2026

In practice, nobody decodes a PE file by hand for very long. Once you understand the structure, you switch to tools that parse it for you. The ones an analyst is most likely to use in 2026, roughly in order of how often they come up:

dumpbin ships with Visual Studio and is the canonical Microsoft tool. dumpbin /headers foo.exe dumps every header structure; /imports shows the import table; /exports shows what the binary exports; /all dumps everything. It only runs from the Visual Studio developer command prompt.

objdump (the GNU version) and llvm-objdump / llvm-readobj are the cross-platform equivalents. They work on Linux, macOS, and Windows, and they handle PE files alongside ELF and Mach-O. objdump -p foo.exe dumps PE-specific headers; llvm-readobj --coff-load-config -r foo.exe gives the most modern dump including extended characteristics.

PE-bear (by hasherezade) is a free graphical tool that handles malformed PE files — important when analyzing malware, which often deliberately stretches the format to confuse parsers. It's particularly good at side-by-side hex / structure views.

CFF Explorer (by Erik Pistelli) is a free PE editor that supports both PE32/PE32+ and .NET binaries. It can read, edit, and rebuild structures — useful for both analysis and patching.

PEStudio performs static malware-triage analysis: it parses the structures and flags suspicious indicators (uncommon imports, packed sections, suspicious entropy, etc.). Free for non-commercial use.

pefile is a Python library by Ero Carrera, widely used for scripting analysis at scale. If you ever need to process a thousand PE files programmatically, this is the tool.

For ad-hoc work on small files, plain xxd or any hex editor (HxD, 010 Editor, ImHex) is enough — we've been doing exactly that throughout this post.

Try it yourself — reproduce every byte

Every number quoted in this post comes from a real binary you can build and inspect in two minutes. On a Linux machine or in WSL, with the MinGW cross-compiler installed (apt install gcc-mingw-w64-x86-64 on Debian/Ubuntu):

# Write the source file
cat > hello.c << 'EOF'
#include <stdio.h>
int main(void) { printf("hello\n"); return 0; }
EOF

# Compile to Windows PE, optimized and stripped
x86_64-w64-mingw32-gcc -O2 -s -o hello.exe hello.c

# Inspect the headers
x86_64-w64-mingw32-objdump -p hello.exe
x86_64-w64-mingw32-objdump -h hello.exe

# Read bytes by file offset
xxd -s 0x00 -l 256 hello.exe    # DOS header + stub
xxd -s 0x80 -l 24  hello.exe    # PE signature + COFF File Header
xxd -s 0x98 -l 240 hello.exe    # Optional Header
xxd -s 0x188 -l 400 hello.exe   # Section header table
xxd -s 0x810 -l 16  hello.exe   # Entry point bytes

The exact byte values may differ slightly from the ones in this post — TimeDateStamp will reflect your build time, and tiny binary differences are expected across MinGW versions — but the structure and the offsets will match. The values are stable enough that the worked example in the previous section (entry point RVA 0x1410 → file offset 0x810, prologue bytes 55 48 89 E5 48 83 EC 20) reproduces reliably.

On Windows itself, the equivalent inspection commands are dumpbin /headers hello.exe from a Visual Studio developer command prompt, or any of the GUI tools listed above. The output formatting differs; the bytes don't.

A few realities you'll meet that we haven't covered in detail: many production binaries are signed (the Certificate Table at data directory index 4 is populated; verification happens against the on-disk file), and managed .NET binaries are common enough that you'll encounter them quickly in PE analysis (the COM Descriptor at index 14 is populated; the actual code is in CIL bytecode within a section, with a tiny native stub to bootstrap the CLR). Both cases preserve all the structures we've covered; they just have additional structures sitting alongside. The DOS header, PE signature, COFF File Header, Optional Header, data directories, and section headers are present in every PE file Windows can load, signed or not, managed or not.

What we have at the end of Part 2

You can now read a PE file. Open any Windows binary in a hex editor, and you can walk down its bytes naming what each one is: MZ at byte zero, the DOS stub, the PE signature, the COFF File Header with its machine type and section count, the Optional Header with its image base and alignments and data directories, the section header table with one entry per section, padding to file alignment, then the section bodies in linker-chosen order. You can convert between the three coordinate systems — file offset, RVA, virtual address — when you need to find a specific byte. You know which fields the loader consumes before mapping and which it consumes after.

What you don't yet know is what actually happens when the file becomes a process. The structures we've described are static. The loader does something specific with them — a series of steps that turns a 39 KB on-disk file into a 68 KB region of mapped memory with the right page permissions, then patches and connects and finally hands control to the entry point. The transformation is more interesting than it sounds, because the difference between the file and the running image is not just size: it's alignment, it's permissions, it's a relationship between bytes that don't move and pointers that have to be fixed up.

That's Part 3: the file-to-memory stretch.