NTAPI Injection
June 23rd, 2023
Last updated
June 23rd, 2023
Last updated
I've created a video to go along with this blog post. To watch the current installment of our "Malware Development" series, see the link below:
You may know by now that there exists the Win32 API and the native API (otherwise known as NTAPI
). In this post, we're going to take a look at how the functions that you use from the standard Windows API get translated into the lower-level NTAPI/syscalls. After that, we'll program a super simple injector that will inject a DLL into our target process, except this time instead of using only Win32 API, we'll swap out one (1
) function for its NTAPI counterpart. If you're looking for an example with all of the Win32 API swapped out for its NTAPI counterparts, check the article below:
You may be asking what the point of this is. Why are we even removing the high-level abstraction in the first place? Well, typically the more and more wrappers we can remove from our malware, the stealthier they become. Obviously, as time went on, defensive solutions easily caught on to this, and it's still easy to detect the use of these native API calls, however, we can still see that it does make a difference. Take, for example, the DLL Injection example from the previous blog. If we submit this to VirusTotal, we can see that ten (10
) vendors pick it up as malicious:
If I take the exact same program but replace some of the Win32 API with the lower-level NTAPI from NTDLL, we should see this number get lower.
We can see that our indeed, does get a lower score. Also remember that we're just including a DLL as our payload, if we were to have some signatured shellcode like a reverse shell payload from msfvenom
, our ratings would be much higher. Moreover, in that program, I've only swapped out two (2
) of the Win32 API functions with their NTAPI counterparts, those being NtOpenProcess
and NtCreateThreadEx
. Don't worry if that looks alien to you, we'll go over that soon.
You should also be wary of uploading your malware to a site like VirusTotal (VT) if it's something you don't want to be shared with or examined by security vendors.
Some websites like VT will take your malware and share it with its partners for analysis and sample submission. Their partners are contractually obliged to use the sample you submit for internal security purposes only and for the purposes of bettering their antivirus engines, as well as detection. You can read more about this from the "Sharing & Disclosure" section of their "Privacy Policy" section:
If you'd like another site that you could use for testing that doesn't submit your sample upon submission, you could look at something like the following:
Even though the score difference doesn't seem that crazy, it's still knocking points off and making our malware more stealthy. And to reiterate, we only replaced two (2
) functions. Let's move on now that we know why using something like NTAPI would be a better alternative than the standard Win32 API. Before we do that, we need to understand the difference between User-mode and Kernel-mode.
It's critical that we spend ample time and effort understanding these processor access modes (i.e., "user mode" and "kernel mode") because you'll be hearing these two terms all the time during your malware development journey. And it's not without reason. A lot of things like Anticheats, Antiviruses, EDRs, etc. run in kernel mode. They do this because running in kernel mode grants you access to all parts of the operating system, which if you're detecting and defending against malware, is a really important advantage to have.
"To protect user applications from accessing and/or modifying critical OS data, Windows uses two processor access modes (even if the processor on which Windows is running supports more than two): user mode and kernel mode. User application code runs in user mode, whereas OS code (such as system services and device drivers) runs in kernel mode. Kernel mode refers to a mode of execution in a processor that grants access to all system memory and all CPU instructions. Some processors differentiate between such modes by using the term code privilege level or ring level, while others use terms such as supervisor mode and application mode. Regardless of what it’s called, by providing the operating system kernel with a higher privilege level than user mode applications have, the processor provides a necessary foundation for OS designers to ensure that a misbehaving application can’t disrupt the stability of the system as a whole." - Windows Internals, Part I
x86
and x64
processors define four (4
) privilege levels (or rings) to protect system code and data from being overwritten by accident or maliciously by code of lesser privilege. These rings help facilitate the "principle of least privilege" model.
If we take a look at the following diagram that shows the hierarchy of these privilege rings, we can see that the Kernel resides in the highest-privileged level (ring 0
) and the applications we run, i.e., user-mode applications, reside on the lowest-privileged ring (ring 3
):
If the kernel uses ring zero (0
) and user applications use ring three (3
), then what are rings one (1
) and two (2
) used for and why does Microsoft only use those aforementioned 0/3 rings?
"The reason why Windows uses only two levels is that some hardware architectures, such as ARM today and MIPS/Alpha in the past, implemented only two privilege levels. Settling on the lowest minimum bar allowed for a more efficient and portable architecture, especially as the other x86/x64 ring levels do not provide the same guarantees as the ring 0/ring 3 divide." - Windows Internals, Part I
To get an idea of what happens when a function gets called, let's look at a typical function's flow path.
When we use a standard Win32 API function like WriteFile
, the Kernel doesn't execute that function right away. Quite the opposite. The function goes on a Tolkien-esque adventure before it finally ends up in kernel mode.
We can see that the program (the User Application) starts by calling the WriteFile
function, which is exported from the Kernel32.dll
library. This module is also responsible for exporting most, if not, all of the Win32 API functions we've been using up to this point. We already know this from the DLL Injection blog post, specifically, the "Creating the Program" section:
After Kernel32.dll
, the function goes into ntdll.dll
. What exactly is that? NTDLL
is the last big stop that your function makes before it crosses the threshold of kernel space via the syscall
, sysenter
, or int 2eh
instructions. It exports the "Native API" (NTAPI
) just like Kernel32
exposes the Win32 API. Let's examine a sample program like the one below:
At this point, you should know what this is doing. To resolve the API calls and see what functions our functions use, we can use an incredible tool called "API Monitor":
Start by opening the API Monitor program (it's architecture-sensitive, so make sure you open the version that corresponds with the architecture you built your program for). It's important that we specify the modules for which we'd like to monitor API calls. Because we've used Win32 API for this program, let's choose Kernel32.dll
:
Once we've selected this, we can either go in and search for the specific functions (OpenProcess
and CloseHandle
), or we can just select a category of functions denoted by the folder names. OpenProcess
belongs to the System Services > Processes and Threads
section, and CloseHandle
to the Windows System Information > Handles and Objects
section. You can actually see this from the MSDN:
So, let's select the necessary ones.
We also want to do the same thing for the NTDLL
module:
With those finally selected, we can move on to attaching our process to this program. To start our program and see what functions it calls, we can configure our startup options in File > Monitor New Process
and press OK
:
A new command prompt opens up and our program halts at the "press enter to exit" part. More importantly, we can see that our OpenProcess
call gets captured, and even more importantly, we can see the NTAPI counterpart for it:
What an incredible program. We can see that our OpenProcess
function gets translated into the lower-level NtOpenProcess
NTAPI. Furthermore, this cool program shows us the parameters of this NtOpenProcess
function as well.
"Hold on a minute. Crow, you said that NtOpenProcess
was located in NTDLL.dll
, but the program is showing the module of this function as KERNELBASE.dll
. Have you just been lying to us?" Yes. Well, sort of. If we hover over the API call, we can 100% see that the function resides in NTDLL
, or hell, even in the parameters section:
So this begs the question then, what the hell is that KERNELBASE.dll
thing and where did it come from? KERNELBASE
was made by Microsoft to act as a sort of "proxy" for your API calls. If we look at the documentation, we can see it being discussed here:
"As an example of functionality that we moved to low-level binaries,
kernelbase.dll
gets functionality fromkernel32.dll
andadvapi32.dll
. This means that the existing binary now forwards calls down to the new binary rather than handling them directly..." - MSDN
So far our function has taken this path:
OpenProcess
(KERNEL32
)
OpenProcess
(KERNELBASE
)
(Zw/Nt)OpenProcess
(NTDLL
)
There are many ways for us to see what the assembly stub of the KERNELBASE
function looks like - and it's important for us to do this because it will also let us see how the function from KERNELBASE
goes into NTDLL
. I'll just use x64dbg
, but you can very easily do this in IDA
, WinDbg
, etc. In the Symbols
section, we can select the kernelbase.dll
module, and search for the OpenProcess
symbol:
Double-clicking that address and then viewing this function in the "Graph" mode of x64dbg, we can see the following:
From this, we can see the following instruction in the stub: call qword ptr ds:[<NtOpenProcess>]
. This is the reference to the NTAPI inside of NTDLL. If we click on this function name, we can finally see how these functions fundamentally work and how NTAPI functions are called using their respective "System Service Numbers" (SSNs) or syscalls:
If for some reason the bitch-ass "ZwOpenProcess
assembly stub" picture above isn't loading, refresh the page.
In this case, the syscall number of this function (26h
) is moved into the eax
register and then the syscall
instruction (or int 2e
) is called - after which, the function is executed. This is incredibly high level, but we'll go into more depth about this when we get to the System Calls section. As we've seen so far, NTAPI functions are distinguished by names that begin with Nt
or Zw
. In user mode, they're exported from NTDLL
, and in kernel mode, from the NTOSKRNL
module. When these functions are executed, they will return a "NTSTATUS
" value. There are a metric sh*t ton of these return values:
The biggest thing you're going to realize when you start delving into the lower-level stuff is that most of it isn't documented and most of it is extremely unstable. The NTAPI
from NTDLL
, is not really documented; sure, the documentation does have some examples and certain references to the Native API and its functions. However, Microsoft has left it in the hands of our incredible reverse-engineering brothers and sisters to dump and figure out the inner workings of these things.
One of the more "practical" reasons why people don't just use the NTAPI is because since it's lower than the higher-level (and reliable/easy-to-use) API, it requires more fiddling to get it working right. The higher-level APIs provide a convenient and standardized interface to the underlying system. This makes it easier for developers to write software that works on multiple versions of Windows and reduces the likelihood of errors or compatibility issues. Microsoft definitely does take backward compatibility into consideration, however, things can and most likely will change. From build version to build version, and almost definitely from OS version to OS version. From this site:
We can see the work that's gone into reconstructing these NTAPI functions and structures:
If you try to visit http://undocumented.ntinternals.net/, and you're not able to reach it or even if you're having issues with the link I've embedded above (i.e. if you get a "404: File not found." error), you can go on Internet Archive (wayback machine) and search for the link there and go to a snapshot to access it.
With the NTAPI covered and some of its potential issues acknowledged, let's begin the construction of our program.
Also, please note that the site above isn't the only resource you can use for finding these functions. Actually, sometimes it could be quite problematic for some if they use this, especially if they only use this. The site is rather old so you should also be aware that some things could be wrong/inaccurate. Therefore, I've linked some alternatives that you could use to get function syntaxes and kernel structures down below.
There are probably a lot more that I'm missing, but with just these, and the previous sites mentioned so far, you should be set. Again, reach out and seek other resources, don't rely on a single resource for everything - that will only come back to bite you.
As we mentioned previously, we'll just replace one of the Win32 API functions within the old DLL injection program with its NTAPI counterpart. We can do it with all of the functions, definitely. However, this might get overwhelming, and we need to learn to walk before we can run. We'll just do one for now. If you want to see an example of all of the Win32 APIs replaced by their NTAPI counterparts, check the blog post below:
A lot of these NTAPI functions require setup in the form of declaring certain structures, making function prototypes, or initializing certain things. The blog post above covers the structure setup and everything but I'll also cover it here so don't worry; I got your back.
Here's a typical Win32 API DLL Injection program that you might find floating around the musty swamps that we call the internet:
By now, you should know the mechanisms behind this code like the back of your ankles. We can start now by swapping out one of the functions from here. I'm going to swap out CreateRemoteThread
for NtCreateThreadEx
. Let's look at the syntax for this. I'll be getting this information from a really good resource that we've included above, the PHNT files:
From the link above, we can see the following syntax we'll use to make a function prototype:
We could just plop this into our source code, but this requires more setup from us before we can start tinkering with it. The biggest thing that we need to sort out right away, are the POBJECT_ATTRIBUTE
and PPS_ATTRIBUTE_LIST
structures present in the function. Those lines are referencing the _OBJECT_ATTRIBUTE
and _PS_ATTRIBUTE_LIST
structures. The "P
" before them denotes that this parameter is a pointer to said structures.
So, because our function requires these structures, we have to include them in our project. You could include everything in one file, and that's what I used to do, but it would be cleaner for us to just make a header file to store all of this stuff so we can just work on our injection program without the extra bloat everywhere. So, in Visual Studio, let's create a new project and let's make a new header file and include the following for now:
If you followed the last blog post with our simple DLL injection, these macros shouldn't be alien to you. Now, you'll notice the issue of the structures I was talking about when you just put this code above into a header file:
So, where can we find these structures? How do we know what members we need to include in our definitions? Well, we have two options here. Either we manually dump them ourselves, or, the more "sane" (but less fun) approach is to get them from sites like Vergilius, ReactOS, etc. I'll be using Vergilius for the _OBJECT_ATTRIBUTES
structure. If we head to the site, and find the way to our current build of Windows, which you can get by issuing the winver
command in a shell:
We can start searching for structures, we'll query the _OBJECT_ATTRIBUTES
structure, as shown here:
We can then click on this, and see what the structure looks like.
You can find this yourself in the link embedded below:
Make sure to prepend the typedef
keyword before the "struct
" since we're trying to use it in an external file. Also, we have to give it names as well, since Vergilius just gives us the members in it, and the name at the top of the definition, we have to manually define the structure name and pointer name: OBJECT_ATTRIBUTES, *POBJECT_ATTRIBUTES
. Now, we've gotten rid of one red line of misery, all that's left is to abolish the other one.
You may realize that my version of Windows shown, doesn't directly match with the version of Windows I yoinked this structure definition from in Vergilius. It may be the case that that'll happen to you as well. Again, use more than one resource. For us here, it doesn't matter too much. The structure's good even for my build but as we're approaching lower and lower levels, we need to be mindful of these things because if we have even one version mismatch, things might not work. Especially in the case of structures and system calls.
With this next structure, _PS_ATTRIBUTE_LIST
, it, unfortunately, won't be as easy as searching it on Vergilius and copy-pasting it in. If we look for this structure on Vergilius, we'll see that it's not actually there - which means we'll have to use something else. I found the following resource for this structure:
From the repository above, we can see that the _PS_ATTRIBUTE_LIST
structure is as follows:
But, we can't use this just yet. This structure references another structure on the line: PS_ATTRIBUTE Attributes[2];
. Because of this, we also need to grab the _PS_ATTRIBUTE
structure, which we can find directly above the _PS_ATTRIBUTE_LIST
structure from the resource below:
Now, after including these two structures in our header file, we're all set to begin programming and we don't see any annoying red lines.
Perfect. Let's start with our actual injection program now. The one thing that I added for QoL is a function that will attempt to get a handle on the module we specify and automatically perform some "error" handling so that we don't need to spam our program with a bunch of if-else statements.
This function takes the name of the module you wish to get a handle on, and tries to get a handle on it - if it can't get a handle, i.e., if GetModuleHandleW
returns NULL
, it'll just return NULL
. Anyways, let's continue. On to the main
function:
All of this looks standard to what we've been doing so far, except for the initialization of the hNTDLL
variable, which will be where we store the base address of the NTDLL
module when we finally get a handle on it via our getMod
function. The only difference we can see is the inclusion of this OBJECT_ATTRIBUTES OA = { sizeof(OA), NULL };
.
This is here because NtCreateThreadEx
needs a pointer to this structure for one of its arguments, but we'll get there when we get there. Another thing you'll note is the CLEANUP
label that we'll use for house cleaning. This provides a very nice and efficient way to close all our handles and clean up the program upon exiting and upon errors.
There's a bit of a debate about using goto
in your program - the main gripe has to do with readability, but for simple house cleaning like closing handles, I think it's actually pretty OP. Anywho, let's continue.
We get a handle on the process pointed to by the PID
we supply. After which, we use our getMod
functions to get a handle to NTDLL
and Kernel32
. Why these two modules, specifically?
Kernel32
-> We need this for the LoadLibrary
function so that we could load our module in the processes memory and have the entry point of our DLL (DllMain
) run.
NTDLL
-> We need to get a handle on NTDLL because we have a function prototype. We need to populate our function prototype with the address of the function we're trying to use. We're trying to use NtCreateThreadEx
which resides in NTDLL
, so we're going to look for it in that module, and once found, it's going to be assigned to our NtCreateThreadEx
function prototype that we've defined in the glassBox.h
header file.
After that, we populate our NtCreateThreadEx
function prototype. Furthermore, we do what we've done previously, and we reach into Kernel32
to look for LoadLibraryW
. We typecast that to PTHREAD_START_ROUTINE
to tell the newly created thread that we want this to be the starting point for the thread.
We allocate a buffer to the process memory with PAGE_READWRITE
permissions and write the DLL path to that allocated buffer. Now, all that's left is to use our NTAPI
NtCreateThreadEx
function, which we've called kawCreateThreadEx
; which at this point, is primed and ready.
The NTAPI
functions return a NTSTATUS
. So, we can't use HANDLE
here as we did in our previous examples:
For this, we'll have to hold the return value of our kawCreateThreadEx
function with a variable like:
Because this function will return a NTSTATUS
, we want to see if it returns one of the values seen here:
More specifically, we want to see if the STATUS_SUCCESS
NTSTATUS
value gets returned, which we can see is the following value:
From the same page, we can see what this value/code actually means:
We can define this value in our header file so that we can do some error handling for our kawCreateThreadEx
function. So, let's set the following line:
With that done, we can finally start filling out the arguments to our kawCreateThreadEx
function.
The first argument (ThreadHandle
) is a pointer to the hThread
variable that will hold the base address of the thread handle when it gets created. The second parameter (DesiredAccess
) is the access we'd like for the newly created thread. We could just specify THREAD_ALL_ACCESS
as our argument.
The next argument (ObjectAttributes
) we supply is a pointer to the OBJECT_ATTRIBUTES
structure we created. We assigned that structure to our OA
variable so let's supply that for this argument:
Next, we supply our process handle for the ProcessHandle
parameter:
StartRoutine
is the next parameter, and we can supply our PTHREAD_START_ROUTINE
-casted LoadLibraryW
function (kawCreateThreadEx
):
We're almost done. Argument
is the next parameter, and though it's named differently, it's just going to be the rBuffer
that we've created since that will be holding the dllPath
in it at this point.
The next couple of parameters are all going to be NULL
or 0
.
CreateFlags
-> FALSE
ZeroBits
-> NULL
StackSize
-> NULL
MaximumStackSize
-> NULL
AttributeList
-> NULL
After this function executes, we can do a test to see if the NTSTATUS
value returned by it is STATUS_SUCCESS
or not. Then, we can finally finish up and begin cleanup:
With that, we're finally finished programming our injection program. You can find these files below or in the GitHub repository.
After compiling the program, we can test it out by injecting a humble notepad or paint process. Also, note that because this is just a glorified DLL injection, the caveats we've previously mentioned in the DLL Injection blog post, are still present here - mechanics-wise. The only thing we've slightly improved here is the stealth factor. Although, again, not by much. Better than nothing, but definitely a long way from perfection as well.
Nice. Now, if we wait for the thread to finish (which will only finish after we've pressed OK
), the program will begin cleanup:
There we go! You've just successfully replaced one (1
) Win32 API function with its NTAPI counterpart from NTDLL. In the next blog post, we'll replace all of the Win32 API functions with their Native API counterparts. Until then, stay safe. Keep on hackin'