Shellcode Injection
June 4th, 2023
Last updated
June 4th, 2023
Last updated
Pork is airborne and hell hath frozen over, the second installment of our malware development series is out! In it, we learn about shellcode injection, and as a little bonus, DLL Injection as well.
Once again, I'm just a nerd normal dude trying his best to learn. I'm not claiming to be some sort of expert at programming or developing malware. Please excuse the outlast levels of clinically insane coding practices I might subject your eyes to. I'm constantly learning and trying to improve, and as such, my coding practices, as a function of time, will get better and better the longer I do this. Hopefully, the blog and the GitHub repository will reflect this.
This technique is as vanilla as it gets. It is by far, , but it’s also quite elegant, don’t get me wrong. The general steps for a shellcode injection, are the following:
Get a handle on a process by attaching to, or creating one.
Allocate a buffer in the process memory with the necessary permissions.
Write the contents of your shellcode to that buffer in the process memory.
Create a thread that will run what you've surgically allocated and written into the process!
For this technique, we're sticking with the Win32 API, which at this point, you should be at least a little familiar with. If not, fret not. See the post below to get started:
Eventually, we'll get a bit more advanced in our craft but until then, we'll stick to using Win32 API. Just for now. Let's get started! We'll start by looking at which API calls we'll need to rip and tear into this technique... until it is done.
All of the documentation for these functions, as well as the entirety of Win32 API, can be found on Microsoft's own documentation pages (commonly referred to as the "MSDN"). Remember that Win32 API, is well-documented, meaning that if you have questions about what something is doing within a function or program, more times than not, you'll be able to find the answer within the docs itself.
On the flip side, I know how daunting this incredible resource is when you first start. However, I promise that if you take the time to actually read it, you'll really come to appreciate this resource.
The most common calls you might end up seeing for this technique are something like the following (in their respective order):
Again, all of this probably looks really alien if you're just starting out, but worry not, I'll be holding your hand for the setup of this program. It's at this point I'd like to discuss the different kinds of compilers, IDEs, and all of that. I'm going to be programming in Visual Studio; I'll also just be using the MSVC
compiler to compile my program.
Whichever IDE you use, shouldn't matter in the slightest. However, the way you compile this program definitely does. We'll get more in-depth into why it's important in the "Common Pitfalls" section. For now, just follow my lead. We start by making a new project in Visual Studio, and then you can create a C, C++ file. I'll make a file called crowinject.cpp
, which will house the following contents for now:
In the video, I made a C++
file, but funnily enough, I only ever make C++
files just to fill them with a majority of standard C
code. So, soon I'll be using more and more actual C++
in the next blogs. Also, during the time rewriting this blog, I've also learned some new "best"-practices, and as such, I'll be making my code reflect that. Moreover, the code in the video, and the code from this blog will look a bit different now but you'll live.
Here, we're including the Windows header (<windows.h>
) into our program, which will let us use the Win32 API. Which, if we remember, is just an interface that allows us to talk to the OS. I started using because I really like making my code verbose. Perhaps more verbose than it should be. They're both defined in the <stdlib.h>
library and it's just a glorified way of saying 0
for success, 1
for error.
Let's compile this, just to make sure everything's working.
Remember to compile a x64-bit program, if your target process is x64-bit. Otherwise, you will run into issues, as we'll see in more detail in the common pitfalls section.
After compiling the program, we can run it from the command line, or we could've just pressed Ctrl+F5
to start without debugging, which automatically compiles and runs your program. So, after doing that, we get our expected output:
In the video, I defined the variables (like hProcess, hThread, PID
, etc.) in the global scope. This is actually not good practice as I've learned; it's better to have the variables defined in the function scope otherwise it'll come back and haunt us in the future. Also, in the video, I had mentioned the Hungarian notation that Microsoft uses for its' naming convention; but some of my variables weren't following the naming convention, while others were.
So, I'll try to omit this cherry-picking and just follow the Hungarian naming convention a bit more strictly from now on. We'll make sure that the program was supplied with an argument for the PID, if not, we'll have it error out with the usage:
We see some familiar data types and variables (assuming you've gone through the first video). We see some HANDLE
types which we've assigned to the hProcess
and hThread
variables. We've created some DWORD
types which we've assigned to the PID
and TID
variables. We'll come back to the rBuffer
in a bit, but let's continue for now. We're checking to see if the program has been supplied with an argument for the PID to attach to.
If we don't take in a PID from the CLI, we'd have to change it in the source code every single time and recompile it over and over again. And frankly, I can't think of a better example of unhinged masochism. After we get an argument for the PID, we convert it into an integer type since PIDs are numbers. Moreover, on Windows, PIDs are always multiples of four (4
). Not important here, but still pretty cool to know. At this point in our code, we're going to try to get a handle on our target process.
As you may know by now, we're going to be using the OpenProcess
function to get a handle on our process.
The easiest way to grasp what this function does is by reading the "Return value" section of this function. You can find it below:
From this section, we can see that if OpenProcess
succeeds, it will return an open handle to the specified process, which is what we're going to make our hProcess
variable hold; hence why it was important to declare it as the HANDLE
data type. If it fails, it will return NULL
. Because of this, we can set up some pretty cool error handling for our program as we'll see soon. Let's look at the arguments this function expects:
DWORD dwDesiredAccess
BOOL bInheritHandle
DWORD dwProcessId
The first argument is where we specify the access rights we'd like to have on the target process. There are various access rights that we could specify, which we can see below:
You can read more about them here:
I'll still try my best to explain what these are and why we need them, so don't worry. Basically, these process access rights determine what exactly we're allowed to do to a process. Remember the steps of this technique and how I said we'd have to allocate and write some memory within the processes' memory? For us to be able to even do that, we'd at minimum need:
As you can see, because we're trying to tinker with the address space of the process, using functions like VirtualProtectEx
and WriteProcessMemory
, we'd have to supply this access right. Is that it then? Is that all we supply in this argument? PROCESS_VM_OPERATION
? Well... not quite. You see, these rights are extremely particular. Sure, you'll be able to allocate and write to the process memory, but how do you expect to create a thread to run your payload without an access right like:
Not to mention other rights like being able to query information about the process (PROCESS_QUERY_INFORMATION
), suspending or resuming it (PROCESS_SUSPEND_RESUME
), etc. It's because of all of these little things and rights, it's easier for us to just specify an access right like PROCESS_ALL_ACCESS
.
Although, note that it's generally best practice to give yourself the least amount of rights in order to do something. It's also safer that way and generally regarded as the best practice for dealing with things including rights and privileges. Since what we're trying to do is quite beefy and requires various different little access rights, we'll just supply PROCESS_ALL_ACCESS
as our argument here to avoid that headache.
Now, let's get on with that second parameter, bInheritHandle
. This parameter is just a boolean that specifies whether we'd like to inherit the handles created by our process; i.e., if our process creates another process, do we want to inherit that handle of the newly created process? We'll set this to FALSE
, since we don't really care about this right now:
Lastly, the dwProcessId
argument is the PID of the process we'd like to open a handle to. We've already created this variable so let's just supply it here:
Et voila! We've set up this portion of the code and we can now work on some error handling, as mentioned before! Since we know that this function returns NULL
on error, we can write the following:
I've also introduced a new function here, GetLastError
. One of my favourites. It's so simple but it provides so much information. Let's try an example (if you don't care, or already know what this function does, click here to move to the next portion). The GetLastError
function is defined thusly:
We can see that if a thread errors out, this function grabs the error code corresponding to that specific error. Let's try to supply a PID to our program which obviously wouldn't ever exist and see what this program spits out.
We can see that the program spits out an error with the following value: 0x57
. "What the hell does this mean? What do we do with this?" I may hear you ask. This value or any of the values outputted here, are system error codes. Furthermore, from the following page:
We can see that there are a ton of these. What we'd do now, is just cross-reference the error code we got, with this neat little section and we can quickly figure out what went wrong based on the error code! Let's take that 0x57
value from our program and see what's going on.
We can see that our value of 0x57
is telling us that we've supplied an invalid parameter! That's so much information given to us! Now, you can also print this out as a decimal by changing the format specifier to %ld
. I personally like the way that hexadecimal looks a bit more, but again, all up to you. Let's try one more example, where we try to get a handle on an elevated process. Something like the system process with PID 4
:
We get an error code of 0x5
. If we look this up in the error code catalogue, we can see that this tells us we don't have the necessary permissions in order to open a handle to this process:
Cool! You now know what these error codes are and how to help yourself debug much easier.
GetLastError,
as awesome as it is, won't work in all situations. For instance, when you're working with the lower-level NT API from NTDLL, that region of error handling is done through the actual NTSTATUS
codes themselves.
Here's where we're at right now:
We can see that if we supply a legitimate PID to an actual process, the program spits out the handle we got from it. Now, we need to allocate a region of memory to our target process, and we can do this with the ever-so-popular, VirtualAllocEx
function. Before doing that, we need to set up some variables since VirtualAllocEx
will be expecting them.
We're setting up our shellcode here, as well as the size of it. If we try to inject this into our process, it will kill it. This isn't valid shellcode, and wouldn't do anything - and as such, the process will crash. We'll come back to creating the shellcode when the time comes, but for now, let's start setting up VirtualAllocEx
.
The first parameter is a handle to our process. Our hProcess
variable is currently holding the return value from OpenProcess
, which again, is just an open handle to our target process. So, we can just put in hProcess
for this argument.
The second parameter, i.e., the lpAddress
is an optionally-inputted argument for this function. It's just a pointer that specifies the starting address for the region of pages that we'd like to allocate. If we set this to NULL
, the function will determine where to allocate the region. Therefore, we'll let the function drive itself home for this part.
The next argument, dwSize
, is where we specify the size of the region of memory that we'd wish to allocate. This is the size of our shellcode from earlier. So, let's populate this argument as such:
Next up, we have the flAllocationType
. This is the type of allocation we'd like to do.
For our cases here, we just want to be able to reserve some space (MEM_RESERVE
) and then we'd want to be able to actually commit that memory (MEM_COMMIT
). So, let's add them both.
Last but not least, we have to select the memory protection that we want our allocated memory to have. From the documentation:
As is the case with most of Microsoft's stuff, there are a lot of things for us to choose from here. However, we need to remember the basics. We're going to be giving ourselves PAGE_EXECUTE_READWRITE
(RWX
) for our shellcode. If we don't have the execute permissions, it's like the whole nightmare of dealing with NX/DEP
. Our shellcode won't be of any use to us if we can't execute it.
Remember that a random buffer which is randomly allocated to your process memory with full RWX
permissions can look extremely suspicious. There are some techniques in which a function like VirtualProtect
gets used. With VirtualProtect
, what would happen is something like the following: You allocate a region of memory with minimal permissions initially (something like RW
), and then change those permissions (to something like RX
) denoted by the flNewProtect
argument supplied to this function.
With that done, we've allocated our buffer at this point! This means we're now ready to actually write the contents of our shellcode, into our recently allocated buffer inside of the process memory.
Here's where we're at right now:
Let's try running this just to make sure we're getting the expected output.
Nice. Now we can finally write the contents of our shellcode into this recently created buffer. In order to do that, we utilize the WriteProcessMemory
function.
The first parameter is the handle to our process,hProcess
.
This second parameter (lpBaseAddress
) is the rBuffer
that we've created and allocated to the process memory. As we can see from the documentation:
The lpBuffer
is the next parameter. This is where we specify the actual contents of our shellcode. Earlier, you heard me say that the shellcode we have currently, would shred our process memory and cause it to crash. Well... why hasn't that happened yet? It's because VirtualAllocEx
isn't the same thing as writing the contents of your payload into the memory. This is why we're able to allocate this memory without our program crashing.
For those who are new, a tip to help you think about VirtualAllocEx
and WriteProcessMemory
would be like the following:
Think of the buffer that you create with VirtualAllocEx
as a canvas. You defined how big it is, what permissions it has, the memory allocation type, etc. Then, you can think of WriteProcessMemory
as the step in which you actually write whatever (or paint whatever in this analogy) to that allocated buffer.
The nSize
argument is the size of our shellcode, which we've already defined as crowPukeSize
(so sorry for these naming conventions):
And lastly, we have an outputted parameter called lpNumberOfBytesWritten
. This just stores the number of bytes we've written in the memory region. You can choose to add this if you want, we'll just set it to NULL
, which will just cause the parameter to be ignored.
And just like that, we've set up the WriteProcessMemory
function. Let's add in a quick little print statement indicating such.
And now, if we try to run this, we can see the following output:
All that's left for us is to create a thread to run our payload!
In this section, we're going to be creating a thread with the CreateRemoteThreadEx
function. If we take a look at the return value of this function, we can see that it's practically the same thing as our OpenProcess
function; except in this case, we're dealing with threads.
Because we know that this function returns a handle to the new thread, we'll make our hThread
variable hold this return value:
Let's look at the syntax for this function.
Now, I know. There are like 12 duovigintillion parameters for this function. However, fret not - most of them are going to be zero (0
) or NULL
. We know the drill by now, we'll fill out what we know, and then consult the documentation for what we don't know.
The lpThreadAttributes
, as we can see from the documentation, is just a pointer to the SECURITY_ATTRIBUTES
structure. This is just to specify a security descriptor for the new thread; also determines if child processes can inherit the returned handle. If we set this to NULL
, the thread will get a default SD
(security descriptor) and the handle cannot be inherited.
For the dwStackSize
argument, we can set it to 0
to let the thread use a default stack size for the executable.
This next section is going to take a bit to explain, but it's pretty cool, nonetheless. So, I'll write out the code here, and then we can explain what's going on here.
Okay, just relax. I know your heart rate just quadrupled, and you can practically ski on the clumps of hair you've pulled out of your scalp from seeing this random line of code seemingly come from nowhere, but just relax. We'll figure this out. So, firstly, let's discuss the parameter itself, before delving into what we're supplying as an argument. Let's consult the documentation.
This parameter is where we specify a pointer to the starting address of what we'd wish to run. We want execution to begin at the buffer that we've created, which at this point, would've had the contents of our shellcode written into it, and we typecast this buffer to the LPTHREAD_START_ROUTINE
to match the signature of this parameter. For the next parameter (lpParameter
), we can just set it to NULL
since we don't have any variables that we're passing to the thread function (lpStartAddress
).
The next section is the creation flags we'd wish to specify for our thread. dwCreationFlags
could be any of these values:
We see that if we supply 0
here, the thread will run immediately after creation. The CREATE_SUSPENDED
flag could also be a cool thing to mess around with, but that's left as an exercise for the reader. We'll supply 0
here since we want our thread to run immediately.
We only have 2 arguments left to supply, we're almost there! The second last parameter of this function, lpAttributeList
contains additional parameters for the new thread. We don't really care about this for now, so we can just set this to zero (0
):
And lastly, the final parameter (lpThreadId
) is where we can set a pointer to the variable that will receive the thread ID (TID
) of the newly created thread. So, let's set this to the dwTID
variable we created when we defined dwPID
.
So, at this point, we could run our program, and we'll see that the program will inject into our target process, but because of the fact that we're using gibberish as shellcode, the program crashes:
So, what we'll do here - is firstly, add in some more debugging lines for verbosity. Secondly, we'll generate some valid shellcode from msfvenom
and try to perform the injection, for real.
I'll be using my Kali machine to generate the shellcode, literally doesn't matter what OS you use; we're only interested in one tool for now, msfvenom
. You could create your own if you want. PIC
shellcode has been pretty huge recently, but we're going to take the easy route for now, and just generate our own.
msfvenom
, it's shellcodes, stagers, all of that stuff; has been signature'd (and blasted) to high-oblivion - there are peaks on Mars that are smaller than how high this has been blasted by virtually every defensive solution out there. Meaning, that if you decide to use this shellcode without encrypting it, your payloads will get flagged by Defender during the compilation process - and the probability that your target flags it as well is considerably high. This is the reason for us setting an exclusion path, or turning off Defender. We'll be discussing ways to bypass Defender in the later sections, but for now, this is how we're doing it.
I did say I was going to generate calc.exe
shellcode for our injections, but since this is our first time, let's live a little, eh? I'll run the following command:
For the 100th time, remember to make the architecture of your shellcode match the architecture of your injection program. So, let's fix up our injection program with this as our payload, and after that, we'll set up the multi/handler
listener needed to catch the callback for this reverse shell.
Now, we're all set to execute our program.
We recompile the program, and after specifying a valid PID to our injector, we can see the results:
Furthermore, if we close the meterpreter session:
We can see that the WaitForSingleObject
function we were using, successfully notes that our thread has finished executing!
Because we've got a reverse shell from our notepad.exe
process, we'll see in the Modules
tab within this amazing tool, Process Hacker 2:
There's an entry of some "networking"-related stuff in this process; which under normal circumstances, it would never do. This would be an insanely suspicious IOC (Indicator of Compromise), since why would Notepad ever need something like sockets or things to do with networking in it? If we look at the "Threads" tab within Process Hacker, we can see our newly created thread in the list:
If we double-click on the one in Process Hacker, we can see some peculiar stuff on the thread stack:
Or, better yet, in the "Modules" section of the program, we can see this ws2_32.dll
and mswsock.dll
loaded into the modules:
Now, what would a socket library be doing in our humble Notepad process? You can see this mentioned in one of the holiest resources for malware development:
And there you go! You've come such a long way and you've learned so much! Seriously, you should be proud of yourself for making it this far. We'll now discuss some common pitfalls that can prohibit you from emulating this attack. You can find the source code of this program attached below, or on the GitHub repository that'll house all the code that we end up making in these blogs/videos.
A crazed lunatic once messaged me; the message showed her following along with this guide and performing her own shellcode injection. "Yippie!", I thought to myself. But alas, the message continued, and with it, my plight:
"Strange..." I pondered the state of her process's memory, which looked like it had been injected with the generated payload:
A million different scenarios went through my head. "Could it be Defender?", "Could it be the payload?", "Could it be something to do with the build/version of Windows?", "Could it be the program itself?", etc. It turns out that she was using the same build of windows as me, so that's out of the question.
And even so, we're using higher-level API so the build and version shouldn't even matter. The source code wasn't the issue either since at one point she'd tried some code that I knew would work (super simple XOR encryption to bypass Defender). I gave her some code that had XOR-encrypted shellcode - since I wanted to test to see if it was an issue with Defender as well. Turns out, nope. Window's Defender wasn't even triggered during the compilation of the program, so that's out the window.
Moreover, even with the XOR-encrypted shellcode, after the injection was run, there still wasn't a new process to show for it. So, eventually, we took a break. Then, at the time of writing this blog post, April 30th, 2023, I got yet another message about someone facing a similar issue. Their program, just for the life of them, would not spawn a new process; even though it seemingly did inject it into the target process's memory.
The user had told us that they were compiling with gcc
like this:
And because of this, I thought that maybe they were using wide API functions, i.e., SomeFunctionW()
but failing to include the -municode
flag for compilation. So, after consulting some amazing, amazing friends of mine, the culprit of architecture was brought to light.
It was at this very moment that my eyes opened up wider than they ever have. I could see individual fermions whizzing past my eyes, I could grab clouds, I could smell numbers, taste vision, etc. How could we forget this? It was so painfully obvious, you must compile your program for the architecture that you wish to target; and with this, your shellcode must follow the same harmony. Remember that x86
shellcode != x86_64
shellcode. You also can't have your 32-bit program injecting into a 64-bit process. Anyways, we sent our newly-crested warrior out to compile their program as a 64-bit program and we patiently awaited their response.
Perfection. We had finally conquered the bug that plagued us for so long, with nary a trail for us to even follow it. Congratulations, l0n31yMC
. May the offensive security gods bless you on the rest of your trek.
If you're using the wide functions, make sure your compiler is taking care of that. Either specify some preprocessor symbols in your program like _UNICODE/UNICODE
(Visual Studio/Code does this automatically), or specify a flag to do the same; for instance, with gcc: gcc -municode ....
Make sure you compile the program to be the same architecture that your target process is. I.e., if you've compiled a 32-bit binary, you can only inject 32-bit processes. Furthermore, make sure your payload/shellcode is also architecture-aware. If you're compiling a 64-bit binary, use 64-bit shellcode.
Thank you to the amazingly beautiful wizards, @bakki and aqua for helping debug this stupidly simple oversight. I love you guys. And remember guys, sometimes, the solution is far simpler than you think it is. See you in the next section. Also, thank you to everyone who's given me constructive criticism of the site as a whole, the code, the videos, etc.
Again, at the time of rewriting this blog, June, 4th, 2023, it's only been about 3 months since I started my malware development journey. Therefore, there's obviously still an incredible amount of stuff that I don't know. , I appreciate all of you for being patient with my ignorance and teaching me new things to become better and better.
So, we're allowed to specify any one of the memory protection constants, huh? Let's go give them a visit and see what we're allowed to supply here . You can find these memory protection constants below:
The WaitForSingleObject
and CloseHandle
functions are also going to be left as an exercise for you to learn about. It will be really fun for you to seek out what these functions do and learn about them, they're pretty straightforward from the name, but regardless, for these two functions; you're on your own