# The Heap

<details>

<summary>Table of Contents</summary>

* [The Malloc Maleficarum](#the-malloc-maleficarum)
* [Dynamic Memory Allocation](#dynamic-memory-allocation)
* [API](#api)
  * [Allocating Memory](#allocating-memory)
    * [User Data Chunk](#user-data-chunk)
    * [Heap Chunk Header](#heap-chunk-header)
    * [The Top Chunk](#the-top-chunk)
  * [Freeing Allocated Memory](#freeing-allocated-memory)
* [Heap Chunk Incrementation](#heap-chunk-incrementation)
  * [Reallocating Memory](#reallocating-memory)
* [References](#references)

</details>

## The Malloc Maleficarum

> *"In late 2001, 'Vudo Malloc Tricks' and 'Once Upon A free()' defined the exploitation of overflowed dynamic memory chunks on Linux. In late 2004, a series of patches to GNU libc malloc implemented over a dozen mandatory integrity assertions, effectively rendering the existing techniques obsolete. It is for this reason, a small suggestion of impossiblity, that I present the Malloc Maleficarum."* - **Phantasmal Phantasmagoria**&#x20;

The [Malloc Maleficarum](https://dl.packetstormsecurity.net/papers/attack/MallocMaleficarum.txt) is a masterpiece, a grimoire of techniques, a true cornerstone of heap exploitation. <mark style="background-color:orange;">It was written by an equally fascinating</mark> <mark style="background-color:orange;"></mark>*<mark style="background-color:orange;">sorcerer of exploitation</mark>*<mark style="background-color:orange;">,</mark> <mark style="background-color:orange;"></mark>*<mark style="background-color:orange;">Phantasmal Phatasmagoria</mark>* <mark style="background-color:orange;"></mark><mark style="background-color:orange;">in 2005</mark>. The reason for its creation was that during the end of 2004, `glibc malloc` got hardened. Moreover, the techniques that were being used back then, like the [`malloc unlink exploit`](https://heap-exploitation.dhavalkapil.com/attacks/unlink_exploit) became deprecated. Phantasmal's work documents various techniques (categorized in "*houses*") that can be used to successfully exploit the heap using `malloc`, hence `Malloc Maleficarum`.&#x20;

{% hint style="info" %}
I apologize for this tangent but I think it would be pretty cool to cover the etymology of the words present in the Malloc Maleficarum. It certainly warrants that curious "tilt of the head." it's just so cool to me, the name, the technique names, the topics at hand, it's like a real-life Skyrim [Oghma Infinium](https://elderscrolls.fandom.com/wiki/Oghma_Infinium_\(Skyrim\)). Anyways, just expand the expandable below if you want to learn the meaning of the name! &#x20;
{% endhint %}

<details>

<summary>Etymology of "Maleficarum"</summary>

So, after searching for the word "maleficarum", I was presented with a couple of hits to a book of [yore to complete this word-searching lore](#user-content-fn-1)[^1]. The book was titled "[**Malleus Maleficarum**](https://en.wikipedia.org/wiki/Malleus_Maleficarum)" which gets translated to "**Hammer of Witches.**" The book was known for its treatise on witchcraft. It was at this point that I decided to research the latter word since it appeared that Malloc Maleficarum was a play on the words "Malleus Maleficarum". You know\... Malleus... *Malloc*... yeah. Moving on. After consulting some etymology sites, I finally got what I was looking for.\
\
\&#xNAN;*<mark style="background-color:orange;">"</mark>*[*<mark style="background-color:orange;">**Maleficarum**</mark>*](#user-content-fn-2)[^2]*<mark style="background-color:orange;">" is actually the feminine plural of the word "</mark><mark style="background-color:orange;">**Maleficus**</mark><mark style="background-color:orange;">."</mark>* Maleficus is derived from the Latin words, **male** (meaning "*badly*" or "*wrongly*"), and the suffix "**-ficus**" (which denotes "*making*" or "*doing*"). Therefore, in this context, it can be implied that the techniques present in the Malloc Maleficarum are intended to cause harm or damage to the program/system being targeted!

</details>

There are many houses/techniques of exploitation covered in the Malloc Maleficarum including (but not limited to, as we'll see later on):

* [ ] The House of Prime
* [ ] The House of Mind
* [x] [The House of Force](/nest/binexp/heap/house-of-force-i.md)
* [ ] The House of Lore
* [ ] The House of Spirit
* [ ] The House of Chaos
* [ ] ...

{% hint style="warning" %}
Please note that there are many more houses of exploitation out there. This isn't an all-inclusive list but I'll add more and more above the more I do. Also, the necessary details of <mark style="background-color:orange;">each technique above will be documented in the page that I create solely for that technique</mark>, to avoid this singular page getting a Guinness World Record for the world's largest blog post, of course.
{% endhint %}

## Dynamic Memory Allocation

Whenever your program needs memory dynamically allocated/deallocated to it, it will generally go to the **heap**. The heap is a segment of memory that's managed by an OS or a programming language's runtime system. Speaking of programming languages, most of them (the modern ones at least) actually already dynamically allocate and free memory. Lower-level languages like C, C++, etc. still require you to do manual memory management. If you don't a lot of bugs, vulnerabilities, etc. may arise - which is *great* for us as hackers; but not so great for developers, potential users, or what have you. As you should know by now, the stack grows downwards. The heap, however, grows upwards. Furthermore, the heap isn't a linear data structure like the stack is; no no no, it's actually hierarchical. There are also multiple parts in play with a procedure like dynamic memory allocation. We have things called arenas, heaps, and chunks.

* `Arenas`: Is the actual memory assigned to a thread and can contain multiple heaps. A main thread's arena is called the **main arena**. It also doesn't have multiple heaps.
* `Heaps`: These are called contiguous regions of memory which can be subdivided/chopped up into multiple "chunks." A heap belongs to a single arena.
* `Chunks`: These are parts of the heap that can actually be allocated, freed, reallocated, etc. Furthermore, a chunk exists in one (`1`) heap and belongs to one (`1`) arena.&#x20;

I absolutely *love* this diagram, from an incredible source of knowledge on the topic of heap exploitation, [Ch0pin](https://valsamaras.medium.com/):

<figure><img src="/files/AuMvfP9DwRvu6PjusIAr" alt=""><figcaption><p>"Arena, Heaps and Chunks in a Parking lot context", taken from <a href="https://infosecwriteups.com/the-toddlers-introduction-to-heap-exploitation-part-1-515b3621e0e8">Ch0pin's blog post</a></p></figcaption></figure>

## API

Hey, remember how we've been delving into some [Malware Development](/nest/mal/dev.md) and how we've been using the Windows API (and even the native API) to do our exploits? Yeah, well, `glibc` offers its own API which lets us do three (`3`) main things when it comes to dynamic memory. Allocating memory with `malloc`, freeing it after use with `free`, and then reallocating memory (if needed) from the *now free* chunks using `realloc`.

### Allocating Memory

With the use of the `malloc` function, we can request the number of bytes we want to allocate.

```c
void *p = malloc(size);
```

From the `man` pages, we can see the following:

{% code overflow="wrap" %}

```
(...)

The malloc() function allocates size bytes and returns a pointer to the allocated memory. The memory is not initialized. If size is 0, then malloc() returns a unique 
pointer value that can later be successfully passed to free().

(...)
```

{% endcode %}

<mark style="background-color:orange;">A thing we'll see is that no matter how much we actually request from</mark> `malloc`<mark style="background-color:orange;">, we'll end up getting a bit more</mark>, which we'll discuss. First, let's create a super simple C-program that'll just use the `malloc` API in order to request `1 byte` of memory from the heap.

```c
#include <stdlib.h>

int main(void){
    void *p = malloc(1);
    return 0;
}
```

Obviously, <mark style="background-color:orange;">you shouldn't just allocate memory and then exit the program before freeing it</mark>, but for our example, and for the purposes of learning; let's compile this thing to tinker with it. After compiling the program (with the flag for GDB: `-g`, so it's a better debugging experience), we can open it in a debugger and set a breakpoint right on the call to `malloc`:

<figure><img src="/files/DZEr1VJI3Ab5e7s8D0CA" alt=""><figcaption><p>Disassembly of <code>main</code>, breakpoint set on the call to <code>malloc</code></p></figcaption></figure>

Let's run the program and wait until we get the breakpoint:

<figure><img src="/files/zdEq6I0dP4Ir7U7sFuDu" alt=""><figcaption><p>Breakpoint hit</p></figcaption></figure>

If we inspect the virtual memory of this process with the `vmmap` command, we'll see that in this moment of time, *in this moment of the program's execution*, there isn't even a heap section (blue, as seen in the `LEGEND`) present within the target's virtual address space:

<figure><img src="/files/XK7OfRdd51GpRpkve3aH" alt=""><figcaption><p>No heap segment </p></figcaption></figure>

Furthermore, we can also verify this with the following commands:

```bash
pwndbg> heap
heap: Heap is not initialized yet.
pwndbg> vis_heap_chunks
vis_heap_chunks: Heap is not initialized yet.
```

However, if we move forward with one (`1`) instruction using the `next`  (`n`) command, we can see that we get a heap segment and that we can now view the heap chunks.

<figure><img src="/files/wjgpiqi6kMPkCpud8aEU" alt=""><figcaption><p>Next instruction after breakpoint</p></figcaption></figure>

<figure><img src="/files/F5v4gSAYqlMr8RmPISVi" alt=""><figcaption><p>Heap segment initialized in process's virtual memory </p></figcaption></figure>

{% hint style="info" %}
The heap region is typically available for use from the moment a process starts up, <mark style="background-color:orange;">but it is not initialized or mapped to physical memory until the first memory allocation request is made</mark>, which would be after our call to `malloc`.
{% endhint %}

We can see that the heap segment is now there and that its size is `0x21000` which is around `~135k` bytes:

<figure><img src="/files/HqWFLfUIUYUKeI3hFsEy" alt=""><figcaption><p>Size of the heap</p></figcaption></figure>

We could go further and actually display the heap chunks with `vis_heap_chunks` or `vis`

<figure><img src="/files/l2fWF0yr362tlpndYVf3" alt=""><figcaption><p>Start of heap (orange)</p></figcaption></figure>

This output is a little borked right now, but basically, the thing that we need to understand is the start of the heap section; which is at the address `0x555555559000`. You can ignore the blue section, it seems to be some allocation that automatically occurs and inspecting it reveals that there is quite a lot that gets allocated.  Oh well, the show must go on. So, The next section, in pink, is the part where we've actually allocated our memory!&#x20;

<figure><img src="/files/pIAlPLGNu5tmAIM493zc" alt=""><figcaption><p>Our <code>malloc()</code> allocation (pink)</p></figcaption></figure>

#### User Data Chunk

The region of our user data (which is the area of the memory that our actual data would go) starts at the address:  `0x5555555592a0`. We can prove this by looking at where we are in the code and printing the value of the variable that we assigned to the `malloc` function after it's been called and the heap's been initialized. The variable, which we know from the source code, is `p`:

<figure><img src="/files/zjCIfpMEPgkj4gNB9JQg" alt=""><figcaption><p>"<code>p</code>" variable holds the beginning address of the user data on the heap chunk</p></figcaption></figure>

```nasm
(...)
0x555555559290	0x0000000000000000	0x0000000000000021	........!.......
0x5555555592a0	0x0000000000000000	0x0000000000000000	................
0x5555555592b0	0x0000000000000000	0x0000000000020d51	........Q.......	 <-- Top chunk
pwndbg> print p
$1 = (void *) 0x5555555592a0
```

We're given three (`3`) quad words at `8 bytes` each, which is **`24 bytes`** of user data. *Hold on a second...* <mark style="background-color:orange;">we literally only requested a single byte from</mark> `malloc`<mark style="background-color:orange;">. How come we're given so much more than that? Turns out, this is actually the</mark> [<mark style="background-color:orange;">lowest chunk size for user data that we can get</mark>](#user-content-fn-3)[^3] <mark style="background-color:orange;">from</mark> `malloc`.  Even if we create a script like the following:

```c
#include <stdlib.h>

int main(void){
    void *p = malloc(1);
    malloc(0);
    return 0;
}
```

We'll see that even if we request zero (`0`) bytes from `malloc`, it'll still give us a chunk size of `24 bytes` for our user data:

<figure><img src="/files/ehDTz9hRKiT4udH0eXOq" alt=""><figcaption><p>Still get <code>24 bytes</code> of user data!</p></figcaption></figure>

I also compiled the same program for 32-bit architecture just to see the minimum size that `malloc` would allocate for us. After reaching the breakpoint of the `malloc(0)` command, we can see that we received three (`3`) double words which are four (`4`) bytes each, for a total of `12 bytes` of user data:

<figure><img src="/files/a936Fov3aoX0sDen9nLs" alt=""><figcaption><p><code>12 bytes</code> of user data, minimum <code>malloc</code> chunk size in 32-bit architecture</p></figcaption></figure>

#### Heap Chunk Header

There's this weird section here in the first part of the allocation chunk, before our user data: `0x0000000000000021`. What does this mean? If we take this value `0x21` and turn it into a decimal:

```nasm
pwndbg> print /d 0x21
$1 = 33
```

We can see that this is 33 bytes. To solve the mystery of all of these extra bytes, let's consider what we have so far:

| 0x0000000000000018 (24 bytes) |
| ----------------------------- |
| `24 bytes` of user data       |

{% hint style="info" %}
We have `24 bytes` of user data which means that `24` of those `33` bytes are accounted for. To figure out where the rest of those `9 bytes` come from, keep on reading.
{% endhint %}

These extra bytes in the header are for bookkeeping/metadata. Just like how the stack has some data like base pointers, return addresses, etc. stored on itself, the heap stores its metadata in itself as well. The chunk header (`0x0000000000000021`) stores information like the size field - which stores the sum total of bytes that make up the chunk; *including* the size field itself. So, what about those mysterious extra nine (`9`) bytes?&#x20;

| 0x0000000000000020 (32 bytes)   |
| ------------------------------- |
| `24 bytes` of user data         |
| `8 bytes` for the header itself |

We had `24 bytes` of user data, and the chunk header itself is another quadword (`8 bytes`) which brings us to `32 bytes` or `0x20`.&#x20;

However, we're not seeing `0x20`, we're seeing `0x21`, what's up with that extra byte? Without getting too far into it, that least significant byte (the rightmost byte (`1`) inside of `0x21`) is just denoting that the previous chunk of lower memory adjacent to it is in use. From the source code of `malloc.c`, we can see that this bit gets set:

{% code overflow="wrap" %}

```c
(...)
/* size field is or'ed with PREV_INUSE when previous adjacent chunk in use */
#define PREV_INUSE 0x1
(...)
```

{% endcode %}

There can also be other bits that get set instead of `0x1` as well; which we can see if we look at an amazing diagram, like the one from [Azeria Labs](https://azeria-labs.com/heap-exploitation-part-1-understanding-the-glibc-heap-implementation/):&#x20;

<figure><img src="/files/Q1ExNktRAniWeApqaYbT" alt=""><figcaption><p>"Allocated Chunks" - <a href="https://azeria-labs.com/heap-exploitation-part-1-understanding-the-glibc-heap-implementation/">Azeria Labs</a></p></figcaption></figure>

With that now explained, we can understand why we're given `33 (0x21) bytes` in the heap chunk header!

| 0x0000000000000021 (33 bytes)   |
| ------------------------------- |
| `24 bytes` of user data         |
| `8 bytes` for the header itself |
| `PREV_INUSE (0x1)` bit set      |

Great, we've solved the mystery of our heap chunk header and the user data section. Now all that's left is to explain the last section.

#### The Top Chunk

This part of the heap memory is the easiest part to understand. The last address that we see from something like the **`vis`** command is just the top chunk of the heap memory. It's simply just the amount of bytes left that can be allocated. If you recall, when the heap first got initialized, it was initialized with a size of `0x21000` bytes, which if we turn it into a decimal:

<figure><img src="/files/sjnCzZALm5XNrblo8XiK" alt=""><figcaption><p>Heap size (~135k bytes)</p></figcaption></figure>

Is around `135,000` bytes. Now, as we start allocating memory with `malloc` we'll see that this top chunk changes its value to reflect the amount of bytes that have been used and more importantly, the number of bytes that are left for allocation.&#x20;

<figure><img src="/files/bzaAepA1D4CM4jZ5nqDP" alt=""><figcaption><p>Heap memory left for allocation</p></figcaption></figure>

So far this has been extremely intuitive! Isn't the heap *amazing?* :heart::smile:

### Freeing Allocated Memory

With the use of the `free` function, we can free the previously allocated memory from `malloc`.

```c
free(p);
```

So, let's use the same script from before, except after allocating `24 bytes` on the heap, we also free them after using the `free` function:

```c
#include <stdlib.h>

int main(void){
    void *p = malloc(24);
    free(p);
    return 0;
}
```

We do the same ol' and get to the point after the call to `malloc` such that the heap chunk is visible:

```nasm
(...)
0x555555559290	0x0000000000000000	0x0000000000000021	........!.......
0x5555555592a0	0x0000000000000000	0x0000000000000000	................
0x5555555592b0	0x0000000000000000	0x0000000000020d51	........Q.......	 <-- Top chunk
```

Now, if we move forward one (`1`) instruction, we'll have called the `free` function on this allocated memory which should have unallocated it.&#x20;

<figure><img src="/files/8gynvUPMYbNO2P1ihX7v" alt=""><figcaption><p>Unallocated memory with <code>free</code> command</p></figcaption></figure>

I know it looks a bit weird right now but `tcache` would be an entire blog post on its own so I'll cover it in more detail at a later time. The important thing to know is that the memory should've now been unallocated. And indeed, if we use the `heap` command, we can see that this section of our memory in the heap is labelled a "free chunk":

<figure><img src="/files/ThDOwBQlT6BtXvbXho4y" alt=""><figcaption><p>Free chunk</p></figcaption></figure>

## Heap Chunk Incrementation&#x20;

A little bit of a tangent is in order for us to understand the incrementation of the heap chunks allotted to us. It turns out, that the heap gives out chunks in increments of `16 bytes`. So, you know how we've been getting `24 bytes` of user data this entire time for the minimum amount? Let's request `25 bytes` from `malloc` and see what happens.

```c
#include <stdlib.h>

int main(void){
    void *p = malloc(24); /* smallest user data chunk size allocatable */
    malloc(25);
    return 0;
}
```

We set a breakpoint on `main` and `run` the program. We can see that we're about to allocate the minimum chunk size we can from `malloc` which is `0x20` bytes (`32 bytes`) or `0x21 (33 bytes)` if the `PREV_INUSE` bit is set (which it's *going* to be):

<figure><img src="/files/9pC2ajDGmeUNzgWnRxt7" alt=""><figcaption><p>Our current location of the scripts execution</p></figcaption></figure>

We move forward one (`1`) instruction and see what we've been seeing this entire time, an allocation of `0x21` bytes:

<figure><img src="/files/ylbIa8hgptFAKMOXGqTA" alt=""><figcaption><p>32 bytes allocated (+1 for <code>PREV_INUSE = 33 (0x21)</code>)</p></figcaption></figure>

Now, if we move forward another instruction to try to allocate more than this minimum amount, just by a single byte, we'll see that the heap chunk header will increment by `16 bytes`.

<figure><img src="/files/3lGVqh282zRaQimy9XXK" alt=""><figcaption><p>48 bytes allocated (+1 for <code>PREV_INUSE = 49 (0x31)</code>)</p></figcaption></figure>

We can see that we've allocated `0x31` bytes for the heap chunk as opposed to the `0x21` bytes we've been seeing this entire time.&#x20;

```c
pwndbg> print /d 0x21
$1 = 33
pwndbg> print /d 0x31
$2 = 49
```

That's right! There's a difference of `16 bytes`! So, the heap gives out memory, depending on how much you want it, in increments of `16 bytes`. Let's try it with one last example just to really internalize this:

```c
#include <stdlib.h>

int main(void){
    void *p = malloc(25); /* 0x31 bytes */
    malloc(50); /* should be 0x41 bytes (+16 from 0x31) */
    return 0;
}
```

<figure><img src="/files/3A221Pq2lLMpOdns99g6" alt=""><figcaption><p>16 byte incrementation</p></figcaption></figure>

Awesome, we've proven that the heap chunks increment by `16 bytes`!

### Reallocating Memory

From the `man` pages, we can see what `realloc` does:

<figure><img src="/files/uCdjZgRw0hLWXWyQiPrg" alt=""><figcaption><p><code>realloc</code> man page entry</p></figcaption></figure>

Let's consider the following source code:

```c
#include <stdlib.h>

int main(void){
    void *p = malloc(24);
    free(p);
    realloc(p, 25);
    return 0;
}
```

So, here we do exactly the same thing as we were doing up until [freeing the allocated memory](#freeing-allocated-memory). After this part, we're going to pass the freed memory pointer, with a size of `25 bytes`. If all goes accordingly, then we should have a new chunk size of `0x31` since we know that anything over the `24` bytes minimum will increment the heap chunk size by `16 bytes` and the process will repeat for the next chunk size, and so on.&#x20;

So, what we're expecting to see at the end of this program is a chunk header with the value of `0x31` as well as the five (`5`) quadwords (`40 bytes`) of user data that comes with an allocation size this big.&#x20;

<figure><img src="/files/PPiCVdVL871ult0b0yn0" alt=""><figcaption><p>Just as expected! reallocation of heap memory with <code>realloc</code></p></figcaption></figure>

Et voila! It worked amazingly. We now understand the basic workings of the heap's functionality, its quirks, and how some of the APIs are involved in doing all of this dynamic memory allocation! Give yourselves a huge round of applause, this is some super interesting stuff and you should be ecstatic because you are now a `HEAP MASTER`.

## References

{% embed url="<https://dl.packetstormsecurity.net/papers/attack/MallocMaleficarum.txt>" %}

{% embed url="<http://phrack.org/issues/66/10.html>" %}

{% embed url="<https://heap-exploitation.dhavalkapil.com/attacks/unlink_exploit>" %}

{% embed url="<https://azeria-labs.com/heap-exploitation-part-1-understanding-the-glibc-heap-implementation/>" %}

{% embed url="<https://infosecwriteups.com/the-toddlers-introduction-to-heap-exploitation-part-1-515b3621e0e8>" %}

{% embed url="<https://0x434b.dev/overview-of-glibc-heap-exploitation-techniques/>" %}

{% embed url="<https://www.udemy.com/course/linux-heap-exploitation-part-1/>" %}

{% embed url="<https://www.youtube.com/watch?v=z33CYcMf2ug>" %}

{% embed url="<https://www.youtube.com/watch?v=s-GJ-buCGio>" %}

<details>

<summary>Etymology References</summary>

* [Malleus Maleficarum](https://en.wikipedia.org/wiki/Malleus_Maleficarum)
* [Maleficarum](https://en.wiktionary.org/wiki/maleficarum)
* [Maleficarum](https://www.wordsense.eu/maleficarum/)
* [Maleficus](https://en.wiktionary.org/wiki/maleficus#Latin)

</details>

[^1]: That was cringe, I apologize.

[^2]: *The masculine plural of this would be "**Malefici**."*

[^3]: On 64-bit platforms, that is.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://www.crow.rip/nest/binexp/heap/the-heap.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
