Post-Mortem of the Jan-07 DDoS Attack

On Saturday, Jan 07 commencing at approximately 3:30pm EST (almost the exact moment I hit “publish” on a post about Save The Elephants, which we’ll repost later) we were hit with a multi-faceted DDoS Attack across three anycast constellations: dns1, dns2 and dns3.

The attack was a combination SYN, ICMP and DNS Flood, in excess of 1 Gig/sec across our anycast IPs with packets per second ranging from 500K/sec to 1M/sec across each nameserver.

At the outset of the attack it looks like all three affected DNS constellations were rendered non-responsive for a period of 30 to 60 minutes.

We were able to identify the target of the attack and had them delegate away from our nameservers (this domain has now cycled through 8 other DNS providers in under 48 hours, bringing this DDoS with it to every one of them).

The attack traffic against us is still persisting, but being mitigated.

Prolexic was able to mitigate starting around 4:40PM EST, bringing parts  of DNS2 back online.

We then made changes to DNS1 to direct queries toward the functional nodes of DNS2.

The attack overwhelmed DNS3 and our upstream providers dropped our BGP  sessions to preserve operational integrity.

We restored native DNS1 functionality at approximately midnight Saturday evening by renumbering the anycast broadcast IP address.

At approximately 3:00am Sunday morning we began routing queries for DNS3  to DNS4. We later renumbered the public anycast IPs for DNS3.EASYDNS.ORG and DNS3.EASYDNS.CA and that traffic has reverted to those anycast constellations.

Policy Response

The target domain in question was of a type we’ve seen before and has caused us grief in the form of other DDoS attacks. We have made additions to our domain prescreening rules to prevent similar domains from acquiring service from us in the future. (Nearly all DDoS attacks are against domains that have moved onto the system within the previous 72 hours).

Technical Response

We have identified DNS3 as the weakest link in our offering and will be making substantive changes to it first, followed by DNS1.

Key Takeaways

We made several unforced errors in the course of this incident which aggravated our pain:

  • While we were frantically trying to get the ccTLD registry to pull the delegation for the domain, they domain moved themselves to a licensee website that specializes in DDoS mitigation. That site was still serving up our NS records in response to queries, because of a configuration error in the licensee site (which we maintain).
  • We had the DNS fully backed up to Amazon Route53 via our easyRoute53 interface for exactly this type of scenario, but nobody else in the company knew that except for me. I never thought to push the button until long after. Had I done that, as was supposed to be the plan, then the easyDNS and blog website would have remained available the entire time. Some of our members did avail themselves of this for their own domains and we’re told it worked great. File under: The cobblers children have no shoes.
  • There was inadequate communications between systems staff (who were all handling this remotely) and support staff back in the office. This translated into additional frustration for members. So while people following all this on twitter, or signed up to the blog mailing list were giving us kudos for our communications savvy, the guys back at the office were wondering what the hell was going on.

All in all, not our proudest day. We’ve handled previous DDoS attacks better. We have been thinking on this deeply since it happened, and we are profoundly sorry for the pain this has cause our members.

There will be substantive changes here as a result of this incident.