Crackberry, DNS Disasters & Other Horrors

Blackberry users across the world were cross at RIM for two email outages that affected a fair number of blackberry users twice in as many weeks.  (See article here)  In addition, some grinches were busy trying to steal Christmas from a number of last minute Amazon and Walmart shoppers.  (See article here)  And google had an outage earlier this year when Michael Jackson died, and the search engine got so many queries for the singer that it thought it was under attack and stopped responding.  (See article here)  These outages reflect one of the great technology design challenges: single points of failure.  In the blackberry’s case, the basic method for getting email from a desktop to the blackberry requires that email messages be copied from the local computer and transferred through a RIM-controlled relay server to the user’s blackberry.  The relay server becomes a single point of failure for the RIM network.

With the Amazon outage this holiday season, the cause was a distributed denial of service (DDOS) attack aimed at the domain name server (DNS) hosting company who is responsible for telling users looking for http://www.amazon.com that that domain is located at the IP address 72.21.207.65.  By design, there can be only one “authoritative” group of DNS servers for a domain that can answer, for the entire internet, queries that request the number for the name.

These single points of failures are targeted by Murphy’s Law and malicious hackers alike, and network engineers and security experts have made careers designing better mousetraps to mitigate these fundamental weaknesses of their computer systems.  When you consider the amount of money and talent that some of these very large companies have, it underscores for me how fragile our existing information system infrastructure really is.  Tremendous resources have been focused on making the amazon.com web site highly available and highly accurate, but in spite of that extraordinary effort, there are still outages around amazon’s busiest time of year.

A challenge for the new decade will be fundamental changes in reliability in our computer networks, to make “High Availability As A Service” one of the new ‘net offerings for computer systems of all sizes.  Maybe you all should put that on your list for Santa for next Christmas!

Published by

faithatlaw

Maryland technology attorney and college professor.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s