We have a script which (probably inadvisably) emails us when there's a problem with the MySQL server. For a while, because of some badly optimised queries, our server was timing out with "too many connections" errors; optimising those queries removed those issues.
But, for a while, every other morning between 0600 and 0800, we seemed to see a lot of traffic; and our old friend the 'too many connections' came back again.
We SSH'd into the server, looked at the access_log, and searched for 07:00:00 to see what was going on at that time. And here's what.
94.8.20.188 - - [09/Apr/2012:07:00:00 +0100] "HEAD /content/login.muk?return=/content/login.muk?return=/tv/342788/club-zebra-inchpinchers/qr-code HTTP/1.1" 200 - "" "Opera/9.80 (Windows NT 5.1; U; en) Presto/2.10.229 Version/11.60"
94.8.20.188 - - [09/Apr/2012:07:00:00 +0100] "GET /content/login.muk?return=/content/login.muk?return=/tv/342788/club-zebra-inchpinchers/edit HTTP/1.1"
94.8.20.188 - - [09/Apr/2012:07:00:00 +0100] "GET /content/login_auth_facebook.muk&scope=email,publish_stream,offline_access&state=/discussions/new/tv/tv-chatter?mediaid=342788 HTTP/1.1"
94.8.20.188 - - [09/Apr/2012:07:00:00 +0100] "HEAD /content/login.muk?return=/content/login.muk?return=/tv/342788/club-zebra-inchpinchers/web-searches HTTP/1.1"
94.8.20.188 - - [09/Apr/2012:07:00:00 +0100] "GET /content/login.muk?return=/content/login.muk?return=/tv/342788/club-zebra-inchpinchers/edit/people HTTP/1.1"
94.8.20.188 - - [09/Apr/2012:07:00:00 +0100] "HEAD /content/login.muk?return=/content/login.muk?return=/content/login.muk?return=/tv/342788/club-zebra-inchpinchers/edit HTTP/1.1"
94.8.20.188 - - [09/Apr/2012:07:00:00 +0100] "GET /content/login.muk?return=/content/login.muk?return=/tv/342788/club-zebra-inchpinchers/qr-code HTTP/1.1"
94.8.20.188 - - [09/Apr/2012:07:00:00 +0100] "GET /content/login.muk?return=/content/login.muk?return=/tv/342788/club-zebra-inchpinchers/web-searches HTTP/1.1"
94.8.20.188 - - [09/Apr/2012:07:00:00 +0100] "HEAD /content/login_auth_google.muk&scope=https%3A//www.googleapis.com/auth/plus.me+https%3A//www.googleapis.com/auth/userinfo.profile+https%3A//www.googleapis.com/auth/userinfo.email&response_type=code&access_type=offline&state=/content/login.muk?return=/tv/342788/club-zebra-inchpinchers/edit HTTP/1.1"
94.8.20.188 - - [09/Apr/2012:07:00:00 +0100] "GET /content/login.muk?return=/content/login.muk?return=/content/login.muk?return=/tv/342788/club-zebra-inchpinchers/edit HTTP/1.1"
94.8.20.188 - - [09/Apr/2012:07:00:00 +0100] "HEAD /content/login_auth_facebook.muk&scope=email,publish_stream,offline_access&state=/content/login.muk?return=/tv/342788/club-zebra-inchpinchers/edit HTTP/1.1"
Eeek!
This is a very badly-written scraper, following literally every link on the page: requesting a HEAD then a GET for each request. And, whoever was at the end of 94.8.20.188 was asking for the equivalent of 2.6m page impressions per day (that's double the amount of page impressions we get every month).
What we've done about it...
1. We've stopped 94.8.20.188 at the hardware firewall. This is a separate Cisco Pix 501, so none of this traffic will make it to the server. Whatever the benefits of the "cloud", this has partially convinced me that remaining on dedicated hardware is important.
2. You'll spot in the logs that there's a number of ?return= parameters in the above links. We had a "login" link on the login page, which was a little silly, and if you're a badly-written scraper, you'll simply keep going, since the URL is different each time. We've removed the login link from the login page.
We've also added all links starting /content/login into the robots.txt since we don't particularly want those appearing in Google searches: not that this badly-written scraper appeared to follow much of those conventions.
Lessions learnt from this:
1. Hardware firewalls are a Good Thing
2. Infinite loops in URLs are a Bad Thing
3. Since the server was still mostly happy all day, our site's pretty happily optimised.
4. Log files are a Good Thing if you bother reading them. (Better still, tail -f them on a screen while you work.)
And... why did it cause problems between 06:00 and 08:00? Because the automated backups happen then: both the 'default' and 'mysql' backup both kick off at 06:00. We'll be changing that, too.
Coping with a DOS attack
Follow @jamescridlandWe had a bizarre fault this morning. Here's what we learnt.
James Cridland is the Managing Director of Media UK, and a radio futurologist: a consultant, writer and public speaker who concentrates on the effect that new platforms and technology are having on the radio business.
E-mail James Cridland | Visit James Cridland's website
Comments
Be the first to comment
Use a social media account you already have to log in. More info
If you're not on social media, register for a Media UK account.
By logging in, you are consenting to a cookie that personally identifies you to us. Here's more about our cookies.
Get the Media UK Daily
Get new articles, news, jobs and discussions every day into your inbox. Subscribe, free, now