MSNBot? Or something else?

Posted On 2010-09-26 by FortyPoundHead
Tags: Webmaster Related 
Views: 1568

I've been getting a ton of hits from what appears to be MSNbot. But is it really? I've gotten thousands of hits from this little beastie, just over the last week or so. Normally I wouldn't mind getting this much attention from a search engine.

The problem is that it appears to be generating random filenames and requesting them from the web server. This generates a 404, or file not found message. The problem is, every time there is a 404 on the site, I get an email.

Here is an example of what it is requesting (watch for line wrap):
2010-09-26 07:06:28 W3SVC3 GET /httpkbindianaedudataagazhtml cust=94316029283131 80 - HTTP/1.1 msnbot/2.0b+(+ - - 404 0 0 6841 255 3203
So in this example (there are tons more entries in the log), the bot purports to be MSNbot 2.0b, or the Bing search engine. And the IP address matches up to a block owned by Microsoft.

Another thing I have noticed is that the bot doesn't appear to be honoring robots.txt instructions. For example, if I have a crawl delay in there, it is ignored, and the bot will crawl around the site for an hour at full speed.

I've heard that others have experienced the bot completely ignoring directory exclusions as well, however this is only hearsay. I've not seen this behavior myself.

So for now, until this naggy little bot gets under control, I'll just be throwing the class C net block into the 403 list.

Anybody else got any input on this? Seen this weirdness on your webserver? or something worse?

About the Author

FortyPoundHead has posted a total of 1974 articles.

Comments On This Post

By: FortyPoundHead
Date: 2010-09-26

Mis-spoken above. MS actually owns 65.52.x.x - 65.55.x.x, or three class B's. Still, I've only blocked the class C (65.55.55.x), since that is where the bot is coming from.

Do you have a thought relating to this post? You can post your comment here. If you have an unrelated question, you can use the Q&A section to ask it.

Or you can drop a note to the administrators if you're not sure where you should post.

Your IP address is:

Before you can post, you need to prove you are human. If you log in, this test goes away.

Code Links