« previous: How To Fix a Broken Mac | next: Mac OS X Network Problem: Slow Remote Login (SSH): Fixed »
Microsoft’s Live Search is really starting to irritate me.
As this log snippet from VisitorLog shows, I get about 30 separate hits per day from hosts named livebot-65-55-*-*.search.live.com. The vast majority of them are bots, not real humans, as evidenced by the fact that they have no screen resolution (and therefore no screen), which while not a guarantee of botness, is a pretty strong sign of it, especially when combined with other bot-like characteristics such as having "livebot" in the hostname.
So far this is all OK. However, the bot’s USER_AGENT string is set to IE7/Win2003, which is bogus [the full string is: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)]. It’s clearly a bot, and possibly a spider, so it’s not a real IE7 browser; it should identify itself with an accurate user-agent string like a responsible internet citizen.
And what’s worse is that, unlike most bots/spiders, this one actually has a non-null HTTP_REFERER string: it claims to have come from Microsoft’s Live Search engine, searching for extremely generic single terms like "files" for which my site isn’t even in the first 10 pages of search results. The only logical conclusion I see here is that Microsoft is doing some serious referer spamming to get hits back to its Live Search pages.
This has been going on since November 21st, but for months before that, the exact same thing had been happening except that the hostname was bl2sch1082113.phx.gbl (or similar) instead of livebot-65-55-*-*.search.live.com.
Now don’t get me wrong: spiders are good. I like spiders crawling my sites, and I’d really like for Google to have some strong competition in the search space. But faking your user-agent and spoofing the referer field with bogus data aren’t good practices for a search engine. Someone please tell me there is a valid explanation for this.