Jeremy has an interesting post on SE referrals to his blog. Essentially Jeremy did the same thing as a one-off that Tim Bray has been doing for a while: graphs of proportions of referrals to his blog from various search engines. One difference is that Jeremy allows comments, so people have been posting the results of doing the same thing on their own logs.
In general, what you see is the same rank ordering that more formal studies of search market share tell you: 1) Google, 2) Y!, 3) MSN and others (though here’s an outlier from a site with a different audience). What’s different is the proportions — Google dominates much more in these referral logs than in the market share numbers. (Also, tellingly, Jeremy’s numbers for Y! are much higher than for the 3rd and 4th place engines, which is not the case for everyone else who is writing in.) This makes both Jeremy and TimB wonder if the market-share numbers are wrong.
I like this kind of study, as long as no one is pretending that it tells you much about search engine market share (Jeremy makes it clear that he is not). As I’ve grumped before, there are just too many ways in which your referrer logs diverge from the overall query stream. If you’re getting more referrals from engine A than engine B, it could be that:
1) People who search on A send different kinds of queries, which match your site better than B’s queries. (Say, techy vs. non-techy queries.)
2) Your site ranks higher on A than on B, independent of query population
3) Click-through levels (per search) differ in general between A and B
4) A’s users may recognize your name or site more than B’s users, causing them to click through.
5) Your site in particular is presented in a more click-attractive way on A than on B.
etc.
So here’s a proposal, for anyone who is posting numbers like this from a high-volume blog and has the patience (Jeremy? TimB?): compare how you rank on G and Y! search for some terms that appear on your site. (Both engines have publicly available APIs for this sort of thing.) You might have to generate an assortment of somewhat rare and unique terms or phrases (say, the model of glider that Jeremy flies, or a flower that TimB has photographed), but not so Googlewhack-rare that they appear only on your site. Mix the terms up so that they’re not all tech terms or specific to one aspect of your blog.
Now count the total number of times you appear in the top N results for the two engines and express it as a ratio. (I have no idea which way this is going to go, but I’ll make one prediction: the ratio won’t be very close to 1.0.) This will give you some estimate of how wrong your referral numbers are as a market share estimate, from only one of the sources of divergence above (number 2). Repost the adjusted numbers…
Disclaimer: I’m a Y! employee and work on Y! Search, and naturally this might be seen as driven by not liking the Google numbers from the logs. But actually, I don’t believe that Y! is as far ahead of MSN as Jeremy’s referral counts would tell you either (the published market-share numbers show MSN as much closer). Could it be that people who issue Yahoo-centric queries are both a) especially likely to find Jeremy’s blog, and b) especially likely to use Y! Search? Like … me for instance?
That might be fun to hack together over the next holiday break…
Re; Mr Bray’s perl script.
I don’t use perl, so I could be very wrong but looking at the source code on Mr Bray’s website.
http://www.tbray.org/ongoing/When/200x/2005/05/29/Search-Engine-Rankings
the section
elsif ($agent =~ m@^http://www.google.*search@) { $e = ‘Google’; }
elsif ($agent =~ m@^http://search.msn.com@) { $e = ‘MSN’; }
elsif ($agent =~ m@^http://web.ask.com@) { $e = ‘Ask Jeeves’;
looks to me as if it catches all of google’s various domains (google.co.uk, google.ca, google.com.au etc) but for MSN and ASK it only catches their dot com domain.