AIRWEB ‘06

We had the AIRWEB anti-webspam workshop Thursday before last, in connection with the SIGIR conference. In theory this was organized by Brian Davison, Marc Najork, and myself, but this is really Brian’s baby, and he did most of the work.

Overall, I was pleased. Unlike the first AIRWEB, the paper reviewing process was “competitive”, meaning that there were enough paper submissions that we had to reject some of them. Our early fears that we would be crammed into an un-airconditioned classroom at the University of Washington were unfounded – we were crammed into a pleasantly airconditioned classroom. It was actually a fine size for the 50 or so people who registered – all of us sitting in those elementary-school-style all-in-one desks with the writing surface bolted on to the chair. Kind of takes you back (way way back), as well as giving you a real test of how the years add girth.

Attendees were a mix of academics and industry people, with industrial folks divided between research-lab types and fighting-spam-in-the-field types, from Yahoo!, Google, Microsoft Research, Technorati, Ask.com, and a lot of others. A high point was Jan Pedersen’s overview of sponsored search (aka search ads) – slides here (PDF). I also saw a couple of techniques that we definitely have never thought of, and that are worth giving a try at Y!.

Late in the day, we had a panel on blogspam, where I subbed for Andrew Tomkins. It was sort of cute – after some intro statements, the six of us pulled our tiny age-inappropriate desks up to the front, and sat to await questions. It felt like an informal spelling bee. Most of what I talked about in the opening was the explosive and surprising growth of adoption of the nofollow standard for marking untrusted links, and the extent to which it doesn’t help.

And then, as always, the question came up: why don’t the major engines share data, and in particular, blacklists of nasty webspammers? Oh boy. Natalie Glance in particular has been an advocate of this, and at an earlier Spam Summit presented a clever voting scheme that engines might use to combine such data. I always feel both like a grumpus and a corporate tool when I say this, but I can think of two good reasons why it’s just never going to happen, and one pretty good reason why you might not even want it to. (I’m going to leave it there, but comments are welcome.)

5 thoughts on “AIRWEB ‘06”

  1. >why don’t the major engines share data, and in particular, blacklists of nasty webspammers?

    In the (presumably) competitive world of web search, isn’t this somewhat analogous to the prisoner’s dilemma?

  2. Tim –

    When you say “Most of what I talked about in the opening was the explosive and surprising growth of adoption of the nofollow standard for marking untrusted links, and the extent to which it doesn’t help”. What do you mean by “…doesn’t help”?

    Why doesn’t it help?

    – Michael

  3. Michel — Sorry, that was a bit of an overstatement. I think nofollow has helped from a websearch relevance point of view (marking some potentially spammy anchortext), and to some extent from a deterrence point of view (making the comment-spam game somewhat less lucrative).

    But while there are still publicly writable areas on the web where links can be placed, having the other places guarded with nofollow is kind of like having a four-door car with two of the doors locked…

  4. Tim –

    I am beginning to see the use of nofollow outside of the “comment spam” context i.e. on legitimate and useful links. I presume this is done in the belief that it will somehow improve rankings. If this trend is continued will it skew the search engine results pages? One established site I have been observing has nofollowed almost every external link, several hundred thousand of them! If this is a developing meme and becomes a pandemic what’s the future for the nofollow?

    – Michael

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s