It starts with a scratchy throat, and (if not treated promptly) progresses to a full-blown belief that content creators everywhere will work together in harmony, and speak with one (meta-)voice. In its origins (in particular, the belief that if we understand what a name/symbol/tag means, then programs will too), it may be related to certain disorders of the AI family.The afflicted are often unaware of its progress, since when applied to small, cohesive communities of technically informed, well-meaning individuals (early Usenet, the Well, the early Web, current RSS feeds), the beliefs actually make some sense. People will use labels correctly, both because they are acting responsibly and because the whole thing hasn’t spun wildly out of control yet in that really cool way that we all hope it will.
Working to make such cohesion _possible_ is a very fine thing. But assuming that the cohesion will persist is nuts if you’re building a search engine. Although standard web servers serving HTML do have a chance to provide a fair amount of metadata to crawlers, _none_ of it can be relied upon in practice to characterize the actual content being served — not even lastmod time or the language of the page. Crawlers must perform all the due diligence.
XML offers the promise of more reliable well-formedness, and so certain kinds of parsing issues may be easier (most of the time) if and when all the interesting content is provided that way. But I see no reason to believe yet that the self-description story will be any different this time around, particularly after there starts to be money in this stuff. So do yourself a favor, and ask your doctor about the free SWD screen when you get your next mental checkup.
Leave a reply to Adam Cancel reply