Technical Hooey from the White House
New staffers claimed to find the place in the "technological dark ages" and pledged a more "open" website. Here's the reality.
January 29, 2009 - 12:10 am
The second, more technical story this week appears to have started out with a posting at Jason Kottke’s blog, one at TheNextWeb.com, one at Codeulate.com, and one by Cory Doctorow at BoingBoing. The Kottke post was helped along by a Twitter tweet from Tim O’Reilly, which was then re-tweeted many times.
The gist of the story is the notion that the Obama administration had made the whitehouse.gov site much more “open,” because the robots.txt file was dramatically shorter. TheNextWeb.com said it was “hopefully a sign of greater transparency”; Tim O’Reilly, who really should know better, put it as “Transparent gov FTW” (in English, “transparent government ‘for the win‘”); from the Twitter world, “saranovotny” said “amazing geek metric of the openness of the obama [sic] administration.”
For people who don’t eat and breathe websites, the robots.txt file is a suggestion to “spidering” programs, like the ones Google uses to index the web for later searching. Many people don’t realize that the way web search engines are built is based on programs that visit potentially every webpage on the Internet and copy the contents back to the search host. (This is why Google can provide cached copies of webpages that have been deleted.) The robots.txt file is defined, by convention, to tell the spidering programs which files should not be copied or indexed. So, on first glance, it makes sense that a shorter robots.txt means “more openness” — after all, that means fewer pages are being blocked.
The problem is, as Declan McCullough pointed out at CNet, there are a lot of good reasons why a competent webmaster would block pages from search engines. In fact:
If anything, Obama’s robots.txt file is too short. It doesn’t currently block search pages, meaning they’ll show up on search engines — something that most site operators don’t want and which runs afoul of Google’s webmaster guidelines. Those guidelines say: “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.”
In other words, the new “openness” of the White House website was actually poor search engine optimization compared to the Bush administration’s site.
Now, I’m the first person to recognize that no one can be an expert at everything, and expecting even a seasoned staff writer at the Washington Post to notice that what’s really been described to her is another example of the Windows versus Macintosh religious war might be a bit much. But, honestly, would it be too much for her to ask some questions before running with a quote like “technological Dark Ages”? Since it’s a geeky subject, sites like TheNextWeb and technical authorities like Cory Doctorow and Tim O’Reilly seem a little more guilty, but still it’s something that was published quickly. (It wouldn’t seem out of line to expect some of the websites to publish a correction.)
What it does tell us, though, is that readers who want to be well informed can’t afford to let down their guards. Clearly, the legacy media and even technical experts are perfectly capable, and more than willing, to be led astray, as long as it fits the “dumb Bush administration” narrative.