So far, most of the Climategate attention has been on the emails in the data dump of November 19 (see here, here, and here), but the emails are only about 5 percent of the total. What does examining the other 95 percent tell us?
Here’s the short answer: it tells us that something went very wrong in the data management at the Climatic Research Unit.
We start with a file called “HARRY_READ_ME.txt.” This is a file containing notes of someone’s three-year effort to try to turn a pile of existing code and data into something useful. Who is Harry, you ask? Clearly, a skilled programmer with some expertise in data reduction, statistics, and climate science. Beyond that I won’t go. I’ve seen sites attributing this file to an identifiable person, but I don’t have any corroboration, and frankly the person who wrote these years of notes has suffered enough.
The story the file tells is of a programmer who started off with a collection of code and data — and the need to be able to replicate some results. The first entry:
1. Two main filesystems relevant to the work:
Both systems copied in their entirety to /cru/cruts/
Nearly 11,000 files! And about a dozen assorted “read me” files addressing individual issues, the most useful being:
(yes, they all have different name formats, and yes, one does begin ‘_’!)
Believe it or not, this tells us quite a bit. “Harry” is starting off with two large collections of data on a UNIX or UNIX-like system (forward slashes, the word “filesystem”) and only knows very generally what the data might be. He has copied it from where it was to a new location and started to work on it. Almost immediately, he notices a problem:
6. Temporarily abandoned 5., getting closer but there’s always another problem to be evaded. Instead, will try using rawtogrim.f90 to convert straight to GRIM. This will include non-land cells but for comparison purposes that shouldn’t be a big problem …  noo, that’s not gonna work either, it asks for a “template grim filepath,” no idea what it wants (as usual) and a serach for files with “grim” or “template” in them does not bear useful fruit. As per usual. Giving up on this approach altogether.
Things aren’t going well. Harry is trying to reconstruct results that someone else obtained, using their files but without their help.
8. Had a hunt and found an identically-named temperature database file which did include normals lines at the start of every station. How handy — naming two different files with exactly the same name and relying on their location to differentiate! Aaarrgghh!! Re-ran anomdtb:
Okay, this isn’t so unusual, actually, but unless you document and describe your file structure, it’s pretty much opaque to a new reader. Still, Harry presses on:
11. Decided to concentrate on Norwich. Tim M uses Norwich as the example on the website, so we know it’s at (363,286). Wrote a prog to extract the relevant 1961-1970 series from the published output, the generated .glo files, and the published climatology. Prog is norwichtest.for. Prog also creates anomalies from the published data, and raw data from the generated .glo data. Then Matlab prog plotnorwich.m plots the data to allow comparisons. First result: works perfectly, except that the .glo data is all zeros. This means I still don’t understand the structure of the .glo files. Argh!