Jousting with the Lancet: Pajamas Media Interviews Professor Gilbert Burnham
Tom W,
I will post the info you seem to need in order to clarify my point.
You should read it.
“The cluster sampling critique”
“There are shreds of this in the Kaplan article, but it reached its fullest and most widely-cited form in a version by Shannon Love on the Chicago Boyz website. The idea here is that the cluster sampling methodology used by the Lancet team (for reasons of economy, and of reducing the very significant personal risks for the field team) reduces the power of the statistical tests and makes the results harder to interpret. It was backed up (wayyyyy down in comments threads) by people who had gained access to a textbook on survey design; most good textbooks on the subject do indeed suggest that it is not a good idea to use cluster sampling when one is trying to measure rare effects (like violent death) in a population which has been exposed to heterogeneous risks of those rare events (ie; some places were bombed a lot, some a little and some not at all).
There are two big problems with the cluster sampling critique, and I think that they are both so serious that this argument is now a true litmus test for hacks; anyone repeating it either does not understand what they are saying (in which case they shouldn’t be making the critique) or does understand cluster sampling and thus knows that the argument is fallacious. The problems are:
1)Although sampling textbooks warn against the cluster methodology in cases like this, they are very clear about the fact that the reason why it is risky is that it carries a very significant danger of underestimating the rare effects, not overestimating them. This can be seen with a simple intuitive illustration; imagine that you have been given the job of checking out a suspected minefield by throwing rocks into it.
This is roughly equivalent to cluster sampling a heterogeneous population; the dangerous bits are a fairly small proportion of the total field, and they’re clumped together (the mines). Furthermore, the stones that you’re throwing (your “clusters”) only sample a small bit of the field at a time. The larger each individual stone, the better, obviously, but equally obviously it’s the number of stones that you have that is really going to drive the precision of your estimate, not their size. So, let’s say that you chuck 33 stones into the field. There are three things that could happen:
a)By bad luck, all of your stones could land in the spaces between mines. This would cause you to conclude that the field was safer than it actually was.
b)By good luck, you could get a situation where most of your stones fell in the spaces between mines, but some of them hit mines. This would give you an estimate that was about right regarding the danger of the field.
c)By extraordinary chance, every single one of your stones (or a large proportion of them) might chance to hit mines, causing you to conclude that the field was much more dangerous than it actually was.
How likely is the third of these possibilities (analogous to an overestimate of the excess deaths) relative to the other two? Not very likely at all. Cluster sampling tends to underestimate rare effects, not overestimate them[2].
And 2), this problem, and other issues with cluster sampling (basically, it reduces your effective sample size to something closer to the number of clusters than the number of individuals sampled) are dealt with at length in the sampling literature. Cluster sampling ain’t ideal, but needs must and it is frequently used in bog-standard epidemiological surveys outside war zones. The effects of clustering on standard results of sampling theory are known, and there are standard pieces of software that can be used to adjust (widen) one’s confidence interval to take account of these design effects. The Lancet team used one of these procedures, which is why their confidence intervals are so wide (although, to repeat, not wide enough to include zero). I have not seen anybody making the clustering critique who as any argument at all from theory or data which might give a reason to believe that the normal procedures are wrong for use in this case. As Richard Garfield, one of the authors, said in a press interview, epidemics are often pretty heterogeneously distributed too.
There is a variant of this critique which is darkly hinted at by both Kaplan and Love, but neither of them appears to have the nerve to say it in so many words[3]. This would be the critique that there is something much nastier about the sample; that it is not a random sample, but is cherry-picked in some way. In order to believe this, if you have read the paper, you have to be prepared to accuse the authors of telling a disgusting barefaced lie, and presumably to accept the legal consequences of doing so. They picked the clusters by the use of random numbers selected from a GPS grid. In the few cases in which this was logistically difficult (read: insanely dangerous), they picked locations off a map and walked to the nearest household). There is no realistic way in which a critique of this sort can get off the ground; in any case, it affected only a small minority of clusters.”
http://crookedtimber.org/2004/11/11/lancet-roundup-and-literature-review





