Thursday, December 28, 2006

10 Things I Learned in 2006 (#8)

Multiple comparisons dishonesty in science is easy, tempting, and probably rife. Let me explain. When you start any experiment, you have one, or a number of null hypotheses about the data, which you intend to reject at a significance level of 0.05 (that is to say, 1 in 20 times you will get a false positive by chance, but we consider this a small enough percentage to deem worthy of report.) Once the data are in, what often happens is that you don't find what you want, and you have to go back to the drawing board so that you haven't wasted a quarter of a million dollars. You run other regressions with plausible stories. One of them comes out significant, and you write your paper.

Now, the more tests you run on a single dataset, the higher chance you have of coming up with a false positive. There are ways of correcting for this (and if you report all the comparisons you do in a paper you have to report as well how you done the corrections), but if it's just you, sitting in the lab playing with the spreadsheet, no one has to know what you're doing. And of course, if the model is reasonable, you get published, post hoc ergo propter hoc* be damned. Now, I'm not leveling accusations at anyone in particular, but this is so easy to perpetrate that surely a great number of findings (that have not been reproduced) are just plain wrong. 1 in 10? 1 in 8? The investigation continues.

* admittedly, one should get these things right before conducting public displays of idiocy. however, see #6

1 comment:

Anonymous said...

That's what academics, especially economists, call "publication bias". You can essentially run some regression to get whatever result you would like to get, which is a weakness of retrospective non-experimental studies. For that reason, many disciplines have started to adopt randomized experiments a la clinical trials, which is ostensibly free of such biases.