In her recent article about pizzagate, Stephanie Lee included this hilarious email from Brian Wansink, the self-styled “world-renowned eating behavior expert for over 25 years”:
OK, what grabs your attention is that last bit about “tweeking” the data to manipulate the p-value, where Wansink is proposing research misconduct (from NIH: “Falsification: Manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record”).
But I want to focus on a different bit:
. . . although the stickers increase apple selection by 71% . . .
This is the type M (magnitude) error problem—familiar now to us, but not so familiar a few years ago to Brian Wansink, James Heckman, and other prolific researchers.
In short: when your data are noisy, you’ll expect to get large point estimates. Also large standard errors, but as the email above illustrates, you can play games to get the standard errors down.
My point is: an unreasonably high point estimate (in this case, “stickers increase apple selection by 71%,” a claim that’s only slightly less ridiculous than the claim that women are three times more likely to wear red or pink during days 6-14 of their monthly cycle) is not a sign of a good experiment or a strong effect—it’s a sign that whatever you’re looking for is being overwhelmed by noise.
One problem, I fear, is the lesson sometimes taught by statisticians, that we should care about “practical significance” and not just “statistical significance.” Generations of researchers have been running around, petrified that they might overinterpret an estimate of 0.003 with standard error 0.001. But this distracts them from the meaninglessness of estimates that are HUGE and noisy.
I say “huge and noisy,” not “huge or noisy,” because “huge” and “noisy” go together. It’s the high noise level that allows you to get that huge estimate in the first place.
P.S. I wrote this post about 6 months ago and it happens to be appearing today. Wansink is again in the news. Just a coincidence. The important lesson here is GIGO. All the preregistration in the world won’t help you if your data are too noisy.