China air pollution regression discontinuity update

Avery writes:

There is a follow up paper for the paper “Evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River policy” [by Yuyu Chen, Avraham Ebenstein, Michael Greenstone, and Hongbin Li] which you have posted on a couple times and used in lectures. It seems that there aren’t much changes other than newer and better data and some alternative methods. Just curious what you think about it. The paper is called, “New evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River Policy” [by Avraham Ebenstein, Maoyong Fan, Michael Greenstone, Guojun He, and Maigeng Zhou].

The cleanest summary of my problems with that earlier paper is this article, “Evidence on the deleterious impact of sustained use of polynomial regression on causal inference,” written with Adam Zelizer.

Here’s the key graph, which we copied from the earlier Chen et al. paper:

The most obvious problem revealed by this graph is that the estimated effect at the discontinuity is entirely the result of the weird curving polynomial regression, which in turn is being driven by points on the edge of the dataset. Looking carefully at the numbers, we see another problem which is that life expectancy is supposed to be 91 in one of these places (check out that green circle on the upper right of the plot)—and, according to the fitted model, the life expectancy there would be 5 years higher, that is, 96 years!, if only they hadn’t been exposed to all that pollution.

As Zelizer and I discuss in our paper, and I’ve discussed elsewhere, this is a real problem, not at all resolved by (a) regression discontinuity being an identification strategy, (b) high-degree polynomials being recommended in some of the econometrics literature, and (c) the result being statistically significant at the 5% level.

Indeed, items (a), (b), (c) above represent a problem, in that they gave the authors of that original paper, and the journal reviewers and editors, a false sense of security which allowed them to ignore the evident problems in their data and fitted model.

We’ve talked a bit recently about “scientism,” defined as “excessive belief in the power of scientific knowledge and techniques.” In this case, certain conventional statistical techniques for causal inference and estimation of uncertainty have led people to turn off their critical thinking.

That said, I’m not saying, nor have I ever said, that the substantive claims of Chen et al. are wrong. It could be that this policy really did reduce life expectancy by 5 years. All I’m saying is that their data don’t really support that claim. (Just look at the above scatterplot and ignore the curvy line that goes through it.)

OK, what about this new paper? Here’s the new graph:

You can make of this what you will. One thing that’s changed is that the two places with life expectancy greater than 85 have disappeared. So that seems like progress. I wonder what happened? I did not read through every bit of the paper—maybe it’s explained there somewhere?

Anyway, I still don’t buy their claims. Or, I should say, I don’t buy their statistical claim that their data strongly support their scientific claim. To flip it around, though, if the public-health experts find the scientific claim plausible, then I’d say, sure, the data are consistent with the this claimed effect on life expectancy. I just don’t see distance north or south of the river as a key predictor, hence I have no particular reason to believe that the data pattern shown in the above figure would’ve appeared, without the discontinuity, had the treatment not been applied.

I feel like kind of a grinch saying this. After all, air pollution is a big problem, and these researchers have clearly done a lot of work with robustness studies etc. to back up their claims. All I can say is: (1) Yes, air pollution is a big problem so we want to get these things right, and (2) Even without the near-certainty implied by these 95% intervals excluding zero, decisions can and will be made. Scientists and policymakers can use their best judgment, and I think they should do this without overrating the strength of any particular piece of evidence. And I do think this new paper is an improvement on the earlier one.

P.S. If you want to see some old-school ridiculous regression discontinuity analysis, check this out:

A global fourth-degree polynomial, huh? This is almost a parody of how to make spurious discoveries. As always, try looking at the graph without the helpful curvy lines and see if you can pick out any pattern at all. Disappointing to see this in the AJPS. The person who sent it to me writes:

The paper uses an RD to estimate whether national MPs in India are more likely to allocate pork to districts represented by copartisans at the state level than by out-partisans . . . Table 1 seems to be the main empirical takeaway: 1) Uses a 4th degree polynomial in the main RD. 2) Post-treatment covariates. How can you control for project-specific covariates when the outcome of interest is the degree to which projects were funded? Presumably the profile of projects changed, too. 3) Author admits that the 2 years pre and post election are somewhat arbitrary. 4) Regression to the mean? States with copartisan representation receive 59,000 rupees more post-election than states with out-partisan representation, but states with copartisan representation received 52,000 rupees LESS pre election. So national MPs punish their co-partisans before competitive elections, but reward them after? Huh.

As usual, my problem is not that researchers use questionable statistical methods, or that they go out on a limb, making claims beyond what can be supported by their data. Those of us who develop advanced statistical methods have to be aware that, once a method is out there, it can be used “off-label” by anybody, and lots of those uses will be mistaken in some way or another. No, my problem is the false sense of certainty that appears to be engendered by the use of high-tech statistics. If fourth-degree polynomials had never been invented, researchers could look at the above graph with just the scatterplot and draw their own conclusions, with no p-values to mislead them. The trouble is that, for many purposes, we do need advanced methods—looking at scatterplots and time series is not always enough—hence we need to get into the details and explain why certain methods such as regression discontinuity with high-degree polynomials don’t work as advertised; see here and here.