“Using numbers to replace judgment”

Julian Marewski and Lutz Bornmann write:

In science and beyond, numbers are omnipresent when it comes to justifying different kinds of judgments. Which scientific author, hiring committee-member, or advisory board panelist has not been confronted with page-long “publication manuals”, “assessment reports”, “evaluation guidelines”, calling for p-values, citation rates, h-indices, or other statistics in order to motivate judgments about the “quality” of findings, applicants, or institutions? Yet, many of those relying on and calling for statistics do not even seem to understand what information those numbers can actually convey, and what not. Focusing on the uninformed usage of bibliometrics as worrysome outgrowth of the increasing quantification of science and society, we place the abuse of numbers into larger historical contexts and trends. These are characterized by a technology-driven bureaucratization of science, obsessions with control and accountability, and mistrust in human intuitive judgment. The ongoing digital revolution increases those trends. We call for bringing sanity back into scientific judgment exercises.

I agree. Vaguely along the same lines is our recent paper on the fallacy of decontextualized measurement.

This happens a lot, that the things that people do specifically to make their work feel more scientific, actually pull them away from scientific inquiry.

Another way to put it is that subjective judgment is unavoidable. When Blake McShane and the rest of us were writing our paper on abandoning statistical significance, once potential criticism we had to address was: What’s the alternative? If researchers, journal editors, policymakers, etc., don’t have “statistical significance” to make their decisions, what can they do? Our response was that decision makers already are using their qualitative judgment to make decisions. PNAS, for example, doesn’t publish every submission that is sent to them with “p less than .05”; no, they still reject most of them, on other grounds (perhaps because their claims aren’t dramatic enough). Journals may use statistical significance as a screener, but they still have to make hard decisions based on qualitative judgment. We, and Marewski and Bornmann, are saying that such judgment is necessary, and it can be counterproductive to add a pseudo-objective overlay on top of that.