Here’s sociologist Jeremy Freese writing, back in 2008:
Key findings in quantitative social science are often interaction effects in which the estimated “effect” of a continuous variable on an outcome for one group is found to differ from the estimated effect for another group. An example I use when teaching is that the relationship between high school test scores and earnings is stronger for men than for women. Interaction effects are notorious for being much easier to publish than to replicate, partly because it is easy for researchers to forget (?) how they tested many dozens of possible interactions before finding one that is statistically significant and can be presented as though it was hypothesized by the researchers all along. Various things ought to heighten suspicion that a statistically significant interaction effect has a strong likelihood of not being “real.” Results that imply a plot like the one above practically scream “THIS RESULT WILL NOT REPLICATE.” There are so many ways of dividing a sample into subgroups, and there are so many variables in a typical dataset that have low correlation with an outcome, that it is inevitable that there will be all kinds of little pockets for high correlation for some subgroup just by chance. Examples of such findings in the published literature are left as an exercise for the reader.
Interesting to see this awareness expressed so clearly way back when, at the very beginning of what we now call the replication crisis in quantitative research. I noticed Freese’s post when it appeared but at the time I didn’t fully recognize the importance of his points.