Can we do better than using averaged measurements?

Angus Reynolds writes:

Recently a PhD student at my University came to me for some feedback on a paper he is writing about the state of research methods in the Fear Extinction field. Basically you give someone an electric shock repeatedly while they stare at neutral stimuli and then you see what happens when you start showing them the stimuli and don’t shock them anymore. Power will always be a concern here because of the ethical problems. Most of his paper is commenting on the complete lack of constancy between and within labs in how they analyse data. Plenty of Garden of forking paths, concerns about type 1, type 2 and S and M errors. One thing I’ve been pushing him is to talk about more is improved measurement. Currently fear is measured in part by taking skin conductance measurements continuously and then summarising an 8 second or so window between trials into averages, which are then split into blocks and ANOVA’d. I’ve commented that they must be losing information if they are summarising a continuous (and potentially noisy) measurement over time to 1 value. It seems to me that the variability within that 8 second window would be very important as well. So why not just model the continuous data? Given that the field could be at least two steps away from where it needs to be (immature data, immature methods), I’ve suggested that he just start by making graphs of the complete data that he would like to be able to model one day and not to really bother with p-value style analyses. In terms of developing the skills necessary to move forward: would you even bother trying to create models of the fear extinction process using the simplified, averaged data that most researchers use or would it be better to get people accustomed to seeing the continuous data first and then developing more complex models for that later?

My reply:

I actually don’t think it’s so horrible to average the data in this way. Yes, it should be better to model the data directly, and, yes, there has to be some information being lost by the averaging, but empirical variation is itself very variable, so it’s not like you can expect to see lots of additional information by comparing groups based on their observed variances.

I agree 100% with your suggestion of graphing the complete data. Regarding measurement, I think the key is for it to be connected to theory where possible. Also from the above description it sounds like the research is using within-person comparisons, which I generally recommend.