Guest post by Raphael Calel, Ciriacy-Wantrup Postdoctoral Fellow at the Department of Agricultural and Resource Economics at the University of California, Berkeley.
One of the most important tools for enhancing the credibility of research is the pre-analysis plan, or the PAP. Simply put, we feel more confident in someone’s inferences if we can verify that they weren’t data mining, engaging in motivated reasoning, or otherwise manipulating their results, knowingly or unknowingly. By publishing a PAP before collecting data, and then closely following that plan, researchers can credibly demonstrate to us skeptics that their analyses were not manipulated in light of the data they collected.
Still, PAPs are credible only when the researcher can anticipate and wait for the collection of new data. The vast majority of social science research, however, does not satisfy these conditions. For instance, while it is perfectly reasonable to test new hypotheses about the causes of the recent financial crisis, it is unreasonable to expect researchers to have pre-specified their analyses before the crisis hit. To give another example, no one analysing a time series of more than a couple of years can reasonably be expected to publish a PAP and then wait for years or decades before implementing the study. Most observational studies face this problem in one form or another.
Some have suggested that researchers could publish PAPs for these studies anyway. There are certainly no obstacles to this, although when the data has already been collected there is nothing preventing the researcher from data mining before writing the PAP. These PAPs therefore serve more as a ‘word of honour’ than a credible mechanism for committing to specific analyses.
So how can researchers credibly commit to a plan of analysis for data that is already available? This remains an open question, and I would like to suggest that another important transparency tool – the replication study – provides a partial answer.
‘Replication’ means different things to different people, so it will be useful to have a simple working definition here: a replication study asks the same research question, and conducts the same analyses as an existing study.
Replication studies of this kind are usually discussed as a way of checking the validity of scientific conclusions, finding errors, or uncovering fraudulent research. These framings emphasise the value of failed replications, as evidence of scientific misunderstanding or misconduct, while dismissing successful replications as presenting little or no new evidence.
I would like to suggest a different way to think about replication studies. Conducting a replication study is conceptually the same as conducting a study that uses a PAP, only that the researcher follows the plan of analysis implemented by a previous study instead of developing a new purpose-built PAP. For reasons already discussed, a new PAP would not be credible anyway, but as long as the previous study was written before you gained access to your data, it offers the kind of credible commitment device that a newly written PAP cannot. By tying your hands to follow the previous study, you dramatically reduce the scope for both intentional and unintentional data mining, motivated reasoning, and other problems that plague social science research. Replication is then simply an alternative way of putting together a plan of analysis (one that also saves you the trouble of peer-reviewing and publishing a PAP yourself), and it promises to extend the benefits of PAPs to a much broader swath of social science research.
Let me try to anticipate a few objections to my proposed reframing of replication studies. Firstly, replication studies permit less scope for innovation than actual PAPs do. This is true, but it is only by tying one’s hands that a PAP gains credibility, and previous studies offer perhaps the most credible ‘hand-cuffs’ when writing your own PAP does not. And, as with any PAP, the researcher remains free to try new things. Starting with a past study only clarifies what parts of the analysis may be subject to data mining.
Secondly, replication studies do not eliminate researcher-degrees-of-freedom, since the researcher may be able to select among several past studies. It is true that researchers would still have as many researcher-degrees-of-freedom as there are past studies on the particular research question, but in practice, this is a much tighter constraint on researcher-degrees-of-freedom than if the researcher writes a new PAP after the fact. Peer-review also provides a mechanism to arbitrate questions about the suitability of following the analysis plan of one study or another.
Finally, one may object that my proposal is unoriginal. After all, we already re-use methods and data developed in previous research. When you study the response of national income to some event, for example, you will likely look at GDP as it is recorded in national accounts. This is not only convenient, but it also reflects a consensus among researchers about how certain things are best measured and recorded. This consensus limits researcher-degrees-of-freedom and increases the credibility of research in exactly the same manner as a PAP, but to a lesser extent. I am proposing that we use past studies as substitutes for PAPs, which is only a small extension of the logic behind the ubiquitous practice of following academic conventions.
This objection is valid, but it merely emphasises that the value of the proposed reframing – replication as a credible pre-analysis plan – is that it helps us further leverage academic conventions and specific studies more systematically to conduct more credible social science research.
Replication studies should not be confined to error-hunting. Instead, we should follow their example and use past studies as commitment devices that increase the credibility of current social science research. This also provides a more balanced view of replication studies, so that failed and successful replications alike provide a new study with a more credible basis for further investigation.
About the Author: Raphael Calel is currently a Ciriacy-Wantrup Postdoctoral Fellow at the Department of Agricultural and Resource Economics at the University of California, Berkeley. He is an applied economist working on environmental and climate change policy evaluation. His most recent work investigates the economic impacts of the emissions trading programs, especially the EU Emissions Trading System. He has also studied the economic implications of uncertainty in climate change forecasts. Raphael recently completed his PhD in Environmental Economics at the London School of Economics and Political Science. He previously studied economics at the University of Cambridge and UCL.