Preregistration, it’s actually a really good idea!

The great preregistration challenge is here, so this is a perfect time to preregister your next study. After all, when do you get the chance to win money for your research? But some might wonder what this pre-thingy is…

Preregistration is a simple, and yet surprisingly novel (as far as I know), idea to ensure that researchers follow the scientific method. In other words, a preregistration means you decide before data collection what (which phenomenon, population) you want to test how (procedure, stimuli, measures, and perhaps most importantly statistics). This is the very definition of testing hypotheses, because commencing data collection should be marking a point of no return when it comes to hypotheses, variables, and statistics. The exception is exploratory work, I will go into detail later on that topic. But back to a typical experiment, the (idealized) lifecycle is illustrated below. Note the arrows going only in one direction and the red line you should definitely not cross between data collection and planning your analyses based on pre-specified hypotheses. 


In its simplest, most straightforward incarnation a preregistration requires you to answer 9 questions on your hypothesis, measures, and statistics. It takes less than 5 minutes, assuming you have already thought about all those issues. It’s surprisingly easy, however, to just collect data without a clear idea of how to test the super interesting research question that you had at some point.

Preregistration forces you to figure out what your data and manipulations mean, and for example whether you are looking for a simple effect or an interaction (or, if you are feeling ambitious: a triple interaction). This should in theory all be decided before data collection, as I said before (and this cannot be stressed enough).

Unexpected advantages of preregistration

One advantage of really thinking things through and possibly even simulating your analyses with fake data (taking the shape of what you expect or simulated based on previous studies you are building on) is that your experiment design will be optimized as much as possible. You will not collect measures you don’t need, you know whether your variables are categorical or continuous, and it will already be clear which test you can do. This is all work you need to do anyhow at some point, and it’s just so much more efficient to think your study through before testing and testing and testing.

A second advantage is that preregistration forces you to specify some stopping rule. We’ve already talked about this topic on this blog, and I cannot recommend the post enough! Stopping rules, no matter how you derive and define them, give you the ability to plan ahead! It might not sound like much when you have all the time in the world (= a contract longer than 2 years), but as a post doc on short contracts, this is really important. You know you might actually finish a project before moving on, after all. Dorothy Bishop elaborates further on this topic on her excellent blog, with a focus on registered reports (you submit a complete paper without the results section before data collection begins). 

What does preregistration (not) prevent?

First, I find it important to make clear that preregistration is not the cure to the current confidence “crisis” (or however you want to call it) in Psychology. If you really want to cheat, you always can cheat. Preregistrations are not there to offer an external monitoring mechanism (which would only work if we were all forced to preregister every single study and if we could not publish without it, a model currently being tried in clinical research). What preregistrations do is prevent you to fall victim to your own biases. We like to tell and hear stories, and the better a story sounds, the more likely we are to believe it. This is only human, and you might have the best intention, but it’s sometimes tricky to remember what today you exactly thought and wanted.

A tale of trying to preregister

Some time ago I felt particularly ambitious and entered a new subfield (vowel discrimination), tried out a new method (habituation/individual differences study), developed a new measure (an input questionnaire), AND wanted to answer an exploratory research question (how does variability in daily life impact early language development). First of all, I don’t recommend doing all at once. But I was on one of those very short post doc contracts and felt the pressure (to be fair, I still do). So I went ahead, wrote up all I knew based on reading a lot of papers and talking to very smart and experienced colleagues. What I preregistered was an ideal scenario, but when I got my data, everything was different. First of all, the basic assumptions of my statistical tests were not met. The questionnaire data was all wonky, with a peak for low numbers and a loooong tail, definitely not normal. But I had not foreseen this case at all.

Then I was very specific when to exclude participants, for example adding a case that parents should not talk or point to the screen (because that’s sort of distracting). I did not expect that some parents would take of their headphones with masking music to listen to the stimuli. That’s also distracting for the little participant on their lap.

My solution

I am still in the process of writing up the paper, data collection and analysis took quite some time. And I must admit that I was held back a bit by my preregistration failure, how should I deal with this honestly and openly? I did not plan to bury my data and already presented them to a few really cool people who gave great feedback.

So what I do, as one should, is bring up the preregistration in the paper and provide supplementary materials with extensive explanations for every. single. change.

In fact, every analysis I ever did on the data is reported in a supplementary document, so I am not even tempted to poke around in my data until something is significant. (Actually, there will be a lot of non-significant data, which are also useful in my view.) The interested reader can check out all statistics I gathered. However, without having written down my plans many months ago, these new analyses, which are of course all justified and logical, might not have seemed exploratory by the time I am writing the paper. This does not mean I would want to deceive my colleagues, I would actually have been unconsciously deceived by myself.

A few handy tricks for preregistrations I wish I had known before

1. You can amend and update your preregistration

Changes in a preregistration can become necessary in a number of cases. What happened to me was that I knew less about the shape of the data I was about to collect than I expected. Looking at descriptive results (including checking the distribution of your data) is a good moment to adjust your preregistration before data analysis

2. It is possible to add decision trees

When planning analyses, you can specify conditions under which you will do either x or y (for example parametric or non-parametric tests, or a follow-up analysis that only is possible if you get enough bilingual participants, and so on…). Of course, such a decision tree requires that you are aware of any uncertainties. So ask yourself what might go “wrong” with your data, it can’t hurt!

3. There is now a handy guide to preregister different analysis plans

A surprisingly large number of parameters must be specified along with the statistical test you want to conduct, some are easily forgotten. To make sure you don’t forget anything, OSF (open science framework) is so nice as to provide a very useful guide. It’s important to not leave any wiggle room to squeeze out that significant result, so better be exhaustive instead of quick.


The story of preregistration is full of uncertainty and misunderstandings, but it’s an important change in our habits. I hope this blog post motivated you to join team prereg!

3 thoughts on “Preregistration, it’s actually a really good idea!

  1. Pre registration should have a framework for:
    Exploratory studies
    Sub clusters of a research programming

    Basics first:
    Any statistically coherent aggregation formula is legit if defined a priori. The ONLY limitation is that probabilities aren’t overstated (e.g. Bonferroni or min(2(min(a,b),max(a,b)) aka Weber (1985))

    A pre registration system has to allow researchers to start in advance how they cluster their studies. And what studies they do as exploratory.

    In study exploration: Exploration can also be allowed within a study. Fur example: a study is intended to test outcome A. Which doesn’t need much of a Bonferroni correction. But authors are a priori stating they want additionally to have a look at ten more parameters.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s