The Buffet Approach to Open Science

We have written about a few of the open science practices, some of which are becoming the norm, such as preregistration (and whether it prevents creativity). I’ve also been invited a few times to give classes and workshops to introduce various audiences to open science and how to implement the practices associated with the overall term (which covers so much more than changing our experimental and publishing habits, but that’s another blog post). Doing so means engaging with researchers from various disciplines, who conduct dramatically different types of studies, and approach science from a different angle than the prototypical theory-testing experimenter. The discussions around open science I had in these contexts have been extremely useful for me, and led me to promote what I call the “buffet approach” to open science. In short, I think it makes most sense to pick and choose those components and practices from open science that fit a specific project, career stage, personal skills, and institutional support.

To illustrate the buffet approach, I’ll use examples from my own research, but your own constraints will be different. I also need to preface this post by making clear that I was very lucky in working in very supportive environments (big shoutout to Alex Cristia and Caroline Rowland, who recognized the benefit and necessity of open science very early). With less support, adopting open science practices becomes much harder and makes it necessary to find your support community elsewhere (check out for example R-Ladies, ManyBabies, PSA, FORRT, …)

Buffet rule 1: You might not be able to try everything 

At a buffet, what you try also has to fit your diet – not all open science practices are even available for everyone, think preregistration for iterative theory building or data sharing for videos. To stick with the metaphor, some people are allergic to milk, so they will not try the yoghurt dressing with their salad. But balsamic is a good alternative.

So what if I can’t share my raw video data? Is data sharing completely out of the question then? I know we like simple distinctions (think significant vs non-significant) but open data, for example, comes in more shades than just black and white. Of course, it is possible to share everything from raw data, over derived data (in my own work for example I take video recordings, which I often cannot share. But I can share annotations, summary statistics, etc), to meta-data (data about the data: where were videos recorded, for what purpose, with what equipment, …). But it is also possible to keep some parts of this pipeline that contain sensitive information (here: the video, and names and personal information in the annotations) private and share the cleaned annotations only. This is exactly what the CHILDES database of child language has been doing since 1984, and it’s still a key resource in understanding language acquisition. This means, even the transcripts are incredibly useful for researchers when video or audio recordings are not necessarily shareable.

Hearing that you don’t have to go all in with data sharing can, in my experience, take a great burden off researchers’ shoulders. Especially when working with sensitive data, it is important to stress that open science does not have to mean access to everything for everyone. We also need to consider our participants’ rights and our obligation to behave ethically. 

I would say such gradedness exists for most open science practices. Think of preregistration: You can preregister before data collection (as I did here), or write a registered report. Both are (usually) quite different in their level of detail and commitment. Or you can preregister before data analysis (as done here). Or you preregister twice: both before data collection and before data analysis, for example when you need to amend something, because you realized you knew less about your data than you thought (like how many participants of a certain population you could get or whether children will make it through your wonderfully balanced but rather long experiment). We discuss the topic in more depth in this preregistration primer (open access version) your favorite bloggers co-wrote with the fantastic Naomi Havron.

Buffet rule 2: You don’t have to try everything at once

That’s basically the heart of the buffet approach: You eat what you like and feel like, but don’t have to go for everything that is on the table. So if for a specific project you want to share materials and data, but did not preregister (maybe because it will be an exploratory study or simply lacked time and resources), that’s already a great step forward. Open science practices are not all or none, you can pick and choose, match and mix, and do what’s most suitable to your career stage, project, lab, and level of support. 

I am thinking of trying too many things at once as a stuffing yourself: You don’t get to enjoy the benefits of single dishes, and it won’t feel good. I’ve made this mistake myself, both at buffets and with open science. For preregistration, this post shares what I learned during my first forays, which was a lot, because I made a lot of mistakes. I also do think that my past data sharing efforts could have been better, because I didn’t have the time to really think through what other researchers might find useful. This is because I tried to do too much at once with little support (which, by the way, has increased a lot in the past years, but still, sharing data is hard to do right, see the next rule…). 

This aspect of the buffet approach might be particularly useful for those that feel overwhelmed by the host of new things that seem to become the norm faster than you can read up on them. Just stick with what you know but try to sample one new dish (= practice) for every visit (= paper; thanks to the wonderful Elika Bergelson for that very practical suggestion). By picking one new practice and figuring it out instead of scrambling to hit a number of targets, it is much easier to do open science well.

Buffet rule 3: Label everything

For the graded, needs- and skills-based open science approach to work, we need documentation. Only this way, we can be transparent about the steps we did and did not take and our reasoning behind the decisions that led to the final product – usually a paper. Think of labels that list all ingredients for each dish, or at least whether it’s compatible with specific diets and/or allergies. Say you’re vegan, then you want to avoid the brioche, but you can go for the baguette. 

The need for documentation has two sides, one is institutional, and one is on the lab’s shoulders.

In open science, we need to know what each practice should entail for it to be useful and such guides do in my opinion at present not exist for most use cases. Funders, for example, mandate Data Management Plans and therein require that you follow field-specific standards. That’s just a bit circular, because you already have to know the standards (sometimes a question of having been at the right conference or knowing the right people), so it can become an in-group / out-group thing. Much better would for example be a link to known and reviewed field-specific standards. I know of BIDS for neuroscientific data and Psych-DS for behavioral human data, but who knows what other communities use? Should be cross-reference with Anthropology? And what about interdisciplinary research…? In short, I usually end up with more questions when reading guides to open science practices.

Now, what can we still do at the individual or lab-level? We do need to know exactly what we’ve done. An emerging focus on good documentation is probably the most useful thing coming out of many open science practices. Whether or not this documentation is formally preregistered, added to openly shared or privately (securely, I hope) archived materials and data, and/or available as commit history is for me secondary to the key change in our scientific habits, namely that we do not focus on the “end product” – i.e. a paper or thesis, but on the process. 

Documentation, like commenting code and describing data, does not necessarily have to be a lonely task. As a lab or community, it’s probably a good idea to develop standards or templates, not just for data but for all aspects of a study. As consequence, e.g. the same very nicely documented code runs on all data and can therefore be re-used with documentation and meta-data (because column names and content will also be the same). So we need to stop reinventing the wheel so much…

Looking at my own papers, I would say none of them is perfect. Some are great examples where I think the authorship team did a lot of things right, for example ManyBabies 1: Infant-directed Speech Preference. We preregistered, shared (derived and anonymized!!) data, scripts, and even Walkthrough videos to make the procedure more transparent. But even with so much effort, we still find gaps in our documentation (recently fixed: stimulus files were not all in the same folder for some reason).

But is the buffet approach the right way?

Some might think that allowing researchers to sample and take their time means we do not improve at all and questionable research practices such as HARKing, p-hacking, or even outright falsification to comply with current incentives will continue. I have three responses to this worry. First, incentives are (admittedly slowly) changing, as shown by this collection of job ads requiring open science statements, updated open science requirements by key funders such as the ERC, and the Recognition and Rewards initiative in the Netherlands. With a set of incentives being put in place, we ensure that moving towards changed practices will not grind to a halt just because not everyone jumps on board right away and with all they have. 

Second, one-size-fits-all approaches cannot work – basically ever (happy to hear your counterexample that is not breathing air…). For starters, as I mentioned before, not all researchers operate within a very specific theory-testing framework for which many practices were developed or generate the type of data that are easily shareable. Asking them to squeeze themselves into a mold might actually harm science, as diverse approaches are beneficial if we care about expanding knowledge. 

Third, new targets to manipulate (i.e. replacing citation counts and significant p-values with open science badges without any quality control) might make the situation even worse. Indeed, “open-washing” is a new term that refers to innocently or deliberately mimicking open science practices without increasing transparency.

For me, the goal of open science practices such as open data and preregistration is to keep ourselves honest. This does not mean that certain practices necessarily have to stop, e.g. data exploration, but any decision should be clear to everyone, not just the final outcome. Storing everything in the experimenter’s head is not very efficient, because let’s face it, even the experimenter is just human and will forget things or change the story in their mind. You’ll notice this when trying to dig up old data and figure out which of the many versions you might have saved were the ones you wrote the paper about. So, be nice to future you and start somewhere…

Image credit: Igor Ovsyannykov via pexels.com, remixed with Open Science Badges retrieved from OSF.io

License: CC BY-SA 4.0