Our kickstarter project #barbarplots reached its funding goal and will thus become reality! In the 30-day campaign, 173 backers pledged a total of 3,479 Euro to send #barbarplots t-shirts to editors of major scientific journals. We are very excited and want to thank you for the tremendous support – not only by pledging, but also by spreading the word via email, Facebook, Twitter, and by wearing and carrying tote bags and t-shirts with the following meme around the world.
Figure 1: Graphs used in original meme.
We also want to thank everyone that joined the discussion on #barbarplots during the 30 days of our campaign. These discussions happened on people’s Facebook pages, in the lab, via email, and on Twitter. To not lose all these thoughts widespread along the www, I here try to put together a collection of discussion points – both critical thoughts about the campaign itself as well as musings on data visualization in general. Note that many snippets of answers I borrow from our British t-shirt doctor Rory, who not only in our video, but also in real life proved to be the most eloquent #barbarplots spokesman.
Before going into it, I want to draw your attention to some of the the shoulders this campaign is standing on, for instance Weissgerber et al. (2009) and Allen at al. (2012) , who started the plotting revolution much earlier.
So off we go – I’ll start with the maybe most obvious critical remark about our campaign.
But barplots can be a good way to plot!
Sure. That’s why we used the following barplot for the cost breakdown of our kickstarter project to demonstrate that count data with a meaningful zero and no distributional properties can be wonderfully represented by a barplot.
Fig. 2: Nice example of a barplot.
This explanation, however, quite readily leads to critique number 2:
But why do you say #barbarplots if you do not actually mean it??
Some people pointed out that our hashtag #barbarplots, our slogan “Friends don’t let friends make barplots” and our campaign video with the cats-and-dogs example were catchy but partially misleading.
That’s true. But we made a conscious decision in favor of catchiness and simplicity in order to increase the likelihood people would raise their head and smile and share. Researchers take their work seriously and details are important, but they’re also humans with an often already overstrained attention span. And they’re also smart enough to look beyond the (attention-catching) headline to ponder the more nuanced argument.
We are aware that some people might still be misled by the simplified version of our message, and could even religiously stop using barplots no matter what. But this is, for sure, not a problem that our slogan is causing.
And a propos catchiness, the one hilarious instantiation of the catchiness type criticism that stuck in my mind most is this one:
What are cases in which barplots are not good and why?
In general, we consider barplots not to be the most informative way to represent distributional data. For example, perhaps an effect is driven entirely by a subset of participants in the experiment? Perhaps a null effect arises because some participants have a negative effect and others have a positive effect? Perhaps some items are extremely variable? Perhaps the data are very non-normal and using inferential statistics is inappropriate? And even if your data are perfectly normally distributed, wouldn’t it make your paper even more convincing if your visualization method reflected that?
Ok, so what you want to say is that we should not be comparing means?
It is true that the issue of barplots as a data visualisation technique often goes together with the issue of doing statistics by comparing means. However, they are nevertheless separate issues. The focus of the campaign is really about barplots as a visualization technique. We just want to make sure that we think about the choice rather than formulaically applying one way of analysis or visualization.
So…what IS a better plot to represent distributions?
In our campaign, we promote box plots and histograms. Both tell us way more about the distribution of data than barplots, notably about the spread and skewness of the underlying data. And there’s more: Scatter plots, violin plots, swarm plots, pirate plots. Below is an example of how the same data would look with different types of plots.
Fig. 3: The R ChickWeight data relating four different diets to chicken weight, plotted six different ways. Please not that axes and colors across graphs are not equivalent (for axes, both in terms of what is represented on the axes, see histogram, and in terms of exact y axis length, see other graphs).
Though this is not the case in the above graphs, the way we plotted in our original meme (Fig. 1) lead some people to say:
Your histogram has two x axes!! That is very misleading!
It is true that the histogram differs from the other plots in this aspect. Sure, we could have superimposed the distributions so that there was only one x axis (but then the plot would differ from the others in the sense that it would not be a side-by-side but an overlaid representation). We could have also rotated the histogram 90 degrees to make the y axes match (but then it would differ from the others by being vertically as opposed to horizontally aligned). In short: Yes, there would have been other ways to do it, but it is not clear that there was a clearly better way to do it. Also, we do not want to promote boxplots or histograms as a single alternative to barplots (see above) – all we want is for us to think about our choices.
Still, we want to emphasize that a barplot is often not the most informative alternative for distributional data. That leads me to the last frequent point of criticism:
Isn’t a plot supposed to be an abstraction of the data and a means to present them in the simplest and most understandable way possible?
To that, I would first want to state the obvious that simple isn’t always good. Like interpreting everything below p = .05 as a victory and above as an epic failure. It’s not good, it’s not right, and part of its being simple is certainly convention and habitude.
But still, a summary/abstraction/simplification of the raw data cannot be a bad idea to bring your point across? This question gets at the heart of what visualization is for and what scientific publishing is supposed to do. If the visualization is to be as simple as possible to get the most basic point across, maybe just showing means is fine. (To be honest, though, if you’re just reporting two means, don’t waste space with a graph, just put that in the text.) However, it’s conventional for us to go into a lot of obsessive detail in our reporting of methods and statistical analyses – so why not with our graphs too? Showing or summarizing the distribution can be a useful way to put your cards on the table for readers to see that you are not trying to hide or obfuscate any unusual outliers or strange trends.
But of course, if we’re giving a 10-min-talk in front of an audience that is used to barplots, walking the audience through the graph would require more time than you have.
Luckily, there are many compromises that can be made – for example, plot a grand mean, along with (less visually prominent) subject means, so that the variability in the data is more apparent. Or a boxplot, which is visually less dense than other alternatives to barplots.
Thank you again to all of our backers, tweeters, and reposters. We’ve really enjoyed working on this campaign, from filming the video to group discussions on and off social media. The campaign is by no means over though, so look for updates as we complete our aim to send out t-shirts to journal editors and continue the discussion. Happy plotting!
Your #barbarplots team,
Page Piccinini @pageinini ,
Christina Bergmann @chbergma ,
Sho Tsuji @CogTalesTweets ,
Alexander Martin ,
Rory Turnbull ,
Adriana Guevara Rukoz ,
M. Julia Carbajal