White label: the growth of bioRxiv

bioRxiv, the preprint server for biology, recently turned 2 years old. This seems a good point to take a look at how bioRxiv has developed over this time and to discuss any concerns sceptical people may have about using the service.

Firstly, thanks to Richard Sever (@cshperspectives) for posting the data below. The first plot shows the number of new preprints deposited and the number that were revised, per month since bioRxiv opened in Nov 2013. There are now about 200 preprints being deposited per month and this number will continue to increase. The cumulative article count (of new preprints) shows that, as of the end of last month, there are >2500 preprints deposited at bioRxiv. overall2

subject2

What is take up like across biology? To look at this, the number of articles in different subject categories can be totted up. Evolutionary Biology, Bioinformatics and Genomics/Genetics are the front-running disciplines. Obviously counting articles should be corrected for the size of these fields, but it’s clear that some large disciplines have not adopted preprinting in the same way. Cell biology, my own field, has some catching up to do. It’s likely that this reflects cultures within different fields. For example, genomics has a rich history of data deposition, sharing and openness. Other fields, less so…

So what are we waiting for?

I’d recommend that people wondering about preprinting go and read Stephen Curry’s post “just do it“. Any people who remain sceptical should keep reading…

Do I really want to deposit my best work on bioRxiv?

I’ve picked six preprints that were deposited in 2015. This selection demonstrates how important work is appearing first at bioRxiv and is being downloaded thousands of times before the papers appear in the pages of scientific journals.

  1. Accelerating scientific publishing in biology. A preprint about preprinting from Ron Vale, subsequently published in PNAS.
  2. Analysis of protein-coding genetic variation in 60,706 humans. A preprint summarising a huge effort from ExAC Exome Aggregation Consortium. 12,366 views, 4,534 downloads.
  3. TP53 copy number expansion correlates with the evolution of increased body size and an enhanced DNA damage response in elephants. This preprint was all over the news, e.g. Science.
  4. Sampling the conformational space of the catalytic subunit of human γ-secretase. CryoEM is the hottest technique in biology right now. Sjors Scheres’ group have been at the forefront of this revolution. This paper is now out in eLife.
  5. The genome of the tardigrade Hypsibius dujardini. The recent controversy over horizontal gene transfer in Tardigrades was rapidfire thanks to preprinting.
  6. CRISPR with independent transgenes is a safe and robust alternative to autonomous gene drives in basic research. This preprint concerning biosafety of CRISPR/Cas technology could be accessed immediately thanks to preprinting.

But many journals consider preprints to be previous publications!

Wrong. It is true that some journals have yet to change their policy, but the majority – including Nature, Cell and Science – are happy to consider manuscripts that have been preprinted. There are many examples of biology preprints that went on to be published in Nature (ancient genomes) and Science (hotspots in birds). If you are worried about whether the journal you want to submit your work to will allow preprinting, check this page first or the SHERPA/RoMEO resource. The journal “information to authors” page should have a statement about this, but you can always ask the Editor.

I’m going to get scooped

Preprints establish priority. It isn’t possible to be scooped if you deposit a preprint that is time-stamped showing that you were the first. The alternative is to send it to a journal where no record will exist that you submitted it if the paper is rejected, or sometimes even if they end up publishing it (see discussion here). Personally, I feel that the fear of scooping in science is overblown. In fields that are so hot that papers are coming out really fast the fear of scooping is high, everyone sees the work if its on bioRxiv or elsewhere – who was first is clear to all. Think of it this way: depositing a preprint at bioRxiv is just the same as giving a talk at a meeting. Preprints mean that there is a verifiable record available to everyone.

Preprints look ugly, I don’t want people to see my paper like that.

The depositor can format their preprint however they like! Check out Christophe Leterrier’s beautifully formatted preprint, or this one from Dennis Eckmeier. Both authors made their templates available so you can follow their example (1 and 2).

Yes but does -insert name of famous scientist- deposit preprints?

Lots of high profile scientists have already used bioRxiv. David Bartel, Ewan Birney, George Church, Ray Deshaies, Jennifer Doudna, Steve Henikoff, Rudy Jaenisch, Sophien Kamoun, Eric Karsenti, Maria Leptin, Rong Li, Andrew Murray, Pam Silver, Bruce Stillman, Leslie Vosshall and many more. Some sceptical people may find this argument compelling.

I know how publishing works now and I don’t want to disrupt the status quo

It’s paradoxical how science is all about pushing the frontiers, yet when it comes to publishing, scientists are incredibly conservative. Physics and Mathematics have been using preprinting as part of the standard route to publication for decades and so adoption by biology is nothing unusual and actually, we will simply be catching up. One vision for the future of scientific publishing is that we will deposit preprints and then journals will search out the best work from the server to highlight in their pages. The journals that will do this are called “overlay journals”. Sounds crazy? It’s already happening in Mathematics. Terry Tao, a Fields medal-winning mathematician recently deposited a solution to the Erdos discrepency problem on arXiv (he actually put them on his blog first). This was then “published” in Discrete Analysis, an overlay journal. Read about this here.

Disclaimer: other preprint services are available. F1000 Research, PeerJ Preprints and of course arXiv itself has quantitative biology section. My lab have deposited work at bioRxiv (1, 2 and 3) and I am an affiliate for the service, which means I check preprints before they go online.

Edit 14/12/15 07:13 put the scientists in alphabetical order. Added a part about scooping.

The post title comes from the term “white label” which is used for promotional vinyl copies of records ahead of their official release.

All This And More

I was looking at the latest issue of Cell and marvelling at how many authors there are on each paper. It’s no secret that the raison d’être of Cell is to publish the “last word” on a topic (although whether it fulfils that objective is debatable). Definitive work needs to be comprehensive. So it follows that this means lots of techniques and ergo lots of authors. This means it is even more impressive when a dual author paper turns up in the table of contents for Cell. Anyway, I got to thinking: has it always been the case that Cell papers have lots of authors and if not, when did that change?

I downloaded the data for all articles published by Cell (and for comparison, J Cell Biol) from Scopus. The records required a bit of cleaning. For example, SnapShot papers needed to be removed and also the odd obituary etc. had been misclassified as an article. These could be quickly removed. I then went back through and filtered out ‘articles’ that were less than three pages as I think it is not possible for a paper to be two pages or fewer in length. The data could be loaded into IgorPro and boxplots generated per year to show how author number varied over time. Reviews that are misclassified as Articles will still be in the dataset, but I figured these would be minimal.

Authors1First off: Yes, there are more authors on average for a Cell paper versus a J Cell Biol paper. What is interesting is that both journals had similar numbers of authors when Cell was born (1974) and they crept up together until the early 2000s, when the number of Cell authors kept increasing, or JCell Biol flattened off, whichever way you look at it.

I think the overall trend to more authors is because understanding biology has increasingly required multiple approaches and the bar for evidence seems to be getting higher over time. The initial creep to more authors (1974-2000) might be due to a cultural change where people (technicians/students/women) began to get proper credit for their contributions. However, this doesn’t explain the divergence between J Cell Biol and Cell in recent years. One possibility is Cell takes more non-cell biology papers and that these papers necessarily have more authors. For example, the polar bear genome was published in Cell (29 authors), and this sort of paper would not appear in J Cell Biol. Another possibility is that J Cell Biol has a shorter and stricter revision procedure, which means that multiple rounds of revision, collecting new techniques and new authors is more limited than it is at Cell. Any other ideas?

AuthorI also quickly checked whether more authors means more citations, but found no evidence for such a relationship. For papers published in the years 2000-2004, the median citation number for papers with 1-10 authors was pretty constant for J Cell Biol. For Cell, these data mere more noisy. Three-author papers tended to be cited a bit more than those with two authors, but then four author papers were also lower.

The number of authors on papers from our lab ranges from 2-9 and median is 3.5. This would put an average paper from our lab in the bottom quartile for JCB and in the lower 10% for Cell in 2013. Ironically, our 9 author paper (an outlier) was published in J Cell Biol. Maybe we need to get more authors on our papers before we can start troubling Cell with our manuscripts…


The Post title is taken from ‘All This and More’ by The Wedding Present from their LP George Best.