Waiting to happen II: Publication lag times

Following on from the last post about publication lag times at cell biology journals, I went ahead and crunched the numbers for all journals in PubMed for one year (2013). Before we dive into the numbers, a couple of points about this kind of information.

  1. Some journals “reset the clock” on the received date with manuscripts that are resubmitted. This makes comparisons difficult.
  2. The length of publication lag is not necessarily a reflection of the way the journal operates. As this comment points out, manuscripts are out of the journals hands (with the reviewers) for a substantial fraction of the time.
  3. The dataset is incomplete because the deposition of this information is not mandatory. About 1/3 of papers have the date information deposited (see below).
  4. Publication lag times go hand-in-hand with peer review. Moving to preprints and post-publication review would eradicate these delays.

Thanks for all the feedback on my last post, particularly those that highlighted the points above.

rawdatesTo see how all this was done, check out the Methods bit below, where you can download the full summary. I ended up with a list of publication lag times for 428500 papers published in 2013 (see left). To make a bit more sense of this, I split them by journal and then found the publication lag time stats for each. This had to be done per journal since PLoS ONE alone makes up 45560 of the records.

LagTimesTo try and visualise what these publication lag times look like for all journals, I made a histogram of the Median lag times for all journals using a 10 d bin width. It takes on average ~100 d to go from Received to Accepted and a further ~120 d to go from Accepted to Published. The whole process on average takes 239 days.

To get a feel for the variability in these numbers I plotted out the ranked Median times for each journal and overlaid Q25 and Q75 (dots). The IQR for some of the slower journals was >150 d. So the papers that they publish can have very different fates.

IFIs the publication lag time longer at higher tier journals? To look at this, I used the Rec-Acc time and the 2013 Journal Impact Factor which, although widely derided and flawed, does correlate loosely with journal prestige. I have fewer journals in this dataset, because the lookup of JIFs didn’t find every journal in my starting set, either because the journal doesn’t have one or there were minor differences in the PubMed name and the Thomson-Reuters name. The median of the median Rec-Acc times for each bin is shown. So on average, journals with a JIF <1 will take 1 month longer to accept your paper than journal with an IF ranging from 1-10. After this it rises again, to ~2 months longer at journals with an IF over 10. Why? Perhaps at the lower end, the trouble is finding reviewers; whereas at the higher end, multiple rounds of review might become a problem.

The executive summary is below. These are the times (in days) for delays at all journals in PubMed for 2013.

Interval Median Q25 Q75
Received-to-Accepted 97 69 136
Accepted-to-Published 122 84 186
Received-to-Published 239 178 319

For comparison:

  1. Median time from ovulation to birth of a human being is 268 days.
  2. Mark Beaumont cycled around the world (29,446 km) in 194 days.
  3. Ellen MacArthur circumnavigated the globe single-handed in 72 days.

On the whole it seems that publishing in Cell Biology is quite slow compared to the whole of PubMed. Why this is the case is a tricky question. Is it because cell biologists submit papers too early and they need more revision? Are they more dogged in sending back rejected manuscripts? Is it because as a community we review too harshly and/or ask too much of the authors? Do Editors allow too many rounds of revision or not give clear guidance to expedite the time from Received-to-Accepted? It’s probably a combination of all of these factors and we’re all to blame.

Finally, this amusing tweet to show the transparency of EMBO J publication timelines raises the question: would these authors have been better off just sending the paper somewhere else?

Methods: I searched PubMed using journal article[pt] AND ("2013/01/01"[PDAT] : "2013/12/31"[PDAT]) this gave a huge xml file (~16 GB) which nokogiri balked at. So I divided the query up into subranges of those dates (1.4 GB) and ran the script on all xml files. This gave 1425643 records. I removed records that did not have a received date or those with greater than 12 in the month field (leaving 428513 records). 13 of these records did not have a journal name. This gave 428500 records from 3301 journals. Again, I filtered out negative values (papers accepted before they were received) and a couple of outliers (e.g. 6000 days!). With a bit of code it was quite straightforward to extract simple statistics for each of the journals. You can download the data here to look up the information for a journal of your choice (wordpress only allows xls, not txt/csv). The fields show the journal name and the number of valid articles. Then for Acc-Pub, Rec-Acc and Rec-Pub, the number, Median, lower quartile, upper quartile times in days are given. I set a limit of 5 or more articles for calculation of the stats. Blank entries are where there was no valid data. Note that there are some differences with the table in my last post. This is because for that analysis I used a bigger date range and then filtered the year based on the published field. Here my search started out by specifying PDAT, which is slightly different.

The data are OK, but the publication date needs to be taken with a pinch of salt. For many records it was missing a month or day, so the date used for some records is approximate. In retrospect using the Entrez date or one of the other required fields would have probably be better. I liked the idea of the publication date as this is when the paper finally appears in print which still represents a significant delay at some journals. The Recieved-to-Accepted dates are valid though.

0 thoughts on “Waiting to happen II: Publication lag times

  1. Excellent work Steve. Part of the extra lag for Cell Biol papers perhaps lies in the time needed for additional experiments. Several other fields are either binary (accept/reject) or require changes in text rather than in data. Cell biology (and others, in particular, genetics) papers often require additional experimental data. For one manuscript I submitted with a mouse conditional KO (actually had 4 KOs compared) the reviewers wanted data from an additional Cre for comparison which would have required repeating *all* of the data. Not surprisingly, we moved on. Beats me that the editors (of a society journal) accepted the reviewers and recommended revision as the two were incompatible to anyone who knows anything about how to do research. But editors tend to prefer not to reject.

    1. Yes, we had a similar experience. We got a clear cut reject from a one-word-title journal replete with one completely outrageous referee’s report that I was embarrassed to show to the person in my lab who had done all the work. I wrote to the Editor to say that I thought this referee had overstepped the line. The Editor responded that they were happy to see a revised version!?!? I wasn’t even appealing the decision! I can see the temptation in continuing to pursue a half-open door (or even a closed and bolted one), but I think it’s better for the trainees to get their paper out promptly, even if it’s somewhere else.

  2. This is great – there was a table of a few journals circulating over summer, but nothing with any corroborating info and misplaced by the time I wanted to look back at it. Reuploaded the Excel file to GitHub as tsv to filter for specific journal(s) in the browser, hope that’s ok

    All: Steve-Royle-pub-lag-times.tsv
    Overall Rec-Pub summary: Steve-Royle-pub-lag-times-rp-only.tsv
    Overall Rec-Pub, ordered by median: Steve-Royle-pub-lag-times-rp-median-descending.tsv

    There’s a journal abbreviation dictionary here for the MEDLINE-defined list if anyone else is querying specific journals in that search box

  3. At PeerJ we make the date stamps of every peer-review event fully public (including the number of peer reviews an article receives in each round). Using this data, you can see that we get first decisions to authors in a median of 22 days. We also operate an (un-peer reviewed) preprint server, which allows findings to go online on the same day.

    In addition to providing a full audit trail of the dates and numbers of peer-reviews, we also encourage peer-reviewers to name themselves (>40% do) and our authors are given the option of making their peer-review comments and version history fully public (>80% do). Some data on this is at: http://blog.peerj.com/post/100580518238/whos-afraid-of-open-peer-review

    In our opinion, if a journal has nothing to be ashamed of, then it should make its peer review data accessible and public. The community pays a lot of money for this ‘service’ so they deserve to see what their money pays for in various different venues.

    1. Thanks for the comment Tom. There are many journals missing. Those are the ones that don’t submit their date information to PubMed (so nothing is calculated for them). I think the date fields are not an absolute requirement for deposition. Whether or not journals deposit this info changes over time.

Leave a Comment

Your email address will not be published. Required fields are marked *