Waiting to Happen: Publication lag times in Cell Biology Journals

My interest in publication lag times continues. Previous posts have looked at how long it takes my lab to publish our work, how often trainees publish and I also looked at very long lag times at Oncogene. I recently read a blog post on automated calculation of publication lag times for Bioinformatics journals. I thought it would be great to do this for Cell Biology journals too. Hopefully people will find it useful and can use this list when thinking about where to send their paper.

What is publication lag time?

If you are reading this, you probably know how science publication works. Feel free to skip. Otherwise, it goes something like this. After writing up your work for publication, you submit it to a journal. Assuming that this journal will eventually publish the paper (there is usually a period of submitting, getting rejected, resubmitting to a different journal etc.), they receive the paper on a certain date. They send it out to review, they collate the reviews and send back a decision, you (almost always) revise your paper further and then send it back. This can happen several times. At some point it gets accepted on a certain date. The journal then prepares the paper for publication in a scheduled issue on a specific date (they can also immediately post papers online without formatting). All of these steps add significant delays. It typically takes 9 months to publish a paper in the biomedical sciences. In 2015 this sounds very silly, when world-wide dissemination of information is as simple as a few clicks on a trackpad. The bigger problem is that we rely on papers as a currency to get jobs or funding and so these delays can be more than just a frustration, they can affect your ability to actually do more science.

The good news is that it is very straightforward to parse the received, accepted and published dates from PubMed. So we can easily calculate the publication lags for cell biology journals. If you don’t work in cell biology, just follow the instructions below to make your own list.

The bad news is that the deposition of the date information in PubMed depends on the journal. The extra bad news is that three of the major cell biology journals do not deposit their data: J Cell Biol, Mol Biol Cell and J Cell Sci. My original plan was to compare these three journals with Traffic, Nat Cell Biol and Dev Cell. Instead, I extended the list to include other journals which take non-cell biology papers (and deposit their data).

LagTimes1

A summary of the last ten years

Three sets of box plots here show the publication lags for eight journals that take cell biology papers. The journals are Cell, Cell Stem Cell, Current Biology, Developmental Cell, EMBO Journal, Nature Cell Biology, Nature Methods and Traffic (see note at the end about eLife). They are shown in alphabetical order. The box plots show the median and the IQR, whiskers show the 10th and 90th percentiles. The three plots show the time from Received-to-Published (Rec-Pub), and then a breakdown of this time into Received-to-Accepted (Rec-Acc) and Accepted-to-Published (Rec-Pub). The colours are just to make it easier to tell the journals apart and don’t have any significance.

You can see from these plots that the journals differ widely in the time it takes to publish a paper there. Current Biology is very fast, whereas Cell Stem Cell is relatively slow. The time it takes the journals to move them from acceptance to publication is pretty constant. Apart from Traffic where it takes an average of ~3 months to get something in to print. Remember that the paper is often online for this period so this is not necessarily a bad thing. I was not surprised that Current Biology was the fastest. At this journal, a presubmission inquiry is required and the referees are often lined up in advance. The staff are keen to publish rapidly, hence the name, Current Biology. I was amazed at Nature Cell Biology having such a short time from Received-to-Acceptance. The delay in Review-to-Acceptance comes from multiple rounds of revision and from doing extra experimental work. Anecdotally, it seems that the review at Nature Cell Biol should be just as lengthy as at Dev Cell or EMBO J. I wonder if the received date is accurate… it is possible to massage this date by first rejecting the paper, but allowing a resubmission. Then using the resubmission date as the received date [Edit: see below]. One way to legitimately limit this delay is to only allow a certain time for revisions and only allow one round of corrections. This is what happens at J Cell Biol, unfortunately we don’t have this data to see how effective this is.

lagtimes2

How has the lag time changed over the last ten years?

Have the slow journals always been slow? When did they become slow?  Again three plots are shown (side-by-side) depicting the Rec-Pub and then the Rec-Acc and Acc-Pub time. Now the intensity of red or blue shows the data for each year (2014 is the most intense colour). Again you can see that the dataset is not complete with missing date information for Traffic for many years, for example.

Interestingly, the publication lag has been pretty constant for some journals but not others. Cell Stem Cell and Dev Cell (but not the mothership – Cell) have seen increases as have Nature Cell Biology and Nature Methods. On the whole Acc-Pub times are stable, except for Nature Methods which is the only journal in the list to see an increase over the time period. This just leaves us with the task of drawing up a ranked list of the fastest to the slowest journal. Then we can see which of these journals is likely to delay dissemination of our work the most.

The Median times (in days) for 2013 are below. The journals are ranked in order of fastest to slowest for Received-to-Publication. I had to use 2013 because EMBO J is missing data for 2014.

Journal Rec-Pub Rec-Acc Acc-Pub
Curr Biol 159 99.5 56
Nat Methods 192 125 68
Cell 195 169 35
EMBO J 203 142 61
Nature Cell Biol 237 180 59
Traffic 244 161 86
Dev Cell 247 204 43
Cell Stem Cell 284 205 66

You’ll see that only Cell Stem Cell is over the threshold where it would be faster to conceive and give birth to a human being than to publish a paper there (on average). If the additional time wasted in submitting your manuscript to other journals is factored in, it is likely that most papers are at least on a par with the median gestation time.

If you are wondering why eLife is missing… as a new journal it didn’t have ten years worth of data to analyse. It did have a reasonably complete set for 2013 (but Rec-Acc only). The median time was 89 days, beating Current Biology by 10.5 days.

Methods

Please check out Neil Saunders’ post on how to do this. I did a PubMed search for (journal1[ta] OR journal2[ta] OR ...) AND journal article[pt] to make sure I didn’t get any reviews or letters etc. I limited the search from 2003 onwards to make sure I had 10 years of data for the journals that deposited it. I downloaded the file as xml and I used Ruby/Nokogiri to parse the file to csv. Installing Nokogiri is reasonably straightforward, but the documentation is pretty impenetrable. The ruby script I used was from Neil’s post (step 3) with a few lines added:


#!/usr/bin/ruby

require 'nokogiri'

f = File.open(ARGV.first)
doc = Nokogiri::XML(f)
f.close

doc.xpath("//PubmedArticle").each do |a|
r = ["", "", "", "", "", "", "", "", "", "", ""]
r[0] = a.xpath("MedlineCitation/Article/Journal/ISOAbbreviation").text
r[1] = a.xpath("MedlineCitation/PMID").text
r[2] = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='received']/Year").text
r[3] = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='received']/Month").text
r[4] = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='received']/Day").text
r[5] = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='accepted']/Year").text
r[6] = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='accepted']/Month").text
r[7] = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='accepted']/Day").text
r[8] = a.xpath("MedlineCitation/Article/Journal/JournalIssue/Pubdate/Year").text
r[9] = a.xpath("MedlineCitation/Article/Journal/JournalIssue/Pubdate/Month").text
r[10] = a.xpath("MedlineCitation/Article/Journal/JournalIssue/Pubdate/Day").text
puts r.join(",")
end

and then executed as described. The csv could then be imported into IgorPro and processed. Neil’s post describes a workflow for R, or you could use Excel or whatever at this point. As he notes, quite a few records are missing the date information and some of it is wrong, i.e. published before it was accepted. These need to be cleaned up. The other problem is that the month is sometimes an integer and sometimes a three-letter code. He uses lubridate in R to get around this, a loop-replace in Igor is easy to construct and even Excel can handle this with an IF statement, e.g. IF(LEN(G2)=3,MONTH(1&LEFT(G2,3)),G2) if the month is in G2. Good luck!

Edit 9/3/15 @ 17:17 several people (including Deborah Sweet and Bernd Pulverer from Cell Press/Cell Stem Cell and EMBO, respectively) have confirmed via Twitter that some journals use the date of resubmission as the submitted date. Cell Stem Cell and EMBO journals use the real dates. There is no way to tell whether a journal does this or not (from the deposited data). Stuart Cantrill from Nature Chemistry pointed out that his journal do declare that they sometimes reset the clock. I’m not sure about other journals. My own feeling is that – for full transparency – journals should 1) record the actual dates of submission, acceptance and publication, 2) deposit them in PubMed and add them to the paper. As pointed out by Jim Woodgett, scientists want the actual dates on their paper, partly because they are the real dates, but also to claim priority in certain cases. There is a conflict here, because journals might appear inefficient if they have long publication lag times. I think this should be an incentive for Editors to simplify revisions by giving clear guidance and limiting successive revision cycles. (This Edit was corrected 10/3/15 @ 11:04).

The post title is taken from “Waiting to Happen” by Super Furry Animals from the “Something 4 The Weekend” single.

If and When: publishing and productivity in the lab

I thought I’d share this piece of analysis looking at productivity of people in the lab. Here, productivity means publishing papers. This is unfortunate since some people in my lab have made some great contributions to other peoples’ projects or have generally got something going, but these haven’t necessarily transferred into print. Also, the projects people have been involved in have varied in toughness. I’ve had students on an 8-week rotation who just collected some data which went straight into a paper and I’ve had postdocs toil for two years trying to purify a protein complex… I wasn’t looking to single out who was the most productive person (I knew who that was already), but I was interested to look at other things, e.g. how long is it on average from someone joining the lab to them publishing their first paper?

The information below would be really useful if it was available for all labs. When trainees are looking for jobs, it would be worth knowing the productivity of a given lab. This can be very hard to discern, since it is difficult to see how many people have worked in the lab and for how long. Often all you have to go on is the PubMed record of the PI. Two papers published per year in the field of cell biology is a fantastic output, but not if you have a lab of thirty people. How likely are you – as a future lab member – to publish your own 1st author paper? This would be very handy to know before applying to a lab.

I extracted the date of online publication for all of our papers as well as author position and other details. I had a record of start and end dates for everyone in the lab. Although as I write this, I realise that I’ve left one person off by mistake. All of this could be dumped into IgorPro and I wrote some code to array the data in a plot vs time. People are anonymous – they’ll know who they are, if they’re reading. Also we have one paper which is close to being accepted so I included this although it is not in press yet.

RoylePapers1

The first plot shows when people joined the lab and how long they stayed. Each person has been colour-coded according to their position. The lines represent their time spent in the lab. Some post-graduates (PG) came as a masters student for a rotation and then came back for a PhD and hence have a broken line. Publications are shown by markers according to when a paper featuring that person was published online. There’s a key to indicate a paper versus review/methods paper and if the person was 1st author or not. We have published two papers that I would call collaborative, i.e. a minor component from our lab. Not shown here are the publications that are authored by me but don’t feature anyone else working in the lab.

This plot starts when I got my first independent position. As you can see it was ~1 year until I was able to recruit my first tech. It was almost another 2 years before we published our first paper. Our second one took almost another 2 years! What is not factored in here is the time spent waiting to get something published – see here. The early part of getting a lab going is tough, however you can see that once we were up-and-running the papers came out more quickly.

RoylePapers2

In the second plot, I offset the traces to show duration in the lab and relative time to publication from the start date in the lab. I also grouped people according to their position and ranked them by duration in the lab. This plot is clearer for comparing publication rates and lag to getting the first paper etc. This plot shows quite nicely that lots of people from the lab publish “posthumously”. This is thanks to the publication lag but also to things not getting finished or results that needed further experiments to make sense etc. Luckily the people in my lab have been well organised, which has made it possible to publish things after they’ve left.

I was surprised to see that five people published within ~1.5 years of joining the lab. However, in each case the papers came about because of some groundwork by other people.

I think the number of people and the number of papers are both too low to begin to predict how long someone will take to get their first paper out, but these plots give a sense of how tough it is and how much effort and work is required to make it into print.

Methods: To recreate this for your own lab, you just need a list of lab members with start and end dates. The rest can be extracted from PubMed. Dataviz fans may be interested that the colour scheme is taken from Paul Tol’s guide.

The post title comes from “If and When” by The dB’s from Ride The Wild Tomtom

Sticky End

We have a new paper out! You can access it here.

The work was mainly done by Cristina Gutiérrez Caballero, a post-doc in the lab. We had some help from Selena Burgess and Richard Bayliss at the University of Leicester, with whom we have an ongoing collaboration.

The paper in a nutshell

We found that TACC3 binds the plus-ends of microtubules via an interaction with ch-TOG. So TACC3 is a +TIP.

What is a +TIP?

EBTACCMitotic
EB3 (red) and TACC3 (green) at the tips of microtubules in mitotic spindle

This is a term used to describe proteins that bind to the plus-ends of microtubules. Microtubules are a major component of the cell’s cytoskeleton. They are polymers of alpha/beta-tubulin that grow and shrink, a feature known as dynamic instability. A microtubule has polarity, the fast growing end is known as the plus-end, and the slower growing end is referred to as the minus-end. There are many proteins that bind to the plus-end and these are termed +TIPs.

OK, so what are TACC3 and ch-TOG?

They are two proteins found on the mitotic spindle. TACC3 is an acronym for transforming acidic coiled-coil protein 3, and ch-TOG stands for colonic hepatic tumour overexpressed gene. As you can tell from the names they were discovered due to their altered expression in certain human cancers. TACC3 is a well-known substrate for Aurora A kinase, which is an enzyme that is often amplified in cancer. The ch-TOG protein is thought to be a microtubule polymerase, i.e. an enzyme that helps microtubules grow. In the paper, we describe how TACC3 and ch-TOG stick together at the microtubule end. TACC3 and ch-TOG are at the very end of the microtubule, they move ahead of other +TIPs like “end-binding proteins”, e.g. EB3.

What is the function of TACC3 as a +TIP?

We think that TACC3 is piggybacking on ch-TOG while it is acting as a polymerase, but any biological function or consequence of this piggybacking was difficult to detect. We couldn’t see any clear effect on microtubule dynamics when we removed or overexpressed TACC3. We did find that loss of TACC3 affects how cells migrate, but this is not likely to be due to a change in microtubule dynamics.

I thought TACC3 and ch-TOG were centrosomal proteins…

In the paper we look again at this and find that there are different pools of TACC3, ch-TOG and clathrin (alone and in combination) and describe how they reside in different places in the cell. Although ch-TOG is clearly at centrosomes, we don’t find TACC3 at centrosomes, although it is on microtubules that cluster near the centrosomes at the spindle pole. TACC3 is often described as a centrosomal protein in lots of other papers, but this is quite misleading.

What else?

NeonCellWe were on the cover – whatever that means in the digital age! We imaged a cell expressing tagged EB3 proteins, EB3 is another +TIP. We coloured consecutive frames different colours and the result looked pretty striking. Biology Open picked it as their cover, which we were really pleased about. Our paper is AOP at the moment and so hopefully they won’t change their mind by the time it appears in the next issue.

Preprinting

This is the second paper that we have deposited as a preprint at bioRxiv (not counting a third paper that we preprinted after it was accepted). I was keen to preprint this particular paper because we became aware that two other groups had similar results following a meeting last summer. Strangely, a week or so after preprinting and submitting to a journal, a paper from a completely different group appeared with a very similar finding! We’d been “scooped”. They had found that the Xenopus homologue of TACC3 was a +TIP in retinal neuronal cultures. The other group had clearly beaten us to it, having submitted their paper some time before our preprint. The reviewers of our paper complained that our data was no longer novel and our paper was rejected. This was annoying because there were lots of novel findings in our paper that weren’t in theirs (and vice versa). The reviewers did make some other constructive suggestions that we incorporated into the manuscript. We updated our preprint and then submitted to Biology Open. One advantage of the preprinting process is that the changes we made can be seen by all. Biology Open were great and took a decision based on our comments from the other journal and the changes we had made in response to them. Their decision to provisionally accept the paper was made in four days. Like our last experience publishing in Biology Open, it was very positive.

References

Gutiérrez-Caballero, C., Burgess, S.G., Bayliss, R. & Royle, S.J. (2015) TACC3-ch-TOG track the growing tips of microtubules independently of clathrin and Aurora-A phosphorylation. Biol. Open doi:10.1242/​bio.201410843.

Nwagbara, B. U., Faris, A. E., Bearce, E. A., Erdogan, B., Ebbert, P. T., Evans, M. F., Rutherford, E. L., Enzenbacher, T. B. and Lowery, L. A. (2014) TACC3 is a microtubule plus end-tracking protein that promotes axon elongation and also regulates microtubule plus end dynamics in multiple embryonic cell types. Mol. Biol. Cell 25, 3350-3362.

The post title is taken from the last track on The Orb’s U.F.Orb album.

Joining A Fanclub

When I started this blog, my plan was to write about interesting papers or at least blog about the ones from my lab. This post is a bit of both.

I was recently asked to write a “Journal Club” piece for Nature Reviews Molecular Cell Biology, which is now available online. It’s paywalled unfortunately. It’s also very short, due to the format. For these reasons, I thought I’d expand a bit on the papers I highlighted.

I picked two papers from Dick McIntosh’s group, published in J Cell Biol in the early 1990s as my subject. The two papers are McDonald et al. 1992 and Mastronarde et al. 1993.

Almost everything we know about the microanatomy of mitotic spindles comes from classical electron microscopy (EM) studies. How many microtubules are there in a kinetochore fibre? How do they contact the kinetochore? These questions have been addressed by EM. McIntosh’s group in Boulder, Colorado have published so many classic papers in this area, but there are many more coming from Conly Rieder, Alexey Khodjakov, Bruce McEwen and many others. Even with the advances in light microscopy which have improved spatial resolution (resulting in a Nobel Prize last year), EM is the only way to see individual microtubules within a complex subcellular structure like the mitotic spindle. The title of the piece, Super-duper resolution imaging of mitotic microtubules, is a bit of a dig at the fact that EM still exceeds the resolution available from super-resolution light microscopy. It’s not the first time that this gag has been used, but I thought it suited the piece quite well.

There are several reasons to highlight these papers over other electron microscopy studies of mitotic spindles.

It was the first time that 3D models of microtubules in mitotic spindles were built from electron micrographs of serial sections. This allowed spatial statistical methods to be applied to understand microtubule spacing and clustering. The software that was developed by David Mastronarde to do this was later packaged into IMOD. This is a great software suite that is actively maintained, free to download and is essential for doing electron microscopy. Taking on the same analysis today would be a lot faster, but still somewhat limited by cutting sections and imaging to get the resolution required to trace individual microtubules.

kfibreThe paper actually showed that some of the microtubules in kinetochore fibres travel all the way from the pole to the kinetochore, and that interpolar microtubules invade the bundle occasionally. This was an open question at the time and was really only definitively answered thanks to the ability to digitise and trace individual microtubules using computational methods.

The final thing I like about these papers is that it’s possible to reproduce the analysis. The methods sections are wonderfully detailed and of course the software is available to do similar work. This is in contrast to most papers nowadays, where it is difficult to understand how the work has been done in the first place, let alone to try and reproduce it in your own lab.

David Mastronarde and Dick McIntosh kindly commented on the piece that I wrote and also Faye Nixon in my lab made some helpful suggestions. There’s no acknowledgement section, so I’ll thank them all here.

References

McDonald, K. L., O’Toole, E. T., Mastronarde, D. N. & McIntosh, J. R. (1992) Kinetochore microtubules in PTK cells. J. Cell Biol. 118, 369—383

Mastronarde, D. N., McDonald, K. L., Ding, R. & McIntosh, J. R. (1993) Interpolar spindle microtubules in PTK cells. J. Cell Biol. 123, 1475—1489

Royle, S.J. (2015) Super-duper resolution imaging of mitotic microtubules. Nat. Rev. Mol. Cell. Biol. doi:10.1038/nrm3937 Published online 05 January 2015

The post title is taken from “Joining a Fanclub” by Jellyfish from their classic second and final LP “Spilt Milk”.

A Day In The Life II

I have been doing paper of the day (#potd) again in 2014. See my previous post about this.

My “rules” for paper of the day are:

  1. Read one paper each working day.
  2. If I am away, or reviewing a paper for a journal or colleague, then I get a pass.
  3. Read it sufficiently to be able to explain it to somebody else, i.e. don’t just scan the abstract and look at the figures. Really read it and understand it. Scan and skim as many other papers as you normally would!
  4. Only papers reporting primary research count. No reviews/opinion pieces etc.
  5. If it was really good or worth telling people about – tweet about it.
  6. Make a simple database in Excel – this helps you keep track, make notes about the paper (to see if you meet #3) and allows you to find the paper easily in the future (this last point turned out to be very useful).

This year has been difficult, especially sticking to #3. My stats for 2014 are:

  • 73% success rate. Down from 85% in 2013
  • Stats errors in 36% of papers I read!
  • 86% of papers were from 2014

Following last year, I wasn’t so surprised by the journals that the papers appeared in:

  1. eLife
  2. J Cell Biol
  3. Mol Biol Cell
  4. Dev Cell
  5. Nature Methods
  6. J Cell Sci
  7. J Neurosci
  8. Nature Cell Biol
  9. Traffic
  10. Curr Biol
  11. Nature
  12. Nature Comm
  13. Science

According to my database I only read one paper in Cell this year. I certainly have lots of them in “Saved for later” in Feedly (which is a black hole from which papers rarely emerge to be read). It’s possible that the reason Cell, Nature and Science are low on the list is that I might quickly glance at papers in those journals but not actually read them for #potd. Last year eLife was at number 9 and this year it is at number 1. This journal is definitely publishing a lot of exciting cell biology and also the lens format is very nice for reading.

I think I’ll try to continue this in 2015. The main thing it has made me realise is how few papers I read (I mean really read). I wonder if students and postdocs are actually the main consumers of the literature. If this is correct, do PIs rely on “subsistence reading”, i.e. when they write their own papers and check the immediate literature? Is their deep reading done only during peer reviewing other people’s work? Or do PIs rely on a constant infusion of the latest science at seminars and at meetings?

Half Right

I was talking to a speaker visiting our department recently. While discussing his postdoc work from years ago, he told me about the identification of the sperm factor that causes calcium oscillations in the egg at fertilisation. It was an interesting tale because the group who eventually identified the factor – now widely accepted as PLCzeta – had earlier misidentified the factor, naming it oscillin.

The oscillin paper was published in Nature in 1996 and the subsequent (correct) paper was published in Development in 2002. I wondered what the citation profiles of these papers looks like now.

plcz

As you can see there was intense interest in the first paper that quickly petered out, presumably when people found out that oscillin was a contaminant and not the real factor. The second paper on the other hand has attracted a large number of citations and continues to do so 12 years later – a sign of a classic paper. However, the initial spike in citations was not as high as the Nature paper.

The impact factor of Nature is much higher than that of Development. I’ve often wondered if this is due to a sociological phenomenon: people like to cite Cell/Nature/Science papers rather than those at other journals and this bumps up the impact factor. Before you comment, yes I know there are other reasons, but the IFs do not change much over time and I wonder whether journal hierarchy explains the hardiness of IFs over time. Anyway, these papers struck me as a good test of the idea… Here we have essentially the same discovery, reported by the same authors. The only difference here is the journal (and that one paper is six years after the other). Normally it is not possible to test if the journal influences citations because a paper cannot erased and republished somewhere else. The plot suggests that Nature papers inherently attract much more cites than those in Development, presumably because of the exposure of publishing there. From the graph, it’s not difficult to see that even if a paper turns out not to be right, it can still boost the IF of the journal during the window of assessment. Another reason not to trust journal impact factors.

I can’t think of any way to look at this more systematically to see if this phenomenon holds true. I just thought it was interesting, so I’ll leave it here.


The post title is taken from Half Right by Elliott Smith from the posthumous album New Moon. Bootlegs have the title as Not Half Right, which would also be appropriate.

What The World Is Waiting For

The transition for scientific journals from print to online has been slow and painful. And it is not yet complete. This week I got an RSS alert to a “new” paper in Oncogene. When I downloaded it, something was familiar… very familiar… I’d read it almost a year ago! Sure enough, the AOP (ahead of print or advance online publication) date for this paper was September 2013 and here it was in the August 2014 issue being “published”.

I wondered why a journal would do this. It is possible that delaying actual publication would artificially boost the Impact Factor of a journal because there is a delay before citations roll in and citations also peak after two years. So if a journal delays actual publication, then the Impact Factor assessment window captures a “hotter” period when papers are more likely to generate more citations*. Richard Sever (@cshperspectives) jumped in to point out a less nefarious explanation – the journal obviously has a backlog of papers but is not allowed to just print more papers to catch up, due to page budgets.

There followed a long discussion about this… which you’re welcome to read. I was away giving a talk and missed all the fun, but if I may summarise on behalf of everybody: isn’t it silly that we still have pages – actual pages, made of paper – and this is restricting publication.

I wondered how Oncogene got to this position. I retrieved the data for AOP and actual publication for the last five years of papers at Oncogene excluding reviews, from Pubmed. Using oncogene[ta] NOT review[pt] as a search term. The field DP has the date published (the “issue date” that the paper appears in print) and PHST has several interesting dates including [aheadofprint]. These could be parsed and imported into IgorPro as 1D waves. The lag time from AOP to print could then be calculated. I got 2916 papers from the search and was able to get data for 2441 papers.

OncogeneLagTimeYou can see for this journal that the lag time has been stable at around 300 days (~10 months) for issues published since 2013. So a paper AOP in Feb 2012 had to wait over 10 months to make it into print. This followed a linear period of lag time growth from mid-2010.

I have no links to Oncogene and don’t particularly want to single them out. I’m sure similar lags are happening at other print journals. Actually, my only interaction with Oncogene was that they sent this paper of ours out to review in 2011 (it got two not-negative-but-admittedly-not-glowing reviews) and then they rejected it because they didn’t like the cell line we used. I always thought this was a bizarre decision: why couldn’t they just decide that before sending it to review and wasting our time? Now, I wonder whether they were not keen to add to their increasing backlog of papers at their journal? Whatever the reason, it has put me off submitting other papers there.

I know that there are good arguments for continuing print versions of journals, but from a scientist’s perspective the first publication is publication. Any subsequent versions are simply redundant and confusing.

*Edit: Alexis Verger (@Alexis_Verger) pointed me to a paper which describes that, for neuroscience journals, the lag time has increased over time. Moreover, the authors suggest that this is for the purpose of maximising Journal Impact Factor.

The post title comes from the double A-side Fools Gold/What The World Is Waiting For by The Stone Roses.

Six Plus One

Last week, ALM (article-level metric) data for PLoS journals were uploaded to Figshare with the invitation to do something cool with it.

Well, it would be rude not to. Actually, I’m one of the few scientists on the planet that hasn’t published a paper with Public Library of Science (PLoS), so I have no personal agenda here. However, I love what PLoS is doing and what it has achieved to disrupt the scientific publishing system. Anyway, what follows is not in any way comprehensive, but I was interested to look at a few specific things:

  1. Is there a relationship between Twitter mentions and views of papers?
  2. What is the fraction of views that are PDF vs HTML?
  3. Can citations be predicted by more immediate article level metrics?

The tl;dr version is 1. Yes. 2. ~20%. 3. Can’t say but looks unlikely.

1. Twitter mentions versus paper views

All PLoS journals are covered. The field containing paper views is (I think) “Counter” this combines views of HTML and PDF (see #2). A plot of Counter against Publication Date for all PLoS papers (upper plot) shows that the number of papers published has increased dramatically since the introduction of PLoS ONE in 2007. There is a large variance in number of views, which you’d expect and also, the views tail off for the most recent papers, since they have had less time to accumulate views. Below is the same plot where the size and colour of the markers reflects their Twitter score (see key). There’s a sharp line that must correspond to the date when Twitter data was logged as an ALM. There’s a scattering of mentions after this date to older literature, but one 2005 paper stands out – Ioannidis’s paper Why Most Published Research Findings Are False. It has a huge number of views and a large twitter score, especially considering that it was a seven year old paper when they started recording the data. A pattern emerges in the post-logging period. Papers with more views are mentioned more on Twitter. The larger darker markers are higher on the y-axis. Mentioning a paper on Twitter is sure to generate views of the paper, at some (unknown) conversion rate. However, as this is a single snapshot, we don’t know if Twitter mentions drive more downloads of papers, or whether more “interesting”/highly downloaded work is talked about more on Twitter.

twitter_counter

2. Fraction of PDF vs HTML views

I asked a few people what they thought the download ratio is for papers. Most thought 60-75% as PDF versus 40-25% HTML. I thought it would be lower, but I was surprised to see that it is, at most, 20% for PDF. The plot below shows the fraction of PDF downloads (counter_pdf/(counter_pdf+counter_html)). For all PLoS journals, and then broken down for PLoS Biol, PLoS ONE.

PDF-FractionThis was a surprise to me. I have colleagues who don’t like depositing post-print or pre-print papers because they say that they prefer their work to be seen typeset in PDF format. However, this shows that, at least for PLoS journals, the reader is choosing to not see a typeset PDF at all, but a HTML version.

Maybe the PLoS PDFs are terribly formatted and 80% people don’t like them. There is an interesting comparison that can be done here, because all papers are deposited at Pubmed Central (PMC) and so the same plot can be generated for the PDF fraction there. The PDF format is different to PLoS and so we can test the idea that people prefer HTML over PDF at PLoS because they don’t like the PLoS format.

PMCGraph

The fraction of PDF downloads is higher, but only around 30%. So either the PMC format is just as bad, or this is the way that readers like to consume the scientific literature. A colleague mentioned that HTML views are preferable to PDF if you want to actually want to do something with the data, e.g. for meta-analysis. This could have an effect. HTML views could be skim reading, whereas PDF is for people who want to read in detail… I wonder whether these fractions are similar at other publishers, particularly closed access publishers?

3. Citation prediction?

ALMs are immediate whereas citations are slow. If we assume for a moment that citations are a definitive means to determine the impact of a paper (which they may not be), then can ALMs predict citations? This would make them very useful in the evaluation of scientists and their endeavours. Unfortunately, this dataset is not sufficient to answer this properly, but with multiple timepoints, the question could be investigated. I looked at number of paper downloads and also the Mendeley score to see how these two things may foretell citations. What follows is a strategy to do this is an unbiased way with few confounders.

scopus v citesThe dataset has a Scopus column, but for some reason these data are incomplete. It is possible to download data (but not on this scale AFAIK) for citations from Web of Science and then use the DOI to cross-reference to the other dataset. This plot shows the Scopus data as a function of “Total Citations” from Web of Science, for 500 papers. I went with the Web of Science data as this appears more robust.

The question is whether there is a relationship between downloads of a paper (Counter, either PDF or HTML) and citations. Or between Mendeley score and citations. I figured that downloading, Mendeley and citation, show three progressive levels of “commitment” to a paper and so they may correlate differently with citations. Now, to look at this for all PLoS journals for all time would be silly because we know that citations are field-specific, journal-specific, time-sensitive etc. So I took the following dataset from Web of Science: the top 500 most-cited papers in PLoS ONE for the period of 2007-2010 limited to “cell biology”. By cross-referencing I could check the corresponding values for Counter and for Mendeley.

CounterMendelyvsCites

I was surprised that the correlation was very weak in both cases. I thought that the correlation would be stronger with Mendeley, however signal-to-noise is a problem here with few users of the service compared with counting downloads. Below each plot is a ranked view of the papers, with the Counter or Mendeley data presented as a rolling average. It’s a very weak correlation at best. Remember that this is post-hoc. Papers that have been cited more would be expected to generate more views and higher Mendeley scores, but this is not necessarily so. Predicting future citations based on Counter or Mendeley, will be tough. To really know if this is possible, this approach needs to be used with multiple ALM timepoints to see if there is a predictive value for ALMs, but based on this single timepoint, it doesn’t seem as though prediction will be possible.

Again, looking at this for a closed access journal would be very interesting. The most-downloaded paper in this set, had far more views (143,952) than other papers cited a similar number of times (78). The paper was this one which I guess is of interest to bodybuilders! Presumably, it was heavily downloaded by people who probably are not in a position to cite the paper. Although these downloads didn’t result in extra citations, this paper has undeniable impact outside of academia. Because PLoS is open access, the bodybuilders were able to access the paper, rather than being met by a paywall. Think of the patients who are trying to find out more about their condition and can’t read any of the papers… The final point here is that ALMs have their own merit, irrespective of citations, which are the default metric for judging the impact of our work.

Methods: To crunch the numbers for yourself, head over to Figshare and download the csv. A Web of Science subscription is needed for the citation data. All the plots were generated in IgorPro, but no programming is required for these comparisons and everything I’ve done here can be easily done in Excel or another package.

Edit: Matt Hodgkinson (@mattjhodgkinson) Snr Ed at PLoS ONE told me via Twitter that all ALM data (periodically updated) are freely available here. This means that some of the analyses I wrote about are possible.

The post title comes from Six Plus One a track on Dad Man Cat by Corduroy. Plus is as close to PLoS as I could find in my iTunes library.

Pay You Back In Time

A colleague once told me that they only review three papers per year and then refuse any further requests for reviewing. Her reasoning was as follows:

  • I publish one paper a year (on average)
  • This paper incurs three peer reviews
  • Therefore, I owe “the system” three reviews.

It’s difficult to fault this logic. However, I think that as a senior scientist with a wealth of experience, the system would benefit greatly from more of her input. Actually, I don’t think she sticks rigorously to this and I know that she is an Academic Editor at a journal so, in fact she contributes much more to the system than she was letting on.

I thought of this recently when – in the space of one week – I got three peer review requests, which I accepted. I began to wonder about my own debit and credit in the peer review system. I only have reliable data from 2010.

Reviews incurred as an author are in gold (re-reviews are in pale gold), reviews completed as a peer are in purple (re-reviews are in pale purple). They are plotted cumulatively and the difference – or the balance – is shown by the markers. So, I have been in a constant state of owing the system reviews and I’m in no position to be turning down review requests.

In my defence, I was for two years Section Editor at BMC Cell Biology which means that I contributed more to the system that the plot shows. Another thing is reviews incurred/completed as a grant applicant/referee. I haven’t factored those in, but I think this would take the balance down further. I also comment on colleagues papers and grant applications.

Thinking back, I’ve only ever turned down a handful of peer review requests. Reasons being either that the work was too far outside my area of expertise or that I had a conflict of interest. I’ve never cited a balance of zero as a reason for not reviewing and this analysis shows that I’m not in this category.

In case any Editors are reading this… I’m happy to review work in my area, but please remember I currently have three papers to review!

The post title comes from a demo recording by The Posies that can be found on the At Least, At Last compilation on Not Lame Recordings.

Strange Things

I noticed something strange about the 2013 Impact Factor data for eLife.

Before I get onto the problem. I feel I need to point out that I dislike Impact Factors and think that their influence on science is corrosive. I am a DORA signatory and I try to uphold those principles. I admit that, in the past, I used to check the new Impact Factors when they were released, but no longer. This year, when the 2013 Impact Factors came out I didn’t bother to log on to take a look. A chance Twitter conversation with Manuel Théry (@ManuelTHERY) and Christophe Leterrier (@christlet) was my first encounter with the new numbers.

Huh? eLife has an Impact Factor?

For those that don’t know, the 2013 Impact Factor is worked out by counting the total number of 2013 cites to articles in a given journal that were published in 2011 and 2012. This number is divided by the number of “citable items” in that journal in 2011 and 2012.

Now, eLife launched in October 2012. So it seems unfair that it gets an Impact Factor since it only published papers for 12.5% of the window under scrutiny. Is this normal?

I looked up the 2013 Impact Factor for Biology Open, a Company of Biologists journal that launched in January 2012* and… it doesn’t have one! So why does eLife get an Impact Factor but Biology Open doesn’t?**

elife-JIFLooking at the numbers for eLife revealed that there were 230 citations in 2013 to eLife papers in 2011 and 2012. One of which was a mis-citation to an article in 2011. This article does not exist (the next column shows that there were no articles in 2011). My guess is that Thomson Reuters view this as the journal existing for 2011 and 2012, and therefore deserving of an Impact Factor. Presumably there are no mis-cites in the Biology Open record and it will only get an Impact Factor next year. Doesn’t this call into question the veracity of the database? I have found other errors in records previously (see here). I also find it difficult to believe that no-one checked this particular record given the profile of eLife.

elfie-citesPerhaps unsurprisingly, I couldn’t track down the rogue citation. I did look at the cites to eLife articles from all years in Web of Science, the Thomson Reuters database (which again showed that eLife only started publishing in Oct 2012). As described before there are spurious citations in the database. Josh Kaplan’s eLife paper on UNC13/Tomosyn managed to rack up 5 citations in 2004, some 9 years before it was published (in 2013)! This was along with nine other papers that somehow managed to be cited in 2004 before they were published. It’s concerning enough that these data are used for hiring, firing and funding decisions, but if the data are incomplete or incorrect this is even worse.

Summary: I’m sure the Impact Factor of eLife will rise as soon as it has a full window for measurement. This would actually be 2016 when the 2015 Impact Factors are released. The journal has made it clear in past editorials (and here) that it is not interested in an Impact Factor and won’t promote one if it is awarded. So, this issue makes no difference to the journal. I guess the moral of the story is: don’t take the Impact Factor at face value. But then we all knew that already. Didn’t we?

* For clarity, I should declare that we have published papers in eLife and Biology Open this year.

** The only other reason I can think of is that eLife was listed on PubMed right away, while Biology Open had to wait. This caused some controversy at the time. I can’t see why a PubMed listing should affect Impact Factor. Anyhow, I noticed that Biology Open got listed in PubMed by October 2012, so in the end it is comparable to eLife.

Edit: There is an update to this post here.

Edit 2: This post is the most popular on Quantixed. A screenshot of visitors’ search engine queries (Nov 2014)…

searches

The post title is taken from “Strange Things” from Big Black’s Atomizer LP released in 1986.