Some Things Last A Long Time

How long does it take to publish a paper?

The answer is – in our experience, at least – about 9 months.

That’s right, it takes about the same amount of time to have a baby as it does to publish a scientific paper. Discussing how we can make the publication process quicker is for another day. Right now, let’s get into the numbers.

The graphic shows the time taken from submission-to-publication for papers on which I am an author. I’m missing data for two papers (one from 1999 and one from 2002) and the Biol Open paper is published online but not yet “in print”, but mostly the information is complete. If you want to calculate this for your own papers; my advice would be to keep a spreadsheet of submission and decision dates as you go along… and archive your emails.

In the last analysis, a few people pointed out ways that the graphic could be improved, and I’ve now implemented these changes.

The graphic shows that the journey to publication is in four eras:

  1. Pre-time (before 0 on the x-axis): this is the time from first submission to the first journal. A dark time which involves rejection.
  2. Submission at the final journal (starting at time 0). Again, the orange periods are when the manuscript is with the journal and the green, when it is with us. Needless to say this green time is mainly spent doing experimental work (compare green periods for reviews and for papers)
  3. Acceptance! This is where the orange bar stops. The manuscript is then readied for publication (blank area).
  4. Published online. A purple period that ends with final publication in print.

Note that: i) the delays are more-or-less negated by preprinting provided deposition is before the first submission (grey line, for Biol Open paper), ii) these delay diagrams do not take into account the original drafting/rewriting cycle before the fist submission – nor the time taken to do the work!

So… how long does it take to publish a paper?

In the top right graph: the time from first submission to being published online is 250 days on average (median). This is shown by the blue bar. If we throw in the average time it takes to go from online to print (15 days) this gives 265 days. The average time for human gestation is 266 days. So it takes about the same amount of time to have a baby as it does to publish a paper! By contrast, reviews take only 121 days, equivalent to four lunar cycles (118 days).

My 2005 paper at Nature holds the record for the most protracted publication 399 days from submission to publication. The fastest publication is the most recent, our Biol Open paper was online 49 days after submission (it was also online 1 day before submission as a preprint).

In the bottom right graph: I added together the total time each paper was either with the journal, or with us, and plotted the average. The time from acceptance-to-publication online is shown stacked onto the “time with journal” column. You can see from this graphic that the lion’s share of the delay comes from revisions that we must do in order for a paper to be published. Multiple revisions and submissions also push these numbers up compared to the totals for reviews.

How representative are these numbers?

This is a small dataset at many different journals and so it is difficult to conclude much. With this analysis, I was hoping to identify ‘slow journals’ that we should avoid and also to think about our publication strategy (as much as a crap shoot can have a strategy). The whole process is stochastic and I don’t see any reason to change the way that we navigate the system. Having said this, I can’t see us doing any more methods/book chapters, as they are just so slow.

Just over half of our papers have some “pre-time”, i.e. they got rejected from at least one other journal before finding a home. A colleague of mine likes to say:

“if your paper is accepted at the first journal you send it to, you sent it to the wrong place”

One thing for sure is that publication takes a long time. And I don’t think our experience is uncommon. The pace of scientific publishing has been described as glacial by Leslie Vosshall and I don’t disagree with this. I think the 9 months figure is probably representative for most areas of biology. I know that other scientists in my field, who have more tenacity for rejections and for slugging it out at high impact journals, have much longer times from 1st submission to acceptance. In my opinion, wasting even more time chasing publication is crazy, counter-productive and demotivating for the people in the lab.

The irony in all this is that, even though we are working at the absolute bleeding edge of science with all of this technology at our disposal, our methods for reporting science are badly out of date. And with that I’ll push the “publish” button and this will be online…

The title of this post comes from ‘Some Things Last A Long Time’ by Daniel Johnston from his LP ‘1990’.

I’m Gonna Crawl

Fans of data visualisation will know the work of Edward Tufte well. His book “The Visual Display of Quantitative Information” is a classic which covers the history and the principals of conveying data in a concise way, that is easy to interpret. He is also credited with two different dataviz techniques: sparklines and image quilts. It was these two innovations that came to mind when I was discussing some cell migration results generated in our lab.

Sparklines are small displays of 1D information versus time to highlight the profile (think: stocks and shares).

Image quilts are arrays of images that together quickly provide you with an overview (think: Google Images results).

Analysing cell migration generates ‘tracks’ of many cells as they move around a 2D surface. Tracks are pairs of XY co-ordinates at different time points. We want to understand how these tracks change if we do something to the cells, e.g. knock-down a particular protein. There are many ways to analyse this. Such as: looking at the speed of migration, their directionality, etc. etc. When we were looking at lots of tracks, all jumbled up, I thought of sparklines and of image quilts and thought the easiest way to compare a control and test group would be to generate something similar.

We start out with many tracks within a field:

 

overviewIt’s difficult to see what is happening here, so it needs to be simplified.

I wrote a couple of procedures in IgorPro that calculated the cumulative distance that each cell had migrated at a given time point (say, the end of the movie). These cumulative distances were then ranked and then the corresponding cells were arrayed in the x-dimension according to how far they migrated. This was a little bit tricky to do, but that’s another story.

 

This plot shows the tracks with the shortest/slowest to the left and the furthest/fastest to the right. This can then be compared to a test set and differences become apparent. However, we need to look at many tracks and expanding these “sparklines” further is not practical – we want to provide an overview.

Accordingly, I wrote another procedure to array them in an XY array with a given spacing between the start points. This should give an “image quilt” feel.

I added gridlines to indicate the start position. The result is that a nice overview is seen and differences between groups can be easily seen at first glance (or not seen if there is no effect!).

This method works well to compare control and test groups that have a similar number of cells. If N is different (say, more than 10%), we need to take a random sample of tracks and array those to get a feel for what’s happening. Obviously the tracks could be arrayed according whatever parameter is required, e.g. highest speed, most directional etc. etc.

One thought is to do a further iteration where the tracks are oriented so that the start and end points are at the same point in X, or oriented so that the tracks have the same starting trajectory. As it is, the mix of trajectories spoils the ease of interpretation.

Obviously, this can be applied to tracks of anything: growing and shrinking microtubules, endosome/lysosome movement etc. etc.

Any suggestions for improvements are welcome, but I think this is a quick and easy way to just eyeball the data to see if there are any differences before calculating any other parameters. I thought I’d put the idea out there – maybe together with the code if there is any interest.

The post title is from I’m Gonna Crawl – Led Zeppelin from their In Through The Out Door LP