Pay You Back In Time

A colleague once told me that they only review three papers per year and then refuse any further requests for reviewing. Her reasoning was as follows:

  • I publish one paper a year (on average)
  • This paper incurs three peer reviews
  • Therefore, I owe “the system” three reviews.

It’s difficult to fault this logic. However, I think that as a senior scientist with a wealth of experience, the system would benefit greatly from more of her input. Actually, I don’t think she sticks rigorously to this and I know that she is an Academic Editor at a journal so, in fact she contributes much more to the system than she was letting on.

I thought of this recently when – in the space of one week – I got three peer review requests, which I accepted. I began to wonder about my own debit and credit in the peer review system. I only have reliable data from 2010.

Reviews incurred as an author are in gold (re-reviews are in pale gold), reviews completed as a peer are in purple (re-reviews are in pale purple). They are plotted cumulatively and the difference – or the balance – is shown by the markers. So, I have been in a constant state of owing the system reviews and I’m in no position to be turning down review requests.

In my defence, I was for two years Section Editor at BMC Cell Biology which means that I contributed more to the system that the plot shows. Another thing is reviews incurred/completed as a grant applicant/referee. I haven’t factored those in, but I think this would take the balance down further. I also comment on colleagues papers and grant applications.

Thinking back, I’ve only ever turned down a handful of peer review requests. Reasons being either that the work was too far outside my area of expertise or that I had a conflict of interest. I’ve never cited a balance of zero as a reason for not reviewing and this analysis shows that I’m not in this category.

In case any Editors are reading this… I’m happy to review work in my area, but please remember I currently have three papers to review!

The post title comes from a demo recording by The Posies that can be found on the At Least, At Last compilation on Not Lame Recordings.

Strange Things – update

My post on the strange data underlying the new impact factor for eLife was read by many people. Thanks for the interest and for the comments and discussion that followed. I thought I should follow up on some of the issues raised in the post.

To recap:

  1. eLife received a 2013 Impact Factor despite only publishing 27 papers in the last three months of the census window. Other journals, such as Biology Open did not.
  2. There were spurious miscites to papers before eLife published any papers. I wondered whether this resulted in an early impact factor.
  3. The Web of Knowledge database has citations from articles in the past referring to future articles!

1. Why did eLife get an early Impact Factor? It turns out that there is something called a partial Impact Factor.  This is where an early Impact Factor is awarded to some journals in special cases. This is described here in a post at Scholarly Kitchen. Cell Reports also got an early Impact Factor and Nature Methods got one a few years ago (thanks to Daniel Evanko for tweeting about Nature Methods’ partial Impact Factor). The explanation is that if a journal is publishing papers that are attracting large numbers of citations it gets fast-tracked for an Impact Factor.

2. In a comment, Rafael Santos pointed out that the miscites were “from a 2013 eLife paper to an inexistent 2010 eLife paper, and another miscite from a 2013 PLoS Computational Biology paper to an inexistent 2011 eLife paper”. The post at Scholarly Kitchen confirms that citations are not double-checked or cleaned up at all by Thomson-Reuters. It occurred to me that journals looking to game their Impact Factor could alter the year for citations to papers in their own journal in order to inflate their Impact Factor. But no serious journal would do that – or would they?

3. This is still unexplained. If anybody has any ideas (other than time travel) please leave a comment.

Strange Things

I noticed something strange about the 2013 Impact Factor data for eLife.

Before I get onto the problem. I feel I need to point out that I dislike Impact Factors and think that their influence on science is corrosive. I am a DORA signatory and I try to uphold those principles. I admit that, in the past, I used to check the new Impact Factors when they were released, but no longer. This year, when the 2013 Impact Factors came out I didn’t bother to log on to take a look. A chance Twitter conversation with Manuel Théry (@ManuelTHERY) and Christophe Leterrier (@christlet) was my first encounter with the new numbers.

Huh? eLife has an Impact Factor?

For those that don’t know, the 2013 Impact Factor is worked out by counting the total number of 2013 cites to articles in a given journal that were published in 2011 and 2012. This number is divided by the number of “citable items” in that journal in 2011 and 2012.

Now, eLife launched in October 2012. So it seems unfair that it gets an Impact Factor since it only published papers for 12.5% of the window under scrutiny. Is this normal?

I looked up the 2013 Impact Factor for Biology Open, a Company of Biologists journal that launched in January 2012* and… it doesn’t have one! So why does eLife get an Impact Factor but Biology Open doesn’t?**

elife-JIFLooking at the numbers for eLife revealed that there were 230 citations in 2013 to eLife papers in 2011 and 2012. One of which was a mis-citation to an article in 2011. This article does not exist (the next column shows that there were no articles in 2011). My guess is that Thomson Reuters view this as the journal existing for 2011 and 2012, and therefore deserving of an Impact Factor. Presumably there are no mis-cites in the Biology Open record and it will only get an Impact Factor next year. Doesn’t this call into question the veracity of the database? I have found other errors in records previously (see here). I also find it difficult to believe that no-one checked this particular record given the profile of eLife.

elfie-citesPerhaps unsurprisingly, I couldn’t track down the rogue citation. I did look at the cites to eLife articles from all years in Web of Science, the Thomson Reuters database (which again showed that eLife only started publishing in Oct 2012). As described before there are spurious citations in the database. Josh Kaplan’s eLife paper on UNC13/Tomosyn managed to rack up 5 citations in 2004, some 9 years before it was published (in 2013)! This was along with nine other papers that somehow managed to be cited in 2004 before they were published. It’s concerning enough that these data are used for hiring, firing and funding decisions, but if the data are incomplete or incorrect this is even worse.

Summary: I’m sure the Impact Factor of eLife will rise as soon as it has a full window for measurement. This would actually be 2016 when the 2015 Impact Factors are released. The journal has made it clear in past editorials (and here) that it is not interested in an Impact Factor and won’t promote one if it is awarded. So, this issue makes no difference to the journal. I guess the moral of the story is: don’t take the Impact Factor at face value. But then we all knew that already. Didn’t we?

* For clarity, I should declare that we have published papers in eLife and Biology Open this year.

** The only other reason I can think of is that eLife was listed on PubMed right away, while Biology Open had to wait. This caused some controversy at the time. I can’t see why a PubMed listing should affect Impact Factor. Anyhow, I noticed that Biology Open got listed in PubMed by October 2012, so in the end it is comparable to eLife.

Edit: There is an update to this post here.

Edit 2: This post is the most popular on Quantixed. A screenshot of visitors’ search engine queries (Nov 2014)…

searches

The post title is taken from “Strange Things” from Big Black’s Atomizer LP released in 1986.

“Yeah” Is What We Had

When it comes to measuring the impact of our science, citations are pretty much all we have. And not only that but they only say one thing – yeah – with no context. How can we enrich citation data?

Much has been written about how and why and whether or not we should use metrics for research assessment. If we accept that metrics are here to stay in research assessment (of journals, Universities, departments and of individuals), I think we should be figuring out better ways to look at the available information.

8541947962_6853dd9786_zCitations to published articles are the key metric under discussion. This is because they are linked to research outputs (papers), have some relation to “impact” and they can be easily computed and a number of metrics have been developed to draw out information from the data (H-index, IF etc.). However there are many known problems with citations such as: they are heavily influenced by the size of the field. What I want to highlight here is what a data-poor resource they are and think of ways we could enrich the dataset with minimal modification to our existing databases.

1. We need a way to distinguish a yeah from a no

The biggest weakness of using citations as a measure of research impact is that a citation is a citation. It just says +1. We have no idea if +1 means “the paper stinks” or “the work is amazing!”.  It’s incredible that we can rate shoelaces on Amazon or eBay but we haven’t figured out a way to do this for scientific papers. Here’s a suggestion:

  • A neutral citation is +1
  • A positive citation is +2
  • A negative citation is -1

A neutral citation would be stating a fact and adding reference to support it, e.g. DNA is a double helix (Watson & Crick, 1953).

A positive citation would be something like: in agreement with Bloggs et al. (2010), we also find x.

A negative citation might be: we have tested the model proposed by Smith & Jones (1977) and find that it does not hold.

One further idea (described here) is to add more context to citation using keywords. Such as “replicating”, “using”, “consistent with”. This would also help with searching the scientific literature.

2. Multiple citations in one article

Because currently, citations are a +1, there is no way to distinguish whether the paper giving the citation was mentioning the cited paper in passing or was entirely focussed on that one paper.

Another way to think about this is that there are multiple reasons to cite a paper: maybe the method or reagent is being used, maybe they are talking about Figure 2 showing X or Figure 5 showing Y. What if a paper is talking about all of these things? In other words, the paper was very useful. Shouldn’t we record that interest?

Suggestion: A simple way to do this is to count the number of mentions in the text of the paper rather than just if the paper appears in the reference list.

3. Division of a citation unit for fair credit to each author

Calculations such as the H-index make no allowance for the position of the author in the author list (used in biological sciences and some other fields to denote contribution to the paper). It doesn’t make sense that the 25th author on a 50 author paper receives 100% of the citation credit as the first or last author. Similarly, the first author on a two author paper is only credited in the same way as the middle author on a multi-author paper. The difference in contribution is clear, but the citation credit is not. This is because the citation credit for the former paper is worth 25 times that of the latter! This needs to be equalised. The citation unit, c could be divided to achieve fair credit for authors. At the moment, c=1, but could be multiples (or negative values) as described above. Here’s a suggestion:

  • First (and multiple first) and last (and co-last) authors get 0.5c divided by number of authors.
  • The remainder, 0.5c, is divided between all authors.

For a two author paper: first author gets 0.5c and last author gets 0.5c. (0.5c/2+0.5c/2)=0.5c

For a ten author paper with one first author and one last author, first and last author each get (0.5c/2+0.5c/10)=0.3c and the 5th author gets (0c+0.5c/10)=0.05c.

Note that the sum for all authors will equal c. So this is equalised for all papers. These citation credits would then be the basis for H-index and other calculations for individuals.

Most simply, the denominator would be the number of authors, or – if we can figure out a numerical credit system – each author could be weighted according to their contribution.

4. Citations to reviews should be downgraded

A citation to a review is not equal to a citation to a research paper. For several reasons. First, they are cited at a higher rate, because they are a handy catchall citation particularly for the Introduction section in papers. This isn’t fair either and robs credit from the people who did the work that actually demonstrated what is being discussed. Second, the achievement of publishing a review is nothing in comparison to publishing a paper. Publishing a review involves 1) being asked, 2) writing it, 3) light peer review and some editing and that’s it! Publishing a research paper involves much more effort: having the idea, getting the money, hiring the people, training the people, getting a result – and we are only at the first panel in Fig 1A. Not to mention the people-hours and arduous peer review process. It’s not fair that citations to reviews are treated as equal to papers when it comes to research assessment.

Suggestion: a citation to a review should be worth a fraction (maybe 1/10th) of a citation to a research paper.

In addition, there are too many reviews written at the moment. I think this is not because they are particularly useful. Very few actually contribute a new view or new synthesis of an area, most are just a summary of the area. Journals like them because they drive up their citation metrics. Authors like them because it is nice to be invited to write something – it means people are interested in what you have to say… If citations to reviews were downgraded, there would be less incentive to publish them and we would have more space for all those real papers that are getting rejected at journals that claim that space is a limitation for publication.

5. Self-citations should be eliminated

If we are going to do all of the above, then self-citation would pretty soon become a problem. Excessive self-citation would be difficult to police, and not many scientists would go for a -1 citation to their own work. So, the simplest thing to do is to eliminate self-citation. Author identification is crucial here. At the moment this doesn’t work well. In ISI and Scopus, whatever algorithm they use keeps missing some papers of mine (and my name is not very common at all). I know people who have been grouped with other people that they have published one or two papers with. For authors with ambiguous names, this is a real problem. ORCID is a good solution and maybe having an ORCID (or similar) should be a requirement for publication in the future.

Suggestion: the company or body that collates citation information needs to accurately assign authors and make sure that research papers are properly segregated from reviews and other publication types.

These were five things I thought of to enrich citation data to improve research assessment, do you have any other ideas?

The post title is taken from ‘”Yeah” Is What We Had’ by Grandaddy from their album Sumday.

Sure To Fall

What does the life cycle of a scientific paper look like?

It stands to reason that after a paper is published, people download and read the paper and then if it generates sufficient interest, it will begin to be cited. At some point these citations will peak and the interest will die away as the work gets superseded or the field moves on. So each paper has a useful lifespan. When does the average paper start to accumulate citations, when do they peak and when do they die away?

Citation behaviours are known to be very field-specific. So to narrow things down, I focussed on cell biology and in one area “clathrin-mediated endocytosis” in particular. It’s an area that I’ve published in – of course this stuff is driven by self-interest. I downloaded data for 1000 papers from Web of Science that had accumulated the most citations. Reviews were excluded, as I assume their citation patterns are different from primary literature. The idea was just to take a large sample of papers on a topic. The data are pretty good, but there are some errors (see below).

Number-crunching (feel free to skip this bit): I imported the data into IgorPro making a 1D wave for each record (paper). I deleted the last point corresponding to cites in 2014 (the year is not complete). I aligned all records so that year of publication was 0. Next, the citations were normalised to the maximum number achieved in the peak year. This allows us to look at the lifecycle in a sensible way. Next I took out records to papers less than 6 years old as I reasoned these would have not have completed their lifecycle and could contaminate the analysis (it turned out to make little difference). The lifecycles were plotted and averaged. I also wrote a quick function to pull out the peak year for citations post hoc.

So what did it show?

Citations to a paper go up and go down, as expected (top left). When cumulative citations are plotted most of the articles have an initial burst and then level off. The exception are ~8 articles that continue to rise linearly (top right). On average a paper generates its peak citations three years after publication (box plot). The fall after this peak period is pretty linear and it’s apparently all over somewhere >15 years after publication (bottom left). To look at the decline in more detail I aligned the papers so that year 0 was the year of peak citations. The average now loses almost 40% of those peak citations in the following year and then declines steadily (bottom right).

Edit: The dreaded Impact Factor calculation takes the citations to articles published in the preceding 2 years and divides by the number of citable items in that period. This means that each paper only contributes to the Impact Factor in years 1 and 2. This is before the average paper reaches its peak citation period. Thanks to David Stephens (@david_s_bristol) for pointing this out. The alternative 5 year Impact Factor gets around this limitation.

Perhaps lifecycle is the wrong term: papers in this dataset don’t actually ‘die’, i.e. go to 0 citations. There is always a chance that a paper will pick up the odd citation. Papers published 15 years ago are still clocking 20% of their peak citations. Looking at papers cited at lower rates would be informative here.

Two other weaknesses that affect precision is that 1) a year is a long time and 2) publication is subject to long lag times. The analysis would be improved by categorising the records based on the month-year when the paper was published and the month-year when each citation comes in. Papers published in January in one year probably have a different peak than those published in December of the same year, but this is lost when looking at year alone. Secondly, due to publication lag, it is impossible to know when the peak period of influence for a paper truly is.
MisCytesProblems in the dataset. Some reviews remained despite being supposedly excluded, i.e. they are not properly tagged in the database. Also, some records have citations from years before the article was published! The numbers of citations are small enough to not worry for this analysis, but it makes you wonder about how accurate the whole dataset is. I’ve written before about how complete citation data may or may not be. These sorts of things are a concern for all of us who are judged by these things for hiring and promotion decisions.

The post title is taken from ‘Sure To Fall’ by The Beatles, recorded during The Decca Sessions.

Outer Limits

This post is about a paper that was recently published. It was the result of a nice collaboration between me and Francisco López-Murcia and Artur Llobet in Barcelona.

The paper in a nutshell
The availability of clathrin sets a limit for presynaptic function

Background
Clathrin is a three legged protein that forms a cage around membranes during endoctosis. One site of intense clathrin-mediated endocytosis (CME) is the presynaptic terminal. Here, synaptic vesicles need to be recaptured after fusion and CME is the main route of retrieval. Clathrin is highly abundant in all cells and it is generally thought of as limitless for the formation of multiple clathrin-coated structures. Is this really true? In a neuron where there is a lot of endocytic activity, maybe the limits are tested?
It is known that strong stimulation of neurons causes synaptic depression – a form of reversible synaptic plasticity where the neuron can only evoke a weak postsynaptic response afterwards. Is depression a vesicle supply problem?

What did we find?
We showed that clathrin availability drops during stimulation that evokes depression. The drop in availability is due to clathrin forming vesicles and moving away from the synapse. We mimicked this by RNAi, dropping the clathrin levels and looking at synaptic responses. We found that when the clathrin levels drop, synaptic responses become very small. We noticed that fewer vesicles are able to be formed and those that do form are smaller. Interestingly, the amount of neurotransmitter (acetylcholine) in the vesicles was much less than the volume of the vesicles as measured by electron microscopy. This suggests there is an additional sorting problem in cells with lower clathrin levels.

Killer experiment
A third reviewer was called in (due to a split decision between Reviewers 1 and 2). He/she asked a killer question: all of our data could be due to an off-target effect of RNAi, could we do a rescue experiment? We spent many weeks to get the rescue experiment to work, but a second viral infection was too much for the cells and engineering a virus to express clathrin was very difficult. The referee also said: if clathrin levels set a limit for synaptic function, why don’t you just express more clathrin? Well, we would if we could! But this gave us an idea… why don’t we just put clathrin in the pipette and let it diffuse out to the synapses and rescue the RNAi phenotype over time? We did it – and to our surprise – it worked! The neurons went from an inhibited state to wild-type function in about 20 min. We then realised we could use the same method on normal neurons to boost clathrin levels at the synapse and protect against synaptic depression. This also worked! These killer experiments were a great addition to the paper and are a good example of peer review improving the paper.

People
Fran and Artur did almost all the experimental work. I did a bit of molecular biology and clathrin purification. Artur and I wrote the paper and put the figures together – lots of skype and dropbox activity.
Artur is a physiologist and his lab like to tackle problems that are experimentally very challenging – work that my lab wouldn’t dare to do – he’s the perfect collaborator. I have known Artur for years. We were postdocs in the same lab at the LMB in the early 2000s. We tried a collaborative project to inhibit dynamin function in adrenal chromaffin cells at that time, but it didn’t work out. We have stayed in touch and this is our first paper together. The situation in Spain for scientific research is currently very bad and it deteriorated while the project was ongoing. This has been very sad to hear about, but fortunately we were able to finish this project and we hope to work together more in the future.

We were on the cover!
25.coverNow the scientific literature is online, this doesn’t mean so much anymore, but they picked our picture for the cover. It is a single cell microculture expressing GFP that was stained for synaptic markers and clathrin. I changed the channels around for artistic effect.

What else?
J Neurosci is slightly different to other journals that I’ve published in recently (my only other J Neurosci paper was published in 2002). For the following reasons:

  1. No supplementary information. The journal did away with this years ago to re-introduce some sanity in the peer review process. This didn’t affect our paper very much. We had a movie of clathrin movement that would have gone into the SI at another journal, but we simply removed it here.
  2. ORCIDs for authors are published with the paper. This gives the reader access to all your professional information and distinguishes authors with similar names. I think this is a good idea.
  3. Submission fee. All manuscripts are subject to a submission fee. I believe this is to defray the costs of editorial work. I think this makes sense, although I’m not sure how I would feel if our paper had been rejected.

Reference:

López-Murcia, F.J., Royle, S.J. & Llobet, A. (2014) Presynaptic clathrin levels are a limiting factor for synaptic transmission J. Neurosci., 34: 8618-8629. doi: 10.1523/JNEUROSCI.5081-13.2014

Pubmed | Paper

The post title is taken from “Outer Limits” a 7″ Single by Sleep ∞ Over released in 2010.

All This And More

I was looking at the latest issue of Cell and marvelling at how many authors there are on each paper. It’s no secret that the raison d’être of Cell is to publish the “last word” on a topic (although whether it fulfils that objective is debatable). Definitive work needs to be comprehensive. So it follows that this means lots of techniques and ergo lots of authors. This means it is even more impressive when a dual author paper turns up in the table of contents for Cell. Anyway, I got to thinking: has it always been the case that Cell papers have lots of authors and if not, when did that change?

I downloaded the data for all articles published by Cell (and for comparison, J Cell Biol) from Scopus. The records required a bit of cleaning. For example, SnapShot papers needed to be removed and also the odd obituary etc. had been misclassified as an article. These could be quickly removed. I then went back through and filtered out ‘articles’ that were less than three pages as I think it is not possible for a paper to be two pages or fewer in length. The data could be loaded into IgorPro and boxplots generated per year to show how author number varied over time. Reviews that are misclassified as Articles will still be in the dataset, but I figured these would be minimal.

Authors1First off: Yes, there are more authors on average for a Cell paper versus a J Cell Biol paper. What is interesting is that both journals had similar numbers of authors when Cell was born (1974) and they crept up together until the early 2000s, when the number of Cell authors kept increasing, or JCell Biol flattened off, whichever way you look at it.

I think the overall trend to more authors is because understanding biology has increasingly required multiple approaches and the bar for evidence seems to be getting higher over time. The initial creep to more authors (1974-2000) might be due to a cultural change where people (technicians/students/women) began to get proper credit for their contributions. However, this doesn’t explain the divergence between J Cell Biol and Cell in recent years. One possibility is Cell takes more non-cell biology papers and that these papers necessarily have more authors. For example, the polar bear genome was published in Cell (29 authors), and this sort of paper would not appear in J Cell Biol. Another possibility is that J Cell Biol has a shorter and stricter revision procedure, which means that multiple rounds of revision, collecting new techniques and new authors is more limited than it is at Cell. Any other ideas?

AuthorI also quickly checked whether more authors means more citations, but found no evidence for such a relationship. For papers published in the years 2000-2004, the median citation number for papers with 1-10 authors was pretty constant for J Cell Biol. For Cell, these data mere more noisy. Three-author papers tended to be cited a bit more than those with two authors, but then four author papers were also lower.

The number of authors on papers from our lab ranges from 2-9 and median is 3.5. This would put an average paper from our lab in the bottom quartile for JCB and in the lower 10% for Cell in 2013. Ironically, our 9 author paper (an outlier) was published in J Cell Biol. Maybe we need to get more authors on our papers before we can start troubling Cell with our manuscripts…


The Post title is taken from ‘All This and More’ by The Wedding Present from their LP George Best.

Blast Off!

This post is about metrics and specifically the H-index. It will probably be the first of several on this topic.

I was re-reading a blog post by Alex Bateman on his affection for the H-index as a tool for evaluating up-and-coming scientists. He describes Jorge Hirsch’s H-index, its limitations and its utility quite nicely, so I won’t reiterate this (although I’ll probably do so in another post). What is under-appreciated is that Hirsch also introduced the m quotient, which is the H-index divided by years since the first publication. It’s the m quotient that I’ll concentrate on here. The TL;DR is: I think that the H-index does have some uses, but evaluating early career scientists is not one of them.

Anyone of an anti-metrics disposition should look away now.

Alex proposes that the scientists can be judged (and hired) by using m as follows:

  • <1.0 = average scientist
  • 1.0-2.0 = above average
  • 2.0-3.0 = excellent
  • >3.0 = stellar

He says “So post-docs with an m-value of greater than three are future science superstars and highly likely to have a stratospheric rise. If you can find one, hire them immediately!”.

From what I have seen, the H-index (and therefore m) is too noisy for early stage career scientists to be of any use for evaluation. Let’s leave that aside for the moment. What he is saying is you should definitely hire a post-doc who has published ≥3 papers with ≥3 citations each in their first year, ≥6 with ≥6 citations each in their second year, ≥9 papers with ≥9 in their third year…

Do these people even exist? A candidate with 3 year PhD and a 3 year postdoc (6 would mean ≥18 papers with ≥18 citations each! In my field (molecular cell biology), it is unusual for somebody to publish that many papers, let alone accrue citations at that rate*.

This got me thinking: using Alex’s criteria, how many stellar scientists would we miss out on and would we be more likely to hire the next Jan Hendrik Schön. To check this out I needed to write a quick program to calculate H-index by year (I’ll describe this in a future post). Off the top of my head I thought of a few scientists that I know of, who are successful by many other measures, and plotted their H-index by year. The dotted line shows a constant m of 1,  “average” by Alex’s criteria. I’ve taken a guess at when they became a PI. I have anonymised the scholars, the information is public and anyone can calculate this, but it’s not fair to identify people without asking (hopefully they can’t recognise themselves – if they read this!).

This is a small sample taken from people in my field. You can see that it is rare for scientists to have a big m at an early stage in their careers. With the exception of Scholar C, who was just awesome from the get-go, panels appointing any of these scholars would have had trouble divining the future success of these people on the basis of H-index and m alone. Scholar D and Scholar E really saw their careers take-off by making big discoveries, and these happened at different stages of their careers. Both of these scholars were “below average” when they were appointed as PI. The panel would certainly not have used metrics in their evaluation (the databases were not in wide use back then), probably just letters of recommendation and reading the work. Clearly, they could identify the potential in these scientists… or maybe they just got lucky. Who knows?!

There may be other fields where publication at higher rates can lead to a large m but I would still question the contribution of the scientist to the papers that led to the H-index. Are they first or last author? One problem with the H-index is that the 20th scientist in a list of 40 authors gets the same credit as the first author. Filtering what counts in the list of articles seems sensible, but this would make the values even more noisy for early stage scientists.

 

*In the comments section, somebody points out that if you publish a paper very early then this affects your m value. This is something I sympathise with. My first paper was in 1999 when I was an undergrad. This dents my m value as it was a full three years until my next paper.

The post title is taken from ‘Blast Off!’ by Rivers Cuomo from ‘Songs from the Black Hole’ the unreleased follow-up to Pinkerton.

Into The Great Wide Open

We have a new paper out! You can read it here.

I thought I would write a post on how this paper came to be and also about our first proper experience with preprinting.

Title of the paper: Non-specificity of Pitstop 2 in clathrin-mediated endocytosis.

In a nutshell: we show that Pitstop 2, a supposedly selective clathrin inhibitor acts in a non-specific way to inhibit endocytosis.

Authors: Anna Willox, who was a postdoc in the lab from 2008-2012, did the flow cytometry measurements and Yasmina Sahraoui who was a summer student in my lab, did the binding experiments. And me.

Background: The description of “pitstops” – small molecules that inhibit clathrin-mediated endocytosis – back in 2011 in Cell was heralded as a major step-forward in cell biology. And it really would be a breakthrough if we had ways to selectively switch off clathrin-mediated endocytosis. Lots of nasty things gain entry into cells by hijacking this pathway, including viruses such as HIV and so if we could stop viral entry this could prevent cellular infection. Plus, these reagents would be really handy in the lab for cell biologists.

The rationale for designing the pitstop inhibitors was that they should block the interaction between clathrin and adaptor proteins. Adaptors are the proteins that recognise the membrane and cargo to be internalised – clathrin itself cannot do this. So if we can stop clathrin from binding adaptors there should be no internalisation – job done! Now, in 2000 or so, we thought that clathrin binds to adaptors via a single site on its N-terminal domain. This information was used in the drug screen that identified pitstops. The problem is that, since 2000, we have found that there are four sites on the N-terminal domain of clathrin that can each mediate endocytosis. So blocking one of these sites with a drug, would do nothing. Despite this, pitstop compounds, which were shown to have a selectivity for one site on the N-terminal domain of clathrin, blocked endocytosis. People in the field scratched their hands at how this is possible.

A damning paper was published in 2012 from Julie Donaldson’s lab showing that pitstops inhibit clathrin-independent endocytosis as well as clathrin-mediated endocytosis. Apparently, the compounds affect the plasma membrane and so all internalisation is inhibited. Many people thought this was the last that we would hear about these compounds. After all, these drugs need to be highly selective to be any use in the lab let alone in the clinic.

Our work: we had our own negative results using these compounds, sitting on our server, unpublished. Back in February 2011, while the Pitstop paper was under revision, the authors of that study sent some of these compounds to us in the hope that we could use these compounds to study clathrin on the mitotic spindle. The drugs did not affect clathrin binding to the spindle (although they probably should have done) and this prompted us to check whether the compounds were working – they had been shipped all the way from Australia so maybe something had gone wrong. We tested for inhibition of clathrin-mediated endocytosis and they worked really well.

At the time we were testing the function of each of the four interaction sites on clathrin in endocytosis, so we added Pitstop 2 to our experiments to test for specificity. We found  that Pitstop 2 inhibits clathrin-mediated endocytosis even when the site where Pitstops are supposed to bind, has been mutated! The picture shows that the compound (pink) binds where sequences from adaptors can bind. Mutation of this site doesn’t affect endocytosis, because clathrin can use any three of the other four sites. Yet Pitstop blocks endocytosis mediated by this mutant, so it must act elsewhere, non-specifically.

So the compounds were not as specific as claimed, but what could we do with this information? There didn’t seem enough to publish and I didn’t want people in the lab working on this as it would take time and energy away from other projects. Especially when debunking other people’s work is such a thankless task (why this is the case, is for another post). The Dutta & Donaldson paper then came out, which was far more extensive than our results and so we moved on.

What changed?

A few things prompted me to write this work up. Not least, Yasmina had since shown that our mutations were sufficient to prevent AP-2 binding to clathrin. This result filled a hole in our work. These things were:

  1. People continuing to use pitstops in published work, without acknowledging that they may act non-specifically. The turning point was this paper, which was critical of the Dutta & Donaldson work.
  2. People outside of the field using these compounds without realising their drawbacks.
  3. AbCam selling this compound and the thought of other scientists buying it and using it on the basis of the original paper made me feel very guilty that we had not published our findings.
  4. It kept getting easier and easier to publish “negative results”. Journals such as Biology Open from Company of Biologists or PLoS ONE and preprint servers (see below) make this very easy.

Finally, it was a twitter conversation with Jim Woodgett convinced me that, when I had the time, I would write it up.

To which, he replied:

I added an acknowledgement to him in our paper! So that, together with the launch of bioRxiv, convinced me to get the paper online.

The Preprinting Experience

This paper was our first proper preprint. We had put an accepted version of our eLife paper on bioRxiv before it came out in print at eLife, but that doesn’t really count. For full disclosure, I am an affiliate of bioRxiv.

The preprint went up on 13th February and we submitted it straight to Biology Open the next day. I had to check with the Journal that it was OK to submit a deposited paper. At the time they didn’t have a preprint policy (although I knew that David Stephens had submitted his preprinted paper there and he told me their policy was about to change). Biology Open now accept preprinted papers – you can check which journals do and which ones don’t here.

My idea was that I just wanted to get the information into the public domain as fast as possible. The upshot was, I wasn’t so bothered about getting feedback on the manuscript. For those that don’t know: the idea is that you deposit your paper, get feedback, improve your paper then submit it for publication. In the end I did get some feedback via email (not on the bioRxiv comments section), and I was able to incorporate those changes into the revised version. I think next time, I’ll deposit the paper and wait one week while soliciting comments and then submit to a journal.

It was viewed quite a few times in the time while the paper was being considered by Biology Open. I spoke to a PI who told me that they had found the paper and stopped using pitstop as a result. I think this means getting the work out there was worth it after all.

Now it is out “properly” in Biology Open and anyone can read it.

Verdict: I was really impressed by Biology Open. The reviewing and editorial work were handled very fast. I guess it helps that the paper was very short, but it was very uncomplicated. I wanted to publish with Biology Open rather than PLoS ONE as the Company of Biologists support cell biology in the UK. Disclaimer: I am on the committee of the British Society of Cell Biology which receives funding from CoB.

Depositing the preprint at bioRxiv was easy and for this type of paper, it is a no-brainer. I’m still not sure to what extent we will preprint our work in the future. This is unchartered territory that is evolving all the time, we’ll see. I can say that the experience for this paper was 100% positive.

References

Dutta, D., Williamson, C. D., Cole, N. B. and Donaldson, J. G. (2012) Pitstop 2 is a potent inhibitor of clathrin-independent endocytosis. PLoS One 7, e45799.

Lemmon, S. K. and Traub, L. M. (2012) Getting in Touch with the Clathrin Terminal Domain. Traffic, 13, 511-9.

Stahlschmidt, W., Robertson, M. J., Robinson, P. J., McCluskey, A. and Haucke, V. (2014) Clathrin terminal domain-ligand interactions regulate sorting of mannose 6-phosphate receptors mediated by AP-1 and GGA adaptors. J Biol Chem. 289, 4906-18.

von Kleist, L., Stahlschmidt, W., Bulut, H., Gromova, K., Puchkov, D., Robertson, M. J., MacGregor, K. A., Tomilin, N., Pechstein, A., Chau, N. et al. (2011) Role of the clathrin terminal domain in regulating coated pit dynamics revealed by small molecule inhibition. Cell 146, 471-84.

Willox, A.K., Sahraoui, Y.M.E. & Royle, S.J. (2014) Non-specificity of Pitstop 2 in clathrin-mediated endocytosis Biol Open, doi: 10.1242/​bio.20147955.

Willox, A.K., Sahraoui, Y.M.E. & Royle, S.J. (2014) Non-specificity of Pitstop 2 in clathrin-mediated endocytosis bioRxiv, doi: 10.1101/002675.

The post title is taken from ‘Into The Great Wide Open’ by Tom Petty and The Heartbreakers from the LP of the same name.

Give, Give, Give Me More, More, More

A recent opinion piece published in eLife bemoaned the way that citations are used to judge academics because we are not even certain of the veracity of this information. The main complaint was that Google Scholar – a service that aggregates citations to articles using a computer program – may be less-than-reliable.

There are three main sources of citation statistics: Scopus, Web of Knowledge/Science and Google Scholar; although other sources are out there. These are commonly used and I checked out how comparable these databases are for articles from our lab.

The ratio of citations is approximately 1:1:1.2 for Scopus:WoK:GS. So Google Scholar is a bit like a footballer, it gives 120%.

I first did this comparison in 2012 and again in 2013. The ratio has remained constant, although these are the same articles, and it is a very limited dataset. In the eLife opinion piece, Eve Marder noted an extra ~30% citations for GS (although I calculated it as ~40%, 894/636=1.41). Talking to colleagues, they have also noticed this. It’s clear that there is some inflation with GS, although the degree of inflation may vary by field. So where do these extra citations come from?

  1. Future citations: GS is faster than Scopus and WoK. Articles appear there a few days after they are published, whereas it takes several weeks or months for the same articles to appear in Scopus and WoK.
  2. Other papers: some journals are not in Scopus and WoK. Again, these might be new journals that aren’t yet included at the others, but GS doesn’t discriminate and includes all papers it finds. One of our own papers (an invited review at a nascent OA journal) is not covered by Scopus and WoK*. GS picks up preprints whereas the others do not.
  3. Other stuff: GS picks up patents and PhD theses. While these are not traditional papers, published in traditional journals, they are clearly useful and should be aggregated.
  4. Garbage: GS does pick up some stuff that is not a real publication. One example is a product insert for an antibody, which has a reference section. Another is duplicate publications. It is quite good at spotting these and folding them into a single publication, but some slip through.

OK, Number 4 is worrying, but the other citations that GS detects versus Scopus and WoK are surely a good thing. I agree with the sentiment expressed in the eLife paper that we should be careful about what these numbers mean, but I don’t think we should just disregard citation statistics as suggested.

GS is free, while the others are subscription-based services. It did look for a while like Google was going to ditch Scholar, but a recent interview with the GS team (sorry, I can’t find the link) suggests that they are going to keep it active and possibly develop it further. Checking out your citations is not just an ego-trip, it’s a good way to find out about articles that are related to your own work. GS has a nice feature that send you an email whenever it detects a citation for your profile. The downside of GS is that its terms of service do not permit scraping and reuse, whereas downloading of subsets of the other databases is allowed.

In summary, I am a fan of Google Scholar. My page is here.

 

* = I looked into this a bit more and the paper is actually in WoK, it has no Title and it has 7 citations (versus 12 in GS). Although it doesn’t come up in a search for Fiona or for me.

hood

 

However, I know from GS that this paper was also cited in a paper by the Cancer Genome Atlas Network in Nature. WoK listed this paper as having 0 references and 0 citations(!). Does any of this matter? Well, yes. WoK is a Thomson Reuters product and is used as the basis for their dreaded Impact Factor – which (like it or not) is still widely used for decision making. Also many Universities use WoK information in their hiring and promotions processes.

The post title comes from ‘Give, Give, Give Me More, More, More’ by The Wonder Stuff from the LP ‘Eight Legged Groove Machine’. Finding a post title was difficult this time. I passed on: Pigs (Three Different Ones) and Juxtapozed with U. My iTunes library is lacking songs about citations…