When it comes to measuring the impact of our science, citations are pretty much all we have. And not only that but they only say one thing – yeah – with no context. How can we enrich citation data?
Much has been written about how and why and whether or not we should use metrics for research assessment. If we accept that metrics are here to stay in research assessment (of journals, Universities, departments and of individuals), I think we should be figuring out better ways to look at the available information.
Citations to published articles are the key metric under discussion. This is because they are linked to research outputs (papers), have some relation to “impact” and they can be easily computed and a number of metrics have been developed to draw out information from the data (H-index, IF etc.). However there are many known problems with citations such as: they are heavily influenced by the size of the field. What I want to highlight here is what a data-poor resource they are and think of ways we could enrich the dataset with minimal modification to our existing databases.
1. We need a way to distinguish a yeah from a no
The biggest weakness of using citations as a measure of research impact is that a citation is a citation. It just says +1. We have no idea if +1 means “the paper stinks” or “the work is amazing!”. It’s incredible that we can rate shoelaces on Amazon or eBay but we haven’t figured out a way to do this for scientific papers. Here’s a suggestion:
- A neutral citation is +1
- A positive citation is +2
- A negative citation is -1
A neutral citation would be stating a fact and adding reference to support it, e.g. DNA is a double helix (Watson & Crick, 1953).
A positive citation would be something like: in agreement with Bloggs et al. (2010), we also find x.
A negative citation might be: we have tested the model proposed by Smith & Jones (1977) and find that it does not hold.
One further idea (described here) is to add more context to citation using keywords. Such as “replicating”, “using”, “consistent with”. This would also help with searching the scientific literature.
2. Multiple citations in one article
Because currently, citations are a +1, there is no way to distinguish whether the paper giving the citation was mentioning the cited paper in passing or was entirely focussed on that one paper.
Another way to think about this is that there are multiple reasons to cite a paper: maybe the method or reagent is being used, maybe they are talking about Figure 2 showing X or Figure 5 showing Y. What if a paper is talking about all of these things? In other words, the paper was very useful. Shouldn’t we record that interest?
Suggestion: A simple way to do this is to count the number of mentions in the text of the paper rather than just if the paper appears in the reference list.
3. Division of a citation unit for fair credit to each author
Calculations such as the H-index make no allowance for the position of the author in the author list (used in biological sciences and some other fields to denote contribution to the paper). It doesn’t make sense that the 25th author on a 50 author paper receives 100% of the citation credit as the first or last author. Similarly, the first author on a two author paper is only credited in the same way as the middle author on a multi-author paper. The difference in contribution is clear, but the citation credit is not. This is because the citation credit for the former paper is worth 25 times that of the latter! This needs to be equalised. The citation unit, c could be divided to achieve fair credit for authors. At the moment, c=1, but could be multiples (or negative values) as described above. Here’s a suggestion:
- First (and multiple first) and last (and co-last) authors get 0.5c divided by number of authors.
- The remainder, 0.5c, is divided between all authors.
For a two author paper: first author gets 0.5c and last author gets 0.5c. (0.5c/2+0.5c/2)=0.5c
For a ten author paper with one first author and one last author, first and last author each get (0.5c/2+0.5c/10)=0.3c and the 5th author gets (0c+0.5c/10)=0.05c.
Note that the sum for all authors will equal c. So this is equalised for all papers. These citation credits would then be the basis for H-index and other calculations for individuals.
Most simply, the denominator would be the number of authors, or – if we can figure out a numerical credit system – each author could be weighted according to their contribution.
4. Citations to reviews should be downgraded
A citation to a review is not equal to a citation to a research paper. For several reasons. First, they are cited at a higher rate, because they are a handy catchall citation particularly for the Introduction section in papers. This isn’t fair either and robs credit from the people who did the work that actually demonstrated what is being discussed. Second, the achievement of publishing a review is nothing in comparison to publishing a paper. Publishing a review involves 1) being asked, 2) writing it, 3) light peer review and some editing and that’s it! Publishing a research paper involves much more effort: having the idea, getting the money, hiring the people, training the people, getting a result – and we are only at the first panel in Fig 1A. Not to mention the people-hours and arduous peer review process. It’s not fair that citations to reviews are treated as equal to papers when it comes to research assessment.
Suggestion: a citation to a review should be worth a fraction (maybe 1/10th) of a citation to a research paper.
In addition, there are too many reviews written at the moment. I think this is not because they are particularly useful. Very few actually contribute a new view or new synthesis of an area, most are just a summary of the area. Journals like them because they drive up their citation metrics. Authors like them because it is nice to be invited to write something – it means people are interested in what you have to say… If citations to reviews were downgraded, there would be less incentive to publish them and we would have more space for all those real papers that are getting rejected at journals that claim that space is a limitation for publication.
5. Self-citations should be eliminated
If we are going to do all of the above, then self-citation would pretty soon become a problem. Excessive self-citation would be difficult to police, and not many scientists would go for a -1 citation to their own work. So, the simplest thing to do is to eliminate self-citation. Author identification is crucial here. At the moment this doesn’t work well. In ISI and Scopus, whatever algorithm they use keeps missing some papers of mine (and my name is not very common at all). I know people who have been grouped with other people that they have published one or two papers with. For authors with ambiguous names, this is a real problem. ORCID is a good solution and maybe having an ORCID (or similar) should be a requirement for publication in the future.
Suggestion: the company or body that collates citation information needs to accurately assign authors and make sure that research papers are properly segregated from reviews and other publication types.
These were five things I thought of to enrich citation data to improve research assessment, do you have any other ideas?
The post title is taken from ‘”Yeah” Is What We Had’ by Grandaddy from their album Sumday.
9 thoughts on ““Yeah” Is What We Had”
Great post! Metrics are not going away and ignoring them is dangerous. They need to be improved and given context and cross validation. I particularly like the idea of deprecating reviews. Self-citations can be important to a paper but should have a value of zero. You’ve gained the advantage of publication already. Assigning credit/contribution based on author position is tricky but something needs to be done. Author identification is a huge problem for common names and this is surely causing dilution of contributions at all levels, including being asked to review for journals. ResearcherID and ORCID help, but are also a pain to update.
Some other ideas:
1. Set maxima. Very few papers accrue 1000 citations and many of these are methods papers. Recognize these in some way but cap their accrual so that they don’t distort (h-index gets around this issue but is age-discriminatory).
2. Comparisons within fields help normalize size effects and behaviours but current field discriminators massively disadvantage scientists who work in multiple fields – and aren’t we supposed to be encouraging multi-disciplinarily? The ISI Highly Cited list is, IMO, severely compromised by it’s blindness to such scientists. I don’t have a suggestion as to how to recognize such cross-disciplinary people, but they need to be lauded.
3. Article-level metrics are way better than associative metrics (the worst being JIF). The bad influence of JIF is easy to underestimate. Its the mark of a lazy mind.
4. Why aren’t references in papers all hot-linked? Why can’t authors use pop-ups to allow deeper explanations? Why are papers still designed to be static? There are movements towards post peer review and author/reader interactivity but these are still baby steps.
4. Don’t just focus on citations (cite-centric) just because they are measurable and involve large numbers. Add other factors that are surrogates for quality and importance. That sounds like alt-metrics but these also suffer from “celebrity” influences (and lack of good/bad distinction – e.g the alt-metrics for the STAP papers must be over the moon). Perhaps recognition of contributions to scientific progress such as reviewing, grant reviews? These are also experience-discriminatory but I do wish postdocs were used more heavily in assessing studentships.
5. Design a metric – wait for it to be gamed. It’s human nature. To avoid this, measures need to be in the public domain with transparent guidelines as to how they are calculated. If these measures are going to be used to calculate distributions of government funding, they have to be as fair as possible.
There is no perfect measure of science due to the simple fact that it is inherently unpredictable. This instability is the lifeline of research. Black swan discoveries and insights have massive consequences and are entirely invisible to predictive measures. Metrics are, at best, lagging indicators that focus on the fact that track record has some predictive value. A number of agencies are realizing these limitations and ask for 3-5 of your papers (your selection) and ask you to explain why you think they are significant. This is progress but we know that qualitative measures are more difficult for ranking people/institutions and this is one of the primary uses of such measures (of course, when we score grant applications, we are converting a qualitative metric into a quantitative metric, for good or bad). Hence, we are stuck with quantitative metrics, warts and all. Improving on these is important, as well as education about their limits.
Jim, thanks for the comment and for the great suggestions. Picking up on “why are papers still designed to be static?” question. It seems to me that we’re not the only ones with ideas to change papers (and citations etc.) for the better, so what is keeping things so… static. I guess some publishers are pushing the envelope – eLife (and EMBO) springs to mid – but the rest? The “article of the future” from Cell Press was not exactly a huge improvement on what went before. There seems to be no interest in innovation from the major publishers who, as we were reminded recently, are making huge profits for their stakeholders.
One other thing about research assessment (of individuals) is the assumption that the playing field is level and that we can take the numbers at face value. Imagine two scientists. Scholar A has an H-index of 20 and Scholar B has an H-index of 15. They’ve both been active for 20 years and working in the same area. So it seems that Scholar A is superior. However, Scholar B works at a mediocre institute with poor facilities and has struggled for funding whereas Scholar A is at a core-funded research institute. Who is the better researcher?
This is a great discussion with many good suggestions pointing to feasible approaches for improving the system.
I think, however, we all agree that no matter how hard we try, our metrics will remain only approximate, far from deserving the 4 significant figures with which nature magazine reports its JIF. So I think we should all be aware of the approximate/qualitative nature of the “quantitative” metrics that we use to evaluate scientific research and recognize that improved metrics alone cannot be the only solution; we should improve the metrics but that direction is fundamentally limited. Many expectational papers are rejected even before they are published and can be evaluated by the frequency of their citations. Rather, I think that we can help scientific research more by changing what we emphasize in our scientific culture, by flattening the administrative hierarchy and making individual research units/groups smaller, by embracing new approaches to sharing and evaluating our results.
Part of being being a good quantitative researcher is recognizing the limits of quantisation, and not reporting numbers with precision exceeding our measurement precision.
I meant “exceptional papers” not “expectational papers”
Thank you for the comment. I’d encourage all readers to check those posts out. Actually, I have a post brewing about the ridiculousness of JIFs reported to 3 d.p. This is based on analysis of 2012 Impact Factors that I did a while back. In truth, I can’t face redoing the analysis on the 2013 IFs, so the post has stagnated.
Late to the party, but I suspect that as a PI you may possibly be underestimating the importance of reviews for trainees and for people who are brand new to a particular field; if you already know a field pretty well, I think it’s difficult to be impressed by a review. I agree it’s kind of insane to put them on the same numerical scale as research articles, though. They’re different beasts.
Comments are closed.