Give, Give, Give Me More, More, More

A recent opinion piece published in eLife bemoaned the way that citations are used to judge academics because we are not even certain of the veracity of this information. The main complaint was that Google Scholar – a service that aggregates citations to articles using a computer program – may be less-than-reliable.

There are three main sources of citation statistics: Scopus, Web of Knowledge/Science and Google Scholar; although other sources are out there. These are commonly used and I checked out how comparable these databases are for articles from our lab.

The ratio of citations is approximately 1:1:1.2 for Scopus:WoK:GS. So Google Scholar is a bit like a footballer, it gives 120%.

I first did this comparison in 2012 and again in 2013. The ratio has remained constant, although these are the same articles, and it is a very limited dataset. In the eLife opinion piece, Eve Marder noted an extra ~30% citations for GS (although I calculated it as ~40%, 894/636=1.41). Talking to colleagues, they have also noticed this. It’s clear that there is some inflation with GS, although the degree of inflation may vary by field. So where do these extra citations come from?

  1. Future citations: GS is faster than Scopus and WoK. Articles appear there a few days after they are published, whereas it takes several weeks or months for the same articles to appear in Scopus and WoK.
  2. Other papers: some journals are not in Scopus and WoK. Again, these might be new journals that aren’t yet included at the others, but GS doesn’t discriminate and includes all papers it finds. One of our own papers (an invited review at a nascent OA journal) is not covered by Scopus and WoK*. GS picks up preprints whereas the others do not.
  3. Other stuff: GS picks up patents and PhD theses. While these are not traditional papers, published in traditional journals, they are clearly useful and should be aggregated.
  4. Garbage: GS does pick up some stuff that is not a real publication. One example is a product insert for an antibody, which has a reference section. Another is duplicate publications. It is quite good at spotting these and folding them into a single publication, but some slip through.

OK, Number 4 is worrying, but the other citations that GS detects versus Scopus and WoK are surely a good thing. I agree with the sentiment expressed in the eLife paper that we should be careful about what these numbers mean, but I don’t think we should just disregard citation statistics as suggested.

GS is free, while the others are subscription-based services. It did look for a while like Google was going to ditch Scholar, but a recent interview with the GS team (sorry, I can’t find the link) suggests that they are going to keep it active and possibly develop it further. Checking out your citations is not just an ego-trip, it’s a good way to find out about articles that are related to your own work. GS has a nice feature that send you an email whenever it detects a citation for your profile. The downside of GS is that its terms of service do not permit scraping and reuse, whereas downloading of subsets of the other databases is allowed.

In summary, I am a fan of Google Scholar. My page is here.


* = I looked into this a bit more and the paper is actually in WoK, it has no Title and it has 7 citations (versus 12 in GS). Although it doesn’t come up in a search for Fiona or for me.



However, I know from GS that this paper was also cited in a paper by the Cancer Genome Atlas Network in Nature. WoK listed this paper as having 0 references and 0 citations(!). Does any of this matter? Well, yes. WoK is a Thomson Reuters product and is used as the basis for their dreaded Impact Factor – which (like it or not) is still widely used for decision making. Also many Universities use WoK information in their hiring and promotions processes.

The post title comes from ‘Give, Give, Give Me More, More, More’ by The Wonder Stuff from the LP ‘Eight Legged Groove Machine’. Finding a post title was difficult this time. I passed on: Pigs (Three Different Ones) and Juxtapozed with U. My iTunes library is lacking songs about citations…

I’m Gonna Crawl

Fans of data visualisation will know the work of Edward Tufte well. His book “The Visual Display of Quantitative Information” is a classic which covers the history and the principals of conveying data in a concise way, that is easy to interpret. He is also credited with two different dataviz techniques: sparklines and image quilts. It was these two innovations that came to mind when I was discussing some cell migration results generated in our lab.

Sparklines are small displays of 1D information versus time to highlight the profile (think: stocks and shares).

Image quilts are arrays of images that together quickly provide you with an overview (think: Google Images results).

Analysing cell migration generates ‘tracks’ of many cells as they move around a 2D surface. Tracks are pairs of XY co-ordinates at different time points. We want to understand how these tracks change if we do something to the cells, e.g. knock-down a particular protein. There are many ways to analyse this. Such as: looking at the speed of migration, their directionality, etc. etc. When we were looking at lots of tracks, all jumbled up, I thought of sparklines and of image quilts and thought the easiest way to compare a control and test group would be to generate something similar.

We start out with many tracks within a field:


overviewIt’s difficult to see what is happening here, so it needs to be simplified.

I wrote a couple of procedures in IgorPro that calculated the cumulative distance that each cell had migrated at a given time point (say, the end of the movie). These cumulative distances were then ranked and then the corresponding cells were arrayed in the x-dimension according to how far they migrated. This was a little bit tricky to do, but that’s another story.


This plot shows the tracks with the shortest/slowest to the left and the furthest/fastest to the right. This can then be compared to a test set and differences become apparent. However, we need to look at many tracks and expanding these “sparklines” further is not practical – we want to provide an overview.

Accordingly, I wrote another procedure to array them in an XY array with a given spacing between the start points. This should give an “image quilt” feel.

I added gridlines to indicate the start position. The result is that a nice overview is seen and differences between groups can be easily seen at first glance (or not seen if there is no effect!).

This method works well to compare control and test groups that have a similar number of cells. If N is different (say, more than 10%), we need to take a random sample of tracks and array those to get a feel for what’s happening. Obviously the tracks could be arrayed according whatever parameter is required, e.g. highest speed, most directional etc. etc.

One thought is to do a further iteration where the tracks are oriented so that the start and end points are at the same point in X, or oriented so that the tracks have the same starting trajectory. As it is, the mix of trajectories spoils the ease of interpretation.

Obviously, this can be applied to tracks of anything: growing and shrinking microtubules, endosome/lysosome movement etc. etc.

Any suggestions for improvements are welcome, but I think this is a quick and easy way to just eyeball the data to see if there are any differences before calculating any other parameters. I thought I’d put the idea out there – maybe together with the code if there is any interest.

The post title is from I’m Gonna Crawl – Led Zeppelin from their In Through The Out Door LP

All Together Now

In the lab we use IgorPro from Wavemetrics for analysis. Here is a useful procedure to plot all XY pairs in an experiment. I was plotting out some cell tracking data with a colleague and I knew that I had this useful function buried in an experiment somewhere. I eventually found it and thought I’d post it here. I’ll add it to the code section of the website soon. Looking at it, it doesn’t look like it was written by me. A search of IgorExchange didn’t reveal its author, so maybe it was me. Apologies if it wasn’t.

The point is: if you have a bunch of XY pairs and you just want to plot all of them in one window to look at them. If they are 2D waves or a small number of 1D waves, this is straightforward. If you have hundreds, you need a function!

An example would be fluorescence recordings versus time (where each time wave is unique to the fluorescence trace) or XY co-ordinates of a particle in space.

To use this procedure, you need an experiment with a logical naming system for 1D waves. something like X_ctrl1, X_ctrl2, X_ctrl3 etc. and Y_ctrl1, Y_ctrl2, Y_ctrl3 etc. Paste the following into the Procedure Window (command+m).

Function PlotAllWaves(theYList,theXlist)
	String theYList
	String theXList
	Variable i=0
	string aWaveName = ""
	string bWaveName = ""
		aWaveName = StringFromList(i, theYList)
		bWavename = StringFromList(i, theXList)
		WAVE/Z aWave = $aWaveName
		WAVE/Z bWave = $bWaveName
		if (!WaveExists(aWave))
 		appendtograph aWave vs bWave
		i += 1

After compiling you can call the function by typing in the Command Window:

PlotAllWaves(wavelist("x_*", ";", ""),wavelist("y_*", ";", ""))

You’ll need to change this for whatever convention you are using for your wave naming system. You will know how to do this if you have got this far!

This function is very useful for just eyeballing the data after you have imported it. The databrowser shows only one wave at a time, but it is preferable to look at all the waves to find errors, spot outliers or trends etc.

Edit 28/4/15: the logical naming system and the order in which the waves were added to the experiment are crucial for this to work. We’re now using two different versions of this code that either a) check that the waves are compatible or b) concatenate the waves into a 2D wave before plotting. This reduces errors in plotting.

The post title is taken from All Together Now – The Beatles from the Yellow Submarine soundtrack.