All This And More

I was looking at the latest issue of Cell and marvelling at how many authors there are on each paper. It’s no secret that the raison d’être of Cell is to publish the “last word” on a topic (although whether it fulfils that objective is debatable). Definitive work needs to be comprehensive. So it follows that this means lots of techniques and ergo lots of authors. This means it is even more impressive when a dual author paper turns up in the table of contents for Cell. Anyway, I got to thinking: has it always been the case that Cell papers have lots of authors and if not, when did that change?

I downloaded the data for all articles published by Cell (and for comparison, J Cell Biol) from Scopus. The records required a bit of cleaning. For example, SnapShot papers needed to be removed and also the odd obituary etc. had been misclassified as an article. These could be quickly removed. I then went back through and filtered out ‘articles’ that were less than three pages as I think it is not possible for a paper to be two pages or fewer in length. The data could be loaded into IgorPro and boxplots generated per year to show how author number varied over time. Reviews that are misclassified as Articles will still be in the dataset, but I figured these would be minimal.

Authors1First off: Yes, there are more authors on average for a Cell paper versus a J Cell Biol paper. What is interesting is that both journals had similar numbers of authors when Cell was born (1974) and they crept up together until the early 2000s, when the number of Cell authors kept increasing, or JCell Biol flattened off, whichever way you look at it.

I think the overall trend to more authors is because understanding biology has increasingly required multiple approaches and the bar for evidence seems to be getting higher over time. The initial creep to more authors (1974-2000) might be due to a cultural change where people (technicians/students/women) began to get proper credit for their contributions. However, this doesn’t explain the divergence between J Cell Biol and Cell in recent years. One possibility is Cell takes more non-cell biology papers and that these papers necessarily have more authors. For example, the polar bear genome was published in Cell (29 authors), and this sort of paper would not appear in J Cell Biol. Another possibility is that J Cell Biol has a shorter and stricter revision procedure, which means that multiple rounds of revision, collecting new techniques and new authors is more limited than it is at Cell. Any other ideas?

AuthorI also quickly checked whether more authors means more citations, but found no evidence for such a relationship. For papers published in the years 2000-2004, the median citation number for papers with 1-10 authors was pretty constant for J Cell Biol. For Cell, these data mere more noisy. Three-author papers tended to be cited a bit more than those with two authors, but then four author papers were also lower.

The number of authors on papers from our lab ranges from 2-9 and median is 3.5. This would put an average paper from our lab in the bottom quartile for JCB and in the lower 10% for Cell in 2013. Ironically, our 9 author paper (an outlier) was published in J Cell Biol. Maybe we need to get more authors on our papers before we can start troubling Cell with our manuscripts…


The Post title is taken from ‘All This and More’ by The Wedding Present from their LP George Best.

Blast Off!

This post is about metrics and specifically the H-index. It will probably be the first of several on this topic.

I was re-reading a blog post by Alex Bateman on his affection for the H-index as a tool for evaluating up-and-coming scientists. He describes Jorge Hirsch’s H-index, its limitations and its utility quite nicely, so I won’t reiterate this (although I’ll probably do so in another post). What is under-appreciated is that Hirsch also introduced the m quotient, which is the H-index divided by years since the first publication. It’s the m quotient that I’ll concentrate on here. The TL;DR is: I think that the H-index does have some uses, but evaluating early career scientists is not one of them.

Anyone of an anti-metrics disposition should look away now.

Alex proposes that the scientists can be judged (and hired) by using m as follows:

  • <1.0 = average scientist
  • 1.0-2.0 = above average
  • 2.0-3.0 = excellent
  • >3.0 = stellar

He says “So post-docs with an m-value of greater than three are future science superstars and highly likely to have a stratospheric rise. If you can find one, hire them immediately!”.

From what I have seen, the H-index (and therefore m) is too noisy for early stage career scientists to be of any use for evaluation. Let’s leave that aside for the moment. What he is saying is you should definitely hire a post-doc who has published ≥3 papers with ≥3 citations each in their first year, ≥6 with ≥6 citations each in their second year, ≥9 papers with ≥9 in their third year…

Do these people even exist? A candidate with 3 year PhD and a 3 year postdoc (6 would mean ≥18 papers with ≥18 citations each! In my field (molecular cell biology), it is unusual for somebody to publish that many papers, let alone accrue citations at that rate*.

This got me thinking: using Alex’s criteria, how many stellar scientists would we miss out on and would we be more likely to hire the next Jan Hendrik Schön. To check this out I needed to write a quick program to calculate H-index by year (I’ll describe this in a future post). Off the top of my head I thought of a few scientists that I know of, who are successful by many other measures, and plotted their H-index by year. The dotted line shows a constant m of 1,  “average” by Alex’s criteria. I’ve taken a guess at when they became a PI. I have anonymised the scholars, the information is public and anyone can calculate this, but it’s not fair to identify people without asking (hopefully they can’t recognise themselves – if they read this!).

This is a small sample taken from people in my field. You can see that it is rare for scientists to have a big m at an early stage in their careers. With the exception of Scholar C, who was just awesome from the get-go, panels appointing any of these scholars would have had trouble divining the future success of these people on the basis of H-index and m alone. Scholar D and Scholar E really saw their careers take-off by making big discoveries, and these happened at different stages of their careers. Both of these scholars were “below average” when they were appointed as PI. The panel would certainly not have used metrics in their evaluation (the databases were not in wide use back then), probably just letters of recommendation and reading the work. Clearly, they could identify the potential in these scientists… or maybe they just got lucky. Who knows?!

There may be other fields where publication at higher rates can lead to a large m but I would still question the contribution of the scientist to the papers that led to the H-index. Are they first or last author? One problem with the H-index is that the 20th scientist in a list of 40 authors gets the same credit as the first author. Filtering what counts in the list of articles seems sensible, but this would make the values even more noisy for early stage scientists.

 

*In the comments section, somebody points out that if you publish a paper very early then this affects your m value. This is something I sympathise with. My first paper was in 1999 when I was an undergrad. This dents my m value as it was a full three years until my next paper.

The post title is taken from ‘Blast Off!’ by Rivers Cuomo from ‘Songs from the Black Hole’ the unreleased follow-up to Pinkerton.

Falling and Landing

A great quote from a classic paper by J.B.S. Haldane “On Being The Right Size” (1926).

You can drop a mouse down a thousand-yard mine shaft; and, on arriving at the bottom, it gets a slight shock and walks away, provided that the ground is fairly soft. A rat is killed, a man is broken, a horse splashes.

The paper is available here.


The post title is taken from ‘Falling and Landing’ by The Delgados from their LP ‘Domestiques’.

Very Best Years

What was the best year in music?

OK, I have to be upfront and say that I thought the answer to this would be 1991. Why? Just a hunch. Nevermind, Loveless, Spiderland, Laughing Stock… it was a pretty good year. I thought it would be fun to find out if there really was a golden year in music. It turns out that it wasn’t 1991.

There are many ways to look at this question, but I figured that a good place to start was to find what year had the highest density of great LPs. But how do we define a great LP? Music critics are notorious for getting it wrong and so I’m a big fan of rateyourmusic.com (RYM) which democratises the grading process for music by crowdsourcing opinion. It allows people to rate LPs in their collection and these ratings are aggregated via a slightly opaque system and the albums are ranked into charts. I scraped the data for the Top 1000 LPs of All-Time*. Crunching the numbers was straightforward. So what did it show?

Looking at the Top 1000, 1971 and 1972 are two years with the highest representation. Looking at the Top 500 LPs, 1971 is the year with most records. Looking at the Top 100, the late 60s features highly.

To look at this in detail, I plotted the rank versus year. This showed that there was a gap in the early 80s where not many Top 1000 LPs were released. This could be seen in the other plots but, it’s clearer on the bubble plot. Also the cluster of high ranking LPs released in the 1960s is obvious.

The plot is colour-coded to show the rank, while the size of the bubbles indicates the rating. Note that rating doesn’t correlate with rank (RYM also factors in number of ratings and user loyalty, to determine this). To take the ranking into account, I calculated the “integrated score” for all albums released in a given year. The score is 1001-rank, and the summation of all of these scores for albums released in a given year gives the integrated score.

This is shown on a background of scores for each decade. Again, 1970s rule and 1971 is the peak. The shape of this profile will not surprise music fans. The first bump in the late 50s coincides with rock n roll, influential jazz records and the birth of the LP as a serious format. The 60s sees a rapid increase in density of great albums per year, hitting a peak in 1971. The decline that follows is halted by a spike in 1977: punk. There’s a relative dearth of highly rated LPs in the early 80s and things really tail off in the early 2000s. The lack of highly rated LPs in these later years is probably best explained by few ratings, due to young age of these LPs. Also diversification of music styles, tastes and the way that music is consumed is likely to play a role. The highest ranked LP on the list is Radiohead’s OK Computer (1997) which was released in a non-peak year. Note that 1991 does not stand out particularly. In fact, in the 1990s, 1994 stands out as the best year for music.

Finally, RYM has a nice classification system for music so I calculated the integrated score for these genres and sub-genres (cowpunk, anyone?). Rock (my definition) is by far the highest scoring and Singer-Songwriter is the highest scoring genre/sub-genre.

So there you have it. 1971 was the best year in music according to this analysis. Now… where’s my copy of Tago Mago.

 

* I did this mid-April. I doubt it’s changed much. This was an exercise to learn how to scrape and I also don’t think I broke the terms of service of RYM. If I did, I’ll take this post down.

The title of this post comes from ‘Very Best Years’ by The Grays from their LP ‘Ro Sham Bo’. It was released in 1994…

Into The Great Wide Open

We have a new paper out! You can read it here.

I thought I would write a post on how this paper came to be and also about our first proper experience with preprinting.

Title of the paper: Non-specificity of Pitstop 2 in clathrin-mediated endocytosis.

In a nutshell: we show that Pitstop 2, a supposedly selective clathrin inhibitor acts in a non-specific way to inhibit endocytosis.

Authors: Anna Willox, who was a postdoc in the lab from 2008-2012, did the flow cytometry measurements and Yasmina Sahraoui who was a summer student in my lab, did the binding experiments. And me.

Background: The description of “pitstops” – small molecules that inhibit clathrin-mediated endocytosis – back in 2011 in Cell was heralded as a major step-forward in cell biology. And it really would be a breakthrough if we had ways to selectively switch off clathrin-mediated endocytosis. Lots of nasty things gain entry into cells by hijacking this pathway, including viruses such as HIV and so if we could stop viral entry this could prevent cellular infection. Plus, these reagents would be really handy in the lab for cell biologists.

The rationale for designing the pitstop inhibitors was that they should block the interaction between clathrin and adaptor proteins. Adaptors are the proteins that recognise the membrane and cargo to be internalised – clathrin itself cannot do this. So if we can stop clathrin from binding adaptors there should be no internalisation – job done! Now, in 2000 or so, we thought that clathrin binds to adaptors via a single site on its N-terminal domain. This information was used in the drug screen that identified pitstops. The problem is that, since 2000, we have found that there are four sites on the N-terminal domain of clathrin that can each mediate endocytosis. So blocking one of these sites with a drug, would do nothing. Despite this, pitstop compounds, which were shown to have a selectivity for one site on the N-terminal domain of clathrin, blocked endocytosis. People in the field scratched their hands at how this is possible.

A damning paper was published in 2012 from Julie Donaldson’s lab showing that pitstops inhibit clathrin-independent endocytosis as well as clathrin-mediated endocytosis. Apparently, the compounds affect the plasma membrane and so all internalisation is inhibited. Many people thought this was the last that we would hear about these compounds. After all, these drugs need to be highly selective to be any use in the lab let alone in the clinic.

Our work: we had our own negative results using these compounds, sitting on our server, unpublished. Back in February 2011, while the Pitstop paper was under revision, the authors of that study sent some of these compounds to us in the hope that we could use these compounds to study clathrin on the mitotic spindle. The drugs did not affect clathrin binding to the spindle (although they probably should have done) and this prompted us to check whether the compounds were working – they had been shipped all the way from Australia so maybe something had gone wrong. We tested for inhibition of clathrin-mediated endocytosis and they worked really well.

At the time we were testing the function of each of the four interaction sites on clathrin in endocytosis, so we added Pitstop 2 to our experiments to test for specificity. We found  that Pitstop 2 inhibits clathrin-mediated endocytosis even when the site where Pitstops are supposed to bind, has been mutated! The picture shows that the compound (pink) binds where sequences from adaptors can bind. Mutation of this site doesn’t affect endocytosis, because clathrin can use any three of the other four sites. Yet Pitstop blocks endocytosis mediated by this mutant, so it must act elsewhere, non-specifically.

So the compounds were not as specific as claimed, but what could we do with this information? There didn’t seem enough to publish and I didn’t want people in the lab working on this as it would take time and energy away from other projects. Especially when debunking other people’s work is such a thankless task (why this is the case, is for another post). The Dutta & Donaldson paper then came out, which was far more extensive than our results and so we moved on.

What changed?

A few things prompted me to write this work up. Not least, Yasmina had since shown that our mutations were sufficient to prevent AP-2 binding to clathrin. This result filled a hole in our work. These things were:

  1. People continuing to use pitstops in published work, without acknowledging that they may act non-specifically. The turning point was this paper, which was critical of the Dutta & Donaldson work.
  2. People outside of the field using these compounds without realising their drawbacks.
  3. AbCam selling this compound and the thought of other scientists buying it and using it on the basis of the original paper made me feel very guilty that we had not published our findings.
  4. It kept getting easier and easier to publish “negative results”. Journals such as Biology Open from Company of Biologists or PLoS ONE and preprint servers (see below) make this very easy.

Finally, it was a twitter conversation with Jim Woodgett convinced me that, when I had the time, I would write it up.

To which, he replied:

I added an acknowledgement to him in our paper! So that, together with the launch of bioRxiv, convinced me to get the paper online.

The Preprinting Experience

This paper was our first proper preprint. We had put an accepted version of our eLife paper on bioRxiv before it came out in print at eLife, but that doesn’t really count. For full disclosure, I am an affiliate of bioRxiv.

The preprint went up on 13th February and we submitted it straight to Biology Open the next day. I had to check with the Journal that it was OK to submit a deposited paper. At the time they didn’t have a preprint policy (although I knew that David Stephens had submitted his preprinted paper there and he told me their policy was about to change). Biology Open now accept preprinted papers – you can check which journals do and which ones don’t here.

My idea was that I just wanted to get the information into the public domain as fast as possible. The upshot was, I wasn’t so bothered about getting feedback on the manuscript. For those that don’t know: the idea is that you deposit your paper, get feedback, improve your paper then submit it for publication. In the end I did get some feedback via email (not on the bioRxiv comments section), and I was able to incorporate those changes into the revised version. I think next time, I’ll deposit the paper and wait one week while soliciting comments and then submit to a journal.

It was viewed quite a few times in the time while the paper was being considered by Biology Open. I spoke to a PI who told me that they had found the paper and stopped using pitstop as a result. I think this means getting the work out there was worth it after all.

Now it is out “properly” in Biology Open and anyone can read it.

Verdict: I was really impressed by Biology Open. The reviewing and editorial work were handled very fast. I guess it helps that the paper was very short, but it was very uncomplicated. I wanted to publish with Biology Open rather than PLoS ONE as the Company of Biologists support cell biology in the UK. Disclaimer: I am on the committee of the British Society of Cell Biology which receives funding from CoB.

Depositing the preprint at bioRxiv was easy and for this type of paper, it is a no-brainer. I’m still not sure to what extent we will preprint our work in the future. This is unchartered territory that is evolving all the time, we’ll see. I can say that the experience for this paper was 100% positive.

References

Dutta, D., Williamson, C. D., Cole, N. B. and Donaldson, J. G. (2012) Pitstop 2 is a potent inhibitor of clathrin-independent endocytosis. PLoS One 7, e45799.

Lemmon, S. K. and Traub, L. M. (2012) Getting in Touch with the Clathrin Terminal Domain. Traffic, 13, 511-9.

Stahlschmidt, W., Robertson, M. J., Robinson, P. J., McCluskey, A. and Haucke, V. (2014) Clathrin terminal domain-ligand interactions regulate sorting of mannose 6-phosphate receptors mediated by AP-1 and GGA adaptors. J Biol Chem. 289, 4906-18.

von Kleist, L., Stahlschmidt, W., Bulut, H., Gromova, K., Puchkov, D., Robertson, M. J., MacGregor, K. A., Tomilin, N., Pechstein, A., Chau, N. et al. (2011) Role of the clathrin terminal domain in regulating coated pit dynamics revealed by small molecule inhibition. Cell 146, 471-84.

Willox, A.K., Sahraoui, Y.M.E. & Royle, S.J. (2014) Non-specificity of Pitstop 2 in clathrin-mediated endocytosis Biol Open, doi: 10.1242/​bio.20147955.

Willox, A.K., Sahraoui, Y.M.E. & Royle, S.J. (2014) Non-specificity of Pitstop 2 in clathrin-mediated endocytosis bioRxiv, doi: 10.1101/002675.

The post title is taken from ‘Into The Great Wide Open’ by Tom Petty and The Heartbreakers from the LP of the same name.

Give, Give, Give Me More, More, More

A recent opinion piece published in eLife bemoaned the way that citations are used to judge academics because we are not even certain of the veracity of this information. The main complaint was that Google Scholar – a service that aggregates citations to articles using a computer program – may be less-than-reliable.

There are three main sources of citation statistics: Scopus, Web of Knowledge/Science and Google Scholar; although other sources are out there. These are commonly used and I checked out how comparable these databases are for articles from our lab.

The ratio of citations is approximately 1:1:1.2 for Scopus:WoK:GS. So Google Scholar is a bit like a footballer, it gives 120%.

I first did this comparison in 2012 and again in 2013. The ratio has remained constant, although these are the same articles, and it is a very limited dataset. In the eLife opinion piece, Eve Marder noted an extra ~30% citations for GS (although I calculated it as ~40%, 894/636=1.41). Talking to colleagues, they have also noticed this. It’s clear that there is some inflation with GS, although the degree of inflation may vary by field. So where do these extra citations come from?

  1. Future citations: GS is faster than Scopus and WoK. Articles appear there a few days after they are published, whereas it takes several weeks or months for the same articles to appear in Scopus and WoK.
  2. Other papers: some journals are not in Scopus and WoK. Again, these might be new journals that aren’t yet included at the others, but GS doesn’t discriminate and includes all papers it finds. One of our own papers (an invited review at a nascent OA journal) is not covered by Scopus and WoK*. GS picks up preprints whereas the others do not.
  3. Other stuff: GS picks up patents and PhD theses. While these are not traditional papers, published in traditional journals, they are clearly useful and should be aggregated.
  4. Garbage: GS does pick up some stuff that is not a real publication. One example is a product insert for an antibody, which has a reference section. Another is duplicate publications. It is quite good at spotting these and folding them into a single publication, but some slip through.

OK, Number 4 is worrying, but the other citations that GS detects versus Scopus and WoK are surely a good thing. I agree with the sentiment expressed in the eLife paper that we should be careful about what these numbers mean, but I don’t think we should just disregard citation statistics as suggested.

GS is free, while the others are subscription-based services. It did look for a while like Google was going to ditch Scholar, but a recent interview with the GS team (sorry, I can’t find the link) suggests that they are going to keep it active and possibly develop it further. Checking out your citations is not just an ego-trip, it’s a good way to find out about articles that are related to your own work. GS has a nice feature that send you an email whenever it detects a citation for your profile. The downside of GS is that its terms of service do not permit scraping and reuse, whereas downloading of subsets of the other databases is allowed.

In summary, I am a fan of Google Scholar. My page is here.

 

* = I looked into this a bit more and the paper is actually in WoK, it has no Title and it has 7 citations (versus 12 in GS). Although it doesn’t come up in a search for Fiona or for me.

hood

 

However, I know from GS that this paper was also cited in a paper by the Cancer Genome Atlas Network in Nature. WoK listed this paper as having 0 references and 0 citations(!). Does any of this matter? Well, yes. WoK is a Thomson Reuters product and is used as the basis for their dreaded Impact Factor – which (like it or not) is still widely used for decision making. Also many Universities use WoK information in their hiring and promotions processes.

The post title comes from ‘Give, Give, Give Me More, More, More’ by The Wonder Stuff from the LP ‘Eight Legged Groove Machine’. Finding a post title was difficult this time. I passed on: Pigs (Three Different Ones) and Juxtapozed with U. My iTunes library is lacking songs about citations…

Some Things Last A Long Time

How long does it take to publish a paper?

The answer is – in our experience, at least – about 9 months.

That’s right, it takes about the same amount of time to have a baby as it does to publish a scientific paper. Discussing how we can make the publication process quicker is for another day. Right now, let’s get into the numbers.

The graphic shows the time taken from submission-to-publication for papers on which I am an author. I’m missing data for two papers (one from 1999 and one from 2002) and the Biol Open paper is published online but not yet “in print”, but mostly the information is complete. If you want to calculate this for your own papers; my advice would be to keep a spreadsheet of submission and decision dates as you go along… and archive your emails.

In the last analysis, a few people pointed out ways that the graphic could be improved, and I’ve now implemented these changes.

The graphic shows that the journey to publication is in four eras:

  1. Pre-time (before 0 on the x-axis): this is the time from first submission to the first journal. A dark time which involves rejection.
  2. Submission at the final journal (starting at time 0). Again, the orange periods are when the manuscript is with the journal and the green, when it is with us. Needless to say this green time is mainly spent doing experimental work (compare green periods for reviews and for papers)
  3. Acceptance! This is where the orange bar stops. The manuscript is then readied for publication (blank area).
  4. Published online. A purple period that ends with final publication in print.

Note that: i) the delays are more-or-less negated by preprinting provided deposition is before the first submission (grey line, for Biol Open paper), ii) these delay diagrams do not take into account the original drafting/rewriting cycle before the fist submission – nor the time taken to do the work!

So… how long does it take to publish a paper?

In the top right graph: the time from first submission to being published online is 250 days on average (median). This is shown by the blue bar. If we throw in the average time it takes to go from online to print (15 days) this gives 265 days. The average time for human gestation is 266 days. So it takes about the same amount of time to have a baby as it does to publish a paper! By contrast, reviews take only 121 days, equivalent to four lunar cycles (118 days).

My 2005 paper at Nature holds the record for the most protracted publication 399 days from submission to publication. The fastest publication is the most recent, our Biol Open paper was online 49 days after submission (it was also online 1 day before submission as a preprint).

In the bottom right graph: I added together the total time each paper was either with the journal, or with us, and plotted the average. The time from acceptance-to-publication online is shown stacked onto the “time with journal” column. You can see from this graphic that the lion’s share of the delay comes from revisions that we must do in order for a paper to be published. Multiple revisions and submissions also push these numbers up compared to the totals for reviews.

How representative are these numbers?

This is a small dataset at many different journals and so it is difficult to conclude much. With this analysis, I was hoping to identify ‘slow journals’ that we should avoid and also to think about our publication strategy (as much as a crap shoot can have a strategy). The whole process is stochastic and I don’t see any reason to change the way that we navigate the system. Having said this, I can’t see us doing any more methods/book chapters, as they are just so slow.

Just over half of our papers have some “pre-time”, i.e. they got rejected from at least one other journal before finding a home. A colleague of mine likes to say:

“if your paper is accepted at the first journal you send it to, you sent it to the wrong place”

One thing for sure is that publication takes a long time. And I don’t think our experience is uncommon. The pace of scientific publishing has been described as glacial by Leslie Vosshall and I don’t disagree with this. I think the 9 months figure is probably representative for most areas of biology. I know that other scientists in my field, who have more tenacity for rejections and for slugging it out at high impact journals, have much longer times from 1st submission to acceptance. In my opinion, wasting even more time chasing publication is crazy, counter-productive and demotivating for the people in the lab.

The irony in all this is that, even though we are working at the absolute bleeding edge of science with all of this technology at our disposal, our methods for reporting science are badly out of date. And with that I’ll push the “publish” button and this will be online…

The title of this post comes from ‘Some Things Last A Long Time’ by Daniel Johnston from his LP ‘1990’.

I’m Gonna Crawl

Fans of data visualisation will know the work of Edward Tufte well. His book “The Visual Display of Quantitative Information” is a classic which covers the history and the principals of conveying data in a concise way, that is easy to interpret. He is also credited with two different dataviz techniques: sparklines and image quilts. It was these two innovations that came to mind when I was discussing some cell migration results generated in our lab.

Sparklines are small displays of 1D information versus time to highlight the profile (think: stocks and shares).

Image quilts are arrays of images that together quickly provide you with an overview (think: Google Images results).

Analysing cell migration generates ‘tracks’ of many cells as they move around a 2D surface. Tracks are pairs of XY co-ordinates at different time points. We want to understand how these tracks change if we do something to the cells, e.g. knock-down a particular protein. There are many ways to analyse this. Such as: looking at the speed of migration, their directionality, etc. etc. When we were looking at lots of tracks, all jumbled up, I thought of sparklines and of image quilts and thought the easiest way to compare a control and test group would be to generate something similar.

We start out with many tracks within a field:

 

overviewIt’s difficult to see what is happening here, so it needs to be simplified.

I wrote a couple of procedures in IgorPro that calculated the cumulative distance that each cell had migrated at a given time point (say, the end of the movie). These cumulative distances were then ranked and then the corresponding cells were arrayed in the x-dimension according to how far they migrated. This was a little bit tricky to do, but that’s another story.

 

This plot shows the tracks with the shortest/slowest to the left and the furthest/fastest to the right. This can then be compared to a test set and differences become apparent. However, we need to look at many tracks and expanding these “sparklines” further is not practical – we want to provide an overview.

Accordingly, I wrote another procedure to array them in an XY array with a given spacing between the start points. This should give an “image quilt” feel.

I added gridlines to indicate the start position. The result is that a nice overview is seen and differences between groups can be easily seen at first glance (or not seen if there is no effect!).

This method works well to compare control and test groups that have a similar number of cells. If N is different (say, more than 10%), we need to take a random sample of tracks and array those to get a feel for what’s happening. Obviously the tracks could be arrayed according whatever parameter is required, e.g. highest speed, most directional etc. etc.

One thought is to do a further iteration where the tracks are oriented so that the start and end points are at the same point in X, or oriented so that the tracks have the same starting trajectory. As it is, the mix of trajectories spoils the ease of interpretation.

Obviously, this can be applied to tracks of anything: growing and shrinking microtubules, endosome/lysosome movement etc. etc.

Any suggestions for improvements are welcome, but I think this is a quick and easy way to just eyeball the data to see if there are any differences before calculating any other parameters. I thought I’d put the idea out there – maybe together with the code if there is any interest.

The post title is from I’m Gonna Crawl – Led Zeppelin from their In Through The Out Door LP

My Blank Pages

Books about the MRC Laboratory of Molecular Biology are plentiful. If you haven’t read any, the best place to start are the books written by some of the Nobelists themselves: “I Wish I’d Made You Angry Earlier” by Perutz, “My Life in Science” by Brenner. Also, “Sequences, Sequence, Sequences” by Sanger, “What Mad Pursuit” by Crick and even Watson’s “The Double Helix” cover ‘how it was done’ and ‘what the place is like’. After that are the biographies of the Nobelists and their associates. Then comes the next layer, the comprehensive but rather dry “Designs for Life: Molecular Biology after World War II” by de Chadarevian and hell, even “The Eighth Day of Creation” by Judson is substantially about the LMB, since so many major discoveries in Molecular Biology happened there.

If your appetite is not sated after wading through all of those, then there are the books for the insiders.

John Finch wrote a book “A Nobel Fellow on Every Floor” which was enjoyable, if rather selective on who and what was included. The latest book from the LMB Press is a collection of essays entitled “Memories and Consequences: Visiting Scientists at the MRC Laboratory of Molecular Biology, Cambridge”. It was edited by Hugh Huxley and was made available last summer (around the time of his death).
You can get it here

 

memories

The premise of Memories and Consequences is that there were a large number of postdoctoral fellows, mainly from the USA, who spent time at the LMB (in the 60s, mainly) and then went away and had hugely successful scientific careers. At one point in the book, Tom Steitz writes that, of his friends during this period, 40% are now NAS members! The essays cover the time of these visitors in England and how it shaped their subsequent careers.

This is definitely a book to dip in and out of. The experiences are actually pretty repetitive: yes, we drive on the other side of the road; Cambridge is a very stuffy place and Max Perutz liked to be called Max. This repetition is amplified if the chapters are read one-after-the-other. Overall however, the essays are nice reminiscences of a booming time in Molecular Biology and many capture the magic of working at the LMB during this period. Brenner and Crick come to life and even Sir Lawrence Bragg looms large in many chapters filling the authors with awe.

When I first downloaded the book, I read the chapters by those whose work I am most familiar. I didn’t even know that Dick McIntosh had spent not one but two sabbaticals at the LMB. Tom Pollard, Harvey Lodish etc. followed. I then read the other chapters when I had more time.

The best chapters were those by Harry Noller and by Peter Moore who gave the right amount (for my taste) of personal insight to their stay at the LMB. I would recommend that the reader skips the chapter by William Dove and Alexandra Shevlovsky, who tried to be a bit clever and didn’t quite pull it off. Sid Altman’s chapter has previously been published and I actually witnessed him read this out (more-or-less) verbatim at the DNA50+1 celebrations – which was far more enjoyable than it sounds.

In short, I enjoyed the book and it’s worth reading some of the chapters if you have a leaning towards the history of science, but there are plenty of other books (listed above) where you should start if you want to find out what life is like inside the Nobel Prize Factory.

I’ll leave you with three quotes that I enjoyed immensely:

“I remember seeing copies of the journal Cell, where we all yearned to publish (though, I noticed, not the really great scientists, like John Sulston or Sydney Brenner). I would shudder and turn away; Cell was for other scientists, not for me.”
Cynthia Kenyon

“Like many others who worked at the LMB in that era, I still think of its modus operandi as exemplifying the blueprint that all scientific research establishments should aspire to emulate. Pack the very best scientists you can find into a building, so densely that they cannot avoid talking to each other, and encourage them to interact in every other way you can. A canteen or dining room might be a good idea. (The facility itself need not be luxurious, and indeed, it is probably better if it is not.) Give those scientists ample staff support, and all the money they need to get on with the job. Stir well, and then be patient because good science takes time. My subsequent career has taught me that this recipe is much harder to execute than it is to describe. I still wonder how the MRC managed to do it so well for so long.”
Peter Moore

“I learned that protein chemistry didn’t need me, that King’s College High Table was for tougher folk than I, and that Sydney talked but Francis conversed.”
Frank Stahl

A comprehensive guide to LMB books is available here

Don’t worry, book reviews will be a very infrequent feature as I hardly have any time to read books these days!

The post title is from My Blank Pages – Velvet Crush from their LP Teenage Symphonies to God. Presumably a play on the Dylan/Byrds song My Back Pages.

All Together Now

In the lab we use IgorPro from Wavemetrics for analysis. Here is a useful procedure to plot all XY pairs in an experiment. I was plotting out some cell tracking data with a colleague and I knew that I had this useful function buried in an experiment somewhere. I eventually found it and thought I’d post it here. I’ll add it to the code section of the website soon. Looking at it, it doesn’t look like it was written by me. A search of IgorExchange didn’t reveal its author, so maybe it was me. Apologies if it wasn’t.

The point is: if you have a bunch of XY pairs and you just want to plot all of them in one window to look at them. If they are 2D waves or a small number of 1D waves, this is straightforward. If you have hundreds, you need a function!

An example would be fluorescence recordings versus time (where each time wave is unique to the fluorescence trace) or XY co-ordinates of a particle in space.

To use this procedure, you need an experiment with a logical naming system for 1D waves. something like X_ctrl1, X_ctrl2, X_ctrl3 etc. and Y_ctrl1, Y_ctrl2, Y_ctrl3 etc. Paste the following into the Procedure Window (command+m).


Function PlotAllWaves(theYList,theXlist)
	String theYList
	String theXList
 	display
	Variable i=0
	string aWaveName = ""
	string bWaveName = ""
	do
		aWaveName = StringFromList(i, theYList)
		bWavename = StringFromList(i, theXList)
		WAVE/Z aWave = $aWaveName
		WAVE/Z bWave = $bWaveName
		if (!WaveExists(aWave))
			break
		endif
 		appendtograph aWave vs bWave
		i += 1
	while(1)
End

After compiling you can call the function by typing in the Command Window:


PlotAllWaves(wavelist("x_*", ";", ""),wavelist("y_*", ";", ""))

You’ll need to change this for whatever convention you are using for your wave naming system. You will know how to do this if you have got this far!

This function is very useful for just eyeballing the data after you have imported it. The databrowser shows only one wave at a time, but it is preferable to look at all the waves to find errors, spot outliers or trends etc.

Edit 28/4/15: the logical naming system and the order in which the waves were added to the experiment are crucial for this to work. We’re now using two different versions of this code that either a) check that the waves are compatible or b) concatenate the waves into a 2D wave before plotting. This reduces errors in plotting.

The post title is taken from All Together Now – The Beatles from the Yellow Submarine soundtrack.