## Sticky End

We have a new paper out! You can access it here.

The work was mainly done by Cristina Gutiérrez Caballero, a post-doc in the lab. We had some help from Selena Burgess and Richard Bayliss at the University of Leicester, with whom we have an ongoing collaboration.

The paper in a nutshell

We found that TACC3 binds the plus-ends of microtubules via an interaction with ch-TOG. So TACC3 is a +TIP.

What is a +TIP?

This is a term used to describe proteins that bind to the plus-ends of microtubules. Microtubules are a major component of the cell’s cytoskeleton. They are polymers of alpha/beta-tubulin that grow and shrink, a feature known as dynamic instability. A microtubule has polarity, the fast growing end is known as the plus-end, and the slower growing end is referred to as the minus-end. There are many proteins that bind to the plus-end and these are termed +TIPs.

OK, so what are TACC3 and ch-TOG?

They are two proteins found on the mitotic spindle. TACC3 is an acronym for transforming acidic coiled-coil protein 3, and ch-TOG stands for colonic hepatic tumour overexpressed gene. As you can tell from the names they were discovered due to their altered expression in certain human cancers. TACC3 is a well-known substrate for Aurora A kinase, which is an enzyme that is often amplified in cancer. The ch-TOG protein is thought to be a microtubule polymerase, i.e. an enzyme that helps microtubules grow. In the paper, we describe how TACC3 and ch-TOG stick together at the microtubule end. TACC3 and ch-TOG are at the very end of the microtubule, they move ahead of other +TIPs like “end-binding proteins”, e.g. EB3.

What is the function of TACC3 as a +TIP?

We think that TACC3 is piggybacking on ch-TOG while it is acting as a polymerase, but any biological function or consequence of this piggybacking was difficult to detect. We couldn’t see any clear effect on microtubule dynamics when we removed or overexpressed TACC3. We did find that loss of TACC3 affects how cells migrate, but this is not likely to be due to a change in microtubule dynamics.

I thought TACC3 and ch-TOG were centrosomal proteins…

In the paper we look again at this and find that there are different pools of TACC3, ch-TOG and clathrin (alone and in combination) and describe how they reside in different places in the cell. Although ch-TOG is clearly at centrosomes, we don’t find TACC3 at centrosomes, although it is on microtubules that cluster near the centrosomes at the spindle pole. TACC3 is often described as a centrosomal protein in lots of other papers, but this is quite misleading.

What else?

We were on the cover – whatever that means in the digital age! We imaged a cell expressing tagged EB3 proteins, EB3 is another +TIP. We coloured consecutive frames different colours and the result looked pretty striking. Biology Open picked it as their cover, which we were really pleased about. Our paper is AOP at the moment and so hopefully they won’t change their mind by the time it appears in the next issue.

Preprinting

This is the second paper that we have deposited as a preprint at bioRxiv (not counting a third paper that we preprinted after it was accepted). I was keen to preprint this particular paper because we became aware that two other groups had similar results following a meeting last summer. Strangely, a week or so after preprinting and submitting to a journal, a paper from a completely different group appeared with a very similar finding! We’d been “scooped”. They had found that the Xenopus homologue of TACC3 was a +TIP in retinal neuronal cultures. The other group had clearly beaten us to it, having submitted their paper some time before our preprint. The reviewers of our paper complained that our data was no longer novel and our paper was rejected. This was annoying because there were lots of novel findings in our paper that weren’t in theirs (and vice versa). The reviewers did make some other constructive suggestions that we incorporated into the manuscript. We updated our preprint and then submitted to Biology Open. One advantage of the preprinting process is that the changes we made can be seen by all. Biology Open were great and took a decision based on our comments from the other journal and the changes we had made in response to them. Their decision to provisionally accept the paper was made in four days. Like our last experience publishing in Biology Open, it was very positive.

References

Gutiérrez-Caballero, C., Burgess, S.G., Bayliss, R. & Royle, S.J. (2015) TACC3-ch-TOG track the growing tips of microtubules independently of clathrin and Aurora-A phosphorylation. Biol. Open doi:10.1242/​bio.201410843.

Nwagbara, B. U., Faris, A. E., Bearce, E. A., Erdogan, B., Ebbert, P. T., Evans, M. F., Rutherford, E. L., Enzenbacher, T. B. and Lowery, L. A. (2014) TACC3 is a microtubule plus end-tracking protein that promotes axon elongation and also regulates microtubule plus end dynamics in multiple embryonic cell types. Mol. Biol. Cell 25, 3350-3362.

The post title is taken from the last track on The Orb’s U.F.Orb album.

## Tips from the blog III – violin plots

Having recently got my head around violin plots, I thought I would explain what they are and why you might want to use them.

There are several options when it comes to plotting summary data. I list them here in order of granularity, before describing violin plots and how to plot them in some detail.

Bar chart

This is the mainstay of most papers in my field. Typically, a bar representing the mean value that’s been measured is shown with an error bar which shows either the standard error of the mean, the standard deviation, or more rarely a confidence interval. The two data series plotted in all cases is the waiting time for Old Faithful eruptions (waiting), a classic dataset from R. I have added some noise to waiting (waiting_mod) for comparison. I think it’s fair to say that most people feel that the bar chart has probably had its day and that we should be presenting our data in more informative ways*.

Pros: compact, easy to tell differences between groups

Cons: hides the underlying distribution, obscures the n number

Box plot

The box plot – like many things in statistics – was introduced by Tukey. It’s sometimes known as a Tukey plot, or a box-and-whiskers plot. The idea was to give an impression of the underlying distribution without showing a histogram (see below). Histograms are great, but when you need to compare many distributions they do not overlay well and take up a lot of space to show them side-by-side. In the simplest form, the “box” is the interquartile range (IQR, 25th and 75th percentiles) with a line to show the median. The whiskers show the 10th and 90th percentiles. There are many variations on this theme: outliers can be shown or not, the whiskers may show the limits of the dataset (or something else), the boxes can be notched or their width may represent the sample size…

Pros: compact, easy to tell differences between groups, shows normality/skewness

Cons: hides multimodal data, sometimes obscures the n number, many variations

Histogram

A histogram is a method of showing the distribution of a dataset and was introduced by Pearson. The number of observations within a bin are counted and plotted. The ‘bars’ sit next to each other, because the variable being measured is continuous. The variable being measured is on the x-axis, rather than the category (as in the other plots).

Often the area of all the bars is normalised to 1 in order to assess the distributions without being confused by differences in sample size. As you can see here, “waiting” is bimodal. This was hidden in the bar chart and in the bot plot.

Related to histograms are other display types such as stemplots or stem-and-leaf plots.

Pros: shows multimodal data, shows normality/skewness clearly

Cons: not compact, difficult to overlay, bin size and position can be misleading

Scatter dot plot

It’s often said that if there are less than 10 data points, then best practice is to simply show the points. Typically the plot is shown together with a bar to show the mean (or median) and maybe with error bars showing s.e.m., s.d., IQR. There are a couple of methods of plotting the points, because they need to be scattered in x value in order to be visualised. Adding random noise is one approach, but this looks a bit messy (top). A symmetrical scatter can be introduced by binning (middle) and a further iteration is to bin the y values rather than showing their true location (bottom). There’s a further iteration which constrains the category width and overlays multiple points, but again the density becomes difficult to see.

These plots still look quite fussy, the binned version is the clearest but then we are losing the exact locations of the points, which seems counterintuitive. Another alternative to scattering the dots is to show a rug plot (see below) where there is no scatter.

Pros: shows all the observations

Cons: can be difficult to assess the distribution

Violin plot

This type of plot was introduced in the software package NCSS in 1997 and described in this paper: Hintze & Nelson (1998) The American Statistician 52(2):181-4 [PDF]. As the title says, violin plots are a synergism between box plot and density trace. A thin box plot is shown together with a symmetrical kernel density estimate (KDE, see explanation below). The point is to be able to quickly assess the distribution. You can see that the bimodality of waiting in the plot, but there’s no complication of lots of points just a smooth curve to see the data.

Pros: shows multimodal data, shows normality/skewness unambiguously

Cons: hides n, not familiar to many readers.

* Why is the bar chart so bad and why should I show my data another way?

The best demonstration of why the bar chart is bad is Anscombe’s Quartet (the figure to the right is taken from the Wikipedia page). These four datasets are completely different, yet they all have the same summary statistics. The point is, you would never know unless you plotted the data. A bar chart would look identical for all four datasets.

Making Violin Plots in IgorPro

I wanted to make Violin Plots in IgorPro, since we use Igor for absolutely everything in the lab. I wrote some code to do this and I might make some improvements to it in the future – if I find the time! This was an interesting exercise, because it meant forcing myself to understand how smoothing is done. What follows below is an aide memoire, but you may find it useful.

What is a kernel density estimate?

A KDE is a non-parametric method to estimate a probability density function of a variable. A histogram can be thought of as a simplistic non-parametric density estimate. Here, a rectangle is used to represent each observation and it gets bigger the more observations are made.

What’s wrong with using a histogram as a KDE?

The following examples are taken from here (which in turn are taken from the book by Bowman and Azzalini described below). A histogram is simplistic. We lose the location of each datapoint because of binning. Histograms are not smooth and the estimate is very sensitive to the size of the bins and also the starting location of the first bin. The histograms to the right show the same data points (in the rug plot).

Using the same bin size, they result in very different distributions depending on where the first bin starts. My first instinct to generate a KDE was to simply smooth a histogram, but this is actually quite inaccurate as it comes from a lossy source. Instead we need to generate a real KDE.

How do I make a KDE?

To do this we place a kernel (a Gaussian is commonly used) at each data point. The rationale behind this is that each observation can be thought of as being representative of a greater number of observations. It sounds a bit bizarre to assume normality to estimate a density non-parametrically, but it works. We can sum all of the kernels to give a smoothed distribution: the KDE. Easy? Well, yes as long as you know how wide to make the kernels. To do this we need to find the bandwidth, h (also called the smoothing parameter).

It turns out that this is not completely straightforward. The answer is summarised in this book: Bowman & Azzalini (1997) Applied Smoothing Techniques for Data Analysis. In the original paper on violin plots, they actually do not have a good solution for selecting h for drawing the violins, and they suggest trying several different values for h. They recommend starting at ~15% of the data range as a good starting point. Obviously if you are writing some code, the process of selecting h needs to be automatic.

Optimising h is necessary because if h is too large, the estimate with be oversmoothed and features will be lost. If is too small, then it will be undersmoothed and bumpy. The examples to the right (again, taken from Bowman & Azzalini, via this page) show examples of undersmoothed, oversmoothed and optimal smoothing.

An optimal solution to find h is

$$h = \left(\frac{4}{3n}\right)^{\frac{1}{5}}\sigma$$

This is termed Silverman’s rule-of-thumb. If smoothing is needed in more than one dimension, the multidimensional version is

$$h = \left\{\frac{4}{\left(p+2\right)n}\right\}^{\frac{1}{\left(p+4\right)}}\sigma$$

You might need multidimensional smoothing to contextualise more than one parameter being measured. The waiting data used above describes the time to wait until the next eruption from Old Faithful. The duration of the eruption is measured, and also the wait to the next eruption can be extracted, giving three parameters. These can give a 3D density estimate as shown here in the example.

The Bowman & Azzalini recommend that, if the distribution is long-tailed, using the median absolute deviation estimator is robust for $$\sigma$$.

$$\tilde\sigma=median\left\{|y_i-\tilde\mu|\right\}/0.6745$$

where $$\tilde\mu$$ is the median of the sample. All of this is something you don’t need to worry about if you use R to plot violins, the implementation in there is rock solid having been written in S plus and then ported to R years ago. You can even pick how the h selection is done from sm.density, or even modify the optimal h directly using hmult.

To get this working in IgorPro, I used some code for 1D KDE that was already on IgorExchange. It needed a bit of modification because it used FastGaussTransform to sum the kernels as a shortcut. It’s a very fast method, but initially gave an estimate that seemed to be undersmoothed. I spent a while altering the formula for h, hence the detail above. To cut a long story short, FastGaussTransform uses Taylor expansion of the Gauss transform and it just needed more terms to do this accurately. This is set with the /TET flag. Note also, that in Igor the width of a Gauss is sigma*2^1/2.

OK, so how do I make a Violin for plotting?

I used the draw tools to do this and placed the violins behind an existing box plot. This is necessary to be able to colour the violins (apparently transparency is coming to Igor in IP7). The other half of the violin needs to be calculated and then joined by the DrawPoly command. If the violins are trimmed, i.e. cut at the limits of the dataset, then this required an extra point to be added. Without trimming, this step is not required. The only other issue is how wide the violins are plotted. In R, the violins are all normalised so that information about n is lost. In the current implementation, box width is 0.1 and the violins are normalised to the area under the curve*(0.1/2). So, again information on n is lost.

Future improvements

Ideas for developments of the Violin Plot method in IgorPro

• incorporate it into the ipf for making boxplots so that it is integrated as an option to ‘calculate percentiles’
• find a better solution for setting the width of the violin
• add other bandwidth options, as in R
• add more options for colouring the violins

What do you think? Did I miss something? Let me know in the comments.

References

Bowman, A.W. & Azzalini, A. (1997) Applied Smoothing Techniques for Data Analysis : The Kernel Approach with S-Plus Illustrations: The Kernel Approach with S-Plus Illustrations. Oxford University Press.

Hintze, J.L. & Nelson, R.D. (1998) Violin plots: A Box Plot-Density Trace Synergism. The American Statistician, 52:181-4.

## Joining A Fanclub

When I started this blog, my plan was to write about interesting papers or at least blog about the ones from my lab. This post is a bit of both.

I was recently asked to write a “Journal Club” piece for Nature Reviews Molecular Cell Biology, which is now available online. It’s paywalled unfortunately. It’s also very short, due to the format. For these reasons, I thought I’d expand a bit on the papers I highlighted.

I picked two papers from Dick McIntosh’s group, published in J Cell Biol in the early 1990s as my subject. The two papers are McDonald et al. 1992 and Mastronarde et al. 1993.

Almost everything we know about the microanatomy of mitotic spindles comes from classical electron microscopy (EM) studies. How many microtubules are there in a kinetochore fibre? How do they contact the kinetochore? These questions have been addressed by EM. McIntosh’s group in Boulder, Colorado have published so many classic papers in this area, but there are many more coming from Conly Rieder, Alexey Khodjakov, Bruce McEwen and many others. Even with the advances in light microscopy which have improved spatial resolution (resulting in a Nobel Prize last year), EM is the only way to see individual microtubules within a complex subcellular structure like the mitotic spindle. The title of the piece, Super-duper resolution imaging of mitotic microtubules, is a bit of a dig at the fact that EM still exceeds the resolution available from super-resolution light microscopy. It’s not the first time that this gag has been used, but I thought it suited the piece quite well.

There are several reasons to highlight these papers over other electron microscopy studies of mitotic spindles.

It was the first time that 3D models of microtubules in mitotic spindles were built from electron micrographs of serial sections. This allowed spatial statistical methods to be applied to understand microtubule spacing and clustering. The software that was developed by David Mastronarde to do this was later packaged into IMOD. This is a great software suite that is actively maintained, free to download and is essential for doing electron microscopy. Taking on the same analysis today would be a lot faster, but still somewhat limited by cutting sections and imaging to get the resolution required to trace individual microtubules.

The paper actually showed that some of the microtubules in kinetochore fibres travel all the way from the pole to the kinetochore, and that interpolar microtubules invade the bundle occasionally. This was an open question at the time and was really only definitively answered thanks to the ability to digitise and trace individual microtubules using computational methods.

The final thing I like about these papers is that it’s possible to reproduce the analysis. The methods sections are wonderfully detailed and of course the software is available to do similar work. This is in contrast to most papers nowadays, where it is difficult to understand how the work has been done in the first place, let alone to try and reproduce it in your own lab.

David Mastronarde and Dick McIntosh kindly commented on the piece that I wrote and also Faye Nixon in my lab made some helpful suggestions. There’s no acknowledgement section, so I’ll thank them all here.

References

McDonald, K. L., O’Toole, E. T., Mastronarde, D. N. & McIntosh, J. R. (1992) Kinetochore microtubules in PTK cells. J. Cell Biol. 118, 369—383

Mastronarde, D. N., McDonald, K. L., Ding, R. & McIntosh, J. R. (1993) Interpolar spindle microtubules in PTK cells. J. Cell Biol. 123, 1475—1489

Royle, S.J. (2015) Super-duper resolution imaging of mitotic microtubules. Nat. Rev. Mol. Cell. Biol. doi:10.1038/nrm3937 Published online 05 January 2015

The post title is taken from “Joining a Fanclub” by Jellyfish from their classic second and final LP “Spilt Milk”.