Start Me Up: Endocytosis on demand

We have a new paper out. The title is New tools for ‘hot-wiring’ clathrin-mediated endocytosis with temporal and spatial precision. You can read it here.

Cells have a plasma membrane which is the barrier between the cell’s interior and the outside world. In order to import material from outside, cells have a special process called endocytosis. During endocytosis, cells form a tiny bubble of plasma membrane and pull it inside – taking with it a little pocket of the outside world. This process is very important to the cell. For example, it is one way that cells import nutrients to live. It also controls cell movement, growth, and how cells talk to one another. Because it is so important, cell biologists have studied how endocytosis works for decades.

Studying endocytosis is tricky. Like naughty children, cells simply do not do what they are told. There is no way to make a cell in the lab “do endocytosis”. It does it all the time, but we don’t know when or where on the cell surface a vesicle will be made. Not only that, but when a vesicle is made, we don’t really know what cargo it contains. It would be helpful to cell biologists if we could bring cells under control. This paper shows a way to do this. We demonstrate that clathrin-mediated endocytosis can be triggered, so that we can make it happen on-demand.

Endocytosis on-demand

Using a chemical which diffuses into the cell, we can trigger endocytosis to happen all over the cell. The movie on the right shows vesicles (bright white spots) forming after we add the chemical (at 0:00). The way that we designed the system means that the vesicles that form have one type of cargo in there. This is exciting because it means that we can now deliver things into cells using this cargo. So, we can trigger endocytosis on-demand and we can control the cargo, but we still cannot control where on the plasma membrane this happens.

We solved this problem by engineering a light-sensitive version of our system. With this new version we can use blue light to trigger endocytosis. Whereas the chemical diffused everywhere, the light can be focussed in a narrow region on the cell and endocytosis can be trigger only in that region. This means we control where, as well as when, a vesicle will form.

What does hot-wiring mean?

It is possible to start a car without a key by “hot-wiring” it. This happens in the movies, when the bad guy breaks into a car and just twists some wires together to start the car and make a getaway. To trigger endocytosis we used the cell’s own proteins, but we modified them. We chopped out all the unnecessary parts and just left the bare essentials. We call the process of triggering endocytosis “hot-wiring” because it is similar to just twisting the wires together rather than having a key.

It turns out that movies are not like real life, and hot-wiring a car is actually quite difficult and takes a while. So our systems are more like the Hollywood version than real life!

What is this useful for?

As mentioned above, the systems we have made are useful for cell biologists because they allow cells to be “tamed”. This means that we can accurately study the timing of endocytosis and which proteins are required in a very controlled way. It also potentially means that molecules can be delivered to cells that cannot normally enter. So we have a way to “force feed” cells with whatever we want. This would be most useful for drugs or nanoparticles that are not actively taken up by cells.

Who did the work?

Almost all of the work in the paper was by Laura Wood, a PhD student in the lab. She had help from fellow lab members Nick Clarke, who did the correlative light-electron microscopy, and Sourav Sarkar who did the binding experiments. Gabrielle Larocque, another PhD student did some fantastic work to revise the paper after Laura had departed for a post-doc position at another University. We put the paper up on bioRxiv in Summer 2016 and the paper has slowly made its way through peer review to be published in J Cell Biol today.

Wait? I’m a cell biologist! I want to know how this thing really works!

OK. The design is shown to the right. We made a plasma membrane “anchor” and a clathrin “hook” which is a fragment of protein which binds clathrin. The anchor and the hook have an FRB domain and an FKBP domain and these can be brought together by rapamycin. When the clathrin hook is at the membrane this is recognised by clathrin and vesicle formation can begin. The main hook we use is the appendage and hinge from the beta2 subunit of the AP2 complex.

Normally AP2, which has four subunits, needs to bind to PIP2 in the plasma membrane and undergo a conformational change to recognise a cargo molecule with a specific motif, only then can clathrin bind the beta2 appendage and hinge. By hot-wiring, we effectively remove all of those other proteins and all of those steps to just bring the clathrin binding bit to the membrane when we want. Being able to recreate endocytosis using such a minimalist system was a surprise. In vitro work from Dannhauser and Ungewickell had suggested this might be possible, but it really seems that the steps before clathrin engagement are not a precursor for endocytosis.

To make the light inducible version we used TULIPs (tunable light-controlled interacting proteins). So instead of FRB and FKBP we had a LOVpep and PDZ domain on the hook and anchor.

The post title comes from “Start Me Up” by The Rolling Stones. Originally on Tattoo You, but perhaps better known for its use by Microsoft in their Windows 95 advertising campaign. I’ve finally broken a rule that I wouldn’t use mainstream song titles for posts on this blog.

Fusion confusion: new paper on FGFR3-TACC3 fusions in cancer

We have a new paper out! This post is to explain what it’s about.

Cancer cells often have gene fusions. This happens because the DNA in cancer cells is really messed up. Sometimes, chromosomes can break and get reattached to a different one in a strange way. This means you get a fusion between one gene and another which makes a new gene, called a gene fusion. There are famous fusions that are known to cause cancer, such as the Philadelphia chromosome in chronic myelogenous leukaemia. This rearrangement of chromosomes 9 and 22 result in a fusion called BCR-ABL. There are lots of different gene fusions and a few years ago, a new fusion was discovered in bladder and brain cancers, called FGFR3-TACC3.

Genes encode proteins and proteins do jobs in cells. So the question is: how are the proteins from gene fusions different to their normal versions, and how do they cause cancer? Many of the gene fusions that scientists have found result in a protein that continues to send a signal to the cell when it shouldn’t. It’s thought that this transforms the cell to divide uncontrollably. FGFR3-TACC3 is no different. FGFR3 can send signals and the TACC3 part probably makes it do this uncontrollably. But, what about the TACC3 part? Does that do anything, or is this all about FGFR3 going wrong?

What is TACC3?

Chromosomes getting shared to the two daughter cells

TACC3, or transforming acidic coiled-coil protein 3 to give it its full name, is a protein important for cell division. It helps to share the chromosomes to the two daughter cells when a cell divides. Chromosomes are shared out by a machine built inside the cell called the mitotic spindle. This is made up of tiny threads called microtubules. TACC3 stabilises these microtubules and adds strength to this machine.

We wondered if cancer cells with FGFR3-TACC3 had problems in cell division. If they did, this might be because the TACC3 part of FGFR3-TACC3 is changed.

We weren’t the first people to have this idea. The scientists that found the gene fusion suggested that FGFR3-TACC3 might bind to the mitotic spindle but not be able to work properly. We decided to take a closer look…

What did you find?

First of all FGFR3-TACC3 is not actually bound to the mitotic spindle. It is at the cells membrane and in small vesicles in the cell. So if it is not part of the mitotic spindle, how can it affect cell division? One unusual thing about TACC3 is that it is a dimer, meaning two TACC3s are stuck together. Stranger than that, these dimers can stick to more dimers and multimerise into a much bigger protein. When we looked at the normal TACC3 in the cell we noticed that the amount bound to the spindle had decreased. We wondered whether the FGFR3-TACC3 was hoovering the normal TACC3 off the spindle, preventing normal cell division.

We made the cancer cells express a bit more normal TACC3 and this rescued the faulty division. We also got rid of the FGFR3-TACC3 fusion, and that also put things back to normal. Finally, we made a fake FGFR3-TACC3 which had a dummy part in place of FGFR3 and this was just as good at hoovering up normal TACC3 and causing cell division problems. So our idea seemed to be right!

What does this mean for cancer?

This project was to look at what is going on inside cancer cells and it is a long way from any cancer treatments. Drug companies can develop chemicals which stop cell signalling from fusions, these could work as anti-cancer agents. In the case of FGFR3-TACC3, what we are saying is: even if you stop the signalling there will still be cell division problems in the cancer cells. So an ideal treatment might be to block TACC3 interactions as well as stopping signalling. This is very difficult to do and is far in the future. Doing work like this is important to understand all the possible ways to tackle a specific cancer and to find any problems with potential treatments.

The people

Sourav Sarkar did virtually all the work for this paper and he is first author. Sourav left the lab before we managed to submit this paper and so the revision experiments requested by the peer reviewers were done by Ellis Ryan.

Why didn’t we post this paper as a preprint?

My group have generally been posting our new manuscripts as preprints while they undergo peer review, but we didn’t post this one. I was reluctant because many cancer journals at the time of submission did not allow preprints. This has changed a bit in the last few months, but back in February several key cancer journals did not accept papers that had appeared first as preprints.

The title of the post comes from “Fusion Confusion” 4th track on the Hazy EP by Dr Phibes & The House of Wax Equations.

Parallel lines: new paper on modelling mitotic microtubules in 3D

We have a new paper out! You can access it here.

The people

This paper really was a team effort. Faye Nixon and Tom Honnor are joint-first authors. Faye did most of the experimental work in the final months of her PhD and Tom came up with the idea for the mathematical modelling and helped to rewrite our analysis method in R. Other people helped in lots of ways. George did extra segmentation, rendering and movie making. Nick helped during the revisions of the paper. Ali helped to image samples… the list is quite long.

The paper in a nutshell

We used a 3D imaging technique called SBF-SEM to see microtubules in dividing cells, then used computers to describe their organisation.

What’s SBF-SEM?

Serial block face scanning electron microscopy. This method allows us to take an image of a cell and then remove a tiny slice, take another image and so on. We then have a pile of images which covers the entire cell. Next we need to put them back together and make some sense of them.

How do you do that?

We use a computer to track where all the microtubules are in the cell. In dividing cells – in mitosis – the microtubules are in the form of a mitotic spindle. This is a machine that the cell builds to share the chromosomes to the two new cells. It’s very important that this process goes right. If it fails, mistakes can lead to diseases such as cancer. Before we started, it wasn’t known whether SBF-SEM had the power to see microtubules, but we show in this paper that it is possible.

We can see lots of other cool things inside the cell too like chromosomes, kinetochores, mitochondria, membranes. We made many interesting observations in the paper, although the focus was on the microtubules.

So you can see all the microtubules, what’s interesting about that?

The interesting thing is that our resolution is really good, and is at a large scale. This means we can determine the direction of all the microtubules in the spindle and use this for understanding how well the microtubules are organised. Previous work had suggested that proteins whose expression is altered in cancer cause changes in the organisation of spindle microtubules. Our computational methods allowed us to test these ideas for the first time.

Resolution at a large scale, what does that mean?

The spindle is made of thousands of microtubules. With a normal light microscope, we can see the spindle but we can’t tell individual microtubules apart. There are improvements in light microscopy (called super-resolution) but even with those improvements, right in the body of the spindle it is still not possible to resolve individual microtubules. SBF-SEM can do this. It doesn’t have the best resolution available though. A method called Electron Tomography has much higher resolution. However, to image microtubules at this large scale (meaning for one whole spindle), it would take months or years of effort! SBF-SEM takes a few hours. Our resolution is better than light microscopy, worse than electron tomography, but because we can see the whole spindle and image more samples, it has huge benefits.

What mathematical modelling did you do?

Cells are beautiful things but they are far from perfect. The microtubules in a mitotic spindle follow a pattern, but don’t do so exactly. So what we did was to create a “virtual spindle” where each microtubule had been made perfect. It was a bit like “photoshopping” the cell. Instead of straightening the noses of actresses, we corrected the path of every microtubule. How much photoshopping was needed told us how imperfect the microtubule’s direction was. This measure – which was a readout of microtubule “wonkiness” – could be done on thousands of microtubules and tell us whether cancer-associated proteins really cause the microtubules to lose organisation.

The publication process

The paper is published in Journal of Cell Science and it was a great experience. Last November, we put up a preprint on this work and left it up for a few weeks. We got some great feedback and modified the paper a bit before submitting it to a journal. One reviewer gave us a long list of useful comments that we needed to address. However, the other two reviewers didn’t think our paper was a big enough breakthrough for that journal. Our paper was rejected*. This can happen sometimes and it is frustrating as an author because it is difficult for anybody to judge which papers will go on to make an impact and which ones won’t. One of the two reviewers thought that because the resolution of SBF-SEM is lower than electron tomography, our paper was not good enough. The other one thought that because SBF-SEM will not surpass light microscopy as an imaging method (really!**) and because EM cannot be done live (the cells have to be fixed), it was not enough of a breakthrough. As I explained above, the power is that SBF-SEM is between these two methods. Somehow, the referees weren’t convinced. We did some more work, revised the paper, and sent it to J Cell Sci.

J Cell Sci is a great journal which is published by Company of Biologists, a not-for-profit organisation who put a lot of money back into cell biology in the UK. They are preprint friendly, they allow the submission of papers in any format, and most importantly, they have a fast-track*** option. This allowed me to send on the reviews we had and including our response to them. They sent the paper back to the reviewer who had a list of useful comments and they were happy with the changes we made. It was accepted just 18 days after we sent it in and it was online 8 days later. I’m really pleased with the whole publishing experience with J Cell Sci.

 

* I’m writing about this because we all have papers rejected. There’s no shame in that at all. Moreover, it’s obvious from the dates on the preprint and on the JCS paper that our manuscript was rejected from another journal first.

** Anyone who knows something about microscopy will find this amusing and/or ridiculous.

*** Fast-track is offered by lots of journals nowadays. It allows authors to send in a paper that has been reviewed elsewhere with the peer review file. How the paper has been revised in light of those comments is assessed by at the Editor and one peer reviewer.

Parallel lines is of course the title of the seminal Blondie LP. I have used this title before for a blog post, but it matches the topic so well.

Come To California

I’ve returned from the American Society for Cell Biology 2016 meeting in San Francisco. Despite being a cell biologist and people from my lab attending this meeting numerous times, this was my first ASCB meeting.

cell-biology-2016

The conference was amazing, so much excellent science and so many opportunities to meet up with people. For the areas that I work in: mitosis, cytoskeleton and membrane traffic, the meeting was pretty much made for me. Often there were two or more sessions I could have attended, but couldn’t. I’ll try to summarise some of my highlights.

One of the best talks I saw was from Dick McIntosh, who is a legend of cell biology and is still making outstanding contributions. He showed some new tomography data of growing microtubules in a number of systems which suggest that microtubules have curved protofilaments as they grow. This is in agreement with structural data and some models of MT growth, but not with many other schematic diagrams.

The “bottom-up cell biology” subgroup was one of the first I attended. Organised by Dan Fletcher and Matt Good, the theme was reconstitution of biological systems in vitro. The mix of speakers was great, with Thomas Surrey and Marileen Dogterom giving great talks on microtubule systems, and Jim Hurley and Patricia Bassereau representing membrane curvature reconstitution. Physical principles and quantitative approaches were a strong theme here and throughout the meeting, which reflects where cell biology is at right now.

img_3382

I took part in a subgroup on preprints organised by Prachee Avasthi and Jessica Polka. I will try to write a separate post about this soon. This was a fun session that was also a chance to meet up with many people I had only met virtually. There was a lot of excitement about preprints at the meeting and it seemed like many attendees were aware of preprinting. I guess this is not too surprising since the ASCB have been involved with the Accelerating Science and Publishing in Biology (ASAPbio) group since the start.

Of the super huge talks I saw in the big room, the Cellular Communities session really stood out. Bonnie Bassler and Jurgen Knoblich gave fantastic talks on bacterial quorum sensing and “minibrains” respectively. The Porter Lecture, given by Eva Nogales on microtubule structure was another highlight.

The poster sessions (which I heard were sprawling and indigestible) were actually my favourite part of the meeting. I saw mostly new work here and had the chance to talk to quite a few presenters. My lab took three posters of different projects at various stages of publication (Laura’s work preprinted/in revision project presented by me, Nick’s work soon to submit and Gabrielle’s work soon to write up) and so we were all happy to get some useful feedback on our work. We’ve had follow up emails and requests for collaboration which made the long trip worthwhile. We also had a mini lab reunion with Dan Booth one of my former students who was presenting his work on using 3D Correlative Light Electron Microscopy to examine chromosome structure.

For those that follow me on Twitter, you may know that I like to make playlists from my iTunes library when I visit another city. This was my first time back on the west coast since 2001. Here are ten tracks selected from my San Francisco, CA playlist:

10. California Über Alles – Dead Kennedys from Fresh Fruit For Rotting Vegetables

9. San Franciscan Nights – The Animals from Winds of Change

8. Who Needs the Peace Corps? – The Mothers of Invention from We’re Only In It For The Money

7. San Francisco – Brian Wilson and Van Dyke Parks from Orange Crate Art

6. Going to California – Led Zeppelin from IV

5. Fake Tales of San Francisco – Arctic Monkeys from Whatever People Say I Am, That’s What I’m Not

4. California Hills – Ty Segall from Emotional Mugger

3. The Portland Cement Factory at Monolith California – Cul de Sac from ECIM (OK Monolith is nearer to LA than SF but it’s a great instrumental track).

2. Come to California – Matthew Sweet from Blue Sky on Mars

1. Russian Hill – Jellyfish from Spilt Milk

Before the meeting, I went on a long walk around SF with the guys from the lab and we accidentally found ourselves on Russian Hill.

img_3334

For some reason I have a higher than average number of bootlegs recorded in SF. Television (Old Waldorf 1978), Elliott Smith (Bottom of the Hill, 1998), Jellyfish (Warfield Theater 1993), My Bloody Valentine, Jimi Hendrix etc. etc.

The post title comes from #2 in my playlist

 

The Digital Cell: Getting started with IgorPRO

This post follows on from “Getting Started“.

In the lab we use IgorPRO for pretty much everything. We have many analysis routines that run in Igor, we have scripts for processing microscope metadata etc, and we use it for generating all figures for our papers. Even so, people in the lab engage with it to varying extents. The main battle is that the use of Excel is pretty ubiquitous.

I am currently working on getting more people in the lab started with using Igor. I’ve found that everyone is keen to learn. The approach so far has been workshops to go through the basics. This post accompanies the first workshop, which is coupled to the first few pages of the Manual. If you’re interested in using Igor read on… otherwise you can skip to the part where I explain why I don’t want people in the lab to use Excel.

IgorPro is very powerful and the learning curve is steep, but the investment is worth it.

WaveMetrics_IGOR_Pro_LogoThese are some of the things that Igor can do: Publication-quality graphics, High-speed data display, Ability to handle large data sets, Curve-fitting, Fourier transforms, smoothing, statistics, and other data analysis, Waveform arithmetic, Matrix math, Image display and processing, Combination graphical and command-line user interface, Automation and data processing via a built-in programming environment, Extensibility through modules written in the C and C++ languages. You can even play games in it!

The basics

The first thing to learn is about the objects in the Igor environment and how they work.There are four basic objects that all Igor users will encounter straight away.

  • Waves
  • Graphs
  • Tables
  • Layouts

All data is stored as waveforms (or waves for short). Waves can be displayed in graphs or tables. Graphs and tables can be placed in a Layout. This is basically how you make a figure.

The next things to check out are the command window (which displays the history), the data browser and the procedure window.

Essential IgorPro

  • Tables are not spreadsheets! Most important thing to understand. Tables are just a way of displaying a wave. They may look like a spreadsheet, but they are not.
  • Igor is case insensitive.
  • Spaces. Igor can handle spaces in names of objects, but IMO are best avoided.
  • Igor is 0-based not 1-based
  • Logical naming and logical thought – beginners struggle with this and it’s difficult to get this right when you are working on a project, but consistent naming of objects makes life easier.
  • Programming versus not programming – you can get a long way without programming but at some point it will be necessary and it will save you a lot of time.

Pretty soon, you will go beyond the four basic objects and encounter other things. These include: Numeric and string variables, Data folders, Notebooks, Control panels, 3D plots – a.k.a. gizmo, Procedures.

Getting started guide
Getting started guide

Why don’t we use Excel?

  • Excel can’t make high quality graphics for publication.
    • We do that in Igor.
    • So any effort in Excel is a waste of time.
  • Excel is error-prone.
    • Too easy for mistakes to be introduced.
    • Not auditable. Tough/impossible to find mistakes.
    • Igor has a history window that allows us to see what has happened.
  • Most people don’t know how to use it properly.
  • Not good for biological data – Transcription factor Oct4 gets converted to a date.
  • Limited to 1048576 rows and 16384 columns.
  • Related: useful link describing some spreadsheet crimes of data entry.

But we do use Excel a lot

  • Excel is useful for quick calculations and for preparing simple charts to show at lab meeting.
  • Same way that Powerpoint is OK to do rough figures for lab meeting.
  • But neither are publication-quality.
  • We do use Excel for Tracking Tables, Databases(!) etc.

The transition is tough, but worth it

Writing formulae in Excel is straightforward, and the first thing you will find is that to achieve the same thing in Igor is more complicated. For example, working out the mean for each row in an array (a1:y20) in Excel would mean typing =AVERAGE(A1:y1) in cell z1 and copying this cell down to z20. Done. In Igor there are several ways to do this, which itself can be unnerving. One way is to use the Waves Average panel. You need to know how this works to get it to do what you want.

But before you turn back, thinking I’ll just do this in Excel and then import it… imagine you now want to subtract a baseline value from the data, scale it and then average. Imagine that your data are sampled at different intervals. How would you do that? Dealing with those simple cases in Excel is difficult-to-impossible. In Igor, it’s straightforward.

Resources for learning more Igor:

  • Igor Help – fantastic resource containing the manual and more. Access via Help or by typing ShowHelpTopic “thing I want to search for”.
  • Igor Manual – This PDF is available online or in Applications/Igor Pro/Manual. This used to be a distributed as a hard copy… it is now ~3000 pages.
  • Guided Tour of IgorPro – this is a great way to start and will form the basis of the workshops.
  • Demos – Igor comes packed with Demos for most things from simple to advanced applications.
  • IgorExchange – Lots of code snippets and a forum to ask for advice or search for past answers.
  • Igor Tips – I’ve honestly never used these, you can turn on tips in Igor which reveal help on mouse over.
  • Igor mailing list – topics discussed here are pretty advanced.
  • Introduction to IgorPRO from Payam Minoofar is good. A faster start to learning to program that reading the manual.
  • Hands-on experience!

Part of a series on the future of cell biology in quantitative terms.

The Digital Cell: Getting Started

More on the theme of “The Digital Cell“: using quantitative, computational approaches in cell biology.

So you want to get started? Well, the short version of this post is:

Find something that you need to automate and get going!

Programming

http://www.instruction-manuals.co.uk/imageIM/four/seven/bbc.gif
http://www.instruction-manuals.co.uk/imageIM/four/seven/bbc.gif

I make no claim to be a computer wizard. My first taste of programming was the same as anyone who went to school in the UK in the 1980s: BBC Basic. Although my programming only went as far as copying a few examples from the book (right), this experience definitely reduced the “fear of the command line”. My next encounter with coding was to learn HTML when I was an undergraduate. It was not until I was a postdoc that I realised that I needed to write scripts in order get computers to do what I wanted them to do for my research.

Image analysis

I work in cell biology. My work involves a lot of microscopy. From the start, I used computer-based methods to quantify images. My first paper mentions quantifying images, but it wasn’t until I was a PhD student that I first used NIH Image (as it was called then) to extract quantitative information from confocal micrographs. I was also introduced to IgorPRO (version 3!) as a PhD student, but did no programming. That came later. As a postdoc, we used Scanalytics’ IPLab and Igor (as well as a bit of ImageJ as it had become). IPLab had an easy scripting language and it was in this program that I learned to write macros for analysis. At this time there were people in the lab who were writing software in IgorPro and MATLAB. While I didn’t pick up programming in IgorPRO or MATLAB then, it made me realise what was possible.

When I started my own group I discovered that IPLab had been acquired by BD Biosciences and then stripped out. I had hundreds of useless scripts and needed a new solution. ImageJ had improved enormously by this time and so this became our default image analysis program. The first data analysis package I bought was IgorPro (version 6) and I have stuck with it since then. In a future post, I will probably return to whether or not this was a good path.

Getting started with programming

Around 2009, I was still unable to program properly. I needed a macro for baseline subtraction – something really simple – and realised I didn’t know how to do it. We didn’t have just one or two traces to modify, we had hundreds. This was simply not possible by hand. It was this situation that made me realise I needed to learn to program.

…having a concrete problem that is impossible to crack any other way is the best motivator for learning to program.

This might seem obvious, but having a concrete problem that is impossible to crack any other way is the best motivator for learning to program. I know many people who have decided they “want to learn to code” or they are “going to learn to use R”. This approach rarely works. Sitting down and learning this stuff without sufficient motivation is really tough. So I would advise someone wanting to learn programming to find something that needs automation and just get going. Just get something to work!

Don’t worry (initially) about any of the following:

  • What program/language to use – as long as it is possible, just pick something and do it
  • If your code is ugly or embarrassing to show to an expert – as long as it runs, it doesn’t matter
  • About copy-and-pasting from examples – it’s OK as long as you take time to understand what you are doing, this is a quick way to make progress. Resources such as stackoverflow are excellent for this
  • Bugs – you can squish them, they will frustrate you, but you might need some…
  • Help – ask for help. Online forums are great, experts love showing off their knowledge. If you have local expertise, even better!

Once you have written something (and it works)… congratulations, you are a computer programmer!

IMG_2206Seriously, that is all there is to it. OK, it’s a long way to being a good programmer or even a competent one, but you have made a start. Like Obi Wan Kenobi says: you’ve taken your first step into a larger world.

So how do you get started with an environment like IgorPro? This will be the topic for next time.

Part of a series on the future of cell biology in quantitative terms.

The Digital Cell

If you are a cell biologist, you will have noticed the change in emphasis in our field.

At one time, cell biology papers were – in the main – qualitative. Micrographs of “representative cells”, western blots of a “typical experiment”… This descriptive style gave way to more quantitative approaches, converting observations into numbers that could be objectively assessed. More recently, as technology advanced, computing power increased and data sets became more complex, we have seen larger scale analysis, modelling, and automation begin to take centre stage.

This change in emphasis encompasses several areas including (in no particular order):

  • Statistical analysis
  • Image analysis
  • Programming
  • Automation allowing analysis at scale
  • Reproducibility
  • Version control
  • Data storage, archiving and accessing large datasets
  • Electronic lab notebooks
  • Computer vision and machine learning
  • Prospective and retrospective modelling
  • Mathematics and physics

The application of these areas is not new to biology and has been worked on extensively for years in certain areas. Perhaps most obviously by groups that identified themselves as “systems biologists”, “computational biologists”, and people working on large-scale cell biology projects. My feeling is that these methods have now permeated mainstream (read: small-scale) cell biology to such an extent that any groups that want to do cell biology in the future have to adapt in order to survive. It will change the skills that we look for when recruiting and it will shape the cell biologists of the future. Other fields such as biophysics and neuroscience are further through this change, while others have yet to begin. It is an exciting time to be a biologist.

I’m planning to post occasionally about the way that our cell biology research group is working on these issues: our solutions and our problems.

Part of a series on the future of cell biology in quantitative terms.

What Can You See?

Yesterday I tried a gedankenexperiment via Twitter, and asked:

If you could visualise a protein relative to an intracellular structure/organelle at ~5 nm resolution, which one would you pick and why?

https://twitter.com/clathrin/status/707949738323218432

I got some interesting replies:
  • Myosin Va and cargo on actin filaments in melanocytes – Cleidson Alves @cleidson_alves
  • COPII components relative to ER and Golgi for export of big proteins – David Stephens @David_S_Bristol
  • Actin inside an axon, AIS, shaft presynaptic bouton relative to membrane and vesicles – Christophe Leterrier @christlet
  • Cargo/vesicle and motor, ideally with a co-reporter of motor activity – Ali Twelvetrees @dozenoaks
  • Dynein on K-fibres. If it was a fixed view dynein on kinetochores, localisation relative to Ndc80 or Mad1 – Eric Griffis @DrGriff34
  • See definitively if TACC3/ch-TOG is at the centrosome or not – Hadrien Mary @HadiM_
  • Pericentriolar proteins relative to centrioles. And Arp2/3 and centrioles – Manuel Théry @ManuelTHERY
  • Arp2/3 and centrioles was seconded by Alexandre Carisey @alexcarisey
  • RhoGTPases near cell-cell contacts in endothelial cells. No good antibodies for this – Joachim Goedhart @joachimgoedhart
  • Integrin and filopdia tips, what structures are formed there – Guillaume Jacquemet @guijacquemet

It’s a tough question because the simplest answer to “which protein” is the “the one I am most interested in” – I mean who wouldn’t want to see that at unprecedented resolution – but I was more interested in the “why” part. I’m conscious of the fact that breaking the resolution limit in light microscopy has not yielded many answers to outstanding questions so far.

OK, it was less a thought experiment and more like trying to crowd-source suggestions. We have some new technology that we’d like to put through its paces and apply to interesting cell biological questions. Thanks to everybody for their input.

If you want to make an additional suggestion, please leave a comment.

Edit 2016-03-13:  Stéphane Vassilopoulos chipped in on Twitter. “dynamin 2 oligomers right on the actin cytoskeleton” he is @Biosdfp

The post title is taken from “What Can You See?” by The Seahorses off their unreleased follow up album to Do It Yourself, which may have been called Minus Blue.

The Great Curve II: Citation distributions and reverse engineering the JIF

There have been calls for journals to publish the distribution of citations to the papers they publish (1 2 3). The idea is to turn the focus away from just one number – the Journal Impact Factor (JIF) – and to look at all the data. Some journals have responded by publishing the data that underlie the JIF (EMBO J, Peer JRoyal Soc, Nature Chem). It would be great if more journals did this. Recently, Stuart Cantrill from Nature Chemistry actually went one step further and compared the distribution of cites at his journal with other chemistry journals. I really liked this post and it made me think that I should just go ahead and harvest the data for cell biology journals and post it.

This post is in two parts. First, I’ll show the data for 22 journals. They’re broadly cell biology, but there’s something for everyone with Cell, Nature and Science all included. Second, I’ll describe how I “reverse engineered” the JIF to get to these numbers. The second part is a bit technical but it describes how difficult it is to reproduce the JIF and highlights some major inconsistencies for some journals. Hopefully it will also be of interest to anyone wanting to do a similar analysis.

Citation distributions for 22 cell biology journals

The JIF for 2014 (published in the summer of 2015) is worked out by counting the total number of 2014 cites to articles in that journal that were published in 2012 and 2013. This number is divided by the number of “citable items” in that journal in 2012 and 2013. There are other ways to look at citation data, different windows to analyse, but this method is used here because it underlies the impact factor. I plotted out histograms to show the citation distributions at these journals from 0-50 citations, inset shows the frequency of papers with 50-1000 cites.

Dist1

Dist2

As you can see, the distributions are highly skewed and so reporting the mean is very misleading. Typically ~70% papers pick up less than the mean number of citations. Reporting the median is safer and is shown below. It shows how similar most of the journals are in this field in terms of citations to the average paper in that journal. Another metric, which I like, is the H-index for journals. Google Scholar uses this as a journal metric (using citation data from a 5-year window). For a journal, this is a number, h, which reveals how many papers got >=h citations. A plot of h-indices for these journals is shown below.

medianplusH

Here’s a summary table of all of this information together with the “official JIF” data, which is discussed below.

Journal Median H Citations Items Mean JIF Cites JIF Items JIF
Autophagy 3 18 2996 539 5.6 2903 247 11.753
Cancer Cell 14 37 5241 274 19.1 5222 222 23.523
Cell 19 72 28147 1012 27.8 27309 847 32.242
Cell Rep 6 26 6141 743 8.3 5993 717 8.358
Cell Res 3 19 1854 287 6.5 2222 179 12.413
Cell Stem Cell 14 37 5192 302 17.2 5233 235 22.268
Cell Mol Life Sci 4 19 3364 596 5.6 3427 590 5.808
Curr Biol 4 24 6751 1106 6.1 7293 762 9.571
Development 5 25 6069 930 6.5 5861 907 6.462
Dev Cell 7 23 3986 438 9.1 3922 404 9.708
eLife 5 20 2271 306 7.4 2378 255 9.3212
EMBO J 8 27 5828 557 10.5 5822 558 10.434
J Cell Biol 6 25 5586 720 7.8 5438 553 9.834
J Cell Sci 3 23 5995 1157 5.2 5894 1085 5.432
Mol Biol Cell 3 16 3415 796 4.3 3354 751 4.466
Mol Cell 11 37 8669 629 13.8 8481 605 14.018
Nature 12 105 69885 2758 25.3 71677 1729 41.296
Nat Cell Biol 13 35 5381 340 15.8 5333 271 19.679
Nat Rev Mol Biol Cell 8.5 43 5037 218 23.1 4877 129 37.806
Oncogene 5 26 6973 1038 6.7 8654 1023 8.459
Science 14 83 54603 2430 22.5 56231 1673 33.611
Traffic 3 11 1020 252 4.0 1018 234 4.350

 

Reverse engineering the JIF

The analysis shown above was straightforward. However, getting the data to match Thomson-Reuters’ calculations for the JIF was far from easy.

I downloaded the citation data from Web of Science for the 22 journals. I limited the search to “articles” and “reviews”, published in 2012 and 2013. I took the citation data from papers published in 2014 with the aim of plotting out the distributions. As a first step I calculated the mean citation for each journal (a.k.a. impact factor) to see how it compared with the official Journal Impact Factor (JIF). As you can see below, some were correct and others were off by some margin.

Journal Calculated IF JIF
Autophagy 5.4 11.753
Cancer Cell 14.8 23.523
Cell 23.9 32.242
Cell Rep 8.2 8.358
Cell Res 5.7 12.413
Cell Stem Cell 13.4 22.268
Cell Mol Life Sci 5.6 5.808
Curr Biol 5.0 9.571
Development 6.5 6.462
Dev Cell 7.5 9.708
eLife 6.0 9.322
EMBO J 10.5 10.434
J Cell Biol 7.6 9.834
J Cell Sci 5.2 5.432
Mol Biol Cell 4.1 4.466
Mol Cell 11.8 14.018
Nature 25.1 41.296
Nat Cell Biol 15.1 19.679
Nat Rev Mol Cell Biol 15.3 37.806
Oncogene 6.7 8.459
Science 18.6 33.611
Traffic 4.0 4.35

For most journals there was a large difference between this number and the official JIF (see below, left). This was not a huge surprise, I’d found previously that the JIF was very hard to reproduce (see also here). To try and understand the difference, I looked at the total citations in my dataset vs those from the official JIF. As you can see from the plot (right), my numbers are pretty much in agreement with those used for the JIF calculation. Which meant that the difference comes from the denominator – the number of citable items.

JifCalc

What the plots show is that, for most journals in my dataset, there are fewer papers considered as citable items by Thomson-Reuters. This is strange. I had filtered the data to leave only journal articles and reviews (which are citable items), so non-citable items should have been removed.

It’s no secret that the papers cited in the sum on the top of the impact factor calculation are not necessarily the same as the papers counted on the bottom.

Now, it’s no secret that the papers cited in the sum on the top of the impact factor calculation are not necessarily the same as the papers counted on the bottom (see here, here and here). This inconsistency actually makes plotting a distribution impossible. However, I thought that using the same dataset, filtering and getting to the correct total citation number meant that I had the correct list of citable items. So, what could explain this difference?

missingPapersI looked first at how big the difference in number of citable items is. Journals like Nature and Science are missing >1000 items(!), others are less and some such as Traffic, EMBO J, Development etc. have the correct number. Remember that journals carry different amounts of papers. So as a proportion of total papers the biggest fraction of missing papers was actually from Autophagy and Cell Research which were missing ~50% of papers classified in WoS as “articles” or “reviews”!

My best guess at this stage was that items were incorrectly tagged in Web of Science. Journals like Nature, Science and Current Biology carry a lot of obituaries, letters and other stuff that can fairly be removed from the citable items count. But these should be classified as such in Web of Science and therefore filtered out in my original search. Also, these types of paper don’t explain the big disparity in journals like Autophagy that only carry papers, reviews with a tiny bit of front matter.

PubmedCompI figured a good way forward would be to verify the numbers with another database – PubMed. Details of how I did this are at the foot of this post. This brought me much closer to the JIF “citable items” number for most journals. However, Autophagy, Current Biology and Science are still missing large numbers of papers. As a proportion of the size of the journal, Autophagy, Cell Research and Current Biology are missing the most. While Nature Cell Biology and Nature Reviews Molecular Cell Biology now have more citable items in the JIF calculation than are found in PubMed!

This collection of data was used for the citation distributions shown above, but it highlights some major discrepancies at least for some journals.

How does Thomson Reuters decide what is a citable item?

Some of the reasons for deciding what is a citable item are outlined in this paper. Of the six reasons that are revealed, all seem reasonable, but they suggest that they do not simply look at the classification of papers in the Web of Science database. Without wanting to pick on Autophagy – it’s simply the first one alphabetically – I looked at which was right: the PubMed number of 539 or the JIF number of 247 citable items published in 2012 and 2013. For the JIF number to be correct this journal must only publish ~10 papers per issue, which doesn’t seem to be right at least from a quick glance at the first few issues in 2012.

Why Thomson-Reuters removes some of these papers as non-citable items is a mystery… you can see from the histogram above that for Autophagy only 90 or so papers are uncited in 2014, so clearly the removed items are capable of picking up citations. If anyone has any ideas why the items were removed, please leave a comment.

Summary

Trying to understand what data goes into the Journal Impact Factor calculation (for some, but not all journals) is very difficult. This makes JIFs very hard to reproduce. As a general rule in science, we don’t trust things that can’t be reproduced, so why has the JIF persisted. I think most people realise by now that using this single number to draw conclusions about the excellence (or not) of a paper because it was published in a certain journal, is madness. Looking at the citation distributions, it’s clear that the majority of papers could be reshuffled between any of these journals and nobody would notice (see here for further analysis). We would all do better to read the paper and not worry about where it was published.

The post title is taken from “The Great Curve” by Talking Heads from their classic LP Remain in Light.

In PubMed, a research paper will have the publication type “journal article”, however other items can still have this publication type. These items also have additional types which can therefore be filtered. I retrieved all PubMed records from the journals published in 2012 and 2013 with publication type = “journal article”. This worked for 21 journals, eLife is online only so the ppdat field code had to be changed to pdat.


("Autophagy"[ta] OR "Cancer Cell"[ta] OR "Cell"[ta] OR "Cell Mol Life Sci"[ta] OR "Cell Rep"[ta] OR "Cell Res"[ta] OR "Cell Stem Cell"[ta] OR "Curr Biol"[ta] OR "Dev Cell"[ta] OR "Development"[ta] OR "Elife"[ta] OR "Embo J"[ta] OR "J Cell Biol"[ta] OR "J Cell Sci"[ta] OR "Mol Biol Cell"[ta] OR "Mol Cell"[ta] OR "Nat Cell Biol"[ta] OR "Nat Rev Mol Cell Biol"[ta] OR "Nature"[ta] OR "Oncogene"[ta] OR "Science"[ta] OR "Traffic"[ta]) AND (("2012/01/01"[PPDat] : "2013/12/31"[PPDat])) AND journal article[pt:noexp]

I saved this as an XML file and then pulled the values from the “publication type” key using Nokogiri/ruby (script). I then had a list of all the publication type combinations for each record. As a first step I simply counted the number of journal articles for each journal and then subtracted anything that was tagged as “biography”, “comment”, “portraits” etc. This could be done in IgorPro by making a wave indicating whether an item should be excluded (0 or 1) using the DOI as a lookup. This wave could then be used exclude papers from the distribution.

For calculation of the number of missing papers as a proportion of size of journal, I used the number of items from WoS for the WoS calculation, and the JIF number for the PubMed comparison.

Related to this, this IgorPro procedure will read in csv files from WoS/WoK. As mentioned in the main text, data were downloaded 500 records at a time as csv from WoS, using journal titles as a search term and limiting to “article” or “review” and limiting to 2012 and 2013. Note that limiting the search at the outset by year, limits the citation data you get back. You need to search first to get citations from all years and then refine afterwards. The files can be stitched together with the cat command.


cat *.txt > merge.txt

Edit 8/1/16 @ 07:41 Jon Lane told me via Twitter that Autophagy publishes short commentaries of papers in other journals called “Autophagic puncta” (you need to be a cell biologist to get this gag). He suggests these could be removed by Thomson Reuters for their calculation. This might explain the discrepancy for this journal. However, these items 1) cite other papers (so they contribute to JIF calculations), 2) they get cited (Jon says his own piece has been cited 18 times) so they are not non-citable items, 3) they’re tagged as though they are a paper or a review in WoS and PubMed.

White label: the growth of bioRxiv

bioRxiv, the preprint server for biology, recently turned 2 years old. This seems a good point to take a look at how bioRxiv has developed over this time and to discuss any concerns sceptical people may have about using the service.

Firstly, thanks to Richard Sever (@cshperspectives) for posting the data below. The first plot shows the number of new preprints deposited and the number that were revised, per month since bioRxiv opened in Nov 2013. There are now about 200 preprints being deposited per month and this number will continue to increase. The cumulative article count (of new preprints) shows that, as of the end of last month, there are >2500 preprints deposited at bioRxiv. overall2

subject2

What is take up like across biology? To look at this, the number of articles in different subject categories can be totted up. Evolutionary Biology, Bioinformatics and Genomics/Genetics are the front-running disciplines. Obviously counting articles should be corrected for the size of these fields, but it’s clear that some large disciplines have not adopted preprinting in the same way. Cell biology, my own field, has some catching up to do. It’s likely that this reflects cultures within different fields. For example, genomics has a rich history of data deposition, sharing and openness. Other fields, less so…

So what are we waiting for?

I’d recommend that people wondering about preprinting go and read Stephen Curry’s post “just do it“. Any people who remain sceptical should keep reading…

Do I really want to deposit my best work on bioRxiv?

I’ve picked six preprints that were deposited in 2015. This selection demonstrates how important work is appearing first at bioRxiv and is being downloaded thousands of times before the papers appear in the pages of scientific journals.

  1. Accelerating scientific publishing in biology. A preprint about preprinting from Ron Vale, subsequently published in PNAS.
  2. Analysis of protein-coding genetic variation in 60,706 humans. A preprint summarising a huge effort from ExAC Exome Aggregation Consortium. 12,366 views, 4,534 downloads.
  3. TP53 copy number expansion correlates with the evolution of increased body size and an enhanced DNA damage response in elephants. This preprint was all over the news, e.g. Science.
  4. Sampling the conformational space of the catalytic subunit of human γ-secretase. CryoEM is the hottest technique in biology right now. Sjors Scheres’ group have been at the forefront of this revolution. This paper is now out in eLife.
  5. The genome of the tardigrade Hypsibius dujardini. The recent controversy over horizontal gene transfer in Tardigrades was rapidfire thanks to preprinting.
  6. CRISPR with independent transgenes is a safe and robust alternative to autonomous gene drives in basic research. This preprint concerning biosafety of CRISPR/Cas technology could be accessed immediately thanks to preprinting.

But many journals consider preprints to be previous publications!

Wrong. It is true that some journals have yet to change their policy, but the majority – including Nature, Cell and Science – are happy to consider manuscripts that have been preprinted. There are many examples of biology preprints that went on to be published in Nature (ancient genomes) and Science (hotspots in birds). If you are worried about whether the journal you want to submit your work to will allow preprinting, check this page first or the SHERPA/RoMEO resource. The journal “information to authors” page should have a statement about this, but you can always ask the Editor.

I’m going to get scooped

Preprints establish priority. It isn’t possible to be scooped if you deposit a preprint that is time-stamped showing that you were the first. The alternative is to send it to a journal where no record will exist that you submitted it if the paper is rejected, or sometimes even if they end up publishing it (see discussion here). Personally, I feel that the fear of scooping in science is overblown. In fields that are so hot that papers are coming out really fast the fear of scooping is high, everyone sees the work if its on bioRxiv or elsewhere – who was first is clear to all. Think of it this way: depositing a preprint at bioRxiv is just the same as giving a talk at a meeting. Preprints mean that there is a verifiable record available to everyone.

Preprints look ugly, I don’t want people to see my paper like that.

The depositor can format their preprint however they like! Check out Christophe Leterrier’s beautifully formatted preprint, or this one from Dennis Eckmeier. Both authors made their templates available so you can follow their example (1 and 2).

Yes but does -insert name of famous scientist- deposit preprints?

Lots of high profile scientists have already used bioRxiv. David Bartel, Ewan Birney, George Church, Ray Deshaies, Jennifer Doudna, Steve Henikoff, Rudy Jaenisch, Sophien Kamoun, Eric Karsenti, Maria Leptin, Rong Li, Andrew Murray, Pam Silver, Bruce Stillman, Leslie Vosshall and many more. Some sceptical people may find this argument compelling.

I know how publishing works now and I don’t want to disrupt the status quo

It’s paradoxical how science is all about pushing the frontiers, yet when it comes to publishing, scientists are incredibly conservative. Physics and Mathematics have been using preprinting as part of the standard route to publication for decades and so adoption by biology is nothing unusual and actually, we will simply be catching up. One vision for the future of scientific publishing is that we will deposit preprints and then journals will search out the best work from the server to highlight in their pages. The journals that will do this are called “overlay journals”. Sounds crazy? It’s already happening in Mathematics. Terry Tao, a Fields medal-winning mathematician recently deposited a solution to the Erdos discrepency problem on arXiv (he actually put them on his blog first). This was then “published” in Discrete Analysis, an overlay journal. Read about this here.

Disclaimer: other preprint services are available. F1000 Research, PeerJ Preprints and of course arXiv itself has quantitative biology section. My lab have deposited work at bioRxiv (1, 2 and 3) and I am an affiliate for the service, which means I check preprints before they go online.

Edit 14/12/15 07:13 put the scientists in alphabetical order. Added a part about scooping.

The post title comes from the term “white label” which is used for promotional vinyl copies of records ahead of their official release.