So quantixed occasionally gets correspondence from other researchers asking for advice. A recent email came from someone who had been “scooped”. What should they do?

Before we get into this topic we have to define what we mean by being scooped.

In the most straightforward sense being scooped means that an article appeared online before you managed to get your article online.

You were working on something that someone else was also working on – maybe you knew about this or not and vice versa – but they got their work out before you did. They are the scooper and you are the scoopee.

There is another use of the term, primarily used in highly competitive fields, which define the act of scooping as the scooper have gained some unfair advantage to make the scoop. In the worst case, this can be done by receiving your article to review confidentially and then delaying your work while using your information to accelerate their own work (Ginsparg, 2016).

However it happens, the scoop can classified as an overscoop or an underscoop. An overscoop is where the scooper has much more data and a far more complete story. Maybe the scooper’s paper appears in high profile journal while the scoopee was planning on submitting to a less-selective journal.  Perhaps the scooper has the cell data, an animal model, the biochemical data and a crystal structure; while the scoopee had some nice data in cells and a bit of biochemistry. An underscoop is where a key observation that the scoopee was building into a full paper is partially revealed. The scoopee could have more data or better quality results and maybe the full mechanism, but the scooper’s paper gives away a key detail (Mole, 2004).

All of these definitions are different from the journalistic definition which simply means “the scoop” is the big story. What the science and journalistic term share is the belief that being second with a story is worthless. In science, being second and getting the details right is valuable and more weight should be given that it currently is. I think follow-up work is valued by the community, but it is fair to say that it is unlikely to receive the same billing and attention as the scooper’s paper.

How often does scooping actually happen?

To qualify as being scooped, you need to have a paper that you are preparing for publication when the other paper appears. If you are not at that point, someone else was just working on something similar and they’ve published a paper. They haven’t scooped you. This is easiest to take when you have just had an idea or have maybe done a few experiments and then you see a paper on the same thing. It must’ve been a good idea! The other paper has saved you some time! Great. Move on. The problem comes when you have invested a lot of time doing a whole bunch of work and then the other paper appears. This is very annoying, but to reiterate, you haven’t really been scooped if you weren’t actually at the point of preparing your work for publication.

As you might have gathered, I am not even sure scooping is a real thing. For sure the fear of being scooped is real. And there are instances of scooping happening. But most of the time the scoopee has not actually been scooped. And even then, the scoopee does not just abandon their work.

So what is the advice to someone who has discovered that they have been scooped?

Firstly, don’t panic! The scoopers paper is not going to go away and you have to deal with the fact you now have the follow up paper. It can be hard to change your mindset, but you must rewrite your paper to take their work into account. Going into denial mode and trying to publish your work as though the other paper doesn’t exist is a huge mistake.

Second, read their work carefully. I doubt that the scooper has left you with no room for manoeuvre. Even in the case of the overscoop, you probably still have something that the other paper doesn’t have that you can still salvage. There’s bound to be some details on which your work does not agree and this can feature in your paper. If it’s an underscoop, you have even less to worry about. There will be a way forward – you just need to identify it and move on.

The main message is that “being scooped” is not the end. You just need to figure out your way forward.

How do I stop it from happening to me?

Be original! It’s a truism that if you are working on something interesting, it’s likely that someone else is too. And if you work in a highly competitive area, there might be many groups working on the same thing and it is more likely that you will be scooped. Some questions are obvious next steps and it might be worth thinking twice about pursuing them. This is especially true if you come up with an idea based on a paper you’ve read. Work takes so long to appear that the lab who published that paper is likely far ahead of you.

Having your own niche gives the best protection. If you have carved out your own question you probably have the lead and will be associated with work in this area anyway. Other labs will back off. If you have a highly specialised method, again you can contribute in ways that others can’t and so your chances of being scooped decrease.

Have a backup plan. Do you have a side project which you can switch to if too much novelty is taken away from your main project? You can insulate yourself from scoop damage by not working on projects that are all-or-nothing. Horror stories about scooping in structural biology (which is all about “the big reveal”) are commonplace. Investing energy in alternative approaches or new assays as well as getting a structure might help here.

If you find out about competition, maybe from a poster or a talk at a meeting, you need to evaluate whether it is worth carrying on. If you can, talk to the other lab. Most labs do not want to compete and would prefer to collaborate or at least co-ordinate submission of manuscripts.

Use preprints! If you deposit your work on a preprint server, you get a DOI and a date stamp. You can prove that your work existed on that date and in what form. This is ultimate protection against being scooped. If someone else’s work appears online before you do this, then as I said above, you haven’t really been scooped. If work appears and you already have a DOI, well, then you haven’t been scooped either. Some journals see things this way. For example, EMBO J have a scoop protection policy that states that the preprint deposition timestamp is the date at which priority is assessed.

The post title is taken from “Scoop” by The Auctioneers. I have this track on an extended C86 3-Disc set.

## Measured Steps: Garmin step adjustment algorithm

I recently got a new GPS running watch, a Garmin Fēnix 5. As well as tracking runs, cycling and swimming, it does “activity tracking” – number of steps taken in a day, sleep, and so on. The step goals are set to move automatically and I wondered how it worked. With a quick number crunch, the algorithm revealed itself. Read on if you are interested how it works.

The watch started out with a step target of 7500 steps in one day. I missed this by 2801 and the target got reduced by 560 to 6940 for the next day. That day I managed 12480, i.e. 5540 over the target. So the target went up by 560 to 7500. With me so far? Good. So next I went over the target and it went up again (but this time by 590 steps). I missed that target by a lot and the target was reduced by 530 steps. This told me that I’d need to collect a bit more data to figure out how the goal is set. Here are the first few days to help you see the problem.

 Actual steps Goal Deficit/Surplus Adjustment for Tomorrow 4699 7500 -2801 -560 12480 6940 5540 560 10417 7500 2917 590 2726 8090 -5364 -530 6451 7560 -1109 -220 8843 7340 1503 150 8984 7490 1494 300 9216 7790 1426 290

The data is available for download as a csv via the Garmin Connect website. After waiting to accumulate some more data, I plotted out the adjustment vs step deficit/surplus. The pattern was pretty clear.

There are two slopes here that pass through the origin. It doesn’t matter what the target was, the adjustment applied is scaled according to how close to the target I was, i.e. the step deficit or surplus. There was either a small (0.1) or large (0.2) scaling used to adjust the step target for the next day, but how did the watch decide which scale to use?

The answer was to look back at the previous day’s activity as well as the current day.

So if today you exceeded the target and you also exceeded the target yesterday then you get a small scale increase. Likewise if you fell short today and yesterday, you get a small scale decrease. However, if you’ve exceeded today but fell short yesterday, your target goes up by the big scaling. Falling short after exceeding yesterday is rewarded with a big scale decrease. The actual size of the decrease depends on the deficit or surplus on that day. The above plot is coloured according to the four possibilities described here.

I guess there is a logic to this. The goal could quickly get unreachable if it increased by 20% on a run of two days exceeding the target, and conversely, too easy if the decreases went down rapidly with consecutive inactivity. It’s only when there’s been a swing in activity that the goal should get moved by the large scaling. Otherwise, 10% in the direction of attainment is fine.

I have no idea if this is the algorithm used across all of Garmin’s watches or if other watch manufacturer’s use different target-setting algorithms.

The post title comes from “Measured Steps” by Edsel from their Techniques of Speed Hypnosis album.

## Esoteric Circle

Many projects in the lab involve quantifying circular objects. Microtubules, vesicles and so on are approximately circular in cross section. This quick post is about how to find the diameter of these objects using a computer.

So how do you measure the diameter of an object that is approximately circular? Well, if it was circular you would measure the distance from one edge to the other, crossing the centre of the object. It doesn’t matter along which axis you do this. However, since these objects are only approximately circular, it matters along which axis you measure. There are a couple of approaches that can be used to solve this problem.

Principal component analysis

The object is a collection of points* and we can find the eigenvectors and eigenvalues of these points using principal component analysis. This was discussed previously here. The 1st eigenvector points along the direction of greatest variance and the 2nd eigenvector is normal to the first. The order of eigenvectors is determined by their eigenvalues. We use these to rotate the coordinate set and offset to the origin.

Now the major axis of the object is aligned to the x-axis at y=0 and the minor axis is aligned with the y-axis at x=0 (compare the plot on the right with the one on the left, where the profiles are in their original orientation – offset to zero). We can then find the absolute values of the axis crossing points and when added together these represent the major axis and minor axis of the object. In Igor, this is done using a oneliner to retrieve a rotated set of coords as the wave M_R.

PCA/ALL/SEVC/SRMT/SCMT xCoord,yCoord

To find the crossing points, I use Igor’s interpolation-based level crossing functions. For example, storing the aggregated diameter in a variable called len.

FindLevel/Q/EDGE=1/P m1c0, 0
len = abs(m1c1(V_LevelX))
FindLevel/Q/EDGE=2/P m1c0, 0
len += abs(m1c1(V_LevelX))

This is just to find one axis (where m1c0 and m1c1 are the 1st and 2nd columns of a 2-column wave m1) and so you can see it is a bit cumbersome.

Anyway, I was quite happy with this solution. It is unbiased and also tells us how approximately circular the object is (because the major and minor axes tell us the aspect ratio or ellipticity of the object). I used it in Figure 2 of this paper to show the sizes of the coated vesicles. However, in another project we wanted to state what the diameter of a vesicle was. Not two numbers, just one. How do we do that? We could take the average of the major and minor axes, but maybe there’s an easier way.

Polar coordinates

The distance from the centre to every point on the edge of the object can be found easily by converting the xy coordinates to polar coordinates. To do this, we first find the centre of the object. This is the centroid $$(\bar{x},\bar{y})$$ represented by

$$\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_{i}$$ and $$\bar{y} = \frac{1}{n}\sum_{i=1}^{n} y_{i}$$

for n points and subtract this centroid from all points to arrange the object around the origin. Now, since the xy coords are represented in polar system by

$$x_{i} = r_{i}\cos(\phi)$$ and $$y_{i} = r_{i}\sin(\phi)$$

we can find r, the radial distance, using

$$r_{i} = \sqrt{x_{i}^{2} + y_{i}^{2}}$$

With those values we can then find the average radial distance and report that.

There’s something slightly circular (pardon the pun) about this method because all we are doing is minimising the distance to a central point initially and then measuring the average distance to this minimised point in the latter step. It is much faster than the PCA approach and would be insensitive to changes in point density around the object. The two methods would probably diverge for noisy images. Again in Igor this is simple:

Make/O/N=(dimsize(m1,0)-1)/FREE rW

rW[] = sqrt(m1[p][0]^2 + m1[p][1]^2)

len = 2 * mean(rW)
Here again, m1 is the 2-column wave of coords and the diameter of the object is stored in len.

How does this compare with the method above? The answer is kind of obvious, but it is equidistant between the major and minor axes. Major axis is shown in red and minor axis shown in blue compared with the mean radial distance method (plotted on the y-axis). In places there is nearly a 10 nm difference which is considerable for objects which are between 20 and 35 nm in diameter. How close is it to the average of the major and minor axis? Those points are in black and they are very close but not exactly on y=x.

So for simple, approximately circular objects with low noise, the ridiculously simple polar method gives us a single estimate of the diameter of the object and this is much faster than the more complex methods above. For more complicated shapes and noisy images, my feeling is that the PCA approach would be more robust. The two methods actually tell us two subtly different things about the shapes.

Why don’t you just measure them by hand?

In case there is anyone out there wondering why a computer is used for this rather than a human wielding the line tool in ImageJ… there are two good reasons.

1. There are just too many! Each image has tens of profiles and we have hundreds of images from several experiments.
2. How would you measure the profile manually? This approach shows two unbiased methods that don’t rely on a human to draw any line across the object.

* = I am assuming that the point set is already created.

The post title is taken from “Esoteric Circle” by Jan Garbarek from the LP of the same name released in 1969. The title fits well since this post is definitely esoteric. But maybe someone out there is interested!

## Inspiration Information: some book recommendations for kids

As with children’s toys and clothes, books aimed at children tend to be targeted in a gender-stereotyped way. This is a bit depressing. While books about princesses can be inspirational to young girls – if the protagonist decides to give it all up and have a career as a medic instead (the plot to Zog by Julia Donaldson) – mostly they are not. How about injecting some real inspiration into reading matter for kids?

Here are a few recommendations. This is not a survey of the entire market, just a few books that I’ve come across that have been road-tested and received a mini-thumbs up from little people I know.

Little People Big Dreams: Marie Curie by Isabel Sanchez Vegara & Frau Isa

This is a wonderfully illustrated book that tells the story of Marie Curie. From a young girl growing up in Poland, overcoming gender restrictions to go and study in France and subsequently winning two Nobel Prizes and being a war hero! The front part of the book is written in simple language that kids can read while the last few pages are (I guess) for an adult to read aloud to the child, or for older children to read for themselves.

This book is part of a series which features inspirational women: Ada Lovelace, Rosa Parks, Emmeline Pankhurst, Amelia Earhart. What is nice is that the series also has books on women from creative fields Coco Chanel, Audrey Hepburn, Frida Kahlo, Ella Fitzgerald. Often non-fiction books for kids are centred on science/tech/human rights which is great but, let’s face it, not all kids will engage with these topics. The bigger message here is to show young people that little people with big dreams can change the world.

Ada Twist, Scientist by Andrea Beaty & David Roberts

A story about a young scientist who keeps on asking questions. The moral of the story is that there is nothing wrong with asking “why?”. The artwork is gorgeous and there are plenty of things to spot and look at on each page. The mystery of the book is not exactly solved either so there’s fun to be had discussing this as well as reading the book straight. Ada Marie Twist is named after Ada Lovelace and Marie Curie, two female giants of science.

This book is highly recommended. It’s fun and crammed full with positivity.

Rosie Revere, Engineer by Andrea Beaty & David Roberts

By the same author and illustrator, ‘Rosie Revere…’ tells the story of a young inventor. She overcomes ridicule when she is taken under the wing of her great aunt who is an inspirational engineer. Her great aunt Rose is I think supposed to be Rosie the Riveter, be-headscarfed feminist icon from WWII. A wonderful touch.

Rosie is a classmate of Ada Twist (see above) and there is another book featuring a young (male) architect which we have not yet road-tested. Rather than recruitment propaganda for Engineering degrees, the broader message of ‘Rosie Revere…’ is that persevering with your ideas and interests is a good thing, i.e. never give up.

Good Night Stories for Rebel Girls by Elena Favilli & Francesca Cavallo
A wonderful book that gives brief biographies of inspiring women. Each two page spread has some text and an illustration of the rebel girl to inspire young readers. The book has a This book belongs to… page at the beginning, but in a move of pure genius, the book has two final pages for the owner of the book to write their own story. Just like the women featured in the book, the owner to the book can have their own one page story and draw their own self-portrait.
This book is highly recommended.
EDIT: this book was added to the list on 2018-02-26

Who was Charles Darwin? by Deborah Hopkinson & Nancy Harrison

This is a non-fiction book covering Darwin’s life from school days through the Beagle adventures and on to old age. It’s a book for children although compared to the books above, this is quite a dry biography with a few black-and-white illustrations. This says more about how well the books above are illustrated rather than anything particularly bad about “Who Was Charles Darwin?”. Making historical or biographical texts appealing to kids is a tough gig.

The text is somewhat inspirational – Darwin’s great achievements were made despite personal problems – but there is a disconnect between the life of a historical figure like Darwin and the children of today.

For older people

Quantum Mechanics by Jim Al-Khalili

Aimed at older children and adults, this book explains the basics behind the big concept of “Quantum Mechanics”. These Ladybird Expert books have a retro appeal, being similar to the original Ladybird books published over forty years ago. Jim Al-Khalili is a great science communicator and any young people (or adults) who have engaged with his TV work will enjoy this short format book.

Evolution by Steve Jones

This is another book in the Ladybird Expert series (there is one further book, on “Climate Change”). The brief here is the same: a short format explainer of a big concept, this time “Evolution”. The target audience is the same. It is too dry for young children but perfect for teens and for adults. Steve Jones is an engaging writer and this book doesn’t disappoint, although the format is limited to one-page large text vignettes on evolution with an illustration on the facing page.

It’s a gateway to further reading on the topic and there’s a nice list of resources at the end.

Computing for Kids

After posting this, I realised that we have lots of other children’s science and tech books that I could have included. The best of the rest is this “lift-the-flap” book on Computers and Coding published by Usborne. It’s a great book that introduces computing concepts in a fun gender-free way. It can inspire kids to get into programming perhaps making a step up from Scratch Jr or some other platform that they use at school.

I haven’t included any links to buy these books. Of course, they’re only a google search away. If you like the sound of any, why not drop in to your local independent bookshop and support them by buying a copy there.

The post title comes from the title track of the “Inspiration Information” LP by Shuggie Otis. The version I have is the re-release with  ‘Strawberry Letter 23’ on it from ‘Freedom Flight’ – probably his best known track – as well as a host of other great tunes. Highly underrated, check it out. There’s another recommendation for you.

## In a Word: LaTeX to Word and vice versa

Here’s a quick tech tip. We’ve been writing papers in TeX recently, using Overleaf as a way to write collaboratively. This works great but sometimes, a Word file is required by the publisher. So how do you convert from one to the other quickly and with the least hassle?

If you Google this question (as I did), you will find a number of suggestions which vary in the amount of effort required. Methods include latex2rtf or pandoc. Here’s what worked for me:

• Exporting the TeX file as PDF from Overleaf
• Opening it in Microsoft Word
• That was it!

OK, that wasn’t quite it. It did not work at all on a Mac. I had to use a Windows machine running Word. The formatting was maintained and the pictures imported OK. Note that this was a short article with three figures and hardly any special notation (it’s possible this doesn’t work as well on more complex documents). A couple of corrections were needed: hyphenation at the end of the line was deleted during the import which borked actual hyphenated words which happened to span two lines; and the units generated by siunitx were missing a space between the number and unit. Otherwise it was pretty straightforward. So straightforward that I thought I’d write a quick post in case it helps other people.

What about going the other way?

Again, on Windows I used Apache OpenOffice to open my Word document and save it as an otd file. I then used the writer2latex filter to make a .tex file with all the embedded images saved in a folder. These could then be uploaded to Overleaf. With a bit of formatting work, I was up-and-running.

I had heard that many publishers, even those that say that they accept manuscripts as TeX files actually require a Word document for typesetting. This is because, I guess, they have workflows set up to make the publisher version which must start with a Word document and nothing else. What’s more worrying is that in these cases, if you don’t supply one, they will convert it for you before putting into the workflow. It’s probably better to do this yourself and check the conversion to reduce errors at the proof stage.

The post title is taken from “In A Word” the compilation album by Nottingham noise-rockers Fudge Tunnel.

## Some Things Last A Long Time II

Back in 2014, I posted an analysis of the time my lab takes to publish our work. This post is very popular. Probably because it looks at the total time it takes us to publish our work. It was time for an update. Here is the latest version.

The colours have changed a bit but again the graphic shows that the journey to publication in four “eras”:

1. Pre-time (before 0 on the x-axis): this is the time from first submission to the first journal. A dark time which involves rejection.
2. Submission at the final journal (starting at time 0). Again, the lime-coloured periods are when the manuscript is with the journal and the green ones, when it is with us (being revised).
3. Acceptance! This is where the lime bar stops. The manuscript is then readied for publication (blank area).
4. Published online. A red period that ends with final publication in print.

Since 2013 we have been preprinting our work, which means that the manuscript is available while it is under review. This procedure means that the journey to publication only delays the work appearing in the journal and not its use by other scientists. If you want to find out more about preprints in biology check out ASAPbio.org or my posts here and here.

The mean time from first submission to the paper appearing online in the journal is 226 days (median 210). Which is shorter than the last time I did this analysis (250 days). Sadly though we managed to set a new record for longest time to publication with 450 days! This is sad for the first author concerned who worked hard (259 days in total) revising the paper when she could have been doing other stuff. It is not all bad though. That paper was put up on bioRxiv the day we first submitted it so the pain is offset somewhat.

What is not shown in the graphic is the other papers that are still making their way through the process. These manuscripts will change the stats again likely pushing up the times. As I said in the last post, I think the delays we experience are pretty typical for our field and if anything, my group are quite quick to publish.

If you’d like to read more about publication lag times see here.

Thanks to Jessica Polka for nudging me to update this post.

The post title comes again from Daniel Johnston’s track “Some Things Last A Long Time” from his “1990” LP.

## Methods papers for MD997

I am now running a new module for masters students, MD997. The aim is to introduce the class to a range of advanced research methods and to get them to think about how to formulate their own research question(s).

The module is built around a paper which is allocated in the first session. I had to come up with a list of methods-type papers, which I am posting below. There are 16 students and I picked 23 papers. I aimed to cover their interests, which are biological but with some chemistry, physics and programming thrown in. The papers are a bit imaging-biased but I tried to get some ‘omics and neuro in there. There were some preprints on the list to make sure I covered the latest stuff.

The students picked their top 3 papers and we managed to assign them without too much trouble. Every paper got at least one vote. Some papers were in high demand. Fitzpatrick et al. on cryoEM of Alzheimer’s samples and the organoid paper from Lancaster et al. had by far the most votes.

The students present this paper to the class and also use it to formulate their own research proposal. Which one would you pick?

1. Booth, D.G. et al. (2016) 3D-CLEM Reveals that a Major Portion of Mitotic Chromosomes Is Not Chromatin Mol Cell 64, 790-802. http://dx.doi.org/10.1016/j.molcel.2016.10.009
2. Chai, H. et al. (2017) Neural Circuit-Specialized Astrocytes: Transcriptomic, Proteomic, Morphological, and Functional Evidence Neuron 95, 531-549 e9. http://dx.doi.org/10.1016/j.neuron.2017.06.029
3. Chang, J.B. et al. (2017) Iterative expansion microscopy Nat Methods 14, 593-599. http://dx.doi.org/10.1038/nmeth.4261
4. Chen, B.C. et al. (2014) Lattice light-sheet microscopy: imaging molecules to embryos at high spatiotemporal resolution Science 346, 1257998. http://dx.doi.org/10.1126/science.1257998
5. Chung, K. & Deisseroth, K. (2013) CLARITY for mapping the nervous system Nat Methods 10, 508-13. http://dx.doi.org/10.1038/nmeth.2481
6. Eichler, K. et al. (2017) The Complete Connectome Of A Learning And Memory Center In An Insect Brain bioRxiv. http://dx.doi.org/10.1101/141762
7. Fitzpatrick, A.W.P. et al. (2017) Cryo-EM structures of tau filaments from Alzheimer’s disease Nature 547, 185-190. http://dx.doi.org/10.1038/nature23002
8. Habib, N. et al. (2017) Massively parallel single-nucleus RNA-seq with DroNc-seq Nat Methods 14, 955-958. http://dx.doi.org/10.1038/nmeth.4407
9. Hardman, G. et al. (2017) Extensive non-canonical phosphorylation in human cells revealed using strong-anion exchange-mediated phosphoproteomics bioRxiv. http://dx.doi.org/10.1101/202820
10. Herzik, M.A., Jr. et al. (2017) Achieving better-than-3-A resolution by single-particle cryo-EM at 200 keV Nat Methods. http://dx.doi.org/10.1038/nmeth.4461
11. Jacquemet, G. et al. (2017) FiloQuant reveals increased filopodia density during breast cancer progression J Cell Biol 216, 3387-3403. http://dx.doi.org/10.1083/jcb.201704045
12. Jungmann, R. et al. (2014) Multiplexed 3D cellular super-resolution imaging with DNA-PAINT and Exchange-PAINT Nat Methods 11, 313-8. http://dx.doi.org/10.1038/nmeth.2835
13. Kim, D.I. et al. (2016) An improved smaller biotin ligase for BioID proximity labeling Mol Biol Cell 27, 1188-96. http://dx.doi.org/10.1091/mbc.E15-12-0844
14. Lancaster, M.A. et al. (2013) Cerebral organoids model human brain development and microcephaly Nature 501, 373-9. http://dx.doi.org/10.1038/nature12517
15. Madisen, L. et al. (2012) A toolbox of Cre-dependent optogenetic transgenic mice for light-induced activation and silencing Nat Neurosci 15, 793-802. http://dx.doi.org/10.1038/nn.3078
16. Penn, A.C. et al. (2017) Hippocampal LTP and contextual learning require surface diffusion of AMPA receptors Nature 549, 384-388. http://dx.doi.org/10.1038/nature23658
17. Qin, P. et al. (2017) Live cell imaging of low- and non-repetitive chromosome loci using CRISPR-Cas9 Nat Commun 8, 14725. http://dx.doi.org/10.1038/ncomms14725
18. Quick, J. et al. (2016) Real-time, portable genome sequencing for Ebola surveillance Nature 530, 228-232. http://dx.doi.org/10.1038/nature16996
19. Ries, J. et al. (2012) A simple, versatile method for GFP-based super-resolution microscopy via nanobodies Nat Methods 9, 582-4. http://dx.doi.org/10.1038/nmeth.1991
20. Rogerson, D.T. et al. (2015) Efficient genetic encoding of phosphoserine and its nonhydrolyzable analog Nat Chem Biol 11, 496-503. http://dx.doi.org/10.1038/nchembio.1823
21. Russell, M.R. et al. (2017) 3D correlative light and electron microscopy of cultured cells using serial blockface scanning electron microscopy J Cell Sci 130, 278-291. http://dx.doi.org/10.1242/jcs.188433
22. Strickland, D. et al. (2012) TULIPs: tunable, light-controlled interacting protein tags for cell biology Nat Methods 9, 379-84. http://dx.doi.org/10.1038/nmeth.1904
23. Yang, J. et al. (2015) The I-TASSER Suite: protein structure and function prediction Nat Methods 12, 7-8. http://dx.doi.org/10.1038/nmeth.3213

If you are going to do a similar exercise, Twitter is invaluable for suggestions for papers. None of the students complained that they couldn’t find three papers which matched their interests. I set up a slide carousel in Powerpoint with the front page of each paper together with some key words to tell the class quickly what the paper was about. I gave them some discussion time and then collated their choices on the board. Assigning the papers was quite straightforward, trying to honour the first choices as far as possible. Having an excess of papers prevented too much horse trading for the papers that multiple people had picked.

Hopefully you find this list useful. I was inspired by Raphaël posting his own list here.

## The Sound of Clouds: wordcloud of tweets using R

Another post using R and looking at Twitter data.

As I was typing out a tweet, I had the feeling that my vocabulary is a bit limited. Papers I tweet about are either “great”, “awesome” or “interesting”. I wondered what my most frequently tweeted words are.

Like the last post you can (probably) do what I’ll describe online somewhere, but why would you want to do that when you can DIY in R?

First, I requested my tweets from Twitter. I wasn’t sure of the limits of rtweet for retrieving tweets and the request only takes a few minutes. This gives you a download of everything including a csv of all your tweets. The text of those tweets is in a column called ‘text’.


## for text mining
library(tm)
## for building a corpus
library(SnowballC)
## for making wordclouds
library(wordcloud)
tweets <- read.csv('tweets.csv', stringsAsFactors = FALSE)
## make a corpus of the text of the tweets
tCorpus <- Corpus(VectorSource(tweets$text)) ## remove all the punctation from tweets tCorpus <- tm_map(tCorpus, removePunctuation) ## good idea to remove stopwords: high frequency words such as I, me and so on tCorpus <- tm_map(tCorpus, removeWords, stopwords('english')) ## next step is to stem the words. Means that talking and talked become talk tCorpus <- tm_map(tCorpus, stemDocument) ## now display your wordcloud wordcloud(tCorpus, max.words = 100, random.order = FALSE)  For my @clathrin account this gave: So my most tweeted word is paper, followed by cell and lab. I’m quite happy about that. I noticed that great is also high frequency, which I had a feeling would also be the case. It looks like @christlet, @davidsbristol, @jwoodgett and @cshperspectives are among my frequent twitterings, this is probably a function of the length of time we’ve been using twitter. The cloud was generated using 10.9K tweets over seven years, it might be interesting to look at any changes over this time… The cloud is a bit rough and ready. Further filtering would be a good idea, but this quick exercise just took a few minutes. The post title comes from “The Sound of Clouds” by The Posies from their Solid States LP. ## I’m not following you: Twitter data and R I wondered how many of the people that I follow on Twitter do not follow me back. A quick way to look at this is with R. OK, a really quick way is to give a 3rd party application access rights to your account to do this for you, but a) that isn’t safe, b) you can’t look at anyone else’s data, and c) this is quantixed – doing nerdy stuff like this is what I do. Now, the great thing about R is the availability of well-written packages to do useful stuff. I quickly found two packages twitteR and rtweet that are designed to harvest Twitter data. I went with rtweet and there were some great guides to setting up OAuth and getting going. The code below set up my environment and pulled down lists of my followers and my “friends”. I’m looking at my main account and not the quantixed twitter account.  library(rtweet) library(httpuv) ## setup your appname,api key and api secret appname <- "whatever_name" key <- "blah614h" secret <- "blah614h" ## create token named "twitter_token" twitter_token <- create_token( app = appname, consumer_key = key, consumer_secret = secret) clathrin_followers <- get_followers("clathrin", n = "all") clathrin_followers_names <- lookup_users(clathrin_followers) clathrin_friends <- get_friends("clathrin") clathrin_friends_names <- lookup_users(clathrin_friends)  The terminology is that people that follow me are called Followers and people that I follow are called Friends. These are the terms used by Twitter’s API. I have almost 3000 followers and around 1200 friends. This was a bit strange… I had fewer followers with data than actual followers. Same for friends: missing a few hundred in total. I extracted a list of the Twitter IDs that had no data and tried a few other ways to look them up. All failed. I assume that these are users who have deleted their account (and the Twitter ID stays reserved) or maybe they are suspended for some reason. Very strange.  ## noticed something weird ## look at the twitter ids of followers and friends with no data missing_followers <- setdiff(clathrin_followers$user_id,clathrin_followers_names$user_id) missing_friends <- setdiff(clathrin_friends$user_id,clathrin_friends_names$user_id) ## find how many real followers/friends are in each set aub <- union(clathrin_followers_names$user_id,clathrin_friends_names$user_id) anb <- intersect(clathrin_followers_names$user_id,clathrin_friends_names$user_id) ## make an Euler plot to look at overlap fit <- euler(c( "Followers" = nrow(clathrin_followers_names) - length(anb), "Friends" = nrow(clathrin_friends_names) - length(anb), "Followers&amp;Friends" = length(anb))) plot(fit) plot(fit)  In the code above, I arranged in sets the “real Twitter users” who follow me or I follow them. There was an overlap of 882 users, leaving 288 Friends who don’t follow me back – boo hoo! I next wanted to see who these people are, which is pretty straightforward.  ## who are the people I follow who don't follow me back bonly <- setdiff(clathrin_friends_names$user_id,anb)
no_follow_back <- lookup_users(bonly)



Looking at no_follow_back was interesting. There are a bunch of announcement accounts and people with huge follower counts that I wasn’t surprised do not follow me back. There are a few people on the list with whom I have interacted yet they don’t follow me, which is a bit odd. I guess they could have unfollowed me at some point in the past, but my guess is they were never following me in the first place. It used to be the case that you could only see tweets from people you followed, but the boundaries have blurred a lot in recent years. An intermediary only has to retweet something you have written for someone else to see it and you can then interact, without actually following each other. In fact, my own Twitter experience is mainly through lists, rather than my actual timeline. And to look at tweets in a list you don’t need to follow anyone on there. All of this led me to thinking: maybe other people (who follow me) are wondering why I don’t follow them back… I should look at what I am missing out on.

## who are the people who follow me but I don't follow back
aonly <- setdiff(clathrin_followers_names\$user_id,anb)
no_friend_back <- lookup_users(aonly)
## save csvs with all user data for unreciprocated follows
write.csv(no_follow_back, file = "nfb.csv")
write.csv(no_friend_back, file = "nfb2.csv")



With this last bit of code, I was able to save a file for each subset of unreciprocated follows/friends. Again there were some interesting people on this list. I must’ve missed them following me and didn’t follow back.

I used these lists to prune my friends and to follow some interesting new people. The csv files contain the Twitter bio of all the accounts so it’s quick to go through and check who is who and who is worth following. Obviously you can search all of this content for keywords and things you are interested in.

So there you have it. This is my first “all R” post on quantixed – hope you liked it!

The post title is from “I’m Not Following You” the final track from the 1997 LP of the same name from Edwyn Collins.

## Start Me Up: Endocytosis on demand

We have a new paper out. The title is New tools for ‘hot-wiring’ clathrin-mediated endocytosis with temporal and spatial precision. You can read it here.

Cells have a plasma membrane which is the barrier between the cell’s interior and the outside world. In order to import material from outside, cells have a special process called endocytosis. During endocytosis, cells form a tiny bubble of plasma membrane and pull it inside – taking with it a little pocket of the outside world. This process is very important to the cell. For example, it is one way that cells import nutrients to live. It also controls cell movement, growth, and how cells talk to one another. Because it is so important, cell biologists have studied how endocytosis works for decades.

Studying endocytosis is tricky. Like naughty children, cells simply do not do what they are told. There is no way to make a cell in the lab “do endocytosis”. It does it all the time, but we don’t know when or where on the cell surface a vesicle will be made. Not only that, but when a vesicle is made, we don’t really know what cargo it contains. It would be helpful to cell biologists if we could bring cells under control. This paper shows a way to do this. We demonstrate that clathrin-mediated endocytosis can be triggered, so that we can make it happen on-demand.

Endocytosis on-demand

Using a chemical which diffuses into the cell, we can trigger endocytosis to happen all over the cell. The movie on the right shows vesicles (bright white spots) forming after we add the chemical (at 0:00). The way that we designed the system means that the vesicles that form have one type of cargo in there. This is exciting because it means that we can now deliver things into cells using this cargo. So, we can trigger endocytosis on-demand and we can control the cargo, but we still cannot control where on the plasma membrane this happens.

We solved this problem by engineering a light-sensitive version of our system. With this new version we can use blue light to trigger endocytosis. Whereas the chemical diffused everywhere, the light can be focussed in a narrow region on the cell and endocytosis can be trigger only in that region. This means we control where, as well as when, a vesicle will form.

What does hot-wiring mean?

It is possible to start a car without a key by “hot-wiring” it. This happens in the movies, when the bad guy breaks into a car and just twists some wires together to start the car and make a getaway. To trigger endocytosis we used the cell’s own proteins, but we modified them. We chopped out all the unnecessary parts and just left the bare essentials. We call the process of triggering endocytosis “hot-wiring” because it is similar to just twisting the wires together rather than having a key.

It turns out that movies are not like real life, and hot-wiring a car is actually quite difficult and takes a while. So our systems are more like the Hollywood version than real life!

What is this useful for?

As mentioned above, the systems we have made are useful for cell biologists because they allow cells to be “tamed”. This means that we can accurately study the timing of endocytosis and which proteins are required in a very controlled way. It also potentially means that molecules can be delivered to cells that cannot normally enter. So we have a way to “force feed” cells with whatever we want. This would be most useful for drugs or nanoparticles that are not actively taken up by cells.

Who did the work?

Almost all of the work in the paper was by Laura Wood, a PhD student in the lab. She had help from fellow lab members Nick Clarke, who did the correlative light-electron microscopy, and Sourav Sarkar who did the binding experiments. Gabrielle Larocque, another PhD student did some fantastic work to revise the paper after Laura had departed for a post-doc position at another University. We put the paper up on bioRxiv in Summer 2016 and the paper has slowly made its way through peer review to be published in J Cell Biol today.

Wait? I’m a cell biologist! I want to know how this thing really works!

OK. The design is shown to the right. We made a plasma membrane “anchor” and a clathrin “hook” which is a fragment of protein which binds clathrin. The anchor and the hook have an FRB domain and an FKBP domain and these can be brought together by rapamycin. When the clathrin hook is at the membrane this is recognised by clathrin and vesicle formation can begin. The main hook we use is the appendage and hinge from the beta2 subunit of the AP2 complex.

Normally AP2, which has four subunits, needs to bind to PIP2 in the plasma membrane and undergo a conformational change to recognise a cargo molecule with a specific motif, only then can clathrin bind the beta2 appendage and hinge. By hot-wiring, we effectively remove all of those other proteins and all of those steps to just bring the clathrin binding bit to the membrane when we want. Being able to recreate endocytosis using such a minimalist system was a surprise. In vitro work from Dannhauser and Ungewickell had suggested this might be possible, but it really seems that the steps before clathrin engagement are not a precursor for endocytosis.

To make the light inducible version we used TULIPs (tunable light-controlled interacting proteins). So instead of FRB and FKBP we had a LOVpep and PDZ domain on the hook and anchor.

The post title comes from “Start Me Up” by The Rolling Stones. Originally on Tattoo You, but perhaps better known for its use by Microsoft in their Windows 95 advertising campaign. I’ve finally broken a rule that I wouldn’t use mainstream song titles for posts on this blog.