All That Noise: The vesicle packing problem

This week Erick Martins Ratamero and I put up a preprint on vesicle packing. This post is a bit of backstory but please take a look at the paper, it’s very short and simple.

The paper started when I wanted to know how many receptors could fit in a clathrin-coated vesicle. Sounds like a simple problem – but it’s actually more complicated.

Of course, this problem is not as simple as calculating the surface area of the vesicle, the cross-sectional area of the receptor and dividing one by the other. The images above show the problem. The receptors would be the dimples on the golf ball… they can’t overlap… how many can you fit on the ball?

It turns out that a PhD student working in Groningen in 1930 posed a similar problem (known as the Tammes Problem) in his thesis. His concern was the even pattern of pores on a pollen grain, but the root of the problem is the Thomson Problem. This is the minimisation of energy that occurs when charged particles are on a spherical surface. The particles must distribute themselves as far away as possible from all other particles.

There are very few analytical solutions to the Tammes Problem (presently 3-14 and 24 are solved). Anyhow, our vesicle packing problem is the other way around. We want to know, for a vesicle of a certain size, and cargo of a certain size, how many can we fit in.

Fortunately stochastic Tammes solvers are available like this one, that we could adapt. It turns out that the numbers of receptors that could be packed is enormous: for a typical clathrin-coated vesicle almost 800 G Protein-Coupled Receptors could fit on the surface. Note, that this doesn’t take into account steric hinderance and assumes that the vesicle carries nothing else. Full details are in the paper.

Why does this matter? Many labs are developing ways to count molecules in cellular structures by light or electron microscopy. We wanted to have a way to check that our results were physically possible. For example, if we measure 1000 GPCRs in a clathrin-coated vesicle, we know something has gone wrong.

What else? This paper ticked a few things on my publishing bucket list: a paper that is solely theoretical, a coffee-break idea paper and one that is on a “fun” subject. Erick has previous form with theoretical/fun papers, previously publishing on modelling peloton dynamics in procycling.

We figured the paper was more substantial than a blog post yet too minimal to send to a journal. So unless a journal wants to publish it (and gets in touch with us), this will be my first preprint where bioRxiv is the final destination.

We got a sense that people might be interested in an answer to the vesicle packing problem because whenever we asked people for an estimate, we got hugely different answers! The paper has been well-received so far. We’ve had quite a few comments on Twitter and we’re glad that we wrote up the work.

The post title comes from the “All That Noise” LP by The Darkside. I picked this not because of the title, but because of the cover.

All That Noise cover shows a packing problem on a sphere

A Certain Ratio: Gender ratio in our papers

I saw today on Twitter that a few labs were examining the gender balance of their papers and posting the ratios of male:female authors. It started with this tweet.

This analysis is simple to perform, but interpreting it can be hard. For example, is the research group gender balanced to start with? How many of the authors are collaborators? Nonetheless, I have the data for all of my papers, so I thought I’d take a quick look too.

To note: the papers are organised chronologically from top to bottom. They include my papers from before I was a PI (first eight papers). Only research papers are listed, no reviews or methods papers and only those from my lab (i.e. collaborative papers where I am not corresponding author are excluded).

Female authors are blue-green, males are orange. I am a dark orange colour. The blocks are organised according to author list. Joint first authors are boxed.

To the right is a graphic to show the gender ratio. The size of the circle indicates the number of authors on the paper. This is because a paper with M:F ratio of 1 is excusable with 2 or 3 authors, but not with 8. Most of our papers have only a few authors so it’s not a great metric.

On the whole the balance is petty good. Men and women are equally likely to be first author and they are well-represented in the author list. On the other hand, the lab has always had a healthy gender balance and so I would’ve been surprised to find otherwise.

Edit: I replaced the graphic above after a few errors were pointed out to me. Specifically, some authors added to three papers during revisions that were not in the list I used. Also, it was suggested to me that removing myself from the analysis would be a good idea! This was a good suggestion and the corresponding graphic is below.

The post title is taken from the band A Certain Ratio a Factory Records staple from the 1980s.

Installing open source PyMol on a Mac

This is a quick “how to” post. There is a licensed version of PyMol (MacPyMol) available, but the open source version can be installed on a Mac free of charge. The official page has a guide, which is not terribly detailed, and I found this excellent guide which is unfortunately out-of-date.

Here is an updated guide to installing PyMol using Homebrew on macOS Mojave 10.14.3

Step 1 is to install Homebrew

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Next step is to install the PyMol dependencies using Homebrew

brew install Caskroom/cask/xquartz
brew install tcl-tk
brew install python

Now install PyMol.

brew install brewsci/bio/pymol

You can now run PyMol by typing

pymol

The MPIBR-Bioinformatics page has the following guide to make a little executable app to launch PyMol straight from the Desktop.

  • Open Automator, which is in Applications in macOS.
  • Create new document, select “Application”.
  • Select “Actions” and “Library” in the left pane. Select “Utilities” and “Run Shell Script”. Drag this into the main pane.
  • Choose “/bin/bash” as a shell.
  • Paste the following: /usr/local/bin/pymol -M. If this doesn’t work, check the path to pymol using which pymol in the terminal, and use this instead.
  • Save the application (“File > Save”) to the Desktop and name it “PyMol”.

Now you can double-click this app to run PyMol.

Plotlines: the story behind the graph

On a whim a posted a plot on Twitter. It shows a marathon training schedule. This post explains the story behind the graph.

I downloaded a few different 17-week marathon training schedules. Most were in imperial measurement and/or were written for time at a certain pace, e.g. 30 min Easy Run etc. I wanted to convert one of these schedules into a proper plan where I input my pace and get an idea of the distance I need to run (in metric). This means I can pick routes to run each day without having to think too much about it.

Calculating the running paces was simple using Jack Daniels’ VDot calculator and verifying the predicted paces with my running database. I constructed a spreadsheet from the plan, and then did the calculations to get the distances out. Once this was done I wondered what the rationale behind the schedule is, and the best way to see that is to plot it out.

No colour-coding

From this plot the way that the long run on each Sunday is extended or tapered is reasonably clear. However, I was wondering about how intense each of the runs will be. Running 5K at threshold is more intense than a 10K easy run. To look at this I just took the average pace for the session. This doesn’t quite tell us about intensity, because a 10 min easy run + 2 intervals + 10 min easy run is not as intense as doing 4 intervals, yet the average pace would be similar. But it would be close enough. I used a colourscale called VioletOrangeYellow and the result was quite intuitive.

Marathon training schedule
A menu of pain – marathon training schedule

The shorter runs are organised in blocks of intensity while the long runs are about building endurance. From what I understand, the blocks are to do with adapting to the stresses of increasing running load/intensity.

Feedback on the plot was good: runners liked it, non-runners thought it was psychotic.

Small Talk: How big is your lab?

I really dislike being asked “how big is your lab?”. The question usually arises at scientific meetings when you are chatting to someone during a break. Small talk can lead to some banal questions being asked, and that’s fine, but when this question is asked seriously, the person asking really just wants to compare themselves to you in some way. This is one reason why I dislike being asked “how big is your lab?”.

The other reason I don’t like the question is that it can be difficult to answer. I don’t mean that I have so many people in my group that I can’t possibly count them. No, I mean that it can be difficult to give an accurate answer. There’s perhaps a student in the group who is currently writing up, or possibly they’ve handed in their thesis and they are awaiting a viva – do they count towards the tally? They are in your lab but they’re not in your lab. Perhaps you jointly supervise someone, or maybe there is someone who is away working in another lab somewhere. I’m guilty of overthinking this or at least fretting about giving an incorrect answer. Whatever the circumstance, I think that the size of most research groups is not very stable over time, so I dislike the question because it’s difficult answer accurately.

I looked at group size recently because the lab had surpassed the milestone of having 50 all-time members and I wanted to see how the group size had varied over time.

Timeline of people in the lab

The first timeline shows the arrival and departure of lab members over time. The role of each person is colour coded as indicated. Note that some people start in one role and get upgraded. PG to PhD, PhD to Post-doc (PDRA). So what it the group size over time?

Group size over time

It turns out that we peaked this year with a team size of 12. The smallest size (besides the period where I started out, when I was on my own!) was at the end of 2012 when I prepared to move the lab to a different university. What has the make up of the lab been during this time.

Constitution of the lab over time

In this last plot the fraction of the team that are PhD, post-doc etc. is shown over time. This plot is interesting because I can see that it was two years before a PhD student joined the group and also how the lab has become post-doc-heavy in the last 18 months.

So what is the answer to “how big is your lab?”. Well, take your pick. Right now it is 11 with someone just joined this week. Over the last year it has averaged at just over 10. Over the last five years it has been 8 to 9. It’s still not an easy question to answer even if you can see all the data.

Methods: I have been trying to use R for these type of posts, so that sharing the code is more useful, but I drew a blank with this one. I found several tools to plot the first timeline (timevis and vistime). To do the integration and breakdown plots, I struggled… I knew exactly how to make those plots in Igor, so that’s what I did. All that was required was a list of the people, their role, their start-end dates, and a few lines of code. I keep a record of this as previously mentioned.

The post title comes from “Small Talk” a track by American Culture on a Split 7″ with Boyracer on Emotional Response Records.

Super Automatic: computer-based tools for research

Since I have now written several posts on this. I thought I would summarise the computer-based tools that we are using in the lab to automate our work and organise ourselves.

Electronic lab notebook – there are previous posts from me on picking an ELN platform and how to set up a WordPress ELN as well as Dave Mason’s guide and CAMDU’s guide to help you to get this set up.

Slack – it took us ages to set up our lab slack. I wanted to try it out a few years ago but some people in the lab were reluctant (see below for some notes on getting people on board with new tech). I was also sceptical since we are a wet lab and I wondered whether slack would work as well for us as it does for bioinformaticians, for example. We just went for it one day and have not regretted it as it has improved lab communication in so many ways. There is a good guide available on the site to set up something that works for a lab.

Trello – organising projects and assigning to-do items (see this post). Since we switched our communication to Slack, we use Trello a lot less. It is still very good for checking progress on defined tasks, for brainstorming during a meeting, and to make sure stuff doesn’t get forgotten about. Besides our lab boards, I own other boards for work outside of my lab and these also work well, in some cases better than our lab boards.

Databases – infrastructure in the lab is centred on databases for key reagents and these can be cross-referenced in electronic lab notebooks. Keeping them updated is essential. We use FileMaker Pro to maintain them. Setup took some time, but very much worth it.

OMERO – The lab is lucky to have excellent computing support from CAMDU who set up our OMERO database and server. Images from microscopes are piped direct to the database on a per user basis. I’m a big fan.

Dropbox – an account is essential in academia. I have a shared folder with each person in the lab and a folder that all members share. File exchange is best done via our lab server, but Dropbox is still incredibly useful.

Overleaf – writing a paper collaboratively in LaTeX with Overleaf is a joy (see this post). There are Google-based tools for doing this but I am not keen on using them. Dropbox now has a collaborative writing function that my colleagues are using and enjoying.

Zotero – when writing collaboratively in Overleaf, a shared reference management system is essential. Zotero is a healthy alternative to Mendeley and it is now supported in Overleaf v2. I don’t store my PDFs in Zotero and I found no solution for an iTunes-for-PDF-files.

Calendars – while I’m not a fan of Google-based tools, we use the calendar functions for our lab. We inherited this from the Centre where we work, where these calendars are used for booking equipment. We have lab calendars for microscopes, equipment, workstations and for general lab stuff so we know when people are away. I set up a dummy account that belongs to all the lab calendars and this is linked to our lab Slack so that we get a digest of the days bookings every morning, and notifications if new events are added. The University has an outlook-based calendar system, the use of which is patchy amongst academics. However, the admin people use it and so I have blocked out times in here when I’m busy to reduce diary conflicts.

Filter, filter, filter – I set up many filters on my email, twitter… wherever I can… to keep out spam and irrelevant stuff.

Automating the little stuff – a previous post on being organised as a PI mentioned that I advocate writing scripts and macros to automate little things that you do often. Of course there is an xkcd cartoon for this. I have scripts set up to do things like assembling PDFs or converting or compressing file formats. We’ve also been automating the big stuff too. Figures are a good example. We write scripts to produce (and reproduce) figure panels. 

Sharing code – A while back I set up an update site to distribute our ImageJ macros among the lab, but also people from outside can subscribe and get the latest updates easily. Our Igor code is shared within the lab via a cloud-based updater which allows code to compiled on-demand. Lab code is maintained via git at GitHub and our centre forks repos from published projects to its own account.

Onboarding. None of these tools work unless everyone is on board. It is worth having a strategy to make this happen. Simple steps such as introducing on system at a time, providing initial training and some support for people who are slow to uptake. There’s a group effect to onboarding but getting to the tipping point can be hard.

The title for this post “Super Automatic” comes from the album of that name by Myracle Brah (the name of this band is not endorsed by quantixed).

For What It’s Worth: Influence of our papers on our papers

This post is about a citation analysis that didn’t quite work out.

I liked this blackboard project by Manuel Théry looking at the influence of each paper authored by David Pellman’s lab on the future directions of the Pellman lab.

It reminds me that papers can have impact in the field while others might be influential to the group itself. I wondered which of the papers on which I’m an author have been most influential to my other papers and whether this correlates with a measure of their impact on the field.

There’s no code in this post. I retrieved the relevant records from Scopus and used the difference in “with” and “without” self-citation to pull together the numbers.

Influence: I used the number of citations to a paper from any of our papers as the number for self-citation. This was divided by the total number of future papers. This means if I have 50 papers, and the 23rd paper that was published has collected 27 self-citations, this has a score of 1 (the 23rd paper nor any of the preceding 22 papers, can cite the 23rd paper, but the 27 that follow, could).  This is our metric for influence.

Impact: As a measure of general impact I took the total number of citations for each paper and divided this by the number of years since publication to get average cites per year for each paper.

Plot of influence against impact

Reviews and methods papers are shown in blue, while research papers are in red. I was surprised that some papers have been cited by as much as half of the papers that followed.

Generally, the articles that were most influential to us were also the papers with the biggest impact. Although the correlation is not very strong. There is an obvious outlier paper that gets 30 cites per year (over a 12 year period, I should say) but this paper has not influenced our work as much as other papers have. This is partly because the paper is a citation magnet and partly because we’ve stopped working on this topic in the last few years.

Obviously, the most recent papers were the least informative. There are no future papers to test if they were influential and there are few citations so far to understand their impact.

It’s difficult to say what the correlation between impact and influence on our own work really means, if anything. Does it mean that we have tended to pursue projects because of their impact (I would hope not)? Perhaps these papers are generally useful to the field and to us.

In summary, I don’t think this analysis was successful. I had wanted to construct some citation networks – similar to the Pellman tree idea above – to look at influence in more detail, but I lost confidence in the method. Many of our self-citations are for methodological reasons and so I’m not sure if we’re measuring influence or utility here. Either way, the dataset is not big enough (yet) to do more meaningful number crunching. Having said this, the approach I’ve described here will work for any scholar and could be done at scale.

There are several song titles in the database called ‘For What It’s Worth’. This one is Chapterhouse on Rownderbout.


Experiment Zero: Using a Raspberry Pi Zero camera

This is the first post at quantixed about Raspberry Pi computing.

Pi Zero is a minimalist Raspberry Pi that can be coupled to a camera. With this little rig, you can make time-lapse footage amongst other things. I’ve set up a couple of these now. One was to make a time-lapse movie of some plants growing through a plastic maze. The results were pretty good and I thought I’d upload the video and a brief how-to guide.

After a delay, you can see four beans sprouting and then one eventually makes it to the top of the maze. This footage was shot over 27 days. The Pi took pictures every 5 min, but I sampled at 10 min in order to make the movie (after discarding the pictures after the sun went down). Everything was automated.

The camera shoots at 3280 × 2464. I downsampled the images to make the video. The camera didn’t focus well on the maze which was a bit too close. Other units are shooting scenery and the autofocus on the unit is great.

How I did it

Pi Zero

Pi Zero with camera module (without IR filter) and a case are available for around £40. I bought mine from the Pi Hut. Power supplies and SD cards are readily available. I put together the PiCam with a fresh Raspbian full image on a 16GB SD card. Another option is to use a smaller card and get the Pi to save the images to a server.

I used PiBaker to format the SD Card, load on Raspbian and add a startup script that would connect the Pi Zero to WiFi and enable VNC. That meant I could plug it in and start using it headless. Well in theory! It turns out that VNC via Mac does not work with the UNIX style password which is the default on the Pi. I needed to connect to a monitor to rectify this by changing to VNC password in the VNC GUI. After this I could log in and use the Pi Zero remotely.

A few more minor steps were needed for full functionality:

  1. I enabled ssh and camera port in Raspberry Pi Configuration, disabled bluetooth and set the correct timezone (this can probably be done in PiBaker but I forgot).
  2. Since I have several Raspberry Pis on the LAN. I needed to give this one its own identity to prevent network conflicts.
  3. I needed to set up SMB sharing on the new Pi.

Instructions for how to do these things are just one google search away.

Now the Pi was ready to start taking images. I built a little stand for it out of Lego and set up the plant maze.

Taking pictures with the Pi

I wrote a shell script to take pictures using raspistill.

I made a directory called camera in home/pi

mkdir camera

Then made a camera.sh file home/pi that looked like this:

#!/bin/bash
DATE=$(date +"%Y-%m-%d_%H%M")
raspistill -o /home/pi/camera/$DATE.jpg

Then I made it executable

chmod +x camera.sh

Using CRON, I execute the shell script on a schedule. I wanted to take pictures every 5 minutes. You can consult cronguru for your needs.

*/5 * * * * /home/pi/camera.sh 2>&1

That’s it! The Pi Zero will happily take pictures until you tell it to stop. Or there’s a…. crash.

Dealing with crashes

If you are going to do long-term time-lapse imaging, you need to defend against a crash that will prevent images from being acquired. In the worst case, the Pi could go offline and you wouldn’t know until you checked up on it. The first one I set up crashed quite often. I couldn’t determine the cause immediately. So I did the next best thing.

I set up a watchdog to monitor for crashes and then reboot the Pi if/when it happens. Many guides online suggest bcm2708_wdog but this doesn’t work for a Pi Zero. Instead this worked for me:

sudo modprobe bcm2835_wdt
sudo nano /etc/modules

Adding the line “bcm2835_wdt” and saving the file

Next I installed the watchdog daemon

sudo apt-get install watchdog chkconfig
chkconfig watchdog on
sudo /etc/init.d/watchdog start
sudo nano /etc/watchdog.conf

I uncommented two lines from this file:

  1. watchdog-device = /dev/watchdog
  2. the line that had max-load-1 in it

Save the watchdog.conf file.

There are guides online that describe how to set up the Pi so that it sends you an email or SMS when there’s a crash/reboot. I figured I didn’t need this – as long as it reboots OK.

What now?

Well, you wait for it to take photos! You can log in via VNC and check that the images are being acquired, or go in via ssh and watch the camera directory fill up. The size of the images is 3280 × 2464 and they are around 4.5 MB each, so the disk can quickly fill.

After a while you’ll want to assemble a movie. I wrote a shell script on my Mac in order to to pull down the images, take a copy of the ones I want and then make a movie file and upload it to Dropbox so I could look at it on the go.

#!/usr/bin/env bash
# move to the location of the images
cd /local/disk/folder2/
# pull down all images to a local folder - only new images are copied
rsync -trv /Volumes/HOMEPI/camera/ /local/disk/folder/
# overnight images are dark and less than 1.5 MB
# copy the ones we want to keep
rsync -trv --min-size=1000K /local/disk/folder/ /local/disk/folder2/
# or you could filter on size like this - delete <2MB
find . -name "*.jpg" -size -2000k -delete
# scale the images down to 480 px wide and make movie
ffmpeg -framerate 30 -pattern_type glob -i '*.jpg' -c:v libx264 -pix_fmt yuv420p -vf scale=480:-2 out.mp4
# move to dropbox
mv out.mp4 /My/Dropbox/Folder/out.mp4

This script means that I had to manually delete the pictures from the Pi once they’d been copied but that was OK. My plan is to write a script to do this for the longer running projects so that it is automated.

While it is possible to make the movies on the Pi itself, I did it on the Mac as that computer is beefier and is not busy taking pictures every 5 min! ffmpeg is a great tool for this and the documentation is impressive. For example if you have set up the camera in the wrong orientation you can do transposition in ffmpeg. If you don’t have ffmpeg, it is a simple install on the command line.

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null
brew install ffmpeg

Hopefully this guide is useful to you. The Pi Zero Camera can be used for streaming video as well as taking a series of still images. I’m planning to test this out soon.

The post title “Experiment Zero” comes from the title of the album by Man or Astro-Man?

Til I Die: Seeking new music

I’ve been following the tweets from an account called Albums You Must Hear @Albums2Hear. Each tweet is an album recommended by the account owner. I’m a sucker for lists of Albums That I Must Hear Before I Die since I’m always interested in new (or not so new) music recommendations.

I wanted to assemble a list of the albums that I don’t have from this account and I was able to do so using R.

Using rtweet, it was possible to pull a list of all the albums and reorganise them so that I had a csv containing the albums with the artist and year. I could then use this to compare with a list of albums from my iTunes library. A snippet of the retrieved records is shown here (full list is here).

The code for retrieval is here. The output is csv can be used to compare with a list of your own records.


library(rtweet)
library(httpuv)
library(stringr)
all_tweets <- get_timeline("Albums2Hear", n = 1500)
albums <- all_tweets$text
albums <- gsub("#albumsyoumusthear ","",albums)
tempdf <- as.data.frame(str_split_fixed(albumV, " - ", 3))
colnames(tempdf) <- c("Artist","Album","YearURL")
tempdf2 <- as.data.frame(str_split_fixed(tempdf$YearURL, " ", 2))
colnames(tempdf2) <- c("Year","URL")
df <- data.frame(y$Artist,y$Album,z$Year)
colnames(df) <- c("Artist","Album","Year")
write.csv(df,file = "albums2hear.csv")

Thanks to whoever runs the account – they ask for support here.

The post title comes from ‘Til I Die by The Beach Boys from Sunflower/Surf’s Up.

Tips from the blog XI: docx to pdf

A long time ago I posted a little Automator routine to convert Word doc/docx files to PDF. Not long after that, this routine ceased to work due to changes in Microsoft Word (I think). It’s still very useful to convert a whole folder of docx files to PDF in order to avoid Word and just use Preview on the Mac. For committee work or for marking students’ work, I often have a whole folder of docx files and would prefer it if they were in PDF format. I found this very nifty trick on the web and thought I’d post a link here to make up for the fact that my old post no longer works.

The full post is here. What is so nice about this Automator solution is that it uses a bash script to do the conversion. This means you don’t need Microsoft Word for it to work! From what I can see it uses the xml in the docx file (and presumably won’t work on older *.doc files) for the conversion. The post describes how to run it as a Service in macOS. Note, that it destroys the docx files, so it should only be used on a copy. It could be run from the command line rather than right-clink, the engine is this little script.

Thanks to Jacob Salmela for posting it.

This post is part of a series of short tips