On a whim a posted a plot on Twitter. It shows a marathon training schedule. This post explains the story behind the graph.
I downloaded a few different 17-week marathon training schedules. Most were in imperial measurement and/or were written for time at a certain pace, e.g. 30 min Easy Run etc. I wanted to convert one of these schedules into a proper plan where I input my pace and get an idea of the distance I need to run (in metric). This means I can pick routes to run each day without having to think too much about it.
Calculating the running paces was simple using Jack Daniels’ VDot calculator and verifying the predicted paces with my running database. I constructed a spreadsheet from the plan, and then did the calculations to get the distances out. Once this was done I wondered what the rationale behind the schedule is, and the best way to see that is to plot it out.
From this plot the way that the long run on each Sunday is extended or tapered is reasonably clear. However, I was wondering about how intense each of the runs will be. Running 5K at threshold is more intense than a 10K easy run. To look at this I just took the average pace for the session. This doesn’t quite tell us about intensity, because a 10 min easy run + 2 intervals + 10 min easy run is not as intense as doing 4 intervals, yet the average pace would be similar. But it would be close enough. I used a colourscale called VioletOrangeYellow and the result was quite intuitive.
The shorter runs are organised in blocks of intensity while the long runs are about building endurance. From what I understand, the blocks are to do with adapting to the stresses of increasing running load/intensity.
Feedback on the plot was good: runners liked it, non-runners thought it was psychotic.
I really dislike being asked “how big is your lab?”. The question usually arises at scientific meetings when you are chatting to someone during a break. Small talk can lead to some banal questions being asked, and that’s fine, but when this question is asked seriously, the person asking really just wants to compare themselves to you in some way. This is one reason why I dislike being asked “how big is your lab?”.
The other reason I don’t like the question is that it can be difficult to answer. I don’t mean that I have so many people in my group that I can’t possibly count them. No, I mean that it can be difficult to give an accurate answer. There’s perhaps a student in the group who is currently writing up, or possibly they’ve handed in their thesis and they are awaiting a viva – do they count towards the tally? They are in your lab but they’re not in your lab. Perhaps you jointly supervise someone, or maybe there is someone who is away working in another lab somewhere. I’m guilty of overthinking this or at least fretting about giving an incorrect answer. Whatever the circumstance, I think that the size of most research groups is not very stable over time, so I dislike the question because it’s difficult answer accurately.
I looked at group size recently because the lab had surpassed the milestone of having 50 all-time members and I wanted to see how the group size had varied over time.
The first timeline shows the arrival and departure of lab members over time. The role of each person is colour coded as indicated. Note that some people start in one role and get upgraded. PG to PhD, PhD to Post-doc (PDRA). So what it the group size over time?
It turns out that we peaked this year with a team size of 12. The smallest size (besides the period where I started out, when I was on my own!) was at the end of 2012 when I prepared to move the lab to a different university. What has the make up of the lab been during this time.
In this last plot the fraction of the team that are PhD, post-doc etc. is shown over time. This plot is interesting because I can see that it was two years before a PhD student joined the group and also how the lab has become post-doc-heavy in the last 18 months.
So what is the answer to “how big is your lab?”. Well, take your pick. Right now it is 11 with someone just joined this week. Over the last year it has averaged at just over 10. Over the last five years it has been 8 to 9. It’s still not an easy question to answer even if you can see all the data.
Methods: I have been trying to use R for these type of posts, so that sharing the code is more useful, but I drew a blank with this one. I found several tools to plot the first timeline (timevis and vistime). To do the integration and breakdown plots, I struggled… I knew exactly how to make those plots in Igor, so that’s what I did. All that was required was a list of the people, their role, their start-end dates, and a few lines of code. I keep a record of this as previously mentioned.
Slack – it took us ages to set up our lab slack. I wanted to try it out a few years ago but some people in the lab were reluctant (see below for some notes on getting people on board with new tech). I was also sceptical since we are a wet lab and I wondered whether slack would work as well for us as it does for bioinformaticians, for example. We just went for it one day and have not regretted it as it has improved lab communication in so many ways. There is a good guide available on the site to set up something that works for a lab.
Trello – organising projects and assigning to-do items (see this post). Since we switched our communication to Slack, we use Trello a lot less. It is still very good for checking progress on defined tasks, for brainstorming during a meeting, and to make sure stuff doesn’t get forgotten about. Besides our lab boards, I own other boards for work outside of my lab and these also work well, in some cases better than our lab boards.
Databases – infrastructure in the lab is centred on databases for key reagents and these can be cross-referenced in electronic lab notebooks. Keeping them updated is essential. We use FileMaker Pro to maintain them. Setup took some time, but very much worth it.
OMERO – The lab is lucky to have excellent computing support from CAMDU who set up our OMERO database and server. Images from microscopes are piped direct to the database on a per user basis. I’m a big fan.
Dropbox – an account is essential in academia. I have a shared folder with each person in the lab and a folder that all members share. File exchange is best done via our lab server, but Dropbox is still incredibly useful.
Overleaf – writing a paper collaboratively in LaTeX with Overleaf is a joy (see this post). There are Google-based tools for doing this but I am not keen on using them. Dropbox now has a collaborative writing function that my colleagues are using and enjoying.
Zotero – when writing collaboratively in Overleaf, a shared reference management system is essential. Zotero is a healthy alternative to Mendeley and it is now supported in Overleaf v2. I don’t store my PDFs in Zotero and I found no solution for an iTunes-for-PDF-files.
Calendars – while I’m not a fan of Google-based tools, we use the calendar functions for our lab. We inherited this from the Centre where we work, where these calendars are used for booking equipment. We have lab calendars for microscopes, equipment, workstations and for general lab stuff so we know when people are away. I set up a dummy account that belongs to all the lab calendars and this is linked to our lab Slack so that we get a digest of the days bookings every morning, and notifications if new events are added. The University has an outlook-based calendar system, the use of which is patchy amongst academics. However, the admin people use it and so I have blocked out times in here when I’m busy to reduce diary conflicts.
Filter, filter, filter – I set up many filters on my email, twitter… wherever I can… to keep out spam and irrelevant stuff.
Automating the little stuff – a previous post on being organised as a PI mentioned that I advocate writing scripts and macros to automate little things that you do often. Of course there is an xkcd cartoon for this. I have scripts set up to do things like assembling PDFs or converting or compressing file formats. We’ve also been automating the big stuff too. Figures are a good example. We write scripts to produce (and reproduce) figure panels.
Sharing code – A while back I set up an update site to distribute our ImageJ macros among the lab, but also people from outside can subscribe and get the latest updates easily. Our Igor code is shared within the lab via a cloud-based updater which allows code to compiled on-demand. Lab code is maintained via git at GitHub and our centre forks repos from published projects to its own account.
Onboarding. None of these tools work unless everyone is on board. It is worth having a strategy to make this happen. Simple steps such as introducing on system at a time, providing initial training and some support for people who are slow to uptake. There’s a group effect to onboarding but getting to the tipping point can be hard.
The title for this post “Super Automatic” comes from the album of that name by Myracle Brah (the name of this band is not endorsed by quantixed).
This post is about a citation analysis that didn’t quite work out.
I liked this blackboard project by Manuel Théry looking at the influence of each paper authored by David Pellman’s lab on the future directions of the Pellman lab.
I'm working on a new project for @criparis students: "The Tree of a Lab" = how thematics have evolved in a lab history. Trying to understand what makes a new branch, or end one. Here is Pellman lab tree. Red = > 200 citations (NCS papers). But branches started from JCB papers ! pic.twitter.com/CPaq3w3owG
It reminds me that papers can have impact in the field while others might be influential to the group itself. I wondered which of the papers on which I’m an author have been most influential to my other papers and whether this correlates with a measure of their impact on the field.
There’s no code in this post. I retrieved the relevant records from Scopus and used the difference in “with” and “without” self-citation to pull together the numbers.
Influence: I used the number of citations to a paper from any of our papers as the number for self-citation. This was divided by the total number of future papers. This means if I have 50 papers, and the 23rd paper that was published has collected 27 self-citations, this has a score of 1 (the 23rd paper nor any of the preceding 22 papers, can cite the 23rd paper, but the 27 that follow, could). This is our metric for influence.
Impact: As a measure of general impact I took the total number of citations for each paper and divided this by the number of years since publication to get average cites per year for each paper.
Reviews and methods papers are shown in blue, while research papers are in red. I was surprised that some papers have been cited by as much as half of the papers that followed.
Generally, the articles that were most influential to us were also the papers with the biggest impact. Although the correlation is not very strong. There is an obvious outlier paper that gets 30 cites per year (over a 12 year period, I should say) but this paper has not influenced our work as much as other papers have. This is partly because the paper is a citation magnet and partly because we’ve stopped working on this topic in the last few years.
Obviously, the most recent papers were the least informative. There are no future papers to test if they were influential and there are few citations so far to understand their impact.
It’s difficult to say what the correlation between impact and influence on our own work really means, if anything. Does it mean that we have tended to pursue projects because of their impact (I would hope not)? Perhaps these papers are generally useful to the field and to us.
In summary, I don’t think this analysis was successful. I had wanted to construct some citation networks – similar to the Pellman tree idea above – to look at influence in more detail, but I lost confidence in the method. Many of our self-citations are for methodological reasons and so I’m not sure if we’re measuring influence or utility here. Either way, the dataset is not big enough (yet) to do more meaningful number crunching. Having said this, the approach I’ve described here will work for any scholar and could be done at scale.
There are several song titles in the database called ‘For What It’s Worth’. This one is Chapterhouse on Rownderbout.
This is the first post at quantixed about Raspberry Pi computing.
Pi Zero is a minimalist Raspberry Pi that can be coupled to a camera. With this little rig, you can make time-lapse footage amongst other things. I’ve set up a couple of these now. One was to make a time-lapse movie of some plants growing through a plastic maze. The results were pretty good and I thought I’d upload the video and a brief how-to guide.
After a delay, you can see four beans sprouting and then one eventually makes it to the top of the maze. This footage was shot over 27 days. The Pi took pictures every 5 min, but I sampled at 10 min in order to make the movie (after discarding the pictures after the sun went down). Everything was automated.
The camera shoots at 3280 × 2464. I downsampled the images to make the video. The camera didn’t focus well on the maze which was a bit too close. Other units are shooting scenery and the autofocus on the unit is great.
How I did it
Pi Zero with camera module (without IR filter) and a case are available for around £40. I bought mine from the Pi Hut. Power supplies and SD cards are readily available. I put together the PiCam with a fresh Raspbian full image on a 16GB SD card. Another option is to use a smaller card and get the Pi to save the images to a server.
I used PiBaker to format the SD Card, load on Raspbian and add a startup script that would connect the Pi Zero to WiFi and enable VNC. That meant I could plug it in and start using it headless. Well in theory! It turns out that VNC via Mac does not work with the UNIX style password which is the default on the Pi. I needed to connect to a monitor to rectify this by changing to VNC password in the VNC GUI. After this I could log in and use the Pi Zero remotely.
A few more minor steps were needed for full functionality:
I enabled ssh and camera port in Raspberry Pi Configuration, disabled bluetooth and set the correct timezone (this can probably be done in PiBaker but I forgot).
Since I have several Raspberry Pis on the LAN. I needed to give this one its own identity to prevent network conflicts.
I needed to set up SMB sharing on the new Pi.
Instructions for how to do these things are just one google search away.
Now the Pi was ready to start taking images. I built a little stand for it out of Lego and set up the plant maze.
Taking pictures with the Pi
I wrote a shell script to take pictures using raspistill.
I made a directory called camera in home/pi
Then made a camera.sh file home/pi that looked like this:
Using CRON, I execute the shell script on a schedule. I wanted to take pictures every 5 minutes. You can consult cronguru for your needs.
*/5 * * * * /home/pi/camera.sh 2>&1
That’s it! The Pi Zero will happily take pictures until you tell it to stop. Or there’s a…. crash.
Dealing with crashes
If you are going to do long-term time-lapse imaging, you need to defend against a crash that will prevent images from being acquired. In the worst case, the Pi could go offline and you wouldn’t know until you checked up on it. The first one I set up crashed quite often. I couldn’t determine the cause immediately. So I did the next best thing.
I set up a watchdog to monitor for crashes and then reboot the Pi if/when it happens. Many guides online suggest bcm2708_wdog but this doesn’t work for a Pi Zero. Instead this worked for me:
There are guides online that describe how to set up the Pi so that it sends you an email or SMS when there’s a crash/reboot. I figured I didn’t need this – as long as it reboots OK.
Well, you wait for it to take photos! You can log in via VNC and check that the images are being acquired, or go in via ssh and watch the camera directory fill up. The size of the images is 3280 × 2464 and they are around 4.5 MB each, so the disk can quickly fill.
After a while you’ll want to assemble a movie. I wrote a shell script on my Mac in order to to pull down the images, take a copy of the ones I want and then make a movie file and upload it to Dropbox so I could look at it on the go.
# move to the location of the images
# pull down all images to a local folder - only new images are copied
rsync -trv /Volumes/HOMEPI/camera/ /local/disk/folder/
# overnight images are dark and less than 1.5 MB
# copy the ones we want to keep
rsync -trv --min-size=1000K /local/disk/folder/ /local/disk/folder2/
# or you could filter on size like this - delete <2MB
find . -name "*.jpg" -size -2000k -delete
# scale the images down to 480 px wide and make movie
ffmpeg -framerate 30 -pattern_type glob -i '*.jpg' -c:v libx264 -pix_fmt yuv420p -vf scale=480:-2 out.mp4
# move to dropbox
mv out.mp4 /My/Dropbox/Folder/out.mp4
This script means that I had to manually delete the pictures from the Pi once they’d been copied but that was OK. My plan is to write a script to do this for the longer running projects so that it is automated.
While it is possible to make the movies on the Pi itself, I did it on the Mac as that computer is beefier and is not busy taking pictures every 5 min! ffmpeg is a great tool for this and the documentation is impressive. For example if you have set up the camera in the wrong orientation you can do transposition in ffmpeg. If you don’t have ffmpeg, it is a simple install on the command line.
I’ve been following the tweets from an account called Albums You Must Hear @Albums2Hear. Each tweet is an album recommended by the account owner. I’m a sucker for lists of Albums That I Must Hear Before I Die since I’m always interested in new (or not so new) music recommendations.
I wanted to assemble a list of the albums that I don’t have from this account and I was able to do so using R.
Using rtweet, it was possible to pull a list of all the albums and reorganise them so that I had a csv containing the albums with the artist and year. I could then use this to compare with a list of albums from my iTunes library. A snippet of the retrieved records is shown here (full list is here).
Live at Leeds
Manfred Mann’s Earth Band
Your New Favourite Band
This is the Sea
Kind of Blue
The code for retrieval is here. The output is csv can be used to compare with a list of your own records.
A long time ago I posted a little Automator routine to convert Word doc/docx files to PDF. Not long after that, this routine ceased to work due to changes in Microsoft Word (I think). It’s still very useful to convert a whole folder of docx files to PDF in order to avoid Word and just use Preview on the Mac. For committee work or for marking students’ work, I often have a whole folder of docx files and would prefer it if they were in PDF format. I found this very nifty trick on the web and thought I’d post a link here to make up for the fact that my old post no longer works.
The full post is here. What is so nice about this Automator solution is that it uses a bash script to do the conversion. This means you don’t need Microsoft Word for it to work! From what I can see it uses the xml in the docx file (and presumably won’t work on older *.doc files) for the conversion. The post describes how to run it as a Service in macOS. Note, that it destroys the docx files, so it should only be used on a copy. It could be run from the command line rather than right-clink, the engine is this little script.
This quick post comes courtesy of LianTze Lim (an Overleaf TeXpert) and Kota Miura (a bioimage analyst).
I asked on the ImageJ forum some time ago how to add an ImageJ Macro lexer for a LaTeX document I was writing. Kota responded with this lexer for pygments. I then asked Overleaf if it was possible to add a custom lexer to an Overleaf document using the minted package. At the time this was not possible. However, I got a message from them today with a solution.
// your code
Here, imagejmacro.py is the name of the custom lexer saved in your project and ImageJMacroLexer is the name of the class in that file. If you want to use another custom lexer just replace as required. I have put up a read-only Overleaf example to show it working.
Thanks to LianTze for following up with me about this and special thanks to Kota who wrote the custom lexer.
The Green Leek 10.5 km run is a mixed terrain race now in its third year. Today’s was a wet and muddy edition. The chip times were posted this afternoon and using my previous code, I took a look at the results.
I was a bit disappointed with my time, which was about 24 s slower than last year. Considering that I’m running faster this year than last, I wondered whether the conditions affected my time. To look at this I quickly retrieved times for people who’d run it all three editions and looked to see if this edition was generally slower than previous editions.
Excuse the formatting of the plot. It looked pretty flat but then we’re probably only considering very small differences over 10.5 km. So I looked at the difference in time from the 2016 edition. Again the formatting is bad (23:55 is 5 minutes faster than 2016, 00:05 is 5 minutes slower).
Three people recorded much slower times this year, but the majority are within the difference from 2016 to 2017. Obviously this is just a few people that could be easily picked out using a script, more runners might reveal more of a pattern. Anyway, here’s hoping for better weather next year!
Well done to Andy Crabtree and Rachel Miller who were fastest male and female, respectively. Thanks to the organisers and volunteers.
The post title is taken from “Pledging My Time” a track from Blonde on Blonde by Bob Dylan
I’d seen the small multiple artwork of running and cycling routes from Marcus Volz’s R package Strava all over the web. Ads for “posters of your GPS tracks” pop up on Reddit and I’d notice a few #Rstats people put up their posters on Twitter. I’ve had the package bookmarked for a while and this week I finally got round to generating a small multiple poster of some of my cycling routes.
I was pleased with the result and wanted to post it here. But also, running the code was not straightforward as I’ll explain below. If you want to generate your own plot read on.
The idea behind the poster is really nice. You get a kind of generative art-style poster. It looks nice and you can identify individual routes which is fun to do.
The instructions on the GitHub page are absolutely correct and the code should run out-of-the-box. The idea is that you download your Strava data and then make your plots. Unfortunately, it seems that a change in Strava’s data export policy (possibly related to GDPR changes) has broken the package. I found that there are two problems. First, Strava’s “download your data” link gives you a mix of formats (in my case GPX and FIT files), the package only works with GPX. Second, if there is any elevation data missing from a track, the data frame that is needed to make the poster is not built properly.
Going GPX only: In my case, I don’t keep all my data in Strava and instead use a local repository managed with RubiTrack. This software allows me to filter for the tracks I want and export them in GPX format. The only problem is that it generates one huge file with all the tracks enclosed. This gets read by the package as a single track. To fix this, I split the file using awk.
I could then discard track1.gpx which just had the xml header and then use the directory of gpx files.
The elevation problem: this affected only some of the tracks, so in the end the R code needed to be modified. The elevation data is not needed to make the posters so the file process_data.R needs editing, line 28 can be commented out and then line 32 should read:
result <- data.frame(lat = lat, lon = lon, time = time, type = type) %<%
This issue is raised on GitHub and has been closed, but the code doesn’t work with elevation blanks. If you run into this problem, this is the way I found to fix it. The other plots in the package which do use elevation will not run, but at least the poster can be made.
I exported the poster as PDF and then made some changes in Illustrator to give the result above.
The post title comes from Multiplex from Oliver’s Standing Stone LP from 1974.