My Blank Pages VI: Programming in Igor Pro

It has been a long time since I wrote a book review.

A few months ago I read on IgorExchange that Martin Schmid had written a book about programming Igor. I snapped up a copy. I’m a competent Igor programmer but I was hoping that this book would be useful for lab members that want to learn.

Learning Igor – like most IDEs or programming languages – is tough going. There’s a booklet from WaveMetrics (the company that sells Igor Pro) called Getting Started – which is really good. There are a few other guides on the web (Payam’s guide, Thomas Braun’s coding conventions, quantixed’s own translator), but other resources are pretty scarce. The Igor Manual itself is excellent but it’s many, many pages long and is only meant to be consulted. So I was intrigued whether Martin Schmid’s book would fill the gap between Getting Started and more advanced guides.

What makes Igor Pro so fantastic is the way that you can use it for so many different things: image processing, statistics, graphing, curve fitting, instrument control and so on. Part of the challenge of writing a book on Igor Programming is deciding what to cover. Schmid deals with this by covering basic programming and core-intermediate topics such as dialogs, loops, string magic etc. The book stops short of any specialised applications. So it’s a really useful intermediate programming guide. It’s a great little book and is recommended for those who want to dig further after doing the Getting Started exercises.

I knew I would learn something from the book because there’s always alternative ways to do stuff in Igor: things that you didn’t know about or little tricks to do stuff faster. What I was surprised about was the first thing mentioned in the book was new to me. The author favours module-static programming. All of my Igor programming has been done in the global pragma and I have avoided this more C-like way of encapsulating programs that I’ve written so far. Module-static works well because it eliminates naming conflicts. I have dealt with name conflicts by using static functions which are called from the top of the stack, and the top has a unique name (arguably this is the same as module-static, but not identical). As the Igor Manual says “this gets tedious after a while” and that’s true. Although in my defence, name conflicts are generally not a problem for the way I work because I favour a reproducible approach. A new experiment is started – one user-written ipf is opened – and the code is run. This means naming conflicts are minimised. The book has actually convinced me that module-static is a good thing, especially since my Igor code is now deployed around the lab and naming conflicts could easily become a problem. It’s an advanced programming technique but is dealt with early by the author and it kind of works. After this, more basic programming topics are covered in depth.

There’s always room for improvement: there are several example programs at the back which need to be rekeyed to run, since this is a paper book and no electronic version is available AFAIK. The author has put one up here to save rekeying and another here, but otherwise you need to type in the examples to see what will happen. This is too long-winded. I’ve been spoiled reading texts about R where the examples can all be run from a markdown file inside RStudio. It would’ve been nice if the code was made available for this book. I don’t think it would compromise the value of the book since it is the text that is most valuable.

The book is available at Amazon for £7.99 at the time of writing.

My Blank Pages is a track by Velvet Crush. This is an occasional series of book reviews.

Pledging My Time III

I’ve previously crunched times for local Half and Full Marathons here on quantixed. Last weekend was the Kenilworth Half Marathon (2018) over a new course. I thought I’d have a look at the distributions of times and paces of the runners. The times are available here. If the Time and Category for finishers are saved as a csv, the script below works to generate the following plots.

Aggregated stats for the race are here. The beeswarm plot nicely shows the distribution of runners times and paces per category. There’s a bimodality to some of the age groups which is interesting. You can see from the average times that people get slower as they get older, as expected.

There was a roughly 2:1 split of M:F runners with a similar proportion in all categories. The ratio is similar for DNSers. The winning times were Andrew Savery of Leamington C A & C in MV35 with 01:12:51 and Polly Keen of Nuneaton Harriers in F sen with 01:23:46.

Congrats to everyone who ran and thanks to the organisers and all the supporters out on the course.


require(ggplot2)
require(ggbeeswarm)
file_name <- file.choose()
df1 <- read.csv(file_name, header = TRUE, stringsAsFactors = FALSE)
# aggregate M and F to a new category called Gender
df1$Gender <- ifelse(startsWith(df1$Category,"F"),"F","M")
# format Date column to POSIXct
df1$Time <- as.POSIXct(strptime(df1$Time, format = "%H:%M:%S"))
orig_var <- as.POSIXct("00:00:00", format = "%H:%M:%S")
p1 <- ggplot( data = df1, aes(x = Category,y = Time, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%H:%M:%S", limits = c(orig_var,NA))
p1
# instead of finishing time, let's look at pace (min/km)
df1$Pace <- as.numeric(difftime(df1$Time, orig_var) / 21.1) * 3600
df1$Pace <- as.POSIXct(df1$Pace, origin = orig_var, format = "%H:%M:%S")
p2 <- ggplot( data = df1, aes(x = Category,y = Pace, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%M:%S", limits = c(orig_var,NA))
p2
# calculate speeds rather than pace
df1$Speed <- 21.1 / as.numeric(difftime(df1$Time, orig_var))
p3 <- ggplot( data = df1, aes(x = Category, y = Speed, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  ylim(0,NA) + ylab("Speed (km/h)")
p3
# now make the same plots but by Gender rather than Category
p4 <- ggplot( data = df1, aes(x = Gender,y = Time, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%H:%M:%S", limits = c(orig_var,NA))
p4
p5 <- ggplot( data = df1, aes(x = Gender,y = Pace, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%M:%S", limits = c(orig_var,NA))
p5
p6 <- ggplot( data = df1, aes(x = Gender, y = Speed, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  ylim(0,NA) + ylab("Speed (km/h)")
p6
ggsave("raceTimeByCat.png", plot = p1)
ggsave("racePaceByCat.png", plot = p2)
ggsave("raceSpeedByCat.png", plot = p3)
ggsave("raceTimeByGen.png", plot = p4)
ggsave("racePaceByGen.png", plot = p5)
ggsave("raceSpeedByGen.png", plot = p6)


Edit 2018-09-12T18:52:43Z I wasn’t happy with the plots and added a few more lines to look at gender as well as category. And show speed as well as pace and finishing time.

The post title is taken from “Pledging My Time” a track from Blonde on Blonde by Bob Dylan

Rip It Up: Grabbing movies from Twitter for use in ImageJ

Some great scientific data gets posted on Twitter. Sometimes I want to take a closer look and this post describes a strategy to do so.

Edit: I received a request to take down the 3D volume images derived from the example dataset I used in this post. I’ve edited the post below so that is now a general guide.

Grab the video

It can be a bit difficult to the grab video from Twitter. The best way I’ve found is using youtube-dl. This works for downloading video and audio from YouTube to view offline, but it also works for other embedded video content on other websites.

To download the video use:

youtube-dl -o '%(title)s.%(ext)s' https://twitter.com/username/status/tweetID

this downloads an mp4 file which is automatically named.

Convert to avi

Now, mp4 is a compressed file format which cannot be read directly by FIJI/ImageJ. Conversion to avi means that the file can be loaded. I like to use another command line tool, ffmpeg for video conversions.

ffmpeg -i originalFile.mp4 -pix_fmt nv12 -f avi -vcodec rawvideo convertedFile.avi

Now we have an avi file called convertedFile.avi that we can use.

Load into FIJI

The avi can be loaded into FIJI. At this point you can analyse the video. However, in the case of the video I was interested in, the data had been pseudocolored and was now in RGB format. I wanted to look at the original data. Converting to grayscale does not give the correct representation but conversion back to grayscale is possible if you know the LUT was applied. Even if you don’t, it’s possible to take a guess at the LUT and do the conversion.

Converting RGB to original values

I found a nice gist that does the conversion for a single image. I just modified this code to work for a stack. It requires the LUT to be displayed vertically in a window called LUT. Caution: this code runs very slowly because every pixel in every slice needs to be recalculated and ImageJ is slow… I took a guess that mpl-inferno was used (I don’t think is exactly right but it worked well enough). You can display the built-in LUTs in FIJI using Color > Display LUTs… and from there you can make the LUT window which the macro uses for the calculation. The macro to convert stacks to grayscale using the LUT is here.

I had a nice grayscale version of the data (inverted because I wanted to look at the volume). This let me see how the layers in the original video add together to make the full structure. I used ClearVolume which can be installed via Update Site in FIJI. I just made a quick video to show it in action (see below). You’ll have to take my word for it (video removed).

So extracting scientific data from Twitter or another online source is pretty straightforward. The extra complication was getting rid of the pseudocoloring, but once this was done, something very close to the original data was available.  Nonetheless this workflow is a fun way to take a closer look at some of the cool movies that people post on Twitter. I hope you find it useful.

The post title comes from “Rip It Up” by Orange Juice. A popular title in my library with versions from several different artists. I was thinking what is described is similar to ripping video content.

Ferrous: new paper on FerriTagging proteins in cells

We have a new paper out. It’s not exactly news, because the paper has been up on bioRxiv since December 2016 and hasn’t changed too much. All of the work was done by Nick Clarke when he was a PhD student in the lab. This post is to explain our new paper to a general audience.

The paper in a nutshell

We have invented a new way to tag proteins in living cells so that you can see them by light microscopy and by electron microscopy.

Why would you want to do that?

Proteins do almost all of the jobs in cells that scientists want to study. We can learn a lot about how proteins work by simply watching them down the microscope. We want to know their precise location. Light microscopy means that the cells are alive and we can watch the proteins move around. It’s a great method but it has low resolution, so seeing a protein’s precise location is not possible. We can overcome this limitation by using electron microscopy. This gives us higher resolution, but the proteins are stuck in one location. When we correlate images from one microscope to the other, we can watch proteins move and then look at them with high resolution. All we need is a way to see the proteins so that they can be seen in both types of microscope. We do this with tagging.

Tagging proteins so that we can see them by light microscopy is easy. A widely used method is to use a fluorescent protein such as GFP. We can’t see GFP in the electron microscope (EM) so we need another method. Again, there are several tags available but they all have drawbacks. They are not precise enough, or they don’t work on single proteins. So we came up with a new one and fused it with a fluorescent protein.

What is your EM tag?

We call it FerriTag. It is based on Ferritin which is a large protein shell that cells use to store iron. Because iron scatters electrons, this protein shell can be seen by EM as a particle. There was a problem though. If Ferritin is fused to a protein, we end up with a mush. So, we changed Ferritin so that it could be attached to the protein of interest by using a drug. This meant that we could put the FerriTag onto the protein we want to image in a few seconds. In the picture on the right you can see how this works to FerriTag clathrin, a component of vesicles in cells.

We can watch the tagging process happening in cells before looking by EM. The movie on the right shows green spots (clathrin-coated pits in a living cell) turning orange/yellow when we do FerriTagging. The cool thing about FerriTag is that it is genetically encoded. That means that we get the cell to make the tag itself and we don’t have to put it in from outside which would damage the cell.

What can you use FerriTag for?

Well, it can be used to tag many proteins in cells. We wanted to precisely localise a protein called HIP1R which links clathrin-coated pits to the cytoskeleton. We FerriTagged HIP1R and carried out what we call “contextual nanoscale mapping”. This is just a fancy way of saying that we could find the FerriTagged HIP1R and map where it is relative to the clathrin-coated pit. This allowed us to see that HIP1R is found at the pit and surrounding membrane. We could even see small changes in the shape of HIP1R in the different locations.

We’re using FerriTag for lots of projects. Our motivation to make FerriTag was so that we could look at proteins that are important for cell division and this is what we are doing now.

Is the work freely available?

Yes! The paper is available here under CC-BY licence. All of the code we wrote to analyse the data and run computer simulations is available here. All of the plasmids needed to do FerriTagging are available from Addgene (a non-profit company, there is a small fee) so that anyone can use them in the lab to FerriTag their favourite protein.

How long did it take to do this project?

Nick worked for four years on this project. Our first attempt at using ribosomes to tag proteins failed, but Nick then managed to get Ferritin working as a tag. This paper has broken our lab record for longest publication delay from first submission to final publication. The diagram below tells the whole saga.

 

The publication process was frustratingly slow. It took a few months to write the paper and then we submitted to the first journal after Christmas 2016. We got a rapid desk rejection and sent the paper to another journal and it went out for review. We had two positive referees and one negative one, but we felt we could address the comments and checked with the journal who said that they would consider a revised paper as an appeal. We did some work and resubmitted the paper. Almost six months after first submission the paper was rejected, but with the offer of a rapid (ha!) publication at Nature Communications using the peer review file from the other journal.

Hindsight is a wonderful thing but I now regret agreeing to transfer the paper to Nature Communications. It was far from rapid. They drafted in a new reviewer who came with a list of new questions, as well as being slow to respond. Sure, a huge chunk of the delay was caused by us doing revision experiments (the revisions took longer than they should because Nick defended his PhD, was working on other projects and also became a parent). However, the journal was really slow. The Editor assigned to our paper left the journal which didn’t help and the reviewer they drafted in was slow to respond each time (6 and 7 weeks, respectively). Particularly at the end, after the paper was ‘accepted in principle’ it took them three weeks to actually accept the paper (seemingly a week to figure out what a bib file is and another to ask us something about chi-squared tests). Then a further three weeks to send us the proofs, and then another three weeks until publication. You can see from the graphic that we sent back the paper in the third week of February and only incurred a 9-day delay ourselves, yet the paper was not published until July.

Did the paper improve as a result of this process? Yes and no. We actually added some things in the first revision cycle (for Journal #2) that got removed in subsequent peer review cycles! And the message in the final paper is exactly the same as the version on bioRxiv, posted 18 months previously. So in that sense, no it didn’t. It wasn’t all a total waste of time though, the extra reviewer convinced us to add some new analysis which made the paper more convincing in the end. Was this worth an 18-month delay? You can download our paper and the preprint and judge for yourself.

Were we unlucky with this slow experience? Maybe, but I know other authors who’ve had similar (and worse) experiences at this journal. As described in a previous post, the publication lag times are getting longer at Nature Communications. This suggests that our lengthy wait is not unique.

There’s lots to like about this journal:

  • It is open access.
  • It has the Nature branding (which, like it or not, impresses many people).
  • Peer review file is available
  • The papers look great (in print and online).

But there are downsides too.

  • The APC for each paper is £3300 ($5200). Obviously open access must cost something, but there a cheaper OA journals available (albeit without the Nature branding).
  • Ironically, paying a premium for this reputation is complicated since the journal covers a wide range of science and its kudos varies depending on subfield.
  • It’s also slow, and especially so when you consider that papers have often transferred here from somewhere else.
  • It’s essentially a mega journal, so your paper doesn’t get the same exposure as it would in a community-focused journal.
  • There’s the whole ReadCube/SpringerNature thing…

Overall it was a negative publication experience with this paper. Transferring a paper along with the peer review file to another journal has worked out well for us recently and has been rapid, but not this time. Please leave a comment particularly if you’ve had a positive experience and redress the balance.

The post title comes from “Ferrous” by Circle from their album Meronia.

Pentagrammarspin: why twelve pentagons?

This post has been in my drafts folder for a while. With the World Cup here, it’s time to post it!

It’s a rule that a 3D assembly of hexagons must have at least twelve pentagons in order to be a closed polyhedral shape. This post takes a look at why this is true.

First, some examples from nature. The stinkhorn fungus Clathrus ruber, has a largely hexagonal layout, with pentagons inserted. The core of HIV has to contain twelve pentagons (shown in red, in this image from the Briggs group) amongst many hexagonal units. My personal favourite, the clathrin cage, can assemble into many buckminsterfullerene-like shapes, but all must contain at least twelve pentagons with a variable number of hexagons.

The case of clathrin is particularly interesting because clathrin triskelia can assemble into a flat hexagonal lattice on membranes. If clathrin is going to coat a vesicle, that means 12 pentagons need to be introduced. So there needs to be quite a bit of rearrangement in order to do this.

You can see the same rule in everyday objects. The best example is a football, or soccer ball, if you are reading in the USA.

The classic design of football has precisely twelve pentagons and twenty hexagonal panels. The roadsign for football stadia here in the UK shows a weirdly distorted hexagonal array that has no pentagons. 22,543 people signed a petition to pressurise the authorities to change it, but the Government responded that it was too costly to correct this geometrical error.

So why do all of these assemblies have 12 pentagons?

In the classic text “On Growth and Form” by D’Arcy Wentworth Thompson, polyhedral forms in nature are explored in some detail. In the wonderfully titled On Concretions, Specules etc. section, the author notes polyhedral forms in natural objects.

One example is Dorataspis, shown left. The layout is identical to the D6 hexagonal barrel assembly of a clathrin cage shown above. There is a belt of six hexagons, one at the top, one at the bottom (eight total) and twelve pentagons between the hexagons. In the book, there is an explanation of the maths behind why there must be twelve pentagons in such assemblies, but it’s obfuscated in bizarre footnotes in latin. I’ll attempt to explain it below.

To shed some light on this we need the help of Euler’s formulae. The surface of a polyhedron in 3D is composed of faces, edges and vertices. If we think back to the football the faces are the pentagons and hexgonal panels, the edges are the stitching where two panels meet and the vertices are where three edges come together. We can denote faces, edges and vertices as f, e and v, respectively. These are 2D, 1D and zero-dimensional objects, respectively. Euler’s formula which is true for all polyhedra is:

\(f – e + v = 2\)

If you think about a cube, it has six faces. It has 12 edges and 8 vertices. So, 6 – 12 + 8 = 2. We can also check out a the football above. This has 32 faces (twelve pentagons, twenty hexagons), 90 edges and sixty vertices. 32 – 90 + 60 = 2. Feel free to check it with other polyhedra!

Euler found a second formula which is true for polyhedra where three edges come together at a vertex.

\(\sum (6-n)f_{n} = 12\)

in this formula, \(f_{n}\) means number of n-gons.

So let’s say we have dodecahedron, which is a polyhedron made of 12 pentagons. So \(n\) = 5 and \(f_{n}\) = 12, and you can see that \((6-5)12 = 12\).

Let’s take a more complicated object, like the football. Now we have:

\(((6-6)20) + ((6-5)12) = 12\)

You can now see why the twelve pentagons are needed. Because 6-6 = 0, we can add as many hexagons as we like, this will add nothing to the left hand side. As long as the twelve pentagons are there, we will have a polyhedron. Without them we don’t. This is the answer to why there must be twelve pentagons in a closed polyhedral assembly.

So how did Euler get to the second equation? You might have spotted this yourself for the f, e, v values for the football. Did you notice that the ratio of edges to vertices is 3:2? This is because each edge has two vertices at either end (it is a 1D object) and remember we are dealing with polyhedra with three edges at each vertex. so \(v = \frac{2}{3}e\). Also, each edge is at the boundary of two polygons. So \( e = \frac{1}{2}\sum n f_{n}\). You can check that with the values for the cube or football above. We know that \(f = \sum f_{n}\), this just means that the number of faces is the sum of all the faces of all n-gons. This means that:

\(f – e + v = 2\)

Can be turned into

\(f – (1/3)e = \sum n f_{n} – \frac{1}{6}\sum n f_{n} = 2\)

Let’s multiply by 6 to get, oh yes

\(\sum (6-n)f_{n} = 12\)

There are some topics for further exploration here:

  • You can add 0, 2 or 10000 hexagons to 12 pentagons to make a polyhedron, but can you add just one?
  • What happens when you add a few heptagons into the array?

Image credits (free-to-use/wiki or):

Clathrus ruber – tineye search didn’t find source.

HIV cores – Briggs Group

Exploded football – Quora

The post title comes from “Pentagrammarspin” by Steve Hillage from the 2006 remaster of his LP Fish Rising

Rollercoaster III: yet more on Google Scholar

In a previous post I made a little R script to crunch Google Scholar data for a given scientist. The graphics were done in base R and looked a bit ropey. I thought I’d give the code a spring clean – it’s available here. The script is called ggScholar.R (rather than gScholar.R). Feel free to run it and raise an issue or leave a comment if you have some ideas.

I’m still learning how to get things looking how I want them using ggplot2, but this is an improvement on the base R version.

As described earlier I have many Rollercoaster songs in my library. This time it’s the song and album by slowcore/dream pop outfit Red House Painters.

Ten Years vs The Spread: Calculating publication lag times in R

There have been several posts on this site about publication lag times. You can read them here. Lag times are the delays in the dissemination of scientific data introduced by the process of publishing the paper in a journal. Nowadays, your paper can be online in a few hours using a preprint server. However, this work is not peer reviewed. Journals organise a formal peer review and provide some sort of certification of the work. They typeset the work and all of this adds delays the dissemination of work in a journal.

To look at publication delays, you can use PubMed data, which is incomplete but can give insight into how long these delays can be. Previous posts have involved the use of a ruby script to make a csv file from PubMed XML output and then use this in Igor to calculate the publication lag times. There is another method detailed in this excellent post by Daniel Himmelstein.

I recently posted a figure for Nature Communications lag times on Twitter and was asked to generate others. I figured that I should write an R script and people can make their own!

The PubMedLagR code is available here with instructions for use.

A query for Nature Communications data at PubMed, such as:

nat commun[ta] AND 2000 : 2018[pdat] AND journal article[pt]

Retrieves all paper for this journal. The range from 2010 to 2018 is for illustration, this journal has only been in operation for these years. Filtering for journal articles rather and attempting to get rid of reviews and front matter is wise, but doesn’t always work. Again this journal doesn’t carry this material so this is for illustration. Getting your query right is very important.

Save the results in XML format and then run the R script as directed. This should give a csv of the data and a png of the lag times.

This is data from Nature Communications. Colleagues had two separate papers accepted at this journal and experienced long delays. I was interested to see if papers were generally taking longer to publish here. Of course we do not know why. Delays are partly the fault of the authors, the reviewers and the journal and it is not possible to say why publication lag times are increasing for this journal year-on-year. The journal has grown in terms of number of papers published, has this introduced inefficiencies? Are reviewers being slow to review? Are they being more demanding? Are Editors not marshalling the referee reports and providing clear guidance to authors? Allowing too much time and too many rounds of revision? Are authors being too slow to do further experimental work? The answer will be yes to some of these questions for some of the papers.

This is not to focus on Nature Communications, it’s one of a few journals that many colleagues complain is too slow to publish their work. With this code you can have a look at the journal you are interested in submitting to and consider whether there is a more rapid venue for your work.

Update:

I changed the code slightly and prettified the plots just a little. Below are some plots for Nature Cell Biology, Nature Neuroscience. I also did a search for clathrin or CRISPR papers over the same time period. These keyword searches are fairly flat, whereas the journal-specific increase in publication lag time can be seen.

The lag times at Nature Neuroscience look artificially low and then seem to have jumped up in 2016 to be something similar to Nature Cell Biology or Nature Communications.

Edit

I neglected to point out that the code truncates the y-axis in the bottom right plot to 1000 days or the maximum lag time, whichever is smaller. This is because it gets difficult to see the data points if there is an outlier, which might be due to an error in PubMed data.

A reader commented on Twitter that some poor paper had a lag time almost 1000 days. Well, due to the y-axis truncation we don’t see that 9 papers in Nature Communications since 2010 have lag times (RecAcc) of > 1000 days. The record holder has a lag time of 1561 days! I checked that this was not a PubMed error by looking at the dates on the paper.

Notes

Date information is not available in PubMed for every paper unfortunately. This is especially true of older papers.

The date information is supplied to PubMed from the journal. These dates are not necessarily accurate: 1) you can see occasional errors in the data, 2) journals sometimes “reset the clock” on papers and treat resubmissions as new submissions.

The post title is taken from “10 Years vs The Spread” by Wing-Tipped Sloat from the LP Chewyfoot. Obviously the song has nothing to do with smoothed kernel density estimates of journal publication lag times, but the title was incredibly apt.

Cloud Eleven: A cloud-based code sharing solution for IgorPro

This post is something of a “how to” guide. The problem is how can you share code with a small team and keep it up-to-date?

For ImageJ, the solution is simple. You can make an ImageJ update site and then push any updated code to the user when they startup ImageJ. For IgorPro, there is no equivalent. Typically I send ipf files to someone and they run the code, but I have to resend them whenever there’s an update. This can cause confusion over which is the latest version.

I’ve tried a bunch of things such as versioning the code (this at least tells you if the person is running an out-of-date version). I also put the code up on GitHub and tell people to pull down the latest version, but again this doesn’t work well. The topic of how to share code comes up perennially on the Igor mailing list and on IgorExchange, so it’s clearly something that people struggle with. I’ve found the solutions offered to be a bit daunting.

My solution is detailed here with some code to make it run.

The details

It is possible to use aliases (shortcuts on Windows) in the User’s WaveMetrics folders to point to external files. Items in the Igor Procedures directory get loaded when you start Igor. Items in User Procedures can get loaded optionally. These aliases sit in the Users Wavemetrics folder (the exact location depends on the Igor Version) and this means that the program itself can get updated without overwriting these files.

So, if aliases are created here that point to a cloud-based repo of Igor Code it can be used to:

  1. Optionally load code as the user needs it. Because the code sits in the cloud it can be updated and instantly used by everyone.
  2. Force Igor to load a bit of code to make a little menu item so that the user can pick the code they want to load.

To get this working, I made use of Ryotako’s excellent menu loader which was written for loading the “hidden” WaveMetrics procedures. I created a version like that one which makes a menu of our shared code in alphabetical order. I then made a version that allows the code to be grouped by purpose. People in my group told me they prefer this version. Code just needs to be organised in folders for it to work.

The IgorDistro.ipf just needs to be placed in a folder called IP somewhere in a share that users have access to (unless people are contributing to code development, you can make the share read-only). In the same directory that IP sits in, place a folder called UP with all of your code organised into folders. Aliases need to be created that point Igor Procedures folder to IP and User Procedures to UP. That’s it! I’ve tested it on IP7 and IP8, on Windows and Mac, using a shared Dropbox as the cloud repo.

Obviously, the correct IgorPro licences need to be in place to share the code with multiple users.

The post title comes from the pseudonymous LP by Cloud Eleven. A great debut album. The track “Wish I” is worth the price of the record alone (assuming people still buy records).

Rollercoaster II: more on Google Scholar citations

I’ve previously written about Google Scholar. Its usefulness and its instability. I just read a post by Jon Tennant on how to harvest Google Scholar data in R and I thought I would use his code as the basis to generate some nice plots based on Google Scholar data.

A script for R is below and can be found here. Graphics are base R but do the job.

First of all I took it for a spin on my own data. The outputs are shown here:

These were the most interesting plots that sprang to mind. First is a ranked citation plot which also shows y=x to find the Hirsch number. Second, was to look at total citations per year to all papers over time. Google Scholar shows the last few years of this plot in the profile page. Third, older papers accrue more citations, but how does this look for all papers? Finally, a prediction of what my H-index will do over time (no prizes for guessing that it will go up!). As Jon noted, the calculation comes from this paper.

While that’s interesting, we need to get  the data of a scholar with a huge number of papers and citations. Here is George Church.

At the time of writing he has 763 papers with over 90,000 citations in total and a H-index of 147. Interestingly ~10% of his total citations come from a monster paper in PNAS with Wally Gilbert in the mid 80s on genome sequencing.

Feel free to grab/fork this code and have a play yourself. If you have other ideas for plots or calculations, add a comment here or an issue at GitHub.

if(!require(scholar)){
     install.packages("scholar")
}
library(scholar)
# Add Google Scholar ID of interest here
ID <- ""
# If you didn't add one to the script prompt user to add one
if(ID == ""){
     ID <- readline(prompt="Enter Scholar ID: ")
}
# Get the citation history
citeByYear<-get_citation_history(ID)
# Get profile information
profile <- get_profile(ID)
# Get publications and save as a csv
pubs <- get_publications(ID)
write.csv(pubs, file = "citations.csv")
# Predict h-index
hIndex <- predict_h_index(ID)
# Now make some plots
# Plot of total citations by year
png(file = "citationsByYear.png")
plot(citeByYear$year,citeByYear$cites,
     type="h", xlab="Year", ylab = "Total Cites")
dev.off()
# Plot of ranked paper by citation with h
png(file = "citationsAndH.png")
plot(pubs$cites, type="l",
     xlab="Paper rank", ylab = "Citations per paper")
abline(0,1)
text(nrow(pubs),max(pubs$cites, na.rm = TRUE),
     profile$h_index)
dev.off()
# Plot of cites to paper by year
png(file = "citesByYear.png")
plot(pubs$year, pubs$cites,
     xlab="Year", ylab = "Citations per paper")
dev.off()
# Plot of h-index prediction
thisYear <- as.integer(format(Sys.Date(), "%Y"))
png(file = "hPred.png")
     plot(hIndex$years_ahead+thisYear,hIndex$h_index,
     ylim = c(0, max(hIndex$h_index, na.rm = TRUE)),
     type = "h",
     xlab="Year", ylab = "H-index prediction") 
dev.off()

Note that my previous code used a python script to grab Google Scholar data. While that script worked well, the scholar package for R seems a lot more reliable.

I have a surprising number of tracks in my library with Rollercoaster in the title. This time I will go with the Jesus & Mary Chain track from Honey’s Dead.

Turn That Heartbeat Over Again: comparing wrist and chest-strap HRM

As a geek, the added bonus of exercise is the fun that you can have with the data you’ve generated. A recent conversation on Twitter about the accuracy of wrist-based HRMs got me thinking… how does a wrist-based HRM compare with a traditional chest-strap HRM? Conventional wisdom says that the chest-strap is more accurate, but my own experience of chest-strap HRMs is that they are a bit unreliable. Time to put it to the test.

I have a Garmin Fēnix 5 which records wrist-based HR and I have a Garmin chest-strap which uses ANT+ to transmit. I could pick up the ANT+ signal with a Garmin Edge 800 so that I could record both datasets simultaneously, on the same bike ride. Both the Fēnix and Edge can record GPS and Time (obviously) allowing accurate registration of the data. I also set both devices to receive cadence data via ANT+ from the same cadence/speed sensor so that I could control for (or at least look at) variability in recordings. I rode for a ~1 hr ~32 km to capture enough data. Obviously this is just one trial but the data gives a feel for the accuracy of the two HRMs. Biking, I figured was a fair activity since upper body and wrist movement is minimal, meaning that the contacts for both HRMs are more likely to stay in place than if I was running.

I’ll get to heart rate last. First, you can see that the GPS recording is virtually identical between the two units – so that’s a good sign. Second, elevation is pretty similar too. There’s a bit of divergence at the beginning and end of the ride, since those parts are over the same stretch of road, neither device looks totally accurate. I’d be inclined to trust the Fēnix here since has a newer altimeter in it. Third, cadence looks accurate. Remember this data is coming off the same sensor and so any divergence is in how it’s being logged. Finally, heart rate. You can see that at the beginning of the ride, both wrist-based and chest-strap HRMs struggle. Obviously I have no reference here to know which trace is correct, but the chest-strap recording looks like it has an erroneous low reading for several minutes but is otherwise normal. The wrist-based HRM looks like it is reading 120ish bpm values from the start and then snaps into reading the right thing after a while. The chest-strap makes best contact when some perspiration has built up, which might explain its readings. The comparisons are shown below in grey. The correlation is OK but not great. Compared to cadence, the data diverge a lot more which rules out simple logging differences as a cause.

I found a different story in Smart recording mode on the Fēnix. This is a lower frequency recording mode, which is recommended to preserve battery life for long activities.

So what can we see here? Well, the data from the Fēnix are more patchy but even so, the data are pretty similar except for heart rate. The Fēnix performs badly here. Again, you can see the drop out of the chest strap HRM for a few minutes at the start, but otherwise it seems pretty reliable.  The comparison graph for heart rate shows how poorly the wrist-based HRM measures heart rate, in this mode.

OK, this is just two rides, for one person – not exactly conclusive but it gives some idea about the data that are captured with each HRM.

Conclusions

Wrist-based HRM is pretty good (at the higher sampling rate only) especially considering how the technology works, plus chest-strap HRMs can be uncomfortable to wear, so wrist-based HRM may be all you need. If you are training to heart rate zones, or want the best data, chest-strap HRM is more reliable than wrist-based HRM generally. Neither are very good for very short activities (<15 min).

For nerds only

Comparisons like these are quite easy to do in desktop packages like Rubitrack (which I love) or Ascent or others. They tend to mask things like missing data points, they smooth out the data and getting the plots the way you want is not straightforward. So, I took the original FIT files from each unit and used these for processing. There’s a great package for R called cycleRtools. This worked great except for the smart recording data which was sampled irregularly and it turns out and the package requires monotonic sampling. I generated a gpx file and parsed the data for this activity in R using XML. I found this snippet on the web (modified slightly).

library(XML)
library(plyr)
filename <- "myfile.gpx"
gpx.raw <- xmlTreeParse(filename, useInternalNodes = TRUE)
rootNode <- xmlRoot(gpx.raw)
gpx.rawlist <- xmlToList(rootNode)$trk
gpx.list <- unlist(gpx.rawlist[names(gpx.rawlist) == "trkseg"], recursive = FALSE)
gpx <- do.call(rbind.fill, lapply(gpx.list, function(x) as.data.frame(t(unlist(x)), stringsAsFactors=F)))
names(gpx) <- c("ele", "time", "temp", "hr", "cad", "lat", "lon")

Otherwise:

library(cycleRtools)
edge <- as.data.frame(read_fit(file = file.choose(), format = TRUE, CP = NULL, sRPE = NULL))
write.csv(edge, file = "edge.csv", row.names = F)

The resulting data frames could be saved out as csv and read into Igor to make the plots. I wrote a quick function in Igor to resample all datasets at 1 Hz. The plots and layout were generated by hand.

The post title comes from “Turn That Heartbeat Over Again” by Steely Dan from Can’t Buy A Thrill