## My Blank Pages VI: Programming in Igor Pro

It has been a long time since I wrote a book review.

A few months ago I read on IgorExchange that Martin Schmid had written a book about programming Igor. I snapped up a copy. I’m a competent Igor programmer but I was hoping that this book would be useful for lab members that want to learn.

Learning Igor – like most IDEs or programming languages – is tough going. There’s a booklet from WaveMetrics (the company that sells Igor Pro) called Getting Started – which is really good. There are a few other guides on the web (Payam’s guide, Thomas Braun’s coding conventions, quantixed’s own translator), but other resources are pretty scarce. The Igor Manual itself is excellent but it’s many, many pages long and is only meant to be consulted. So I was intrigued whether Martin Schmid’s book would fill the gap between Getting Started and more advanced guides.

What makes Igor Pro so fantastic is the way that you can use it for so many different things: image processing, statistics, graphing, curve fitting, instrument control and so on. Part of the challenge of writing a book on Igor Programming is deciding what to cover. Schmid deals with this by covering basic programming and core-intermediate topics such as dialogs, loops, string magic etc. The book stops short of any specialised applications. So it’s a really useful intermediate programming guide. It’s a great little book and is recommended for those who want to dig further after doing the Getting Started exercises.

I knew I would learn something from the book because there’s always alternative ways to do stuff in Igor: things that you didn’t know about or little tricks to do stuff faster. What I was surprised about was the first thing mentioned in the book was new to me. The author favours module-static programming. All of my Igor programming has been done in the global pragma and I have avoided this more C-like way of encapsulating programs that I’ve written so far. Module-static works well because it eliminates naming conflicts. I have dealt with name conflicts by using static functions which are called from the top of the stack, and the top has a unique name (arguably this is the same as module-static, but not identical). As the Igor Manual says “this gets tedious after a while” and that’s true. Although in my defence, name conflicts are generally not a problem for the way I work because I favour a reproducible approach. A new experiment is started – one user-written ipf is opened – and the code is run. This means naming conflicts are minimised. The book has actually convinced me that module-static is a good thing, especially since my Igor code is now deployed around the lab and naming conflicts could easily become a problem. It’s an advanced programming technique but is dealt with early by the author and it kind of works. After this, more basic programming topics are covered in depth.

There’s always room for improvement: there are several example programs at the back which need to be rekeyed to run, since this is a paper book and no electronic version is available AFAIK. The author has put one up here to save rekeying and another here, but otherwise you need to type in the examples to see what will happen. This is too long-winded. I’ve been spoiled reading texts about R where the examples can all be run from a markdown file inside RStudio. It would’ve been nice if the code was made available for this book. I don’t think it would compromise the value of the book since it is the text that is most valuable.

The book is available at Amazon for £7.99 at the time of writing.

My Blank Pages is a track by Velvet Crush. This is an occasional series of book reviews.

## Rip It Up: Grabbing movies from Twitter for use in ImageJ

Some great scientific data gets posted on Twitter. Sometimes I want to take a closer look and this post describes a strategy to do so.

Edit: I received a request to take down the 3D volume images derived from the example dataset I used in this post. I’ve edited the post below so that is now a general guide.

Grab the video

It can be a bit difficult to the grab video from Twitter. The best way I’ve found is using youtube-dl. This works for downloading video and audio from YouTube to view offline, but it also works for other embedded video content on other websites.

youtube-dl -o '%(title)s.%(ext)s' https://twitter.com/username/status/tweetID

Convert to avi

Now, mp4 is a compressed file format which cannot be read directly by FIJI/ImageJ. Conversion to avi means that the file can be loaded. I like to use another command line tool, ffmpeg for video conversions.

ffmpeg -i originalFile.mp4 -pix_fmt nv12 -f avi -vcodec rawvideo convertedFile.avi

Now we have an avi file called convertedFile.avi that we can use.

The avi can be loaded into FIJI. At this point you can analyse the video. However, in the case of the video I was interested in, the data had been pseudocolored and was now in RGB format. I wanted to look at the original data. Converting to grayscale does not give the correct representation but conversion back to grayscale is possible if you know the LUT was applied. Even if you don’t, it’s possible to take a guess at the LUT and do the conversion.

Converting RGB to original values

I found a nice gist that does the conversion for a single image. I just modified this code to work for a stack. It requires the LUT to be displayed vertically in a window called LUT. Caution: this code runs very slowly because every pixel in every slice needs to be recalculated and ImageJ is slow… I took a guess that mpl-inferno was used (I don’t think is exactly right but it worked well enough). You can display the built-in LUTs in FIJI using Color > Display LUTs… and from there you can make the LUT window which the macro uses for the calculation. The macro to convert stacks to grayscale using the LUT is here.

I had a nice grayscale version of the data (inverted because I wanted to look at the volume). This let me see how the layers in the original video add together to make the full structure. I used ClearVolume which can be installed via Update Site in FIJI. I just made a quick video to show it in action (see below). You’ll have to take my word for it (video removed).

So extracting scientific data from Twitter or another online source is pretty straightforward. The extra complication was getting rid of the pseudocoloring, but once this was done, something very close to the original data was available.  Nonetheless this workflow is a fun way to take a closer look at some of the cool movies that people post on Twitter. I hope you find it useful.

The post title comes from “Rip It Up” by Orange Juice. A popular title in my library with versions from several different artists. I was thinking what is described is similar to ripping video content.

## Rollercoaster III: yet more on Google Scholar

In a previous post I made a little R script to crunch Google Scholar data for a given scientist. The graphics were done in base R and looked a bit ropey. I thought I’d give the code a spring clean – it’s available here. The script is called ggScholar.R (rather than gScholar.R). Feel free to run it and raise an issue or leave a comment if you have some ideas.

I’m still learning how to get things looking how I want them using ggplot2, but this is an improvement on the base R version.

As described earlier I have many Rollercoaster songs in my library. This time it’s the song and album by slowcore/dream pop outfit Red House Painters.

## Ten Years vs The Spread: Calculating publication lag times in R

There have been several posts on this site about publication lag times. You can read them here. Lag times are the delays in the dissemination of scientific data introduced by the process of publishing the paper in a journal. Nowadays, your paper can be online in a few hours using a preprint server. However, this work is not peer reviewed. Journals organise a formal peer review and provide some sort of certification of the work. They typeset the work and all of this adds delays the dissemination of work in a journal.

To look at publication delays, you can use PubMed data, which is incomplete but can give insight into how long these delays can be. Previous posts have involved the use of a ruby script to make a csv file from PubMed XML output and then use this in Igor to calculate the publication lag times. There is another method detailed in this excellent post by Daniel Himmelstein.

I recently posted a figure for Nature Communications lag times on Twitter and was asked to generate others. I figured that I should write an R script and people can make their own!

The PubMedLagR code is available here with instructions for use.

A query for Nature Communications data at PubMed, such as:

nat commun[ta] AND 2000 : 2018[pdat] AND journal article[pt]

Retrieves all paper for this journal. The range from 2010 to 2018 is for illustration, this journal has only been in operation for these years. Filtering for journal articles rather and attempting to get rid of reviews and front matter is wise, but doesn’t always work. Again this journal doesn’t carry this material so this is for illustration. Getting your query right is very important.

Save the results in XML format and then run the R script as directed. This should give a csv of the data and a png of the lag times.

This is data from Nature Communications. Colleagues had two separate papers accepted at this journal and experienced long delays. I was interested to see if papers were generally taking longer to publish here. Of course we do not know why. Delays are partly the fault of the authors, the reviewers and the journal and it is not possible to say why publication lag times are increasing for this journal year-on-year. The journal has grown in terms of number of papers published, has this introduced inefficiencies? Are reviewers being slow to review? Are they being more demanding? Are Editors not marshalling the referee reports and providing clear guidance to authors? Allowing too much time and too many rounds of revision? Are authors being too slow to do further experimental work? The answer will be yes to some of these questions for some of the papers.

This is not to focus on Nature Communications, it’s one of a few journals that many colleagues complain is too slow to publish their work. With this code you can have a look at the journal you are interested in submitting to and consider whether there is a more rapid venue for your work.

Update:

I changed the code slightly and prettified the plots just a little. Below are some plots for Nature Cell Biology, Nature Neuroscience. I also did a search for clathrin or CRISPR papers over the same time period. These keyword searches are fairly flat, whereas the journal-specific increase in publication lag time can be seen.

The lag times at Nature Neuroscience look artificially low and then seem to have jumped up in 2016 to be something similar to Nature Cell Biology or Nature Communications.

Edit

I neglected to point out that the code truncates the y-axis in the bottom right plot to 1000 days or the maximum lag time, whichever is smaller. This is because it gets difficult to see the data points if there is an outlier, which might be due to an error in PubMed data.

A reader commented on Twitter that some poor paper had a lag time almost 1000 days. Well, due to the y-axis truncation we don’t see that 9 papers in Nature Communications since 2010 have lag times (RecAcc) of > 1000 days. The record holder has a lag time of 1561 days! I checked that this was not a PubMed error by looking at the dates on the paper.

Notes

Date information is not available in PubMed for every paper unfortunately. This is especially true of older papers.

The date information is supplied to PubMed from the journal. These dates are not necessarily accurate: 1) you can see occasional errors in the data, 2) journals sometimes “reset the clock” on papers and treat resubmissions as new submissions.

The post title is taken from “10 Years vs The Spread” by Wing-Tipped Sloat from the LP Chewyfoot. Obviously the song has nothing to do with smoothed kernel density estimates of journal publication lag times, but the title was incredibly apt.

## Cloud Eleven: A cloud-based code sharing solution for IgorPro

This post is something of a “how to” guide. The problem is how can you share code with a small team and keep it up-to-date?

For ImageJ, the solution is simple. You can make an ImageJ update site and then push any updated code to the user when they startup ImageJ. For IgorPro, there is no equivalent. Typically I send ipf files to someone and they run the code, but I have to resend them whenever there’s an update. This can cause confusion over which is the latest version.

I’ve tried a bunch of things such as versioning the code (this at least tells you if the person is running an out-of-date version). I also put the code up on GitHub and tell people to pull down the latest version, but again this doesn’t work well. The topic of how to share code comes up perennially on the Igor mailing list and on IgorExchange, so it’s clearly something that people struggle with. I’ve found the solutions offered to be a bit daunting.

My solution is detailed here with some code to make it run.

The details

It is possible to use aliases (shortcuts on Windows) in the User’s WaveMetrics folders to point to external files. Items in the Igor Procedures directory get loaded when you start Igor. Items in User Procedures can get loaded optionally. These aliases sit in the Users Wavemetrics folder (the exact location depends on the Igor Version) and this means that the program itself can get updated without overwriting these files.

So, if aliases are created here that point to a cloud-based repo of Igor Code it can be used to:

1. Optionally load code as the user needs it. Because the code sits in the cloud it can be updated and instantly used by everyone.
2. Force Igor to load a bit of code to make a little menu item so that the user can pick the code they want to load.

To get this working, I made use of Ryotako’s excellent menu loader which was written for loading the “hidden” WaveMetrics procedures. I created a version like that one which makes a menu of our shared code in alphabetical order. I then made a version that allows the code to be grouped by purpose. People in my group told me they prefer this version. Code just needs to be organised in folders for it to work.

The IgorDistro.ipf just needs to be placed in a folder called IP somewhere in a share that users have access to (unless people are contributing to code development, you can make the share read-only). In the same directory that IP sits in, place a folder called UP with all of your code organised into folders. Aliases need to be created that point Igor Procedures folder to IP and User Procedures to UP. That’s it! I’ve tested it on IP7 and IP8, on Windows and Mac, using a shared Dropbox as the cloud repo.

Obviously, the correct IgorPro licences need to be in place to share the code with multiple users.

The post title comes from the pseudonymous LP by Cloud Eleven. A great debut album. The track “Wish I” is worth the price of the record alone (assuming people still buy records).

## Rollercoaster II: more on Google Scholar citations

I’ve previously written about Google Scholar. Its usefulness and its instability. I just read a post by Jon Tennant on how to harvest Google Scholar data in R and I thought I would use his code as the basis to generate some nice plots based on Google Scholar data.

A script for R is below and can be found here. Graphics are base R but do the job.

First of all I took it for a spin on my own data. The outputs are shown here:

These were the most interesting plots that sprang to mind. First is a ranked citation plot which also shows y=x to find the Hirsch number. Second, was to look at total citations per year to all papers over time. Google Scholar shows the last few years of this plot in the profile page. Third, older papers accrue more citations, but how does this look for all papers? Finally, a prediction of what my H-index will do over time (no prizes for guessing that it will go up!). As Jon noted, the calculation comes from this paper.

While that’s interesting, we need to get  the data of a scholar with a huge number of papers and citations. Here is George Church.

At the time of writing he has 763 papers with over 90,000 citations in total and a H-index of 147. Interestingly ~10% of his total citations come from a monster paper in PNAS with Wally Gilbert in the mid 80s on genome sequencing.

Feel free to grab/fork this code and have a play yourself. If you have other ideas for plots or calculations, add a comment here or an issue at GitHub.

if(!require(scholar)){
install.packages("scholar")
}
library(scholar)
ID <- ""
# If you didn't add one to the script prompt user to add one
if(ID == ""){
ID <- readline(prompt="Enter Scholar ID: ")
}
# Get the citation history
citeByYear<-get_citation_history(ID)
# Get profile information
profile <- get_profile(ID)
# Get publications and save as a csv
pubs <- get_publications(ID)
write.csv(pubs, file = "citations.csv")
# Predict h-index
hIndex <- predict_h_index(ID)
# Now make some plots
# Plot of total citations by year
png(file = "citationsByYear.png")
plot(citeByYear$year,citeByYear$cites,
type="h", xlab="Year", ylab = "Total Cites")
dev.off()
# Plot of ranked paper by citation with h
png(file = "citationsAndH.png")
plot(pubs$cites, type="l", xlab="Paper rank", ylab = "Citations per paper") abline(0,1) text(nrow(pubs),max(pubs$cites, na.rm = TRUE),
profile$h_index) dev.off() # Plot of cites to paper by year png(file = "citesByYear.png") plot(pubs$year, pubs$cites, xlab="Year", ylab = "Citations per paper") dev.off() # Plot of h-index prediction thisYear <- as.integer(format(Sys.Date(), "%Y")) png(file = "hPred.png") plot(hIndex$years_ahead+thisYear,hIndex$h_index, ylim = c(0, max(hIndex$h_index, na.rm = TRUE)),
type = "h",
xlab="Year", ylab = "H-index prediction")
dev.off()


Note that my previous code used a python script to grab Google Scholar data. While that script worked well, the scholar package for R seems a lot more reliable.

I have a surprising number of tracks in my library with Rollercoaster in the title. This time I will go with the Jesus & Mary Chain track from Honey’s Dead.

## I’m not following you II: Twitter data and R

My activity on twitter revolves around four accounts.

I try to segregate what happens on each account, and there’s inevitably some overlap. But what about overlap in followers?

What lucky people are following all four? How many only see the individual accounts?

It’s quite easy to look at this in R.

So there are 36 lucky people (or bots!) following all four accounts. I was interested in the followers of the quantixed account since it seemed to me that it attracts people from a slightly different sphere. It looks like about one-third of quantixed followers only follow quantixed, about one-third follow clathrin also and more or less the remainder are “all in” following three accounts or all four. CMCB followers are split about the same. The lab account is a bit different, with close to one-half of the followers also following clathrin.

Extra nerd points:

This is a Venn diagram and not an Euler plot. Venn just shows schematically the intersections and does not attempt to encode information in the area of each part. Euler plots for greater than three groups are hard to generate and to make any sense of what is shown. It is a dataviz problem to look at the proportions or lots of groups. A solution here would be to generate a further four Venn diagrams. On each, display the proportion for one category as a fraction or percentage

How to do it:

Last time, I described how to set up rtweet and make a Twitter app for use in R. You can use this to pull down lists of followers and extract their data. Using the intersect function you can work out the numbers of followers at each intersection. For four accounts, there will be 1 group of four, 4 groups of three, 6 groups of two. The VennDiagram package just needs the total numbers for all four groups and then details of the intersections, i.e. you don’t need to work out the groups minus their intersections – it does this for you.

library(rtweet)
library(httpuv)
library(VennDiagram)
## whatever name you assigned to your created app
appname <- "whatever_name"
## api key (example below is not a real key)
key <- "blah614h"
## api secret (example below is not a real key)
secret <- "blah614h"
app = appname,
consumer_key = key,
consumer_secret = secret)
clathrin_followers <- get_followers("clathrin", n = "all")
clathrin_followers_names <- lookup_users(clathrin_followers)
quantixed_followers <- get_followers("quantixed", n = "all")
quantixed_followers_names <- lookup_users(quantixed_followers)
cmcb_followers <- get_followers("Warwick_CMCB", n = "all")
cmcb_followers_names <- lookup_users(cmcb_followers)
roylelab_followers <- get_followers("roylelab", n = "all")
roylelab_followers_names <- lookup_users(roylelab_followers)
# a = clathrin
# b = quantixed
# c = cmcb
# d = roylelab
## now work out intersections
anb <- intersect(clathrin_followers_names$user_id,quantixed_followers_names$user_id)
anc <- intersect(clathrin_followers_names$user_id,cmcb_followers_names$user_id)
and <- intersect(clathrin_followers_names$user_id,roylelab_followers_names$user_id)
bnc <- intersect(quantixed_followers_names$user_id,cmcb_followers_names$user_id)
bnd <- intersect(quantixed_followers_names$user_id,roylelab_followers_names$user_id)
cnd <- intersect(cmcb_followers_names$user_id,roylelab_followers_names$user_id)
anbnc <- intersect(anb,cmcb_followers_names$user_id) anbnd <- intersect(anb,roylelab_followers_names$user_id)
ancnd <- intersect(anc,roylelab_followers_names$user_id) bncnd <- intersect(bnc,roylelab_followers_names$user_id)
anbncnd <- intersect(anbnc,roylelab_followers_names\$user_id)
## four-set Venn diagram
area1 = nrow(clathrin_followers_names),
area2 = nrow(quantixed_followers_names),
area3 = nrow(cmcb_followers_names),
area4 = nrow(roylelab_followers_names),
n12 = length(anb),
n13 = length(anc),
n14 = length(and),
n23 = length(bnc),
n24 = length(bnd),
n34 = length(cnd),
n123 = length(anbnc),
n124 = length(anbnd),
n134 = length(ancnd),
n234 = length(bncnd),
n1234 = length(anbncnd),
category = c("Clathrin", "quantixed", "CMCB", "RoyleLab"),
fill = c("dodgerblue1", "red", "goldenrod1", "green"),
lty = "dashed",
cex = 2,
cat.cex = 1.5,
cat.col = c("dodgerblue1", "red", "goldenrod1", "green"),
fontfamily = "Helvetica",
cat.fontfamily = "Helvetica"
);
# write to file
grid.draw(venn.plot);
dev.off()


I’ll probably return to rtweet in future and will recycle the title if I do.

Like last time, the post title is from “I’m Not Following You” the final track from the 1997 LP of the same name from Edwyn Collins

## Frankly, Mr. Shankly

I read about Antonio Sánchez Chinchón’s clever approach to use the Travelling Salesperson algorithm to generate some math-art in R. The follow up was even nicer in my opinion, Pencil Scribbles. The subject was Boris Karloff as the monster in Frankenstein. I was interested in running the code (available here and here), so I thought I’d run it on a famous scientist.

By happy chance one of the most famous scientists of the 20th Century, Rosalind Franklin, shares a nominative prefix with the original subject. There is also a famous portrait of her that I thought would work well.

I first needed needed to clear up the background because it was too dark.

Now to run the TSP code.

The pencil scribbles version is nicer I think.

The R scripts basically ran out-of-the-box. I was using a new computer that didn’t have X11quartz on it nor the packages required, but once that they were installed I just needed to edit the line to use a local file in my working directory. The code just ran. The outputs FrankyTSP and Franky_scribbles didn’t even need to be renamed, given my subject’s name.

Thanks to Antonio for making the code available and so easy to use.

The post title comes from “Frankly, Mr. Shankly” by The Smiths which appears on The Queen is Dead. If the choice of post title needs an explanation, it wasn’t a good choice…

## Paintball’s Coming Home: generating Damien Hirst spot paintings

A few days ago, I read an article about Damien Hirst’s new spot paintings. I’d forgotten how regular the spots were in the original spot paintings from the 1990s (examples are on his page here). It made me think that these paintings could be randomly generated and so I wrote a quick piece of code to do this (HirstGenerator).

I used Hirst’s painting ‘Abalone Acetone Powder’ (1991), which is shown on this page as photographed by Alex Hartley. A wrote some code to sample the colours of this image and then a script to replicate it. The original is shown below  © Damien Hirst and Science Ltd. Click them for full size.

and then this is the replica:

Now that I had a palette of the colours used in the original. It was simple to write a generator to make spot paintings where the spots are randomly assigned.

The generator can make canvasses at whatever size is required.

The code can be repurposed to make spot paintings with different palettes from his other spot paintings or from something else. So there you have it. Generative Hirst Spot Paintings.

For nerds only

My original idea was to generate a palette of unique colours from the original painting. Because of the way I sampled them, each spot is represented once in the palette. This means the same colour as used by the artist is represented as several very similar but nonidentical colours in the palette. My original plan was to find the euclidean distances between all spots in RGB colour space and to establish a distance cutoff to decide what is a unique colour.

That part was easy to write but what value to give for the cutoff was tricky. After some reading, it seems that other colour spaces are better suited for this task, e.g. converting RGB to a CIE colour space. For two reasons, I didn’t pursue this. First, quantixed coding is time-limited. Second. assuming that there is something to the composition of these spot paintings (and they are not a con trick) the frequency of spots must have artistic merit and so they should be left in the palette for sampling in the generated pictures. The representation of the palette in RGB colour space had an interesting pattern (shown in the GIF above).

The post title comes from “Paintball’s Coming Home” by Half Man Half Biscuit from Voyage To The Bottom Of The Road. Spot paintings are kind of paintballs, but mostly because I love the title of this song.

## Esoteric Circle

Many projects in the lab involve quantifying circular objects. Microtubules, vesicles and so on are approximately circular in cross section. This quick post is about how to find the diameter of these objects using a computer.

So how do you measure the diameter of an object that is approximately circular? Well, if it was circular you would measure the distance from one edge to the other, crossing the centre of the object. It doesn’t matter along which axis you do this. However, since these objects are only approximately circular, it matters along which axis you measure. There are a couple of approaches that can be used to solve this problem.

Principal component analysis

The object is a collection of points* and we can find the eigenvectors and eigenvalues of these points using principal component analysis. This was discussed previously here. The 1st eigenvector points along the direction of greatest variance and the 2nd eigenvector is normal to the first. The order of eigenvectors is determined by their eigenvalues. We use these to rotate the coordinate set and offset to the origin.

Now the major axis of the object is aligned to the x-axis at y=0 and the minor axis is aligned with the y-axis at x=0 (compare the plot on the right with the one on the left, where the profiles are in their original orientation – offset to zero). We can then find the absolute values of the axis crossing points and when added together these represent the major axis and minor axis of the object. In Igor, this is done using a oneliner to retrieve a rotated set of coords as the wave M_R.

PCA/ALL/SEVC/SRMT/SCMT xCoord,yCoord

To find the crossing points, I use Igor’s interpolation-based level crossing functions. For example, storing the aggregated diameter in a variable called len.

FindLevel/Q/EDGE=1/P m1c0, 0
len = abs(m1c1(V_LevelX))
FindLevel/Q/EDGE=2/P m1c0, 0
len += abs(m1c1(V_LevelX))

This is just to find one axis (where m1c0 and m1c1 are the 1st and 2nd columns of a 2-column wave m1) and so you can see it is a bit cumbersome.

Anyway, I was quite happy with this solution. It is unbiased and also tells us how approximately circular the object is (because the major and minor axes tell us the aspect ratio or ellipticity of the object). I used it in Figure 2 of this paper to show the sizes of the coated vesicles. However, in another project we wanted to state what the diameter of a vesicle was. Not two numbers, just one. How do we do that? We could take the average of the major and minor axes, but maybe there’s an easier way.

Polar coordinates

The distance from the centre to every point on the edge of the object can be found easily by converting the xy coordinates to polar coordinates. To do this, we first find the centre of the object. This is the centroid $$(\bar{x},\bar{y})$$ represented by

$$\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_{i}$$ and $$\bar{y} = \frac{1}{n}\sum_{i=1}^{n} y_{i}$$

for n points and subtract this centroid from all points to arrange the object around the origin. Now, since the xy coords are represented in polar system by

$$x_{i} = r_{i}\cos(\phi)$$ and $$y_{i} = r_{i}\sin(\phi)$$

we can find r, the radial distance, using

$$r_{i} = \sqrt{x_{i}^{2} + y_{i}^{2}}$$

With those values we can then find the average radial distance and report that.

There’s something slightly circular (pardon the pun) about this method because all we are doing is minimising the distance to a central point initially and then measuring the average distance to this minimised point in the latter step. It is much faster than the PCA approach and would be insensitive to changes in point density around the object. The two methods would probably diverge for noisy images. Again in Igor this is simple:

Make/O/N=(dimsize(m1,0)-1)/FREE rW

rW[] = sqrt(m1[p][0]^2 + m1[p][1]^2)

len = 2 * mean(rW)
Here again, m1 is the 2-column wave of coords and the diameter of the object is stored in len.

How does this compare with the method above? The answer is kind of obvious, but it is equidistant between the major and minor axes. Major axis is shown in red and minor axis shown in blue compared with the mean radial distance method (plotted on the y-axis). In places there is nearly a 10 nm difference which is considerable for objects which are between 20 and 35 nm in diameter. How close is it to the average of the major and minor axis? Those points are in black and they are very close but not exactly on y=x.

So for simple, approximately circular objects with low noise, the ridiculously simple polar method gives us a single estimate of the diameter of the object and this is much faster than the more complex methods above. For more complicated shapes and noisy images, my feeling is that the PCA approach would be more robust. The two methods actually tell us two subtly different things about the shapes.

Why don’t you just measure them by hand?

In case there is anyone out there wondering why a computer is used for this rather than a human wielding the line tool in ImageJ… there are two good reasons.

1. There are just too many! Each image has tens of profiles and we have hundreds of images from several experiments.
2. How would you measure the profile manually? This approach shows two unbiased methods that don’t rely on a human to draw any line across the object.

* = I am assuming that the point set is already created.

The post title is taken from “Esoteric Circle” by Jan Garbarek from the LP of the same name released in 1969. The title fits well since this post is definitely esoteric. But maybe someone out there is interested!