Tips from the blog X: multi-line commenting in Igor

This is part-tip, part-adventures in code. I found out recently that it is possible to comment out multiple lines of code in Igor and thought I’d put this tip up here.

Multi-line commenting in programming is useful two reasons:

  1. writing comments (instructions, guidance) that last more than one line
  2. the ability to temporarily remove a block of code while testing

In each computer language there is the ability to comment out at least one line of code.

In Igor this is “//”, which comments out the whole line, but no more.

ipcomment1

This is the same as in ImageJ macro language.

ijcomment1

Now, to comment out whole sections in FIJI/ImageJ is easy. Inserting “/*” where you want the comment to start, and then “*/” where it ends, multiple lines later.

ijcomment2

I didn’t think this syntax was available in Igor, and it isn’t really. I was manually adding “//” for each line I wanted to remove, which was annoying. It turns out that you can use Edit > Commentize to add “//” to the start of all selected lines. The keyboard shortcut in IP7 is Cmd-/. You can reverse the process with Edit > Decommentize or Cmd-\.

ipcomment2

There is actually another way. Igor can conditionally compile code. This is useful if for example you write for Igor 7 and Igor 6. You can get compilation of IP7 commands only if the user is running IP7 for example. This same logic can be used to comment out code as follows.

ipcomment3

The condition if 0 is never satisfied, so the code does not compile. The equivalent statement for IP7-specific compilation, is “#if igorversion()>=7”.

So there you have it, two ways to comment out code in Igor. These tips were from IgorExchange.

If you want to read more about commenting in different languages and the origins of comments, read here.

This post is part of a series of tips.

Tips from the blog IX: running route

University of Warwick is a popular conference destination, with thousands of visitors per year. Next time you visit and stay on campus, why not bring your running shoes and try out these routes?

Route 1

track1

This is just over 10K and it takes you from main campus out towards Cryfield Pavilion. A path goes to the Greenway (a former railway), which is a nice flat gravel track. It goes up to Burton Green and back to campus via Westwood Heath Road. To exit the Greenway at Burton Green you need to take the “offramp” at the bridge otherwise you will end up heading to Berkswell. If you want to run totally off-road*, just turn back at this point (probably ~12K). The path out to the Greenway and the Greenway itself is unlit, so be careful early in the morning or late at night.

GPX of a trace put together on gpsies.

Track 2

track2

This is a variation on Track 1. Instead of heading up the Greenway to Burton Green, take a left and head towards Kenilworth Common. With a bit of navigation you can run on alongside a brook and pop out in Abbey Fields and see the ruins of Kenilworth Abbey. This is out-and-back, 12K. Obviously you can turn back sooner if you prefer. It’s all off-road apart from a few 100m on quiet residential streets as you navigate from the Common to Abbey Fields. GPX from Uni to around the lake at Abbey Fields.

Track 3

track3

 

This is a variation on Track 1 where you exit the Greenway and take a loop around Crackley Wood. The Wood is nice and has wild deer and other interesting wildlife. This route is totally off-road and is shorter at ~8K. GPX from Uni to around the Wood.

 

Other Routes

There is a footpath next to a bike lane down the A429 which is popular for runners heading to do a lap or two of Memorial Park in Coventry. This is OK, but means that you run alongside cars a lot.

If you don’t have time for these routes, the official Warwick page has three very short running routes of around 3 to 5 km (1, 2 and 3). I think that these routes are the ones that are on the signpost near the Sports Centre.

* Here, off-road means on paths but not alongside a road on a pavement. It doesn’t mean across fields.

This post is part of a series of tips.

Tips from the blog VIII: Time Machine on QNAP NAS

This is just a quick tip as it took me a little while to sort this out. In the lab we have two QNAP TS-869 Pro NAS devices. Each was set up with a single RAID6 storage pool and I ran them as a primary and replicant via rsync. We recently bought a bigger server and so the plan was to repurpose one of the NAS boxes to be a Time Machine for all the computers in the lab.

We have around 10 computers that use a Time Machine for small documents that reside on each computer (primary data is always on the server). So far, I’d relied on external Hard Drives to do this. However: 1) they fail, 2) they can get unplugged accidentally and fail to backup and 3) they take up space.

As always the solution is simple and I’ll outline this with the drawbacks and then describe the other things I did to save you wasting time.

  1. Wipe the NAS. I went for RAID5, rather than RAID6. I figured this is safe enough. The NAS emails me if there is a problem with any of the disks and they can be replaced.
  2. Enable Time Machine. In the QNAP interface in Backup Server>Time Machine, click Enable Time Machine support. Set a limit if you like or 0 for no limit. Add a Username and Password.
  3. Pick the NAS as the Time Machine disk. On each Mac, wait for backup to complete, turn Time Machine off. In Time Machine Preferences pick new disk. It will see your QNAP NAS Time Machine share. Click on it, enter user/pass. click OK. Don’t select use both disks (an option in Yosemite onwards).
  4. That’s it. Wait for backup to complete, check it. Unplug external HD and repurpose.

You don’t need each user to have a share or an account on the NAS. You don’t need to mount the volume on the Mac at startup. The steps above are all you need to do.

The major drawback is that all users share the same Time Machine space. In theory, one person could swamp the backup with their files and this will limit how far back all users can go in the Time Machine. The NAS is pretty big, so I think this will be OK. A rule for putting data/big files in a directory on a user’s Mac and then excluding this directory from the Backup seems the obvious workaround.

What not to try

There is a guide on the web (on the QNAP website!) which shows you how to add Time Machine support for a Mac. It requires some command line action and is pretty complicated. Don’t do it. My guess is that this was the original hack and this has now been superseded by the “official” support on the QNAP interface. I’m not sure why they haven’t taken this guide down. There is another page on the site, outlining the steps above. Just use that.

I read on the QNAP forums about the drawback of using the official Time Machine backup (i.e. all users having to share). My brainwave was to create a user for each Mac, enable a home folder, limit the account with a quota and then set the Mac to mount the volume on startup. This could then be the Time Machine and allow me to limit the size of the Time Machine backup on a per user basis. It didn’t work! Time Machine needs a specially formatted volume to work and so it cannot use a home folder in this way. Maybe there is a way to get this work – it would be a better solution – but I couldn’t manage it.

So there you go. If you have one of these devices. This is what worked for us.

Tips from the blog VII: Recolour Z-stack and Save Projection

I’m putting this up here in case it is useful for somebody.

We capture Z-stacks on a Perkin Elmer Spinning Disk microscope system. I wanted to turn each stack into a single image so that we could quickly compare them. This simple macro does the job.

  1. We import the images straight from the *.mvd2 library using the wonderful BioFormats import tool. We open all files as composite hyperstacks.
  2. This Macro is then run (works on all of the open images).
  3. It finds the mid-Z point of the stack and sets the brightness/contrast for each channel to Auto.
  4. It then recolours the channels to the right colours (we capture DAPI then GFP then mCherry, but the import colours them red, green and blue, respectively).
  5. Then it makes a z-projection and saves the file in a directory that you specify at the start.
  6. It closes the projection and the z-stack.

You can then open up all of the projections, tile them and take a look at what was going on in the experiment.


setBatchMode(true);
imgArray = newArray(nImages);
dir1 = getDirectory("Choose Destination Directory ");
for (i=0; i<nImages; i++) {
	selectImage(i+1);
	imgArray[i] = getImageID();
	}
for (i=0; i< imgArray.length; i++) {
	selectImage(imgArray[i]);
	id = getImageID();
	win = getTitle();
	getDimensions(w, h, c, nFrames, dummy);
	Stack.setSlice(round(nFrames/2));
	run("Enhance Contrast", "saturated=0.35");
	run("Next Slice [>]");
	run("Enhance Contrast", "saturated=0.35");
	run("Next Slice [>]");
	run("Enhance Contrast", "saturated=0.35");
	Stack.setChannel(1)
	run("Blue");
	Stack.setChannel(2)
	run("Green");
	Stack.setChannel(3)
	run("Red");
	run("Z Project...", "projection=[Max Intensity]");
	saveAs("TIFF", dir1+win);
	close();
	selectImage(id);
	close();
	}

Sorry for lack of commenting.

Tips from the blog is a series and gets its name from a track from Black Sunday by Cypress Hill.

Tips from the blog VI: doc to pdf

doctopdfA while back I made this little Automator script to convert Microsoft Word doc and docx files to PDF. It’s useful for when you are sent a bunch of Word files for committee work. Opening PDFs in Preview is nice and hassle-free. Struggling with Word is not.

It’s not my own work, I just put it together after googling around a bit. I’ll put it here for anyone to use.

To get it working:

  1. Open Automator. Choose Template Service and you need to check: Service receives selected files and folders in Finder.app
  2. Set the actions: Get Folder content and Run AppleScript (this is where the script goes)
  3. Now Save the workflow. Suggested name Doc to PDF.

Now to run it:

  1. Select the doc/docx file(s) in the Finder window.
  2. Right-click and look for the service in the contextual menu. This should be down the bottom near “Reveal in Finder”.
  3. Run it.

If you want to put it onto another computer. Go to ~/Library/Services and you will find the saved workflow there. Just copy to the same location on your other computer.

Known bug: Word has to be open for the script to run. It also doesn’t shut down Word when it’s finished.

Here is the code.

property theList : {"doc", "docx"}

on run {input, parameters}
          set output to {}
          tell application "Microsoft Word" to set theOldDefaultPath to get default file path file path type documents path
          repeat with x in input
                    try
                              set theDoc to contents of x
                              tell application "Finder"
                                        set theFilePath to container of theDoc as text

                                        set ext to name extension of theDoc
                                        if ext is in theList then
                                                  set theName to name of theDoc
                                                  copy length of theName to l
                                                  copy length of ext to exl

                                                  set n to l - exl - 1
                                                  copy characters 1 through n of theName as string to theFilename

                                                  set theFilename to theFilename &amp; ".pdf"

                                                  tell application "Microsoft Word"
  set default file path file path type documents path path theFilePath
                                                            open theDoc
                                                            set theActiveDoc to the active document
  save as theActiveDoc file format format PDF file name theFilename
                                                            copy (POSIX path of (theFilePath &amp; theFilename as string)) to end of output
  close theActiveDoc
                                                  end tell
                                        end if
                              end tell
                    end try
          end repeat
          tell application "Microsoft Word" to set default file path file path type documents path path theOldDefaultPath

          return output
end run

 

Tips from the blog is a series and gets its name from a track from Black Sunday by Cypress Hill.

Tips from the Blog V: Advice for New PIs

I recently gave a talk at a retreat for new PIs working at QMUL. My talk was focussed on tips for getting started, i.e. the nitty gritty of running an efficient lab. It was a mix of things I’ve been told, worked out for myself or that I’d learned the hard way.

PIs are expected to be able to do lots of things that can be full-time jobs in themselves. In my talk, I focussed on ways to make yourself more efficient to give yourself as much time to tackle all these different roles that you need to take on. You don’t need to work 80 hours a week to succeed, but you do need to get organised.

1. Timelines

Get a plan together. A long-term (5 -year) plan and a shorter (1-2 year) plan. What do you want to achieve in the lab? What papers do you want to publish? How many people do you need in the lab? What grants do you need? When are your next three grant applications due? When is the first one due? Work back from there. It’s January, the first one is due in September, better get that paper submitted! You need a draft application available for circulation to colleagues in good time to do something about the comments. Plan well. Don’t leave anything to the last minute. But don’t apportion too much time as the task will expand to fill it.

Always try to work towards the big goals. It’s too easy to spend all of your time on “urgent” things and busywork (fire-fighting). Prioritise Important over Urgent.

2. Time audit

Doing a time audit is a good way to identify where you are wasting time and how to reorganise your day to be more effective. Do you find it difficult to write first thing in the morning? If so, why not deal with your email or paperwork first thing since it requires less brain activity. Can you work during your commute? Save busywork for then. Can you switch between lab work and desk work well? Where are you fitting in teaching and admin? Try and find out answers to these questions with a time audit. It’s a horrible corporate thing to do, but I found it worked for me.

3. Lab manual

This was a popular idea. Paul Nurse’s lab had one – so should yours! The Royle lab manual has the following sections:

  • Lab organisation
  • Molecular Biology
  • Cell Biology
  • Biochemistry
  • Imaging

The lab organisation section has subsections on 1) how to keep a lab book; 2) lab organisation (databases, plasmid/antibody organisation); 3) computers/data storage; 4) lab calculations; 5) making figures. The other sections are a collection of our tried-and-tested protocols. New protocols are submitted to a share on the server and honed until ready for preservation in the Lab Manual. The idea is that you give the manual to all new starters and get them to stick to it and to consult the manual first when troubleshooting. People in the lab like it, because they are not left guessing exactly what you expect of them.

As part of this. You need to sort out lab databases and a lab server for all of the data. One suggestion was to give one person in the lab the job of looking after (a.k.a. policing) the databases and enforcing compliance. We don’t do this and instead do spot checks every few weeks/months to ensure that things haven’t drifted too far.

Occasionally, and at random, I’ll ask all lab members to bring their lab books to our lab meeting. I ask everyone to swap books with someone else. I then pick a random date and ask person X to describe (using the lab book) what person Y did on that day. It’s a bit of fun, but makes people realise how important keeping a good lab book is.

4. Tame your email

There are lots of tips on how to do this – find something that works for you. For example, I set up several filters/rules that move messages that are low importance away from my inbox. I flag messages and deal with them later if they will take more than 5 sec to deal with. I’ve tried checking at specified times of the day – doesn’t work well for me – but it might for you. Out-of-hours email is a problem. Just remember that no email is so urgent that it cannot wait until the morning – otherwise they would phone you.

5. Automation

Again there are lots of tips out there, e.g. in this post from Sylvain Deville. I have set up macros for routine things like exporting figures in a variety of formats/resolutions and assembling them with a manuscript file to one PDF for circulating around the lab for comment. We have workflows for building figures and writing papers. Anything that you do frequently is worth automating.

6. Deposit your plasmids with Addgene

They’ll distribute them for you! This saves a lot of time. You still get to check who is requesting them if you are curious.

7. Organising frequently-used files

Spend some time making some really good schematic figures. The can be used reused and rejigged time and again for a variety of purposes – talks, manuscripts etc. It’s worth doing this well and with a diagram that is definitely yours and not plundered from the web. Also, never retype text instructions – save them somewhere and just cut-and-paste. Examples include: answers for common questions from students, details of how to do something in the lab, details of how to get to the lab, brief biography, abstracts for talks…

Have a long-format CV always ready, keep updating it (I’ve not found a good way to automate this, yet). I get asked for my CV all the time for lots of different things. Have the long (master) CV set up so that you can delete sections as appropriate, depending on the purpose. Use the publication list from this for pasting into various boxes that you are required to fill out. An Endnote smart list of all of your papers is also handy for rapidly formatting a list of your papers according to different styles. Try to keep your web profiles up-to-date. If you publish a new paper add to your CV and all of your profiles so they don’t look out of date. ORCiD, Researchfish, whatever, try and keep them all current.

Get a slidedeck together of all your slides on a topic. Pull from here to put your talks together. Get a slidepack together to show to visitors to the lab at a moment’s notice. Also, when you publish a new paper, make slides of the final figure versions and add them to the master slidedeck.

8. Alerts

Set up literature alerts. My advice would be don’t have these coming to your inbox. Use RSS. This way you can efficiently mark interesting papers to look at later and keep your email free of clutter. Grab feeds for your favourite journals and for custom pubmed searches. Not just for subject keywords but also for colleagues and scientists who you think do interesting work. Set up Google Scholar to send you papers that have cited your work. Together with paper recommendations from Twitter (or maybe some other services – PubChase etc.) you’ll be as sure as you can be that you’re not missing anything. Also grab feeds from funding agencies, so that you don’t miss calls for grant applications. If all of these things are in place, you don’t need to browse the web which can be a huge time drain.

9. Synchronise

I have several computers synced via Unison (thanks to Daniel and Christophe who suggested this to me years ago). You can do this via Dropbox, but the space is limited. Unison syncs all my documents so that I am always able to work on the latest versions of things wherever I am. This is useful, if for some reason you cannot make it in to work unexpectedly.

10. Paper of the day

This has worked at some level to make sure that I am reading papers regularly. Posts about this here and here.

11. Make use of admin staff

If you have access to administrative staff get them to do as much of your paperwork as is feasible so you can concentrate on other things. And be nice to them! They can help a lot if you are really stuck with something, like an imminent deadline; or they can… be less helpful.

12. Be a good colleague

There’s a temptation to perform badly in tasks so that you don’t get asked again in order to reduce your workload. Don’t do this. It is true that if you are efficient, you will get asked to do more things. This is good (because not all tasks are bad). If you have too much to do, you just need to manage it. Say “No” if your workload is too high. But don’t just do a bad job. This pushes the problem onto your colleagues. If nothing else, you need their help. Also, help your colleagues if they need it. Always make yourself available to comment on their grants and papers. Interacting with colleagues is one of the most fun parts of being a PI.

13. Don’t write a book chapter

It’s a waste of time. Nobody will read it. Nobody will cite it. It will take time away from publishing real papers. Also, think carefully about writing review articles. If you have something unique to say, then go for it. Don’t do it just because you’ve been asked…

In need of some more advice?

atthehelmThis post was focussed on technicalities of running a lab to make things more efficient. There’s obviously lots more to it than this: people management, networking etc.

A great recommendation that I got after I had been a PI for a few years… this excellent book by Kathy Barker. At The Helm: Leading your laboratory. I read this and wished I’d found it earlier. The sections on early stage negotiations and planning for the moment you become a PI are great (although it is very US-centric).

I’ve also been told that the EMBO Course for New Investigators is great, although I have not attended it.

Update 12:15 13/7/15: A reader sent this link to me via email. It’s a document from HHMI on scientific management for Postdocs and New PIs. Well worth a read!!

Update 07:41 4/2/15: We now use Trello for organising activities in the lab. You can read about how we do that here. I added the lab book audit anecdote and fixed some typos.

Thanks to attendees of the QMUL ECR Retreat for your feedback on my talk. I also incorporated a few points from Kebs Hodivala-Dilke’s presentation, which was focussed more on the big picture points. If you have any other time-saving tips for PIs, leave a comment.

Tips from the blog IV – averaging

I put a recent code snippet put up on the IgorExchange. It’s a simple procedure for averaging a set of 1D waves and putting the results in a new wave. The difference between this code and Average Waves.ipf (which ships with Igor) is that this function takes the average of all points in the wave and places this single value in a new wave. You can specify whether the mean or median is used for the average.

avgwaves

I still don’t have a way to markup Igor code in wordpress.

Tips from the blog III – violin plots

Having recently got my head around violin plots, I thought I would explain what they are and why you might want to use them.

There are several options when it comes to plotting summary data. I list them here in order of granularity, before describing violin plots and how to plot them in some detail.

Bar chart

kdebarThis is the mainstay of most papers in my field. Typically, a bar representing the mean value that’s been measured is shown with an error bar which shows either the standard error of the mean, the standard deviation, or more rarely a confidence interval. The two data series plotted in all cases is the waiting time for Old Faithful eruptions (waiting), a classic dataset from R. I have added some noise to waiting (waiting_mod) for comparison. I think it’s fair to say that most people feel that the bar chart has probably had its day and that we should be presenting our data in more informative ways*.

Pros: compact, easy to tell differences between groups

Cons: hides the underlying distribution, obscures the n number

Box plot

kdeboxThe box plot – like many things in statistics – was introduced by Tukey. It’s sometimes known as a Tukey plot, or a box-and-whiskers plot. The idea was to give an impression of the underlying distribution without showing a histogram (see below). Histograms are great, but when you need to compare many distributions they do not overlay well and take up a lot of space to show them side-by-side. In the simplest form, the “box” is the interquartile range (IQR, 25th and 75th percentiles) with a line to show the median. The whiskers show the 10th and 90th percentiles. There are many variations on this theme: outliers can be shown or not, the whiskers may show the limits of the dataset (or something else), the boxes can be notched or their width may represent the sample size…

Pros: compact, easy to tell differences between groups, shows normality/skewness

Cons: hides multimodal data, sometimes obscures the n number, many variations

Histogram

kdehistoA histogram is a method of showing the distribution of a dataset and was introduced by Pearson. The number of observations within a bin are counted and plotted. The ‘bars’ sit next to each other, because the variable being measured is continuous. The variable being measured is on the x-axis, rather than the category (as in the other plots).

Often the area of all the bars is normalised to 1 in order to assess the distributions without being confused by differences in sample size. As you can see here, “waiting” is bimodal. This was hidden in the bar chart and in the bot plot.

Related to histograms are other display types such as stemplots or stem-and-leaf plots.

Pros: shows multimodal data, shows normality/skewness clearly

Cons: not compact, difficult to overlay, bin size and position can be misleading

Scatter dot plot

kdescatterIt’s often said that if there are less than 10 data points, then best practice is to simply show the points. Typically the plot is shown together with a bar to show the mean (or median) and maybe with error bars showing s.e.m., s.d., IQR. There are a couple of methods of plotting the points, because they need to be scattered in x value in order to be visualised. Adding random noise is one approach, but this looks a bit messy (top). A symmetrical scatter can be introduced by binning (middle) and a further iteration is to bin the y values rather than showing their true location (bottom). There’s a further iteration which constrains the category width and overlays multiple points, but again the density becomes difficult to see.

These plots still look quite fussy, the binned version is the clearest but then we are losing the exact locations of the points, which seems counterintuitive. Another alternative to scattering the dots is to show a rug plot (see below) where there is no scatter.

Pros: shows all the observations

Cons: can be difficult to assess the distribution

Violin plot

kdeviolinThis type of plot was introduced in the software package NCSS in 1997 and described in this paper: Hintze & Nelson (1998) The American Statistician 52(2):181-4 [PDF]. As the title says, violin plots are a synergism between box plot and density trace. A thin box plot is shown together with a symmetrical kernel density estimate (KDE, see explanation below). The point is to be able to quickly assess the distribution. You can see that the bimodality of waiting in the plot, but there’s no complication of lots of points just a smooth curve to see the data.

Pros: shows multimodal data, shows normality/skewness unambiguously

Cons: hides n, not familiar to many readers.

* Why is the bar chart so bad and why should I show my data another way?

425px-Anscombe's_quartet_3.svgThe best demonstration of why the bar chart is bad is Anscombe’s Quartet (the figure to the right is taken from the Wikipedia page). These four datasets are completely different, yet they all have the same summary statistics. The point is, you would never know unless you plotted the data. A bar chart would look identical for all four datasets.

Making Violin Plots in IgorPro

I wanted to make Violin Plots in IgorPro, since we use Igor for absolutely everything in the lab. I wrote some code to do this and I might make some improvements to it in the future – if I find the time! This was an interesting exercise, because it meant forcing myself to understand how smoothing is done. What follows below is an aide memoire, but you may find it useful.

What is a kernel density estimate?

A KDE is a non-parametric method to estimate a probability density function of a variable. A histogram can be thought of as a simplistic non-parametric density estimate. Here, a rectangle is used to represent each observation and it gets bigger the more observations are made.

What’s wrong with using a histogram as a KDE?

hist1The following examples are taken from here (which in turn are taken from the book by Bowman and Azzalini described below). A histogram is simplistic. We lose the location of each datapoint because of binning. Histograms are not smooth and the estimate is very sensitive to the size of the bins and also the starting location of the first bin. The histograms to the right show the same data points (in the rug plot).

hist2Using the same bin size, they result in very different distributions depending on where the first bin starts. My first instinct to generate a KDE was to simply smooth a histogram, but this is actually quite inaccurate as it comes from a lossy source. Instead we need to generate a real KDE.

How do I make a KDE?

optsmoothTo do this we place a kernel (a Gaussian is commonly used) at each data point. The rationale behind this is that each observation can be thought of as being representative of a greater number of observations. It sounds a bit bizarre to assume normality to estimate a density non-parametrically, but it works. We can sum all of the kernels to give a smoothed distribution: the KDE. Easy? Well, yes as long as you know how wide to make the kernels. To do this we need to find the bandwidth, h (also called the smoothing parameter).

undersmoothIt turns out that this is not completely straightforward. The answer is summarised in this book: Bowman & Azzalini (1997) Applied Smoothing Techniques for Data Analysis. In the original paper on violin plots, they actually do not have a good solution for selecting h for drawing the violins, and they suggest trying several different values for h. They recommend starting at ~15% of the data range as a good starting point. Obviously if you are writing some code, the process of selecting h needs to be automatic.

oversmoothOptimising h is necessary because if h is too large, the estimate with be oversmoothed and features will be lost. If is too small, then it will be undersmoothed and bumpy. The examples to the right (again, taken from Bowman & Azzalini, via this page) show examples of undersmoothed, oversmoothed and optimal smoothing.

An optimal solution to find h is

\(h = \left(\frac{4}{3n}\right)^{\frac{1}{5}}\sigma\)

This is termed Silverman’s rule-of-thumb. If smoothing is needed in more than one dimension, the multidimensional version is

\(h = \left\{\frac{4}{\left(p+2\right)n}\right\}^{\frac{1}{\left(p+4\right)}}\sigma\)

bowman3dYou might need multidimensional smoothing to contextualise more than one parameter being measured. The waiting data used above describes the time to wait until the next eruption from Old Faithful. The duration of the eruption is measured, and also the wait to the next eruption can be extracted, giving three parameters. These can give a 3D density estimate as shown here in the example.

The Bowman & Azzalini recommend that, if the distribution is long-tailed, using the median absolute deviation estimator is robust for \(\sigma\).

\(\tilde\sigma=median\left\{|y_i-\tilde\mu|\right\}/0.6745\)

where \(\tilde\mu\) is the median of the sample. All of this is something you don’t need to worry about if you use R to plot violins, the implementation in there is rock solid having been written in S plus and then ported to R years ago. You can even pick how the h selection is done from sm.density, or even modify the optimal h directly using hmult.

To get this working in IgorPro, I used some code for 1D KDE that was already on IgorExchange. It needed a bit of modification because it used FastGaussTransform to sum the kernels as a shortcut. It’s a very fast method, but initially gave an estimate that seemed to be undersmoothed. I spent a while altering the formula for h, hence the detail above. To cut a long story short, FastGaussTransform uses Taylor expansion of the Gauss transform and it just needed more terms to do this accurately. This is set with the /TET flag. Note also, that in Igor the width of a Gauss is sigma*2^1/2.

OK, so how do I make a Violin for plotting?

I used the draw tools to do this and placed the violins behind an existing box plot. This is necessary to be able to colour the violins (apparently transparency is coming to Igor in IP7). The other half of the violin needs to be calculated and then joined by the DrawPoly command. If the violins are trimmed, i.e. cut at the limits of the dataset, then this required an extra point to be added. Without trimming, this step is not required. The only other issue is how wide the violins are plotted. In R, the violins are all normalised so that information about n is lost. In the current implementation, box width is 0.1 and the violins are normalised to the area under the curve*(0.1/2). So, again information on n is lost.

Future improvements

Ideas for developments of the Violin Plot method in IgorPro

  • incorporate it into the ipf for making boxplots so that it is integrated as an option to ‘calculate percentiles’
  • find a better solution for setting the width of the violin
  • add other bandwidth options, as in R
  • add more options for colouring the violins

What do you think? Did I miss something? Let me know in the comments.

References

Bowman, A.W. & Azzalini, A. (1997) Applied Smoothing Techniques for Data Analysis : The Kernel Approach with S-Plus Illustrations: The Kernel Approach with S-Plus Illustrations. Oxford University Press.

Hintze, J.L. & Nelson, R.D. (1998) Violin plots: A Box Plot-Density Trace Synergism. The American Statistician, 52:181-4.

Tips from the Blog II

An IgorPro tip this week. The default font for plots is Geneva. Most of our figures are assembled using Helvetica for labelling. The default font can be changed in Igor Graph Preferences, but Preferences need to be switched on in order to be implemented. Anyway, I always seem to end up with a mix of Geneva plots and Helevetica plots. This can be annoying as the fonts are pretty similar yet the spacing is different and this can affect the plot size. Here is a quick procedure Helvetica4All() to rectify this for all graph windows.

Helvetica4All

Tips from the Blog I

What is the best music to listen to while writing a manuscript or grant proposal? OK, I know that some people prefer silence and certainly most people hate radio chatter while trying to concentrate. However, if you like listening to music, setting an iPod on shuffle is no good since a track by Napalm Death can jump from the speakers and affect your concentration. Here is a strategy for a randomised music stream of the right mood and with no repetition, using iTunes.

For this you need:
A reasonably large and varied iTunes library that is properly tagged*.

1. Setup the first smart playlist to select all songs in your library that you like to listen to while writing. I do this by selecting genres that I find conducive to writing.
Conditions are:
-Match any of the following rules
-Genre contains jazz
-add as many genres as you like, e.g. shoegaze, space rock, dream pop etc.
-Don’t limit and do check live updating
I call this list Writing

2. Setup a second smart playlist that makes a randomised novel list from the first playlist
Conditions are:
-Match all of the following rules
-Playlist is Writing   //or whatever you called the 1st playlist
-Last played is not in the last 14 days    //this means once the track is played it disappears, i.e. refreshes constantly
-Limit to 50 items selected by random
-Check Live updating
I call this list Writing List

That’s it! Now play from Writing List while you write. The same strategy works for other moods, e.g. for making figures I like to listen to different music and so I have another pair for that.

After a while, the tracks that you’ve skipped (for whatever reason) clog up the playlist. Just select all and delete from the smart playlist, this refreshes the list and you can go again with a fresh set.

* If your library has only a few tracks, or has plenty of tracks but they are all of a similar genre, this tip is not for you.