A Day In The Life III

This year #paperOTD (or paper of the day for any readers not on Twitter) did not go well for me. I’ve been busy with lots of things and I’m now reviewing more grants than last year because I am doing more committee work. This means I am finding less time to read one paper per day. Nonetheless I will round up the stats for this year. I only managed to read a paper on 59.2% of available days…

The top ten journals that published the papers that I read:

• 1 Nat Commun
• 2 J Cell Biol
• 3 Nature
• 4= Cell
• 4= eLife
• 4= Traffic
• 7 Science
• 8= Dev Cell
• 8= Mol Biol Cell
• 8= Nat Cell Biol

Nature Communications has published some really nice cell biology this year and I’m not surprised it’s number one. Also, I read more papers in Cell this year compared to last. The papers I read are mainly recent. Around 83% of the papers were published in 2015. Again, a significant fraction (42%) of the papers have statistical errors. Funnily enough there were no preprints in my top ten. I realised that I tend to read these when approving them as an affiliate (thoroughly enough for #paperOTD if they interest me) but I don’t mark them in the database.

I think my favourite paper was this one on methods to move organelles around cells using light, see also this paper for a related method. I think I’ll try again next year to read one paper per day. I’m a bit worried that if I don’t attempt this, I simply won’t read any papers in detail.

I also resolved to read one book per month in 2015. I managed this in 2014, but fell short in 2015 just like with #paperOTD. The best book from a limited selection was Matthew Cobb’s Life’s Greatest Secret. A tale of the early days of molecular biology, as it happened. I was a bit sceptical that Matthew could bring anything new to this area of scientific history. Having read Eighth Day of Creation, and then some pale imitations, I thought that this had pretty much been covered completely. This book however takes a fresh perspective and it’s worth reading. Matthew has a nice writing style, animating the dusty old main characters with a insightful detail as he goes along. Check it out.

This blog is going well, with readership growing all the time. I have written about this progress previously (here and here). The most popular posts are those on publishing: preprints, impact factors and publication lag times, rather than my science, but that’s OK. There is more to come on lag times in the New Year, stay tuned.

I am a fan of year-end lists as you may be able to tell. My album of the year is Battles – La Di Da Di which came out on Warp in September. An honourable mention goes to Air Formation – Were We Ever Here EP which I bought on iTunes since the 250 copies had long gone by the time I discovered it on AC30.

Since I don’t watch TV or go to the cinema, I don’t have a pick of the year for that. When it comes to pro-cycling, of course I have an opinion. My favourite stage race was Critérium du Dauphiné Libere which was won by Chris Froome in a close contest with Tejay van Garderen. The best one-day race was a tough pick between E3 Harelbeke won by Geraint Thomas and Omloop Het Nieuwsblad won by Ian Stannard. Although E3 was a hard man’s race in tough conditions, I have to go for Stannard outfoxing three(!) Etixx Quick Step riders to take the win in Nieuwsblad. I’m a bit annoyed that those three picks all involve Team Sky and British riders…. I won’t bore everyone with my own cycling (and running) exploits in 2015. Just to say, that I’ve been more active this year in any year since 2009.

I shouldn’t need to tell you where the post title comes from. If you haven’t heard Sgt. Pepper’s Lonely Hearts Club Band by The Beatles, you need to rectify this urgently. The greatest album recorded on 4-track equipment, no question. 🙂

I read this article on the BBC recently about alcohol consumption in the UK. In passing it mentions how many people in the UK are teetotal. I found the number reported – 21% – unbelievable so I checked out the source for the numbers.

Sure enough, ~20% of the UK population are indeed teetotal (see plots). The breakdown by gender and age is perhaps to be expected. There are fewer teetotal men than women. Older women (65+) in particular are more likely to be teetotal. There has been a slight tendency in recent years for more abstinence across the board, although last year is an exception. The BBC article noted that young people are pushing up the numbers with high rates of sobriety.

There are more interesting stats in the survey which you can check out and download. For example, London has the highest rate of teetotallers in the UK (32%).

I thought this post would make a fun antidote in the run up to the holidays, which in the UK at least is strongly linked with alcohol consumption.

The post title is taken from “Lemonade Secret Drinker” by Mansun, which featured on their first EP (One). It’s a play on “Secret Lemonade Drinker” the theme from R Whites Lemonade TV commercial in the 70s/80s (which I believe was written and sung by Elvis Costello’s father).

Where You Come From: blog visitor stats

It’s been a while since I did some navel-gazing about who reads this blog and where they come from. This week, quantixed is close to 25K views and there was a burst of people viewing an old post, which made me look again at the visitor statistics.

Where do the readers of quantixed come from?

Well, geographically they come from all around the world. The number of visitors from each country is probably related to: population of scientists and geographical spread of science people on Twitter (see below). USA is in the lead, followed by UK, Germany, Canada, France, Spain, Australia, etc.

Where do they click from? This is pretty interesting. Most people come here from Twitter (45%), around 20% come via a search on Google (mainly looking for eLife’s Impact Factor) and another ~20% come from the blog Scholarly Kitchen(!). Around 3% come from Facebook, which is pretty neat since I don’t have a profile and presumably people are linking to quantixed on there. 1% come from people clicking links that have been emailed to them – I also value these hits a lot. I guess these links are sent to people who don’t do any social media, but somebody thought the recipient should read something on quantixed. I get a few hits from blogs and sites where we’ve linked to each other. The remainder are a long list of single clicks from a wide variety of pages.

The traffic is telling me that quantixed doesn’t have “readers”. I think most people are one-time visitors, or at least occasional visitors. I do know which posts are popular:

1. Strange Things
2. Wrong Number
4. Publication lag times I and II
5. Violin plots
6. Principal Component Analysis

Just like my papers, I’ve found it difficult to predict what will be interesting to lots of people. Posts that took a long time to prepare and were the most fun to think about, have received hardly any views. The PCA post is most surprising, because I thought no-one would be interested in that!

I thoroughly enjoy writing quantixed and I really value the feedback that I get from people I talk to offline about it. I’m constantly amazed who has read something on here. The question that they always ask is “how do you find the time?”. And I always answer, “I don’t”. What I mean is I don’t really have the free time to write this blog. Between the lab, home life, sleep and cycling, there is no time for this at all. The analyses you see on here take only three hours or less. If anything looks tougher than this, I drop it. If draft posts aren’t interesting enough to get finished, they get canned. Writing the blog is a nice change from writing papers, grants and admin. So I don’t feel it detracts from work. One aim was to improve my programming through fun analyses; and I’ve definitely learnt a lot about that. The early posts on coding are pretty cringe-worthy. I also wanted to improve my writing which is still a bit dry and scientific…

My favourite type of remark is when people tell me about something that they’ve read on here, not realising that I actually write this blog! Anyway, whoever you are, wherever you come from; I hope you enjoy quantixed. If you read something here and like it, please leave a comment, tweet a link or email it to a friend. The encouragement is appreciated.

The post title is taken from “Where You Come From” by Pantera. This was a difficult one to pick, but this song had the most apt title, at least.

Your Favorite Thing: Algorithmically Perfect Playlist

I’ve previously written about analysing my iTunes library and about generating Smart Playlists in iTunes. This post takes things a bit further by generating a “perfect playlist” outside of iTunes… it is exclusively for nerds.

How can you put together a perfect playlist? What are your favourite songs? How can you tell what they are? Well, we could look at how many times you’ve played each song in your iTunes library (assuming this is mainly how you consume your music)… but this can be misleading. Songs that have been in there since the start (in my case, a decade ago) have had longer to accumulate plays than songs that were added last year. This problem was covered nicely in a post by Mr Science Show.

He suggests that your all-time greatest playlist can be modelled using

$$\frac{dp}{dt}=\frac{A}{Bt+N_0} + Ce^{-Dt}$$

Where $$N_0$$ is the number of tracks in the library at $$t_0$$, time zero. A and B are constants and the collection growing linearly over time. The second component is an additional correction for the fact that songs added more recently are likely to have garnered more plays, and as they age, they relax back into the general soup of the library. I used something similar to make my perfect playlist.

Calculating something like this is well beyond the scope of iTunes and so we need to do something more heavy duty. The steps below show how this can be achieved. Of course, I used IgorPro to do almost all of this. I tried to read in the iTunes Music Library.xml directly in Igor using the udStFiLrXML package, but couldn’t get it to work. So there’s a bit of ruby followed by an all-Igor workflow. You can scroll to the bottom to find out a) whether this was worth it and b) for other stuff I discovered along the way.

All the code to do this is available here. I’ll try to put quantixed code on github from now on.

Once the data is in Igor, the strategy is to calculate the expected number of plays a track should have received if iTunes was simply set to random. We can then compare this number to the actual number of plays. The ratio of these numbers helps us to discriminate our favourite tracks. To work out the expected plays, we calculate the number of tracks in the library over time and the inverse of this gives us the probability that a given track, at that moment in the lifetime of the library, will be played. We know the total number of plays and the lifetime of the library, so if we assume that play rate is constant over time (fair assumption), this means we can calculate the expected number of plays for each track. As noted above, there is a slight snag with this methodology, because tracks added in the last few months will have a very low number of expected plays, yet are likely to have been played quite a lot. To compensate for this I used the modelling method suggested by Mr Science Show, but only for recent songs. Hopefully that all makes sense, so now for a step-by-step guide.

Step 1: Extract data from iTunes xml file to tsv

After trying and failing to write my own script to parse the xml file, I stumbled across this on the web.

#!/usr/bin/ruby

require 'rubygems'
require 'nokogiri'

list = []
doc = Nokogiri::XML(File.open(ARGV[0], 'r'))

doc.xpath('/plist/dict/dict/dict').each do |node|

hash = {}
last_key = nil

node.children.each do |child|

next if child.blank?

if child.name == 'key'

last_key = child.text
else

hash[last_key] = child.text
end
end

list << hash
end

p list


This script was saved as parsenoko.rb and could be executed from the command line

find . -name "*.xml" -exec ruby parsenoko.rb {} > playlist.csv \;


after cd to appropriate directory containing the script and a copy of the xml file.

Step 2: A little bit of cleaning

The file starts with [ and ends with ]. Each dictionary item (dict) has been printed enclosed by {}. It’s easiest to remove these before importing to IgorPro. For my library the maximum number of keys is 38. I added a line with (ColumnA<tab>ColumnB<tab>…<tab>ColumnAL), to make sure all keys were imported correctly.

Step 3: Import into IgorPro

Import the tsv. This is preferable to csv because many tracks have commas in the track title, album title or artist name. Everything comes in as text and we will sort everything out in the next step.

LoadWave /N=Column/O/K=2/J/V={"\t"," \$",0,0}


Step 4: Get Igor to sort the key values into waves corresponding to each key

This is a major type of cleaning. What we’ll do is read the key and its value. The two are separated by => and so this is used to parse and resort the values. This will convert the numeric values to numeric waves.

This is done by executing

iTunes()

Step 5: Convert timestamps to date values

iTunes stores dates in a UTC timestamp with this format 2014-10-02T20:24:10Z. It does this for Date Added, Date Modified, Last Played etc. To do anything meaningful with these, we need to convert them to date values. IgorPro uses the time in seconds from Midnight on 1st Jan 1904 as a date system. This requires double precision FP64 waves. We can parse the string containing this time stamp and convert it using

DateRead()

Step 6: Discover your favourite tracks!

We do all of this by running

Predictor()

The way this works is described above. Note that you can run whatever algorithm you like at this point to generate a list of tracks.

Step 7: Make a playlist to feed back to iTunes

The format for playlists is the M3U file. This has a simple layout which can easily be printed to a Notebook in Igor and then saved as a file for importing back into iTunes.

To do this we run

WritePlaylist(listlen)

Where the Variable listlen is the length of the playlist. In this example, listlen=50 would give the Top 50 favourite tracks.

So what did I find out?

My top 50 songs determined by this method were quite different to the Smart Playlist in iTunes of the Most Played tracks. The tracks at the top of the Most Played list in iTunes have disappeared in the new list and these are the ones that have been in the library for a long time and I suppose I don’t listen to that much any more. The new algorithmically designed playlist has a bunch of fresher tracks that were added in the last few years and I have listened to quite a lot. Looking through I can see music that I should explore in more detail. In short, it’s a superior playlist and one that will always change and should not go stale.

Other useful stuff

There are quite a few parsing tools on the web that vary in their utility and usefulness. Some that deserve a mention are:

• The xml file should be readable as a plist by cocoa which is native to OSX
• Visualisation of what proportion of an iTunes library is by a given artist – bdunagan’s blog
• itunes-parser on github by phiggins
• Really nice XSLT to move the xml file to html – moveable-type
• Comprehensive but difficult to follow method in ruby.

The post title comes from “Your Favorite Thing” by Sugar from their LP “File Under: Easy Listening”

I saw this great tweet (fairly) recently:

I thought this was such a great explanation of when to submit your paper.

It reminded me of a diagram that I sketched out when talking to a student in my lab about a paper we were writing. I was trying to explain why we don’t exaggerate our findings. And conversely why we don’t undersell our results either. I replotted it below:

Getting out to review is a major hurdle to publishing a paper. Therefore, convincing the Editor that you have found out something amazing is the first task. This is counterbalanced by peer review, which scrutinises the claims made in a paper for their experimental support. So, exaggerated claims might get you over the first hurdle, but it will give you problems during peer review (and afterwards if the paper makes it to print). Conversely, underselling or not interpreting all your data fully is a different problem. It’s unlikely to impress the Editor as it can make your paper seem “too specialised”, although if it made it to the hands of your peers they would probably like it! Obviously at either end of the spectrum no-one likes a dull/boring/incremental paper and everyone can smell a rat if the claims are completely overblown, e.g. genome sequence of Sasquatch.

So this is why we try to interpret our results fully but are careful not to exaggerate our claims. It might not get us out to review every time, but at least we can sleep at night.

I don’t know if this is a fair representation. Certainly depending on the journal the scale of the y-axis needs to change!

The post title is taken from “Middle of the Road” by Teenage Fanclub a B-side from their single “I Don’t Want Control of You”.

Science songs

I thought I’d compile a list of songs related to biomedical science. These were all found in my iTunes library. I’ve missed off multiple entries for the same kind of thing, as indicated.

Neuroscience

• Grand Mal -Elliott Smith from XO Sessions
• She’s Lost Control – Joy Division from Unknown Pleasures (Epilepsy)
• Aneuryism – Nirvana from Hormoaning EP
• Serotonin – Mansun from Six
• Serotonin Smile – Ooberman from Shorley Wall EP
• Brain Damage – Pink Floyd from Dark Side of The Moon
• Paranoid Schizophrenic – The Bats from How Pop Can You Get?
• Headacher – Bear Quartet from Penny Century
• Headache – Frank Black from Teenager of the Year
• Manic Depression – Jimi Hendrix Experience and lots of other songs about depression
• Paranoid – Black Sabbath from Paranoid (thanks to Joaquin for the suggestion!)

Medical

• Cancer (interlude) – Mansun from Six
• Hepatic Tissue Fermentation – Carcass or pretty much any song in this genre of Death Metal
• Whiplash – Metallica from Kill ‘Em All
• Another Invented Disease – Manic Street Preachers from Generation Terrorists
• Broken Nose – Family from Bandstand
• Ana’s Song – Silverchair from Neon Ballroom (Anorexia Nervosa)
• 4st 7lb – Manic Street Preachers from The Holy Bible (Anorexia Nervosa)
• November Spawned A Monster – Morrissey from Bona Drag (disability)
• Castles Made of Sand – Jimi Hendrix Experience from Axis: Bold As Love (disability)
• Cardiac Arrest – Madness from 7
• Blue Veins – The Raconteurs from Broken Boy Soldiers
• Vein Melter – Herbie Hancock from Headhunters
• Scoliosis – Pond from Rock Collection (curvature of the spine)
• Taste the Blood – Mazzy Star… lots of songs with blood in the title.

Pharmaceutical

• Biotech is Godzilla – Sepultura from Chaos A.D.
• Luminol – Ryan Adams from Rock N Roll
• Feel Good Hit Of The Summer – Queens of The Stone Age from Rated R (prescription drugs of abuse)
• Stars That Play with Laughing Sam’s Dice – Jimi Hendrix Experience (and hundreds of other songs about recreational drugs)
• Tramazi Parti – Black Grape from It’s Great When You’re Straight…
• Z is for Zofirax – Wingtip Sloat from If Only For The Hatchery
• Goldfish and Paracetamol – Catatonia from International Velvet
• L Dopa – Big Black from Songs About Fucking

Genetics and molecular biology

• Genetic Reconstruction – Death from Spiritual Healing
• Genetic – Sonic Youth from 100%
• Hair and DNA – Hot Snakes from Audit in Progress
• DNA – Circle from Meronia
• Biological – Air from Talkie Walkie
• Gene by Gene – Blur from Think Tank
• My Selfish Gene – Catatonia from International Velvet
• Sheer Heart Attack – Queen (“it was the DNA that made me this way”)
• Mutantes – Os Mutantes
• The Missing Link – Napalm Death from Mentally Murdered E.P.
• Son of Mr. Green Genes – Frank Zappa from Hot Rats

Cell Biology

• Sweet Oddysee Of A Cancer Cell T’ Th’ Center Of Yer Heart – Mercury Rev from Yerself Is Steam
• Dead Embryonic Cells – Sepultura from Arise
• Cells – They Might Be Giants from Here Comes Science (songs for kids about science)
• White Blood Cells LP by The White Stripes
• Anything by The Membranes
• Soma – Smashing Pumpkins from Siamese Dream
• Golgi Apparatus – Phish from Junta
• Cell-scape LP by Melt Banana

Album covers with science images

Godflesh – Selfless. Scanning EM image of some cells growing on a microchip?

Circle – Meronia. Photograph of an ampuole?

Insane In The Brain

Back of the envelope calculations for this post.

An old press release for a paper on endocytosis by Tom Kirchhausen contained this fascinating factoid:

The equivalent of the entire brain, or a football field of membrane, is turned over every hour

If this is true it is absolutely staggering. Let’s check it out.

A synaptic vesicle is ~40 nm in diameter. So the surface area of 1 vesicle is

$$4 \pi r^2$$

which is 5026 nm2, or 5.026 x 10-15 m2.

Now, an American football field is 5350 m2 (including both endzones), this is the equivalent of 1.065 x 1018 synaptic vesicles.

It is estimated that the human cortex has 60 trillion synapses. This means that each synapse would need to internalise 17742 vesicles to retrieve the area of membrane equivalent to one football field.

The factoid says this takes one hour. This membrane load equates to each synapse turning over 296 vesicles in one minute, which is 4.93 vesicles per second.

Tonic activity of neurons differs throughout the brain and actually 5 Hz doesn’t sound too high (feel free to correct me on this). We’ve only considered cortical neurons, so the factoid seems pretty plausible!

For an actual football field, i.e. Association Football. The calculation is slightly more complicated. This is because there is no set size for football pitches. In England, the largest is apparently Manchester City (7598 m2) while the smallest actually belongs to the greatest football team in the world, Crewe Alexandra (5518 m2).

A brain would hoover up Man City’s ground in an hour if each synapse turned over 7 vesicles per second, while Gresty Road would only take 5 vesicles per second.

What is less clear from the factoid is whether a football field really equates to an “entire brain”. Bionumbers has no information on this. I think this part of the factoid may come from a different bit of data which is that clathrin-mediated endocytosis in non-neuronal cells can internalise the equivalent of the entire surface area of the cell in about an hour. I wonder whether this has been translated to neurons for the purposes of the quote. Either way, it is an amazing factoid that the brain can turnover this huge amount of membrane in such a short space of time.

So there you have it: quanta quantified on quantixed.

The post title is from “Insane In The Brain” by Cypress Hill from the album Black Sunday.

My Favorite Things

I realised recently that I’ve maintained a consistent iTunes library for ~10 years. For most of that time I’ve been listening exclusively to iTunes, rather than to music in other formats. So the library is a useful source of information about my tastes in music. It should be possible to look at who are my favourite artists, what bands need more investigation, or just to generate some interesting statistics based on my favourite music.

Play count is the central statistic here as it tells me how often I’ve listened to a certain track. It’s the equivalent of a +1/upvote/fave/like or maybe even a citation. Play count increases by one if you listen to a track all the way to the end. So if a track starts and you don’t want to hear it and you skip on to the next song, there’s no +1. There’s a caveat here in that the time a track has been in the library, influences the play count to a certain extent – but that’s for another post*. The second indicator for liking a track or artist is the fact that it’s in the library. This may sound obvious, but what I mean is that artists with lots of tracks in the library are more likely to be favourite artists compared to a band with just one or two tracks in there. A caveat here is that some artists do not have long careers for a variety of reasons, which can limit the number of tracks actually available to load into the library. Check the methods at the foot of the post if you want to do the same.

What’s the most popular year? Firstly, I looked at the most popular year in the library. This question was the focus of an earlier post that found that 1971 was the best year in music. The play distribution per year can be plotted together with a summary of how many tracks and how many plays in total from each year are in the library. There’s a bias towards 90s music, which probably reflects my age, but could also be caused by my habit of collecting CD singles which peaked as a format in this decade. The average number of plays is actually pretty constant for all years (median of ~4), the mean is perhaps slightly higher for late-2000s music.

Favourite styles of music: I also looked at Genre. Which styles of music are my favourite? I plotted the total number of tracks versus the total number of plays for each Genre in the library. Size of the marker reflects the median number of plays per track for that genre. Most Genres obey a rule where total plays is a function of total tracks, but there are exceptions. Crossover, Hip-hop/Rap and Power-pop are highlighted as those with an above average number of plays. I’m not lacking in Power-pop with a few thousand tracks, but I should probably get my hands on more Crossover or Hip-Hop/Rap.

Using citation statistics to find my favourite artists: Next, I looked at who my favourite artists are. It could be argued that I should know who my favourite artists are! But tastes can change over a 10 year period and I was interested in an unbiased view of my favourite artists rather than who I think they are. A plot of Total Tracks vs Mean plays per track is reasonably informative. The artists with the highest plays per track are those with only one track in the library, e.g. Harvey Danger with Flagpole Sitta. So this statistic is pretty unreliable. Equally, I’ve got lots of tracks by Manic Street Preachers but evidently I don’t play them that often. I realised that the problem of identifying favourite artists based on these two pieces of information (plays and number of tracks) is pretty similar to assessing scientists using citation metrics (citations and number of papers). Hirsch proposed the h-index to meld these two bits of information into a single metric, the h-index. It’s easily computed and I already had an Igor procedure to calculate it en masse, so I ran it on the library information.

Before doing this, I consolidated multiple versions of the same track into one. I knew that I had several versions of the same track, especially as I have multiple versions of some albums (e.g. Pet Sounds = 3 copies = mono + stereo + a capella), the top offending track was “Baby’s Coming Back” by Jellyfish, 11 copies! Anyway, these were consolidated before running the h-index calculation.

The top artist was Elliott Smith with an h-index of 32. This means he has 32 tracks that have been listened to at least 32 times each. I was amazed that Muse had the second highest h-index (I don’t consider myself a huge fan of their music) until I remembered a period where their albums were on an iPod Nano used during exercise. Amusingly (and narcissistically) my own music – the artist names are redacted – scored quite highly with two out of three bands in the top 100, which are shown here. These artists with high h-indeces are the most consistently played in the library and probably constitute my favourite artists, but is the ranking correct?

The procedure also calculates the g-index for every artist. The g-index is similar to the h-index but takes into account very highly played tracks (very highly cited papers) over the h threshold. For example, The Smiths h=26. This could be 26 tracks that have been listened to exactly 26 times or they could have been listened to 90 times each. The h-index cannot reveal this, but the g-index gets to this by assessing average plays for the ranked tracks. The Smiths g=35. To find the artists that are most-played-of-the-consistently-most-played, I subtracted h from g and plotted the Top 50. This ranked list I think most closely represents my favourite artists, according to my listening habits over the last ten years.

Track length: Finally, I looked at the track length. I have a range of track lengths in the library, from “You Suffer” by Napalm Death (iTunes has this at 4 s, but Wikipedia says it is 1.36 s), through to epic tracks like “Blue Room” by The Orb. Most tracks are in the 3-4 min range. Plays per track indicates that this track length is optimal with most of the highly played tracks being within this window. The super-long tracks are rarely listened to, probably because of their length. Short tracks also have higher than average plays, probably because they are less likely to be skipped, due to their length.

These were the first things that sprang to mind for iTunes analysis. As I said at the top, there’s lots of information in the library to dig through, but I think this is enough for one post. And not a pie-chart in sight!

Methods: the library is in xml format and can be read/parsed this way. More easily, you can just select the whole library and copy-paste it into TextEdit and then load this into a data analysis package. In this case, IgorPro (as always). Make sure that the interesting fields are shown in the full library view (Music>Songs). To do everything in this post you need artist, track, album, genre, length, year and play count. At the time of writing, I had 21326 tracks in the library. For the “H-index” analysis, I consolidated multiple versions of the same track, giving 18684 tracks. This is possible by concatenating artist and the first ten characters of the track title (separated by a unique character) and adding the play counts for these concatenated versions. The artist could then be deconvolved (using the unique character) and used for the H-calculation. It’s not very elegant, but seemed to work well. The H-index and G-index calculations were automated (previously sort-of-described here), as was most of the plot generation. The inspiration for the colour coding is from the 2013 Feltron Report.

* there’s an interesting post here about modelling the ideal playlist. I worked through the ideas in that post but found that it doesn’t scale well to large libraries, especially if they’ve been going for a long time, i.e. mine.

The post title is taken from John Coltrane’s cover version of My Favorite Things from the album of the same name. Excuse the US English spelling.

Belly Button Window

A bit of navel gazing for this post. Since moving the blog to wordpress.com in the summer, it recently accrued 5000 views. Time to analyse what people are reading…

The most popular post on the blog (by a long way) is “Strange Things“, a post about the eLife impact factor (2824 views). The next most popular is a post about a Twitter H-index, with 498 views. The Strange Things post has accounted for ~50% of views since it went live (bottom plot) and this fraction seems to be creeping up. More new content is needed to change this situation.

I enjoy putting blog posts together and love the discussion that follows from my posts. It’s also been nice when people have told me that they read my blog and enjoy my posts. One thing I didn’t expect was the way that people can take away very different messages from the same post. I don’t know why I found this surprising, since this often happens with our scientific papers! Actually, in the same way as our papers, the most popular posts are not the ones that I would say are the best.

Wet Wet Wet: I have thought about deleting the Strange Things post, since it isn’t really what I want this blog to be about. An analogy here is the Scottish pop-soul outfit Wet Wet Wet who released a dreadful cover of The Troggs’ “Love is All Around” in 1994. In the end, the band deleted the single in the hope of redemption, or so they said. Given that the song had been at number one for 15 weeks, the damage was already done. I think the same applies here, so the post will stay.

Directing Traffic: Most people coming to the blog are clicking on links on Twitter. A smaller number come via other blogs which feature links to my posts. A very small number come to the blog via a Google search. Google has changed the way it formats the clicks and so most of the time it is not possible to know what people were searching for. For those that I can see, the only search term is… yes, you’ve guessed it: “elife impact factor”.

Methods: WordPress stats are available for blog owners via URL formatting. All you need is your API key and (obviously) your blog address.

Instructions are found at http://stats.wordpress.com/csv.php

A basic URL format would be: http://stats.wordpress.com/csv.php?api_key=yourapikey&blog_uri=yourblogaddress replacing yourapikey with your API key (this can be retrieved at https://apikey.wordpress.com) and yourblogaddress with your blog address e.g. quantixed.wordpress.com

Various options are available from the first page to get the stats in which you are  interested. For example, the following can be appended to the second URL to get a breakdown of views by post title for the past year:

&table=postviews&days=365&limit=-1

The format can be csv, json or xml depending on how your preference for what you want to do next with the information.

The title is from “Belly Button Window” by Jimi Hendrix, a posthumous release on the Cry of Love LP.

Tips from the Blog II

An IgorPro tip this week. The default font for plots is Geneva. Most of our figures are assembled using Helvetica for labelling. The default font can be changed in Igor Graph Preferences, but Preferences need to be switched on in order to be implemented. Anyway, I always seem to end up with a mix of Geneva plots and Helevetica plots. This can be annoying as the fonts are pretty similar yet the spacing is different and this can affect the plot size. Here is a quick procedure Helvetica4All() to rectify this for all graph windows.