Rollercoaster IV: ups and downs of Google Scholar citations

Time for an update to a previous post. For the past few years, I have been using an automated process to track citations to my lab’s work on Google Scholar (details of how to set this up are at the end of this post).

Due to the nature of how Google Scholar tracks citations, it means that citations get added (hooray!) but might be removed (booo!). Using a daily scrape of the data it is possible to watch this happening. The plots below show the total citations to my papers and then a version where we only consider the net daily change.

Four years of tracking citations on Google Scholar

The general pattern is for papers to accrue citations and some do so faster than others. You can also see that the number of citations occasionally drops down. Remember that we are looking at net change here. So a decrease of one citation is masked by the addition of one citation and vice versa. Even so, you can see net daily increases and even decreases.

It’s difficult to see what is happening down at the bottom of the graph so let’s separate them out. The two plots below show the net change in citations, either on the same scale (left) or scaled to the min/max for that paper (right).

Citation tracking for individual papers

The papers are shown here ranked from the ones that accrued the most citations down to the ones that gained no citations while they were tracked. Five “new” papers began to be tracked very recently. This is because I changed the way that the data are scraped (more on this below).

The version on the right reveals a few interesting things. Firstly that there seems to be “bump days” where all of the papers get a jolt in one direction or another. This could be something internal to Google or the addition or several items which all happen to cite a bunch of my papers. The latter explanation is unlikely, given the frequency of changes seen in the whole dataset. Secondly, some papers are highly volatile with daily toggling of citation numbers. I have no idea why this may be. Two plots below demonstrate these two points. The arrow shows a “bump day”. The plot on the right shows two review papers that have volatile citation numbers.

I’m going to keep the automated tracking going. I am a big fan of Google Scholar, as I have written previously, but quoting some of the numbers makes me uneasy, knowing how unstable they are.

Note that you can use R to get aggregate Google Scholar data as I have written about previously.

How did I do it?

The analysis would not be possible without automation. I use a daemon to run a shell script everyday. This script calls a python routine which outputs the data to a file. I wrote something in Igor to load each day’s data, and crunch the numbers, and make the graphs. The details of this part are in the previous post.

I realised that I wasn’t getting all of my papers using the previous shell script. Well, this is a bit of a hack, but I changed the calls that I make to scholar.py so that I request data from several years.

#!/bin/bash
cd /directory/for/data/
python scholar.py -c 500 --author "Sam Smith" --after=1999 --csv > g1999.csv
sleep $[ ( $RANDOM % 15 )  + 295 ]
# and so on
python scholar.py -c 500 --author "Sam Smith" --after=2019 --csv > g2019.csv
OF=all_$(date +%Y%m%d).csv
cat g*.csv > $OF
rm g*.csv

I found that I got different results for each year I made the query. My first change was to just request all years using a loop to generate the calls. This resulted in an IP ban for 24 hours! Through a bit of trial-and-error I found that reducing the queries to ten and waiting a polite amount of time between queries avoided the ban.

The hacky part was to figure out which year requests I needed to make to make sure I got most of my papers. There is probably a better way to do this!

I still don’t get every single paper and I retrieve data for a number of papers on which I am not an author – I have no idea why! I exclude the erroneous papers using the Igor program that reads all the data and plots out everything. The updated version of this code is here.

As described earlier I have many Rollercoaster songs in my library. This time it’s the song by Sleater-Kinney from their “The Woods” album.

Installing open source PyMol on a Mac

This is a quick “how to” post. There is a licensed version of PyMol (MacPyMol) available, but the open source version can be installed on a Mac free of charge. The official page has a guide, which is not terribly detailed, and I found this excellent guide which is unfortunately out-of-date.

Here is an updated guide to installing PyMol using Homebrew on macOS Mojave 10.14.3

Step 1 is to install Homebrew

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Next step is to install the PyMol dependencies using Homebrew

brew install Caskroom/cask/xquartz
brew install tcl-tk
brew install python

Now install PyMol.

brew install brewsci/bio/pymol

You can now run PyMol by typing

pymol

The MPIBR-Bioinformatics page has the following guide to make a little executable app to launch PyMol straight from the Desktop.

  • Open Automator, which is in Applications in macOS.
  • Create new document, select “Application”.
  • Select “Actions” and “Library” in the left pane. Select “Utilities” and “Run Shell Script”. Drag this into the main pane.
  • Choose “/bin/bash” as a shell.
  • Paste the following: /usr/local/bin/pymol -M. If this doesn’t work, check the path to pymol using which pymol in the terminal, and use this instead.
  • Save the application (“File > Save”) to the Desktop and name it “PyMol”.

Now you can double-click this app to run PyMol.

Experiment Zero: Using a Raspberry Pi Zero camera

This is the first post at quantixed about Raspberry Pi computing.

Pi Zero is a minimalist Raspberry Pi that can be coupled to a camera. With this little rig, you can make time-lapse footage amongst other things. I’ve set up a couple of these now. One was to make a time-lapse movie of some plants growing through a plastic maze. The results were pretty good and I thought I’d upload the video and a brief how-to guide.

After a delay, you can see four beans sprouting and then one eventually makes it to the top of the maze. This footage was shot over 27 days. The Pi took pictures every 5 min, but I sampled at 10 min in order to make the movie (after discarding the pictures after the sun went down). Everything was automated.

The camera shoots at 3280 × 2464. I downsampled the images to make the video. The camera didn’t focus well on the maze which was a bit too close. Other units are shooting scenery and the autofocus on the unit is great.

How I did it

Pi Zero

Pi Zero with camera module (without IR filter) and a case are available for around £40. I bought mine from the Pi Hut. Power supplies and SD cards are readily available. I put together the PiCam with a fresh Raspbian full image on a 16GB SD card. Another option is to use a smaller card and get the Pi to save the images to a server.

I used PiBaker to format the SD Card, load on Raspbian and add a startup script that would connect the Pi Zero to WiFi and enable VNC. That meant I could plug it in and start using it headless. Well in theory! It turns out that VNC via Mac does not work with the UNIX style password which is the default on the Pi. I needed to connect to a monitor to rectify this by changing to VNC password in the VNC GUI. After this I could log in and use the Pi Zero remotely.

A few more minor steps were needed for full functionality:

  1. I enabled ssh and camera port in Raspberry Pi Configuration, disabled bluetooth and set the correct timezone (this can probably be done in PiBaker but I forgot).
  2. Since I have several Raspberry Pis on the LAN. I needed to give this one its own identity to prevent network conflicts.
  3. I needed to set up SMB sharing on the new Pi.

Instructions for how to do these things are just one google search away.

Now the Pi was ready to start taking images. I built a little stand for it out of Lego and set up the plant maze.

Taking pictures with the Pi

I wrote a shell script to take pictures using raspistill.

I made a directory called camera in home/pi

mkdir camera

Then made a camera.sh file home/pi that looked like this:

#!/bin/bash
DATE=$(date +"%Y-%m-%d_%H%M")
raspistill -o /home/pi/camera/$DATE.jpg

Then I made it executable

chmod +x camera.sh

Using CRON, I execute the shell script on a schedule. I wanted to take pictures every 5 minutes. You can consult cronguru for your needs.

*/5 * * * * /home/pi/camera.sh 2>&1
sudo modprobe bcm2835_wdt
sudo nano /etc/modules

Adding the line “bcm2835_wdt” and saving the file

Next I installed the watchdog daemon

sudo apt-get install watchdog chkconfig
chkconfig watchdog on
sudo /etc/init.d/watchdog start
sudo nano /etc/watchdog.conf

I uncommented two lines from this file:

  • watchdog-device = /dev/watchdog
  • the line that had max-load-1 in it

Save the watchdog.conf file.

There are guides online that describe how to set up the Pi so that it sends you an email or SMS when there’s a crash/reboot. I figured I didn’t need this – as long as it reboots OK.

What now?

Well, you wait for it to take photos! You can log in via VNC and check that the images are being acquired, or go in via ssh and watch the camera directory fill up. The size of the images is 3280 × 2464 and they are around 4.5 MB each, so the disk can quickly fill.

After a while you’ll want to assemble a movie. I wrote a shell script on my Mac in order to to pull down the images, take a copy of the ones I want and then make a movie file and upload it to Dropbox so I could look at it on the go.

#!/usr/bin/env bash
# move to the location of the images
cd /local/disk/folder2/
# pull down all images to a local folder - only new images are copied
rsync -trv /Volumes/HOMEPI/camera/ /local/disk/folder/
# overnight images are dark and less than 1.5 MB
# copy the ones we want to keep
rsync -trv --min-size=1000K /local/disk/folder/ /local/disk/folder2/
# or you could filter on size like this - delete <2MB
find . -name "*.jpg" -size -2000k -delete
# scale the images down to 480 px wide and make movie
ffmpeg -framerate 30 -pattern_type glob -i '*.jpg' -c:v libx264 -pix_fmt yuv420p -vf scale=480:-2 out.mp4
# move to dropbox
mv out.mp4 /My/Dropbox/Folder/out.mp4

This script means that I had to manually delete the pictures from the Pi once they’d been copied but that was OK. My plan is to write a script to do this for the longer running projects so that it is automated.

While it is possible to make the movies on the Pi itself, I did it on the Mac as that computer is beefier and is not busy taking pictures every 5 min! ffmpeg is a great tool for this and the documentation is impressive. For example if you have set up the camera in the wrong orientation you can do transposition in ffmpeg. If you don’t have ffmpeg, it is a simple install on the command line.

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" /dev/null 2> /dev/null
brew install ffmpeg

Hopefully this guide is useful to you. The Pi Zero Camera can be used for streaming video as well as taking a series of still images. I’m planning to test this out soon.

The post title “Experiment Zero” comes from the title of the album by Man or Astro-Man?

Adventures in Code V: making a map of Igor functions

I’ve generated a lot of code for IgorPro. Keeping track of it all has got easier since I started using GitHub – even so – I have found myself writing something only to discover that I had previously written the same thing. I was thinking that it would be good to make a list of all functions that I’ve written to locate long lost functions.

This question was brought up on the Igor mailing list a while back and there are several solutions – especially if you want to look at dependencies. However, this two liner works to generate a file called funcfile.txt which contains a list of functions and the ipf file that they are appear in.

grep "^[ \t]*Function" *.ipf | grep -oE '[ \t]+[A-Za-z_0-9]+\(' | tr -d " " | tr -d "(" > output
for i in `cat output`; do grep -ie "$i" *.ipf | grep -w "Function" >> funcfile.txt ; done

Thanks to Thomas Braun on the mailing list for the idea. I have converted it to work on grep (BSD grep) 2.5.1-FreeBSD which runs on macOS. Use the terminal, cd to the directory containing your ipf files and run it. Enjoy!

EDIT: I did a bit more work on this idea and it has now expanded to its own repo. Briefly, funcfile.txt is converted to tsv and then parsed – using Igor – to json. This can be displayed using some d3.js magic.


Part of a series with code snippets and tips.