Answering the question of what fraction of a journal’s papers were previously available as a preprint is quite difficult to do. The tricky part is matching preprints (from a number of different servers) with the published output from a journal. The easy matches are those that are directly linked together, the remainder though can be […]
Category: computing
Tips from the Blog XVI: getting FASTA sequences
I am having some fun running AlphaPulldown on a computing cluster. A requirement is to have input sequences in FASTA format. I found that I needed to get ~600 sequences. I had a list of the relevant Uniprot IDs. Surely getting the sequences for these proteins should be straightforward? Solution The Uniprot IDs can be […]
Airy Area: approximating surface area of a cell from a 3D point set
In the spirit of “if it took you a while to find out how to do something, write about it”, I will detail a method to approximate the surface area of a 3D shape. Our application here was finding the surface area of a cell but it can be used on any shape. We start […]
All The Right Friends II: clustering papers using Google Scholar data
In a previous post, I looked at how Google Scholar ranks co-authors. While I had the data available I wondered whether paper authorship could be used in other ways. A few months back, John Cook posted about using Jaccard index and jazz albums. The idea is to look at the players on two jazz albums […]
Probot 2: upgrading a Mastodon bot
Earlier this year I set up a bot on Mastodon. The bot, AlbumsX3, posts an album suggestion twice-a-day. Performance has been good. It has only missed a few posts due – I think – to server glitches. However, I have made a couple of tweaks to upgrade the bot since my last post, so I […]
Mr. Mastodon Farm: analysing a mastodon ActivityPub outbox.json file
I migrated my personal Mastodon account from mastodon.social to biologists.social recently. If you’d like to do the same, I found this guide very useful. Note that, once you move, all your previous posts are left behind on the old instance. Before I migrated, I downloaded all of my data from the old instance. I thought […]
Free Bird II: Mastodon macOS clients
This is a brief review of macOS Mastodon clients that I’ve tried. It is unashamedly incomplete/non-exhaustive, but since the ones I found online from computing magazines literally look at one app, I am ahead of the pack here! tl;dr I prefer Ivory on macOS and prior to that, Mastonaut was OK. For clarity: I have […]
Step By Step: recreating a volcano plot in R
We have an analysis routine for proteomics data written for IgorPro. One output is a volcano plot. These plots show the fold change in one sample compared to another and plot that against a p-value to estimate how reproducible any changes observed are. This post is not about that software, but on the topic of […]
Pledging My Time VI: scraping and analysis of race results in R
I’ve posted in the past about analysing race results in R (most recently here). I ran the 2023 MK Marathon and wanted to have a look at the finishing times. The days of race results being made available as a csv or xls for easy analysis seem to be behind us. Instead they tend to […]
Yet Another Movie: IMDB Top 250 movies
I’m not a big movie person. Nonetheless I have a media library with quite a few films in and I wondered how many “films to see before you die”-type movies I had in the collection, and how many were missing. I used R to find the answers. I’ve described previously how to get a plain […]