Tips from the blog VI: doc to pdf

doctopdfA while back I made this little Automator script to convert Microsoft Word doc and docx files to PDF. It’s useful for when you are sent a bunch of Word files for committee work. Opening PDFs in Preview is nice and hassle-free. Struggling with Word is not.

It’s not my own work, I just put it together after googling around a bit. I’ll put it here for anyone to use.

To get it working:

  1. Open Automator. Choose Template Service and you need to check: Service receives selected files and folders in Finder.app
  2. Set the actions: Get Folder content and Run AppleScript (this is where the script goes)
  3. Now Save the workflow. Suggested name Doc to PDF.

Now to run it:

  1. Select the doc/docx file(s) in the Finder window.
  2. Right-click and look for the service in the contextual menu. This should be down the bottom near “Reveal in Finder”.
  3. Run it.

If you want to put it onto another computer. Go to ~/Library/Services and you will find the saved workflow there. Just copy to the same location on your other computer.

Known bug: Word has to be open for the script to run. It also doesn’t shut down Word when it’s finished.

Here is the code.

property theList : {"doc", "docx"}

on run {input, parameters}
          set output to {}
          tell application "Microsoft Word" to set theOldDefaultPath to get default file path file path type documents path
          repeat with x in input
                    try
                              set theDoc to contents of x
                              tell application "Finder"
                                        set theFilePath to container of theDoc as text

                                        set ext to name extension of theDoc
                                        if ext is in theList then
                                                  set theName to name of theDoc
                                                  copy length of theName to l
                                                  copy length of ext to exl

                                                  set n to l - exl - 1
                                                  copy characters 1 through n of theName as string to theFilename

                                                  set theFilename to theFilename & ".pdf"

                                                  tell application "Microsoft Word"
  set default file path file path type documents path path theFilePath
                                                            open theDoc
                                                            set theActiveDoc to the active document
  save as theActiveDoc file format format PDF file name theFilename
                                                            copy (POSIX path of (theFilePath & theFilename as string)) to end of output
  close theActiveDoc
                                                  end tell
                                        end if
                              end tell
                    end try
          end repeat
          tell application "Microsoft Word" to set default file path file path type documents path path theOldDefaultPath

          return output
end run

 

Tips from the blog is a series and gets its name from a track from Black Sunday by Cypress Hill.

Pull Together: our new paper on “The Mesh”

We have a new paper out! You can access it here.

Title of the paper: The mesh is a network of microtubule connectors that stabilizes individual kinetochore fibers of the mitotic spindle

bundle1What’s it about? When a cell divides, the two new cells need to get the right number of chromosomes. If this process goes wrong, it is a disaster which may lead to disease e.g. cancer. The cell shares the chromosomes using a “mitotic spindle”. This is a tiny machine made of microtubules and other proteins. We have found that the microtubules are held together by something called “the mesh”. This is a weblike structure which connects the microtubules and gives them structural support.

Does this have anything to do with cancer? Some human cancer cells have high levels of  proteins called TACC3 and Aurora A kinase. We know that TACC3 is changed by Aurora A kinase. This changed form of TACC3 is part of the mesh. In our paper we mimic the cancer condition by increasing TACC3 levels. The mesh changes and the microtubules become wonky. This causes problems for dividing cells. It might be possible to target TACC3 using drugs to treat certain types of cancer, but this is a long way in the future.

Who did the work? Faye Nixon, a PhD student in the lab did most of the work. She used a method to look at mitotic spindles in 3D to study the mesh. My lab actually discovered the mesh by accident. A previous student, Dan Booth – back in 2011 – was looking at mitotic spindles to try and get 3D electron microscopy (tomography) working in the lab. Tomography works just like a CAT scan in a hospital, but on a much smaller scale. The mesh is found in the gaps between microtubules that are 25 nanometre wide (1 nanometre is 1 billionth of a metre), this is about 3,000 times smaller than a human hair, so it is very small! It was Dan who found the mesh and gave it the name. Other people in the lab did some really nice work which helped us to understand how the mesh works in dividing cells. Cristina Gutiérrez-Caballero did some experiments using a different type of microscope and Fiona Hood contributed some test tube experiments. Ian Prior at University of Liverpool, co-supervises Faye and helped with electron microscopy.

Have you discovered a new structure in cells? Yes and No. All cell biologists dream of finding a new structure in cells. It’s so unlikely though. Scientists have been looking at cells since the 17th Century and so the chances of seeing something that no-one has seen before are very small. In the 1970s, “inter-microtubule bridges” in the mitotic spindle were described using 2D electron microscopy. What we have done is to look at these structures in 3D for the first time and find that they are a network rather than individual connectors.

The work was funded by Cancer Research UK and North West Cancer Research Fund.

References

Nixon, F.M., Gutiérrez-Caballero, C., Hood, F.E., Booth, D.G., Prior, I.A. & Royle, S.J. (2015) The mesh is a network of microtubule connectors that stabilizes individual kinetochore fibers of the mitotic spindle eLife, doi: 10.7554/eLife.07635

This post is written in plain English to try to describe what is in the paper. I’m planning on writing a more technical post on some of the spatial statistics we developed as part of this paper.

The post title is from “Pull Together” a track from Shack’s H.M.S. Fable album.

Green is the Colour: mNeonGreen spectra

I was searching for the excitation and emission spectra for mNeonGreen. I was able to find an image, but no values for the spectra. Here is an approximation of the spectra (xlsx format, still haven’t figured out csv for wordpress).

I got these values using IgorThief.ipf a very handy tool that allows the extraction of XY coordinates from a bitmapped plot (below).

mNeonGreen is available from Allele Biotechnologies.

Here is a great site for comparing fluorescent protein properties.

Edit @ 06:54 4/7/14: A web-based data thief was suggested by @dozenoaks

The post title is taken from “Green is the Colour” from the Pink Floyd LP “More”

Tips from the Blog V: Advice for New PIs

I recently gave a talk at a retreat for new PIs working at QMUL. My talk was focussed on tips for getting started, i.e. the nitty gritty of running an efficient lab. It was a mix of things I’ve been told, worked out for myself or that I’d learned the hard way.

PIs are expected to be able to do lots of things that can be full-time jobs in themselves. In my talk, I focussed on ways to make yourself more efficient to give yourself as much time to tackle all these different roles that you need to take on. You don’t need to work 80 hours a week to succeed, but you do need to get organised.

1. Timelines

Get a plan together. A long-term (5 -year) plan and a shorter (1-2 year) plan. What do you want to achieve in the lab? What papers do you want to publish? How many people do you need in the lab? What grants do you need? When are your next three grant applications due? When is the first one due? Work back from there. It’s January, the first one is due in September, better get that paper submitted! You need a draft application available for circulation to colleagues in good time to do something about the comments. Plan well. Don’t leave anything to the last minute. But don’t apportion too much time as the task will expand to fill it.

Always try to work towards the big goals. It’s too easy to spend all of your time on “urgent” things and busywork (fire-fighting). Prioritise Important over Urgent.

2. Time audit

Doing a time audit is a good way to identify where you are wasting time and how to reorganise your day to be more effective. Do you find it difficult to write first thing in the morning? If so, why not deal with your email or paperwork first thing since it requires less brain activity. Can you work during your commute? Save busywork for then. Can you switch between lab work and desk work well? Where are you fitting in teaching and admin? Try and find out answers to these questions with a time audit. It’s a horrible corporate thing to do, but I found it worked for me.

3. Lab manual

This was a popular idea. Paul Nurse’s lab had one – so should yours! The Royle lab manual has the following sections:

  • Lab organisation
  • Molecular Biology
  • Cell Biology
  • Biochemistry
  • Imaging

The lab organisation section has subsections on 1) how to keep a lab book; 2) lab organisation (databases, plasmid/antibody organisation); 3) computers/data storage; 4) lab calculations; 5) making figures. The other sections are a collection of our tried-and-tested protocols. New protocols are submitted to a share on the server and honed until ready for preservation in the Lab Manual. The idea is that you give the manual to all new starters and get them to stick to it and to consult the manual first when troubleshooting. People in the lab like it, because they are not left guessing exactly what you expect of them.

As part of this. You need to sort out lab databases and a lab server for all of the data. One suggestion was to give one person in the lab the job of looking after (a.k.a. policing) the databases and enforcing compliance. We don’t do this and instead do spot checks every few weeks/months to ensure that things haven’t drifted too far.

Occasionally, and at random, I’ll ask all lab members to bring their lab books to our lab meeting. I ask everyone to swap books with someone else. I then pick a random date and ask person X to describe (using the lab book) what person Y did on that day. It’s a bit of fun, but makes people realise how important keeping a good lab book is.

4. Tame your email

There are lots of tips on how to do this – find something that works for you. For example, I set up several filters/rules that move messages that are low importance away from my inbox. I flag messages and deal with them later if they will take more than 5 sec to deal with. I’ve tried checking at specified times of the day – doesn’t work well for me – but it might for you. Out-of-hours email is a problem. Just remember that no email is so urgent that it cannot wait until the morning – otherwise they would phone you.

5. Automation

Again there are lots of tips out there, e.g. in this post from Sylvain Deville. I have set up macros for routine things like exporting figures in a variety of formats/resolutions and assembling them with a manuscript file to one PDF for circulating around the lab for comment. We have workflows for building figures and writing papers. Anything that you do frequently is worth automating.

6. Deposit your plasmids with Addgene

They’ll distribute them for you! This saves a lot of time. You still get to check who is requesting them if you are curious.

7. Organising frequently-used files

Spend some time making some really good schematic figures. The can be used reused and rejigged time and again for a variety of purposes – talks, manuscripts etc. It’s worth doing this well and with a diagram that is definitely yours and not plundered from the web. Also, never retype text instructions – save them somewhere and just cut-and-paste. Examples include: answers for common questions from students, details of how to do something in the lab, details of how to get to the lab, brief biography, abstracts for talks…

Have a long-format CV always ready, keep updating it (I’ve not found a good way to automate this, yet). I get asked for my CV all the time for lots of different things. Have the long (master) CV set up so that you can delete sections as appropriate, depending on the purpose. Use the publication list from this for pasting into various boxes that you are required to fill out. An Endnote smart list of all of your papers is also handy for rapidly formatting a list of your papers according to different styles. Try to keep your web profiles up-to-date. If you publish a new paper add to your CV and all of your profiles so they don’t look out of date. ORCiD, Researchfish, whatever, try and keep them all current.

Get a slidedeck together of all your slides on a topic. Pull from here to put your talks together. Get a slidepack together to show to visitors to the lab at a moment’s notice. Also, when you publish a new paper, make slides of the final figure versions and add them to the master slidedeck.

8. Alerts

Set up literature alerts. My advice would be don’t have these coming to your inbox. Use RSS. This way you can efficiently mark interesting papers to look at later and keep your email free of clutter. Grab feeds for your favourite journals and for custom pubmed searches. Not just for subject keywords but also for colleagues and scientists who you think do interesting work. Set up Google Scholar to send you papers that have cited your work. Together with paper recommendations from Twitter (or maybe some other services – PubChase etc.) you’ll be as sure as you can be that you’re not missing anything. Also grab feeds from funding agencies, so that you don’t miss calls for grant applications. If all of these things are in place, you don’t need to browse the web which can be a huge time drain.

9. Synchronise

I have several computers synced via Unison (thanks to Daniel and Christophe who suggested this to me years ago). You can do this via Dropbox, but the space is limited. Unison syncs all my documents so that I am always able to work on the latest versions of things wherever I am. This is useful, if for some reason you cannot make it in to work unexpectedly.

10. Paper of the day

This has worked at some level to make sure that I am reading papers regularly. Posts about this here and here.

11. Make use of admin staff

If you have access to administrative staff get them to do as much of your paperwork as is feasible so you can concentrate on other things. And be nice to them! They can help a lot if you are really stuck with something, like an imminent deadline; or they can… be less helpful.

12. Be a good colleague

There’s a temptation to perform badly in tasks so that you don’t get asked again in order to reduce your workload. Don’t do this. It is true that if you are efficient, you will get asked to do more things. This is good (because not all tasks are bad). If you have too much to do, you just need to manage it. Say “No” if your workload is too high. But don’t just do a bad job. This pushes the problem onto your colleagues. If nothing else, you need their help. Also, help your colleagues if they need it. Always make yourself available to comment on their grants and papers. Interacting with colleagues is one of the most fun parts of being a PI.

13. Don’t write a book chapter

It’s a waste of time. Nobody will read it. Nobody will cite it. It will take time away from publishing real papers. Also, think carefully about writing review articles. If you have something unique to say, then go for it. Don’t do it just because you’ve been asked…

In need of some more advice?

atthehelmThis post was focussed on technicalities of running a lab to make things more efficient. There’s obviously lots more to it than this: people management, networking etc.

A great recommendation that I got after I had been a PI for a few years… this excellent book by Kathy Barker. At The Helm: Leading your laboratory. I read this and wished I’d found it earlier. The sections on early stage negotiations and planning for the moment you become a PI are great (although it is very US-centric).

I’ve also been told that the EMBO Course for New Investigators is great, although I have not attended it.

Update 12:15 13/7/15: A reader sent this link to me via email. It’s a document from HHMI on scientific management for Postdocs and New PIs. Well worth a read!!

Update 07:41 4/2/15: We now use Trello for organising activities in the lab. You can read about how we do that here. I added the lab book audit anecdote and fixed some typos.

Thanks to attendees of the QMUL ECR Retreat for your feedback on my talk. I also incorporated a few points from Kebs Hodivala-Dilke’s presentation, which was focussed more on the big picture points. If you have any other time-saving tips for PIs, leave a comment.

My Blank Pages II: Statistics Done Wrong

I have just finished reading this excellent book, Statistics done wrong: a woefully complete guide by Alex Reinhart. I’d recommend it to anyone interested in quantitative biology and particularly to PhD students starting out in biomedical science.

20150524_214742Statistics is a topic that many people find difficult to grasp. I think there are a couple of reasons for this that I’ll go into below. The aim of this book is to comprehensively cover the common mistakes and errors that are continually crop up in data analysis. The author writes in an easy-to-understand style and – this is the important bit – he dispenses with nearly all the equations. The result is an accessible guide on “what not to do” in significance testing.

I think there are two main reasons why people find statistics tough: uncertainty and mathematical anxiety.

First, uncertainty. What I mean is the uncertainty over what statistical approach to take, rather than the uncertainty that can be studied using statistics! It is very easy to find fault in which statistical approaches have been used in a study by a biologist. Why did they show the confidence interval and not the standard deviation? Why haven’t they corrected for multiple testing…? Statistics has a “gotcha” reputation. The reason for the uncertainty is that it is difficult to come up with a hard-and-fast set of guidelines of approaches to take, because this depends a lot on the type of data that has been collected, what is being tested etc. And there are often several ways to do the same thing. This uncertainty doesn’t go away even with a firm grounding in statistics. The methods are nearly always up for debate as far as I can see. And I think it is this uncertainty that prevents people from really engaging with statistics. In the absence of clear direction, it seems like having in mind a set of “what not to do”, is a useful approach to stats.

Second, mathematical anxiety, i.e. fear of maths. Biology has a reputation for being populated by people who ended up here through an affinity with science but a discomfort with physics and maths. This is unfair as there are many areas of biology where this is not true and statistical/quantitative approaches are right at the forefront. Nonetheless, there is a reason why there are umpteen “Statistics for Biologists” books in the bookshop. Now, the way that statistics is taught is to crunch through the equations that describe statistical concepts. Again, this means that people who really need to know about statistics for their research are held back if they don’t have a mathematical background or just find maths a bit daunting. The situation is well described by a recent post at Will Kurt’s excellent Count Bayesie blog on the teaching of statistics. His point is: insisting that students know these equations gets in the way of them understanding statistics. Nowadays, calculating something like the standard deviation is trivial using a computer and we are unlikely to need to know the derivation of an equation in order to do our work. We should just skip the equations and explain why.

The nice thing about this book is that the author has collected together all the faux pas that you’re likely to encounter and how to avoid them. This goes some way to addressing uncertainty in what methods to use. Secondly, the author has dispensed with the equations, so the mathematically anxious can pick it up without fear. These features make this book different to other stats books that I’ve read.

You can find copies at many online retailers. It’s published by No Starch. I picked up a copy after reading about it on Nathan Yau’s Flowing Data blog.

The post title comes from “My Blank Pages” by Velvet Crush from their Teenage Symphonies to God LP.

Tips from the blog IV – averaging

I put a recent code snippet put up on the IgorExchange. It’s a simple procedure for averaging a set of 1D waves and putting the results in a new wave. The difference between this code and Average Waves.ipf (which ships with Igor) is that this function takes the average of all points in the wave and places this single value in a new wave. You can specify whether the mean or median is used for the average.

avgwaves

I still don’t have a way to markup Igor code in wordpress.

Wrong Number: A closer look at Impact Factors

This is a long post about Journal Impact Factors. Thanks to Stephen Curry for encouraging me to post this.

tl;dr

  • the JIF is based on highly skewed data
  • it is difficult to reproduce the JIFs from Thomson-Reuters
  • JIF is a very poor indicator of the number of citations a random paper in the journal received
  • reporting a JIF to 3 d.p. is ridiculous, it would be better to round to the nearest 5 or 10.

I really liked this recent tweet from Stat Fact

It’s a great illustration of why reporting means for skewed distributions is a bad idea. And this brings us quickly to Thomson-Reuters’ Journal Impact Factor (JIF).

I can actually remember the first time I realised that the JIF was a spurious metric. This was in 2003, after reading a letter to Nature from David Colquhoun who plotted out the distribution of citations to a sample of papers in Nature. Up until that point, I hadn’t appreciated how skewed these data are. We put it up on the lab wall.

dcif

Now, the JIF for a given year is calculated as follows:

A JIF for 2013 is worked out by counting the total number of 2013 cites to articles in that journal that were published in 2011 and 2012. This number is divided by the number of “citable items” in that journal in 2011 and 2012.

There are numerous problems with this calculation that I don’t have time to go into here. If we just set these aside for the moment, the JIF is still used widely today and not for the purpose it was originally intended. Eugene Garfield, created the metric to provide librarians with a simple way to prioritise subscriptions to Journals that carried the most-cited scientific papers. The JIF is used (wrongly) in some institutions in the criteria for hiring, promotion and firing. This is because of the common misconception that the JIF is a proxy for the quality of a paper in that journal. Use of metrics in this manner is opposed by the SF-DORA and I would encourage anyone that hasn’t already done so, to pledge their support for this excellent initiative.

Why not report the median rather than the mean?

With the citation distribution in mind, why do Thomson-Reuters calculate the mean rather than the median for the JIF? It makes no sense at all. If you didn’t quite understand why from the @statfact tweet above, then look at this:

ActaJIFThe Acta Crystallographica Section A effect. The plot shows that this journal had a JIF of 2.051 in 2008 which jumped to 49.926 in 2009 due to a single highly-cited paper. Did every other paper in this journal suddenly get amazingly awesome and highly-cited for this period? Of course not. The median is insensitive to outliers like this.

The answer to why Thomson-Reuters don’t do this is probably for ease of computation. The JIF (mean) requires only three numbers for each journal, whereas calculating the median would require citation information for each paper under consideration for each journal. But it’s not that difficult (see below). There’s also a mismatch in the items that bring in citations to the numerator and those that count as “citeable items” in the denominator. This opacity is one of the major criticisms of the Impact Factor and this presents a problem for them to calculate the median.

Let’s crunch some citation numbers

I had a closer look at citation data for a small number of journals in my field. DC’s citation distribution plot was great (in fact, superior to JIF data) but it didn’t capture the distribution that underlies the JIF. I crunched the IF2012 numbers (released in June 2013) sometime in December 2013. This is shown below. My intention was to redo this analysis more fully in June 2014 when the IF2013 was released, but I was busy, had lost interest and the company said that they would be more open with the data (although I’ve not seen any evidence for this). I wrote about partial impact factors instead, which took over my blog. Anyway, the analysis shown here is likely to be similar for any year and the points made below are likely to hold.

I mainly looked at Nature, Nature Cell Biology, Journal of Cell Biology, EMBO Journal and J Cell Science. Using citations in 2012 articles to papers published in 2010 and 2011, i.e. the same criteria as for IF2012.

The first thing that happens when you attempt this analysis is that you realise how unreproducible the Thomson-Reuters JIFs are. This has been commented on in the past (e.g. here), yet I had the same data as the company uses to calculate JIFs and it was difficult to see how they had arrived at their numbers. After some wrangling I managed to get a set of papers for each journal that gave close to the same JIF.

2012IFMeanMedian

From this we can look at the citation distribution within the dataset for each journal. Below is a gallery of these distributions. You can see that the data are highly skewed. For example, JCB has kurtosis of 13.5 and a skewness of 3. For all of these journals ~2/3 of papers had fewer than the mean number of citations. With this kind of skew, it makes more sense to report the median (as described above). Note that Cell is included here but was not used in the main analysis.

So how do these distributions look when compared? I plotted each journal compared to JCB. They are normalised to account for the differing number of papers in each dataset. As you can see they are largely overlapping.

2012CitationDist

If the distributions overlap so much, how certain can we be that a paper in a journal with a high JIF will have more citations than a paper in a journal with a lower JIF? In other words, how good is the JIF (mean or median) at predicting how many citations a paper published in a certain journal is likely to have?

To look at this, I ran a Monte Carlo analysis comparing a random paper from one journal with a random one from JCB and looked at the difference in number of citations. Papers in EMBO J are indistinguishable from JCB. Papers in JCS have very slightly fewer citations than JCB. Most NCB papers have a similar number of cites to papers in JCB, but there is a tail of papers with higher cites, a similar but more amplified picture for Nature.

1paperSubtract

Thomson-Reuters quotes the JIF to 3 d.p. and most journals use this to promote their impact factor (see below). The precision of 3 d.p. is ridiculous when two journals with IFs of 10.822 and 9.822 are indistinguishable when it comes to the number of citations to randomly sampled papers in that journal.

So how big do differences in JIF have to be in order to be able to tell a “Journal X paper” from a “Journal Y paper” (in terms of citations)?

To look at this I ran some comparisons between the journals in order to get some idea of “significant differences”. I made virtual issues of each journal with differing numbers of papers (5,10,20,30) and compared the citations in each via Wilcoxon rank text and then plotted out the frequency of p-values for 100 of these tests. Please leave a comment if you have a better idea to look at this. I liked this method over the head-to-head comparison for two papers as it allows these papers the benefit of the (potential) reflected glory of other papers in the journal. In other words, it is closer to what the JIF is about.

OK, so this shows that sufficient sample size is required to detect differences, no surprise there. But at N=20 and N=30 the result seems pretty clear. A virtual issue of Nature trumps a virtual issue of JCB, and JCB beats JCS. But again, there is no difference between JCB and EMBO J. Finally, only ~30% of the time would a virtual issue of NCB trump JCB for citations! NCB and JCB had a difference in JIF of  almost 10 (20.761 vs 10.822). So not only is quoting the JIF to 3 d.p. ridiculous, it looks like rounding the JIF to the nearest 5 (or 10) might be better!

This analysis supports the idea that there are different tiers of journal (in Cell Biology at least). But the JIF is the bluntest of tools to separate these journals. A more rigorous analysis is needed to demonstrate this more clearly but it is not feasible to do this while having a dataset which agrees with that of Thomson-Reuters (without purchasing the data from the company).

If you are still not convinced about how shortcomings of the JIF, here is a final example. The IF2013 for Nature increased from 38.597 to 42.351. Let’s have a look at the citation distributions that underlie this rise of 3.8! As you can see below they are virtually identical. Remember that there’s a big promotion that the journal uses to pull in new subscribers, seems a bit hollow somehow doesn’t it? Disclaimer: I think this promotion is a bit tacky, but it’s actually a really good deal… the News stuff at the front and the Jobs section at the back alone are worth ~£40.

Show us the data!

CellBiolIFDist
More skewed distributions: The distribution of JIFs in the Cell Biology Category for IF2012 is itself skewed. Median JIF is 3.2 and Mean JIF is 4.8.

Recently, Stephen Curry has called for Journals to report the citation distribution data rather than parroting their Impact Factor (to 3 d.p.). I agree with this. The question is though – what to report?

  • The IF window is far too narrow (2 years + 1 year of citations) so a broader window would be more useful.
  • A comparison dataset from another journal is needed in order to calibrate ourselves.
  • Citations are problematic – not least because they are laggy. A journal could change dramatically and any citation metric would not catch up for ~2 years.
  • Related to this some topics are hot and others not. I guess we’re most interested in how a paper in Journal X compares to others of its kind.
  • Any information reported needs to be freely available for re-analysis and not in the hands of a company. Google Scholar is a potential solution but it needs to be more open with its data. They already have a journal ranking which provides a valuable and interesting alternative view to the JIF.

One solution would be to show per article citation profiles comparing these for similar papers. How do papers on a certain topic in Journal X compare to not only those in Journal Y but to the whole field? In my opinion, this metric would be most useful when assessing scholarly output.

Summary

Thanks for reading to the end (or at least scrolling all the way down). The take home points are:

  • the JIF is based on highly skewed data.
  • the median rather than the mean is better for summarising such distributions.
  • JIF is a very poor indicator of the number of citations a random paper in the journal received!
  • reporting a JIF to 3 d.p. is ridiculous, it would be better to round to the nearest 5 or 10.
  • an open resource for comparing citation data per journal would be highly valuable.

The post title is taken from “Wrong Number” by The Cure. I’m not sure which album it’s from, I only own a Greatest Hits compilation.

Middle of the road: pitching your paper

I saw this great tweet (fairly) recently:

https://twitter.com/jkpfeiff/status/589148184284254208/

I thought this was such a great explanation of when to submit your paper.

It reminded me of a diagram that I sketched out when talking to a student in my lab about a paper we were writing. I was trying to explain why we don’t exaggerate our findings. And conversely why we don’t undersell our results either. I replotted it below:

PaperPitch

Getting out to review is a major hurdle to publishing a paper. Therefore, convincing the Editor that you have found out something amazing is the first task. This is counterbalanced by peer review, which scrutinises the claims made in a paper for their experimental support. So, exaggerated claims might get you over the first hurdle, but it will give you problems during peer review (and afterwards if the paper makes it to print). Conversely, underselling or not interpreting all your data fully is a different problem. It’s unlikely to impress the Editor as it can make your paper seem “too specialised”, although if it made it to the hands of your peers they would probably like it! Obviously at either end of the spectrum no-one likes a dull/boring/incremental paper and everyone can smell a rat if the claims are completely overblown, e.g. genome sequence of Sasquatch.

So this is why we try to interpret our results fully but are careful not to exaggerate our claims. It might not get us out to review every time, but at least we can sleep at night.

I don’t know if this is a fair representation. Certainly depending on the journal the scale of the y-axis needs to change!

The post title is taken from “Middle of the Road” by Teenage Fanclub a B-side from their single “I Don’t Want Control of You”.

Zero Tolerance

We were asked to write a Preview piece for Developmental Cell. Two interesting papers which deal with the insertion of amphipathic helices in membranes to influence membrane curvature during endocytosis were scheduled for publication and the journal wanted some “front matter” to promote them.

Our Preview is paywalled – sorry about that – but I can briefly tell you why these two papers are worth a read.

The first paper – a collaboration between EMBL scientists led by Marko Kaksonen – deals with the yeast proteins Ent1 and Sla2. Ent1 has an ENTH domain and Sla2 has an ANTH domain. ENTH stands for Epsin N-terminal homology whereas ANTH means AP180 N-terminal homology. These two domains are known to bind membrane and in the case of ENTH to tubulate and vesiculate giant unilamellar vesicles (GUVs). Ent1 does this via an amphipathic helix “Helix 0” that inserts into the outer leaflet to bend the membrane. The new paper shows that Ent1 and Sla2 can bind together (regulated by PIP2) and that ANTH regulates ENTH so that it doesn’t make lots of vesicles, instead the two team up to make regular membrane tubules. The tubules are decorated with a regular “coat” of these adaptor proteins. This coat could prepattern the clathrin lattice. Also, because Sla2 links to actin, then actin can presumably pull on this lattice to help drive the formation of a new vesicle. The regular spacing might distribute the forces evenly over large expanses of membrane.

The second paper – from David Owen’s lab at CIMR in Cambridge – shows that CALM (a protein with an ANTH domain) actually has a secret Helix 0! They show that this forms on contact with lipid. CALM influences the size of clathrin-coated pits and vesicles, by influencing curvature. They propose a model where cargo size needs to be matched to vesicle size, simply due to the energetics of pit formation. The idea is that cells do this by regulating the ratio of AP2 to CALM.

You can read our preview and the papers by Skruzny et al and Miller et al in the latest issue of Dev Cell.

The post title and the title of our Preview is taken from “Zero Tolerance” by Death from their Symbolic LP. I didn’t want to be outdone by these Swedish scientists who have been using Bob Dylan song titles and lyrics in their papers for years.

To Open Closed Doors: How open is your reference list?

Our most recent manuscript was almost ready for submission. We were planning to send it to an open access journal. It was then that I had the thought: how many papers in the reference list are freely available?

It somehow didn’t make much sense to point readers towards papers that they might not be able to access. So, I wondered if there was a quick way to determine how papers in my reference list were open access. I asked on twitter and got a number of suggestions:

  1. Search crossref to find out if the journal is in DOAJ (@epentz)
  2. How Open Is It? from Cottage Labs will check a list of DOIs (up to 20) for openness (@emanuil_tolev)
  3. Open access DOI Resolver will perform a similar task (@neurocraig)

I actually used a fourth method (from @biochemistries and @invisiblecomma) which was to use HubMed, although in the end a similar solution can be reached by searching PubMed itself. Whereas the other strategies will work for a range of academic texts, everything in my reference list was from PubMed. So this solution worked well for me. I pulled out the list of Accessions (PMIDs) for my reference list. This was because some papers were old and I did not have their DOIs. The quickest way to do this was to make a new EndNote style that only contained the field Accession and get it to generate a new bibliography from my manuscript. I appended [uid] OR after each one and searched with that term.

What happened?

My paper had 44 references. Of these, 35 were freely available to read. I was actually surprised by how many were available. So, 9 papers were not free to read. As advised, I checked each one to really make sure that the HubMed result was accurate, and it was.

Please note that I’d written the paper without giving this a thought and citing papers as I normally do: the best demonstration of something, the first paper to show something, using primary papers as far as possible.

Seven of the nine I couldn’t compromise on. They’re classic papers from 80s and 90s that are still paywalled but are unique in what they describe. However, two papers were reviews in closed access journals. Now these I could do something about! Especially as I prefer to cite the primary literature anyway. Plus, most reviews are pretty unoriginal in what they cover and an alternative open access version that is fairly recent can easily be found. I’ll probably run this check for future manuscripts and see what it throws up.

Summary

It’s often said that papers are our currency in science. The valuation of this currency comes from citations. Funnily enough, we the authors are in a position to actually do something about this. I don’t think any of us should compromise the science in our manuscripts. However, I think we could all probably pay a bit more attention to the citations that we dish out when writing a paper. Whether this is simply to make sure that what we cite is widely accessible, or just making sure that credit goes to the right people.

The post title is taken from “To Open Closed Doors” by D.R.I. from the Dirty Rotten LP