My Blank Pages V: Raw Data

Raw Data: A novel on Life in Science by Pernille Rørth (Springer, 2016)


I was keen to read this “lab lit” novel written by renowned cell biologist Pernille Rørth. I’d seen lots of enthusiastic comments about the book, and it didn’t disappoint.

I was frustrated to read two pieces about Raw Data on Retraction Watch and The Node, both of which gave the plot away with no warning, so if you haven’t read it and want to enjoy the suspense while you read, look away now.

The story is set in a high flying cancer cell biology lab in Boston. Postdocs are working night and day to try and land a paper in Nature and to become an independent PI. Chloe manages it, while Karen is struggling, despite her best efforts. Karen accidentally uncovers that Chloe may have cut some corners to get her paper into Nature and this sets off a cascade of events, leading to the retraction of the paper. It’s a fascinating tale, well-written and completely absorbing. I recommend it for anyone working in science. You will smile at the references to conference coffee, failed scientists and more.

The plotline is highly reminiscent of Intuition by Allegra Goodman, even down to the tumours growing in the mice. Both stories echo the real-life events of Thereza Imanishi-Kari which are detailed in the overlong but comprehensive The Baltimore Case by Daniel J. Kevles. Rørth’s retelling of the science world is more convincing that Goodman’s, due in part to her 25 years as a scientist. Nonetheless, Intuition is a great book that I’d also recommend in this genre.

Raw Data is thought-provoking. You can ponder the role of Tom, the PI who has cultivated a certain atmosphere in the lab. What about the pressure to publish? How about the peer reviewers who dangle the carrot of “get this result and you can have a Nature paper” in front of Chloe? It’s a toxic mix and it’s happening in labs all over the world. A terrifying thought.

On the role of Tom: one thing that is slightly underexplored is the fact that Tom tells Chloe that there is a competing group that could ‘scoop’ her while she is rushing to finish her paper. It isn’t clear whether this group actually exists and if this was a tactic to gee her along. Either way it is another bit of pressure which goes on to create the misconduct.

Not so long ago a high profile research institute in the UK announced that it was recruiting PIs by looking for the “best scientific athlete”. I read last week that so far from the London 2012 Olympics, 37 track-and-field sportspeople have had their results disqualified by the IAAF for doping. The parallels are interesting. Science, like sport, is run with winner-takes-all rules and the high stakes and pressure that go along with it. The incentives are dangerous and I wonder what we are creating with this atmosphere. Certainly, as PIs we have a real responsibility, just as coaches do in sport, to ensure our trainees make the right choices in their career.

I’ve seen nothing but recommendations for this book so far and mine is another one.

Here’s Matthew Freeman saying that it would be required reading for everyone in his lab:

My Blank Pages is a track by Velvet Crush. This is an occasional series of book reviews.

Wrote for Luck

Fans of probability love random processes. And lotteries are a great example of random number generation.

The UK National Lottery ran in one format from 19/11/1994 until 7/10/2015. I was talking to somebody who had played the same set of numbers in all of these lottery draws and I wondered what the net gain or loss has been for them over this period.

The basic format is that people buy a line of numbers (6 numbers, from 1-49) and try to match the six numbers (from 49 balls numbered 1-49) drawn from a machine. The aim is to match all six balls and win the jackpot. The odds of this are fantastically small (1 in ~14 million), but if they are the only person matching these numbers they can take away £3-5 million. There are prizes for matching three numbers (1 in ~56 chance), four numbers (1 in ~1,032),  five numbers (1 in ~55,491) or five numbers plus a seventh “bonus ball” (1 in ~2,330,636). Typical prizes are £10, £100, £1,500, or £50,000, respectively.

The data for all draws are available here. I pulled all draws regardless of machine that was used or what set of balls was used. This is what the data look like.


The rows are the seven balls (colour coded 1-49) that came out of the machine over 2065 draws.

I wrote a quick bit of code which generated all possible combinations of lottery numbers and compared all of these combinations to the real-life draws. The 1 in 14 million that I referred to earlier is actually

\(^{n}C_r = ^{49}C_6\)

\(\frac{49!}{\left ( 6! \left (49-6 \right )! \right )} = 13,983,816\)


This gives us the following.


Crunching these combinations against the real-life draw outcomes tells us what would have happened if every possible ticket had been bought for all draws. If we assume a £1 stake for each draw and ~14 million people each buying a unique combination line. Each person has staked £2065 for the draws we are considering.

  • The unluckiest line is 6, 7, 10, 21, 26, 36. This would’ve only won 12 lots of three balls, i.e. £120 – a net loss of £1945
  • The luckiest line is 3, 6, 13, 23, 27, 49. These numbers won 41 x three ball, 2 x four ball, 1 x jackpot, 1 x 5 balls + bonus.
  • Out of all possible combinations, 13728621 of them are in the red by anything from £5 to £1945. This is 98.2% of combinations.

Pretty terrible odds all-in-all. Note that I used the typical payout values for this calculation. If all possible tickets had been purchased the payouts would be much higher. So this calculation tells us what an individual could expect if they played the same numbers for every draw.

Note that the unluckiest line and the luckiest line have an equal probability of success in the 2066th draw. There is nothing intrinsically unlucky or lucky about these numbers!

I played the lottery a few times when it started with a specified set of numbers. I matched 3 balls twice and 4 balls once. I’ve not played since 1998 or so. Using another function in my code, I could check what would’ve happened if I’d kept playing all those intervening years. Fortunately, I would’ve looked forward to a net loss with 43 x three balls and 2 x four balls. Since I actually had a ticket for some of those wins and hardly any for the 2020 losing draws, I feel OK about that. Discovering that my line had actually matched the jackpot would’ve been weird, so I’m glad that wasn’t the case.

There’s lots of fun to be had with this dataset and a quick google suggests that there are plenty of sites on the web doing just that.

Here’s a quick plot for fun. The frequency of balls drawn in the dataset:



  • The ball drawn the least is 13
  • The one drawn the most is 38
  • Expected number of appearances is 295 (14455/49).
  • 14455 is 7 balls from 2065 draws



Since October 2015, the Lottery changed to 1-59 balls and so the dataset used here is effectively complete unless they revert to the old format.

The title of this post comes from “Wrote for Luck” by The Happy Mondays from their 1988 LP Bummed. The Manic Street Preachers recorded a great cover version which was on the B-Side of Roses in The Hospital single.

What Can You See?

Yesterday I tried a gedankenexperiment via Twitter, and asked:

If you could visualise a protein relative to an intracellular structure/organelle at ~5 nm resolution, which one would you pick and why?

I got some interesting replies:
  • Myosin Va and cargo on actin filaments in melanocytes – Cleidson Alves @cleidson_alves
  • COPII components relative to ER and Golgi for export of big proteins – David Stephens @David_S_Bristol
  • Actin inside an axon, AIS, shaft presynaptic bouton relative to membrane and vesicles – Christophe Leterrier @christlet
  • Cargo/vesicle and motor, ideally with a co-reporter of motor activity – Ali Twelvetrees @dozenoaks
  • Dynein on K-fibres. If it was a fixed view dynein on kinetochores, localisation relative to Ndc80 or Mad1 – Eric Griffis @DrGriff34
  • See definitively if TACC3/ch-TOG is at the centrosome or not – Hadrien Mary @HadiM_
  • Pericentriolar proteins relative to centrioles. And Arp2/3 and centrioles – Manuel Théry @ManuelTHERY
  • Arp2/3 and centrioles was seconded by Alexandre Carisey @alexcarisey
  • RhoGTPases near cell-cell contacts in endothelial cells. No good antibodies for this – Joachim Goedhart @joachimgoedhart
  • Integrin and filopdia tips, what structures are formed there – Guillaume Jacquemet @guijacquemet

It’s a tough question because the simplest answer to “which protein” is the “the one I am most interested in” – I mean who wouldn’t want to see that at unprecedented resolution – but I was more interested in the “why” part. I’m conscious of the fact that breaking the resolution limit in light microscopy has not yielded many answers to outstanding questions so far.

OK, it was less a thought experiment and more like trying to crowd-source suggestions. We have some new technology that we’d like to put through its paces and apply to interesting cell biological questions. Thanks to everybody for their input.

If you want to make an additional suggestion, please leave a comment.

Edit 2016-03-13:  Stéphane Vassilopoulos chipped in on Twitter. “dynamin 2 oligomers right on the actin cytoskeleton” he is @Biosdfp

The post title is taken from “What Can You See?” by The Seahorses off their unreleased follow up album to Do It Yourself, which may have been called Minus Blue.

My Blank Pages IV: Every Song Ever

Every Song Ever: Twenty Ways to Listen in an Age of Musical Plenty

Ben Ratliff (Farrar, Straus and Giroux)


A non-science book review for today’s post. This is a great read on “how to listen to music”. There have been hundreds of books published along these lines, the innovation here however is that we now live in an age of musical plenty. Every song ever recorded is available at our fingertips to listen to when, where and how we want. This means that the author can draw on Thelonious Monk, Sunn O))), Shostakovitch and Mariah Carey. And you can seek it out and find out whatever it is that they have in common.

I got hooked in Chapter 2 (discussing slowness in music). I was reading  and thinking: he should mention Sleep’s Dopesmoker, but what are the chances? I turn the page and there it was. Then I knew that we were literally on the same page and that I would enjoy whatever it was he had to say. Isn’t confirmation bias a wonderful  thing (outside of science).

A lot of writing about music is terrible, but I love it when it is done well. As it is here. I especially like reading “under the bonnet” analysis of songs. Ian MacDonald’s Revolution In The Head (or Twilight of the Gods by Wilfred Mellers as an extreme example) springs to mind. This close analysis means you can go back and find new treasures in old songs. And this is the essence of the book.

I must admit that I have thought about trying to write similar analyses of songs on quantixed. Aside from the fact that I don’t have time, I was worried it might make me seem like Patrick Bateman discussing the merits of Huey Lewis & The News in American Psycho. It’s something that’s difficult to do well and Ratliff’s analyses here are light touch and spot-on.

The short section on blast beats which mentioned D.R.I. made me smile too. Although there’s a factual error here. Ratliff talks about how singer-drummer-brother combo Kurt and Eric Brecht lock in on Draft Me when they played CBGB’s in 1984. Drummer Eric had left the band at that point to be replaced by Felix Griffin, and it is him, not Eric, duelling with vocalist Kurt. Both on LP Dealing With It and the gig at CBGB’s which was later released as an LP and video. Again it’s a band that I have soft spot for and it was great to see them picked out.

There were a couple of quotes that I found amusing, being a CD collector and something of a completist. Here’s one:

A friend described to me the experience of acquiring a complete CD collection of Mozart, after having had a piece-by-piece relationship with his music for most of his life. It was 175 CDs, or something like that. “I realized,” he said, “that now that I had it all, I never needed to listen to it again.

Along the same lines, I thought this quote was pretty chilling.

We can pretty much wave bye-bye to the completist-music-collector impulse: it had a limited run in the human brain, probably 1930 to 2010. (It still exists in a fitful way, but it doesn’t have a consensual frame: there is no style for it.) It is not only a way of buying, owning, and arranging music-related objects and experiences in one’s life, but also a distinct way of listening.


As somebody who is not a fan of streaming and still values physically owning music I know I am out-of-step with the rest of the world. However I think this quote is at odds with what the whole book is trying to achieve. The guy listening to music on his phone speaker on the bus, described in the intro can’t hear and appreciate much of what is described in the book. To hear that squeak of John Bonham’s kick drum pedal on Since I’ve Been Loving You from Led Zeppelin III, you need to be listening in the old-fashioned way, rather than in the noisy and busy way most music is consumed nowadays.

It’s a great read. You can get it here.

My Blank Pages is a track by Velvet Crush. This is an occasional series of book reviews.