Adventures in code II

I needed to generate a uniform random distribution of points inside a circle and, later, a sphere. This is part of a bigger project, but the code to do this is kind of interesting. There were no solutions available for IgorPro, but stackexchange had plenty of examples in python and mathematica. There are many ways to do this. The most popular seems to be to generate a uniform random set of points in a square or cube and then discard those that are greater than the radius away from the origin. I didn’t like this idea, because I needed to extend it to spheroids eventually, and as I saw it the computation time saved was minimal.

Here is the version for points in a circle (radius = 1, centred on the origin).

circleCode

This gives a nice set of points, 1000 shown here.

pointsCircle

And here is the version inside a sphere. This code has variable radius for the sphere.

sphereCode

The three waves (xw,yw,zw) can be concatenated and displayed in a Gizmo. The code just plots out the three views.

pointsSphere

My code uses var + enoise(var) to get a random variable from 0,var. This is because enoise goes from -var to +var. There is an interesting discussion about whether this is a truly flat PDF here.

This is part of a bigger project where I’ve had invaluable help from Tom Honnor from Statistics.

This post is part of a series on esoterica in computer programming.

Voice Your Opinion: Editors shopping for preprints is the future

Today I saw a tweet from Manuel Théry (an Associate Ed at Mol Biol Cell). Which said that he heard that the Editor-in-Chief of MBoC, David Drubin shops for interesting preprints on bioRxiv to encourage the authors to submit to MBoC. This is not a surprise to me. I’ve read that authors of preprints on bioRxiv have been approached by journal Editors previously (here and here, there are many more). I’m pleased that David is forward-thinking and that MBoC are doing this actively.

I think this is the future.

Why? If we ignore for a moment the “far future” which may involve the destruction of most journals, leaving a preprint server and a handful of subject-specific websites which hunt down and feature content from the server and co-ordinate discussions and overviews of current trends… I actually think this is a good idea for the “immediate future” of science and science publishing. Two reasons spring to mind.

  1. Journals would be crazy to miss out: The manuscripts that I am seeing on bioRxiv are not stuff that’s been dumped there with no chance of “real publication”. This stuff is high profile. I mean that in the following sense: the work in my field that has been posted is generally interesting, it is from labs that do great science, and it is as good as work in any journal (obviously). For some reason I have found myself following what is being deposited here more closely than at any real journal. Journals would be crazy to miss out on this stuff.
  2. Levelling the playing field: For better-or-worse papers are judged on where they are published. The thing that bothers me most about this is that manuscripts are only submitted to 1 or more journals before “finding their home”. This process is highly noisy and it means that if we accept that there is a journal hierarchy, your paper may or may not be deserving of the kudos it receives in its resting place. If all journals actively scour the preprint server(s), the authors can then pick the “highest bidder”. This would make things fairer in the sense that all journals in the hierarchy had a chance to consider the paper and its resting place may actually reflect its true quality.

I don’t often post opinions here, but I thought this would take more than 140 characters to explain. If you agree or disagree, feel free to leave a comment!

Edit @ 11:46 16-05-26 Pedro Beltrao pointed out that this idea is not new, a post of his from 2007.

Edit 16-05-26 Misattributed the track to Extreme Noise Terror (corrected). Also added some links thanks to Alexis Verger.

The post title comes from “Voice Your Opinion” by Unseen Terror. The version I have is from a Peel sessions compilation “Hardcore Holocaust”.

If I Can’t Change Your Mind

I have written previously about Journal Impact Factors (here and here). The response to these articles has been great and earlier this year I was asked to write something about JIFs and citation distributions for one of my favourite journals. I agreed and set to work.

Things started off so well. A title came straight to mind. In the style of quantixed, I thought The Number of The Beast would be amusing. I asked for opinions on Twitter and got an even better one (from Scott Silverman @sksilverman) Too Many Significant Figures, Not Enough Significance. Next, I found an absolute gem of a quote to kick off the piece. It was from the eminently quotable Sydney Brenner.

Before we develop a pseudoscience of citation analysis, we should remind ourselves that what matters absolutely is the scientific content of a paper and that nothing will substitute for either knowing it or reading it.

looseEndsThat quote was from a Loose Ends piece that Uncle Syd penned for Current Biology in 1995. Wow, 1995… that is quite a few years ago I thought to myself. Never mind. I pressed on.

There’s a lot of literature on JIFs, research assessment and in fact there are whole fields of scholarly activity (bibliometrics) devoted to this kind of analysis. I thought I’d better look back at what has been written previously. The “go to” paper for criticism of JIFs is Per Seglen’s analysis in the BMJ, published in 1997. I re-read this and I can recommend it if you haven’t already seen it. However, I started to feel uneasy. There was not much that I could add that hadn’t already been said, and what’s more it had been said 20 years ago.

Around about this time I was asked to review some fellowship applications for another EU country. The applicants had to list their publications, along with the JIF. I found this annoying. It was as if SF-DORA never happened.

There have been so many articles, blog posts and more written on JIFs. Why has nothing changed? It was then that I realised that it doesn’t matter how many things are written – however coherently argued – people like JIFs and they like to use them for research assessment. I was wasting my time writing something else. Sorry if this sounds pessimistic. I’m sure new trainees can be reached by new articles on this topic, but acceptance of JIF as a research assessment tool runs deep. It is like religious thought. No amount of atheist writing, no matter how forceful, cogent, whatever, will change people’s minds. That way of thinking is too deeply ingrained.

As the song says, “If I can’t change your mind, then no-one will”.

So I declared defeat and told the journal that I felt like I had said all that I could already say on my blog and that I was unable to write something for them. Apologies to all like minded individuals for not continuing to fight the good fight.

But allow me one parting shot. I had a discussion on Twitter with a few people, one of whom said they disliked the “JIF witch hunt”. This caused me to think about why the JIF has hung around for so long and why it continues to have support. It can’t be that so many people are statistically illiterate or that they are unscientific in choosing to ignore the evidence. What I think is going on is a misunderstanding. Criticism of a journal metric as being unsuitable to judge individual papers is perceived as an attack on journals with a high-JIF. Now, for good or bad, science is elitist and we are all striving to do the best science we can. Striving for the best for many scientists means aiming to publish in journals which happen to have a high JIF. So an attack of JIFs as a research assessment tool, feels like an attack on what scientists are trying to do every day.

JIFDistBecause of this intense focus on high-JIF journals… what people don’t appreciate is that the reality is much different. The distribution of JIFs is as skewed as that for the metric itself. What this means is that focussing on a minuscule fraction of papers appearing in high-JIF journals is missing the point. Most papers are in journals with low-JIFs. As I’ve written previously, papers in journals with a JIF of 4 get similar citations to those in a journal with a JIF of 6. So the JIF tells us nothing about citations to the majority of papers and it certainly can’t predict the impact of these papers, which are the majority of our scientific output.

So what about those fellowship applicants? All of them had papers in journals with low JIFs (<8). The applicants’ papers were indistinguishable in that respect. What advice would I give to people applying to such a scheme? Well, I wouldn’t advise not giving the information asked for. To be fair to the funding body they also asked for number of citations for each paper, but for papers that are only a few months old, this number is nearly always zero. My advice would be to try and make sure that your paper is available freely for anyone to read. Many of the applicants’ papers were outside my expertise and so the title and abstract didn’t tell me much about the significance of the paper. So I looked at some of these papers to look at the quality of the data in there… if I had access. Applicants who had published in closed access journals are at a disadvantage here because if I couldn’t download the paper then it was difficult to assess what they had been doing.

I was thinking that this post would be a meta-meta-blogpost. Writing about an article which was written about something I wrote on my blog. I suppose it still is, except the article was never finished. I might post again about JIFs, but for now I doubt I will have anything new to say that hasn’t already been said.

The post title is taken from “If I Can’t Change Your Mind” by Sugar from their LP Copper Blue. Bob Mould was once asked about song-writing and he said that the perfect song was like a maths puzzle (I can’t find a link to support this, so this is from memory). If you are familiar with this song, songwriting and/or mathematics, then you will understand what he means.

Edit @ 08:22 16-05-20 I found an interview with Bob Mould where he says song-writing is like city-planning. Maybe he just compares song-writing to lots of different things in interviews. Nonetheless I like the maths analogy.

Adventures in code

An occasional series in esoteric programming issues.

As part of a larger analysis project I needed to implement a short program to determine the closest distance of two line segments in 3D space. This will be used to sort out which segments to compare… like I say, part of a bigger project. The best method to do this is to find the closest distance one segment is to the other when the other one is represented as an infinite line. You can then check if that point is beyond the segment if it is you use the limits of the segment to calculate the distance. There’s a discussion on stackoverflow here. The solutions point to one in C++ and one in MATLAB. The C++ version is easiest to port to Igor due to the similarity of languages, but the explanation of the MATLAB code was more approachable. So I ported that to Igor to figure out how it works.

The MATLAB version is:

>> P = [-0.43256      -1.6656      0.12533]
P =
   -0.4326   -1.6656    0.1253
>> Q = [0.28768      -1.1465       1.1909]
Q =
    0.2877   -1.1465    1.1909
>> R = [1.1892    -0.037633      0.32729]
R =
    1.1892   -0.0376    0.3273
>> S = [0.17464     -0.18671      0.72579]
S =
    0.1746   -0.1867    0.7258
>> N = null(P-Q)
N =
   -0.3743   -0.7683
    0.9078   -0.1893
   -0.1893    0.6115
>> r = (R-P)*N
r =
    0.8327   -1.4306
>> s = (S-P)*N
s =
    1.0016   -0.3792
>> n = (s - r)*[0 -1;1 0];
>> n = n/norm(n);
>> n
n =
    0.9873   -0.1587
>> d = dot(n,r)
d =
    1.0491
>> d = dot(n,s)
d =
    1.0491
>> v = dot(s-r,d*n-r)/dot(s-r,s-r)
v =
    1.2024
>> u = (Q-P)'\((S - (S*N)*N') - P)'
u =
    0.9590
>> P + u*(Q-P)
ans =
    0.2582   -1.1678    1.1473
>> norm(P + u*(Q-P) - S)
ans =
    1.0710

and in IgorPro:

Function MakeVectors()
	Make/O/D/N=(1,3) matP={{-0.43256},{-1.6656},{0.12533}}
	Make/O/D/N=(1,3) matQ={{0.28768},{-1.1465},{1.1909}}
	Make/O/D/N=(1,3) matR={{1.1892},{-0.037633},{0.32729}}
	Make/O/D/N=(1,3) matS={{0.17464},{-0.18671},{0.72579}}
End

Function MakeVectors()
	Make/O/D/N=(1,3) matP={{-0.43256},{-1.6656},{0.12533}}
	Make/O/D/N=(1,3) matQ={{0.28768},{-1.1465},{1.1909}}
	Make/O/D/N=(1,3) matR={{1.1892},{-0.037633},{0.32729}}
	Make/O/D/N=(1,3) matS={{0.17464},{-0.18671},{0.72579}}
End

Function DoCalcs()
	WAVE matP,matQ,matR,matS
	MatrixOp/O tempMat = matP - matQ
	MatrixSVD tempMat
	Make/O/D/N=(3,2) matN
	Wave M_VT
	matN = M_VT[p][q+1]
	MatrixOp/O tempMat2 = (matR - matP)
	MatrixMultiply tempMat2, matN
	Wave M_product
	Duplicate/O M_product, mat_r
	MatrixOp/O tempMat2 = (matS - matP)
	MatrixMultiply tempMat2, matN
	Duplicate/O M_product, mat_s
	Make/O/D/N=(2,2) MatUnit
	matUnit = {{0,1},{-1,0}}
	MatrixOp/O tempMat2 = (mat_s - mat_r)
	MatrixMultiply tempMat2,MatUnit
	Duplicate/O M_Product, Mat_n
	Variable nn
	nn = norm(mat_n)
	MatrixOP/O new_n = mat_n / nn
	//new_n is now a vector with unit length
	Variable dd
	dd = MatrixDot(new_n,mat_r)
	//print dd
	//dd = MatrixDot(new_n,mat_s)
	//print dd
	dd = abs(dd)
	// now find v
	// v = dot(s-r,d*n-r)/dot(s-r,s-r)
	variable vv
	MatrixOp/O mat_s_r = mat_s - mat_r
	MatrixOp/O tempmat2 = dd * mat_n - mat_r
	vv = MatrixDot(mat_s_r,tempmat2) / MatrixDot(mat_s_r,mat_s_r)
	//print vv
	//because vv &amp;gt; 1 then closest post is s (because rs(1) = s) and therefore closest point on RS to infinite line PQ is S
	//what about the point on PQ is this also outside the segment?
	// u = (Q-P)'\((S - (S*N)*N') - P)'
	variable uu
	MatrixOp/O matQ_P = matQ - matP
	MatrixTranspose matQ_P
	//MatrixOP/O tempMat2 = ((matS - (matS * matN) * MatrixTranspose(MatN)) - MatrixTranspose(matP))
	Duplicate/O MatN, matNprime
	MatrixTranspose matNprime
	MatrixMultiply matS, matN
	Duplicate/O M_Product, matSN
	MatrixMultiply M_Product, matNprime
	MatrixOP/O tempMat2 = ((matS - M_product) - matP)
	MatrixTranspose tempMat2
	MatrixLLS matQ_P tempMat2
	Wave M_B
	uu = M_B[0]
	// find point on PQ that is closest to RS
	// P + u*(Q-P)
	MatrixOp/O matQ_P = matQ - matP
	MatrixOp/O matPoint = MatP + (uu * MatQ_P)
	MatrixOP/O distpoint = matPoint - matS
	Variable dist
	dist = norm(distpoint)
	Print dist
End

The sticking points were finding the Igor equivalents of

  • null()
  • norm()
  • dot()
  • \ otherwise known as mldivide

Which are:

  • MatrixSVD (answer is in the final two columns of wave M_VT)
  • norm()
  • MatrixDot()
  • MatrixLLS

MatrixLLS wouldn’t accept a mix of single-precision and double-precision waves, so this needed to be factored into the code.

As you can see, the Igor code is much longer. Overall, I think MATLAB handles Matrix Math better than Igor. It is certainly easier to write. I suspect that there are a series of Igor operations that can do what I am trying to do here, but this was an exercise in direct porting.

More work is needed to condense this down and also deal with every possible case. Then it needs to be incorporated into the bigger program. SIMPLE! Anyway, hope this helps somebody.

The post title is taken from the band Adventures In Stereo.