The accessibility of scientific writing #1: readability metrics

In science, communication is crucial. The work done by researchers only becomes useful after it is transformed into a publication or presentation of some kind. Otherwise it may as well have never been done. We should therefore be concerned if our publications are getting less readable. A recent article in eLife makes this claim, finding that papers have been steadily decreasing in readability over the last 135 years (Plavén-Sigray et al., 2017). They measured this with two metrics: Flesch Reading Ease (FRE), and the New Dale-Chall (NDC) readability formula. The FRE considers sentence length and the number of syllables in each word. The NDC also considers sentence length, along with the number of “difficult” words used.

For abstracts, the mean FRE in 1960 was around 20. In 2015, it was 10. The NDC score in 1960 was around 12.2. By 2015 it had increased to 13 (note that for NDC higher scores are worse). The authors establish a correlation between the scores found for abstracts and those in the full text of the articles. The FRE scores are higher (simpler sentences) for full articles than for abstracts. The change over time is also less pronounced. From the data in the article I calculated that the FRE scores for the full text will have decreased from 26 to 22 between 1960 and 2015. The NDC scores are lower (fewer difficult words) for full articles than for abstracts. Using the data from the article they increase from 11.9 in 1960 to 12.3 in 2015 (note that these are estimates I extracted from the graphs).

To put the FRE score in perspective I found this article from Shane Snow. He gives FRE scores for a variety of authors, including J. R. R. Tolkein (around 80), David Foster Wallace (around 68) and Malcolm Gladwell (around 66). Appropriately, he includes an “academic paper about reading level” which has a score of around 48. For the NDC scores it is more difficult to find examples. Generally, it appears that scores under 10 are understandable by college students. This seems an appropriate upper-limit for scientific writing.

These metrics are relatively crude. Even a more sophisticated metric (the "Lexile") came under heavy criticism when the ratings it gave defied expectations. I do think however that these metrics are worth considering when we write papers. I am not arguing that simpler readability scores indicate “better” writing, but that being mindful of them makes papers more accessible. Just because the ideas may be hard to understand, that does not mean they need to be presented in an obscure way. Some of this is a matter of style. Where there is the option of making something easier to read without losing meaning though, it should be taken. Native English speakers should consider that they are writing for an international audience. Your readers may have had to learn a second language to read your papers. It is unfair to present them with writing that is unnecessarily difficult to follow.

my_readability.png

For my own part, I have been trying to make my writing more accessible. I have plotted metrics for my recent first-author papers in the figure above. As well as the FRE and NDC scores I also calculated the Gunning Fog metric (calculated in a similar way to the FRE). Note that the axes for the NDC and Gunning Fog metrics are flipped upside-down. This way, the upward slope in all three plots shows increasing readability. I analysed text only from the Introduction and Discussion sections of the papers. This means a straight comparison should not be made against the values from the Plavén-Sigray et al. (2017) study. Comparing the Introduction sections against the Discussions revealed negligible differences (I had expected that my Introductions might be more readable than my Discussions).

In 2016, I started specifically rewriting my paper drafts to improve their FRE score. I will cover how I do this in another post in the future. This extra step affected one of the papers in 2016 (obvious on the graph). For the 2017 paper the Introduction FRE was much better than the Discussion (45 vs 31), so I must have been a bit slack. In the future, I will be aiming to keep the FRE scores for my writing above 45. I will also try to keep the NDC scores below 10 (which I am already managing to do). I have seen suggestions (e.g. Armstrong, 1982) for journals to set readability standards for submitted articles. I am not aware of any journals that do this, but I think it is an great idea to at least have some “floor” value. I hope to come back to this topic soon with discussions of other metrics, and a demonstration of how to improve the readability of a piece of writing. I will also be posting updated versions of the graphs above as more papers come out.

P.S. For this blog post: Flesch Reading Ease = 67, New Dale-Chall = 7.2, Gunning-Fog = 9.6

  • Armstrong, J. S. (1982). Research on scientific journals: implications for editors and authors. Journal of Forecasting, 1(1), 83–104.
  • Plavén-Sigray, P., Matheson, G. J., Schiffler, B. C., & Thompson, W. H. (2017). The readability of scientific texts is decreasing over time. eLife, 6(e27725), 1–14. 

Update on my Matlab staircase object

A couple of years on, and I wanted to post the current working version of the staircase code from a previous blog post. These changes have added tracking of the current threshold estimate from reversals. I have also fixed a bug that would prevent the staircase recording data if it was given an invalid starting value (now it simply finds the nearest valid value).

Conferences

I'm just coming to the end of a busy week where I had two conferences to attend. The first was the Vision Sciences Society conference in St. Pete Beach, Florida. I was presenting some research I have done recently with other members of my lab, where we applied the types of summation ideas I worked on in my PhD to a new area (more details here). Although the publication of papers is the real engine of how the field progresses, conferences seem to be a very important accelerant. Without them there would be a much greater dependence on waiting for journal articles to spread new ideas, and any progress would be glacially slow.

Immediately after VSS we had the 25th anniversary reunion for McGill Vision Research. I joined MVR two years ago, though I had visited a couple of times before that. We had a very interesting day of talks (I much prefer talks in this sort of smaller conference setting, as was the case at the AVA meetings I used to attend in the UK). Hopefully there is some way that the reunion can be made into a recurring event.

GNU Terry Pratchett

I was greatly saddened today when I heard that Terry Pratchett had died. My Dad introduced me to the Discworld books when I was young and since then I have read almost all of them multiple times. In particular I remember having The Science of Discworld bought for me when we visited Birmingham Airport. Being 12 years old and obsessed with both science and the Discworld books at that point, I was desperate to read it. Rather than being one of those "how could Star Trek technology really work?" kind of things, it was instead concerned with how Discworld wizards would study our world (Roundworld), having accidentally created it. The book interleaved chapters of scientific explanations (including the idea of "lies-to-children") with the wizards' reactions to those discoveries. It was the start of a series of those types of books that Pratchett and his scientist co-authors wrote, but that was merely the most extreme example of how he used his fiction to help readers see things from a different perspective. His writing was insidiously educational, and he was obviously a person who spent a lot of time just thinking about everything. His books will continue to be an inspiration for me (as I have been re-reading them recently) and for anyone else who is lucky enough to read them. For that reason I choose to remember him not with the traditional RIP, but instead with GNU. If you read Going Postal you will understand.

WHAT CAN THE HARVEST HOPE FOR, IF NOT FOR THE CARE OF THE REAPER MAN?
— Death, Reaper Man

Colour constancy, retinex theory, and blue/black dresses

A lot of people are currently arguing about the colour of this dress, and trying to figure out why they see it differently

Some people see it as black and blue, others as white and gold. A few are making their case using RGB colour pickers to try to prove that it is one colour or another. Counter-intuitively though (and in contrast to what you might have been taught) there is not a simple correspondence between the physical emitted light from an object and its colour. Instead the emitted light features the combined effects of both the properties of the object and of the illumination in the scene (by using the context of the whole scene). Your brain attempts to subtract out the effects of the illumination in order to give you "colour constancy" (meaning that colours are perceived the same in different lighting). This video explains it:

Note that I am not an expert on colour vision, but I did spend a few years teaching students from this video while I was working as a TA. Edwin Land's experiments in this video demonstrate colour constancy in action (and why the colour-picker approach is invalid), and are explained by his retinex theory. Here is the Wikipedia article (which you should take with a heap of salt, but it's better than nothing). I believe what we see with the dress photo is an example of different peoples' brains coming up with different assumptions about what the illuminant is in the photo.

Update 12th March 2015: There has been a lot of semi-private discussion on this topic between vision scientists since the story first came out and many people have written great summaries which go more into why we might see these individual differences. Christoph Witzel does a very good job here. There will be a special issue of Journal of Vision on this topic next year, by which time it may have been all figured out.

Loading Matlab .mat data in Python

A friend of mine just asked me for some tips with this. I thought I would reply using a blog post so that it can be useful to other people too. If you collect data with Matlab but want to work on it using Python (e.g. making nice graphs with matplotlib) you can export a .mat file and then import that into Python using SciPy.

First let's save some example data in Matlab:

function savematlabdata
% save some data in a .mat

a = [1, 2, 3; 4, 5, 6];
S.b = [7, 8, 9; 10, 11, 12];
M(1).c = [2, 4, 6; 8, 10, 12];
M(2).c = [1, 3, 5; 7, 9, 11];

save('data.mat','a','S','M')

return

Now we have a file "data.mat" which stores the array a, the structure S containing an array b, and an array of structures M where each of those contains an array c. Now we can load that data in Python with the scipy.io module and use the "print" function to prove it's there:

# filename: loadmatlabdata.py
# description : load in data from a .mat file
# author: Alex Baldwin
#==============================================

import scipy.io as spio

mat = spio.loadmat('data.mat', squeeze_me=True)

a = mat['a'] # array
S = mat['S'] # structure containing an array
M = mat['M'] # array of structures

print a[:,:]
print S['b'][()][:,:] # structures need [()]
print M[0]['c'][()][:,:]
print M[1]['c'][()][:,:]

Remember that in Python indexing starts at 0, rather than 1 (which is how Matlab does it). I hope that helps!

Also, a new blog from my research group

My colleague Gunnar Schmidtmann has started his own blog, find it here. So far it is home to a bunch of interesting illusions (not tricks).

Also, it occurred to me that there was a sharp change in "level of required understanding" between my second and third posts. The second (Why do psychophysics?) was intended to be the start of a series of introductory-level explanations, which I do intend to continue sometime soon. However I'll also be wanting to post things at a higher level as well. For that reason I'm making use of the tagging system on here. The following posts in that series will all have the "introductory" tag (as the original now does).

The object-oriented approach to staircases in Matlab

N.B. the code in this blog post is outdated, please use the updated code found here

When I was first learning to program it took me a long time to appreciate why object-oriented approaches would be useful. I found a great example recently though, when I decided to write a generic "staircase" method to use in my psychophysical experiments. I find staircases to be the most efficient data collection method to use for most experiments (they are even as good as the considerably more complex entropy-minimising Bayesian methods such as qCRF, a topic which I will address in the near future). The code is at the bottom of this post, so skip the next two paragraphs if you already know what a staircase is etc.

The staircase method was first used by Dixon and Mood (1948) to study the volatility of explosives. Since then it has found a wide use within many fields, including psychophysics. Staircase methods involve determining the next stimulus level to be tested by reacting to the results of previous trials. In the simplest case of a one-up one-down staircase, a correct response on the previous trial results in the next trial being tested with a lower stimulus magnitude (i.e. making the task more difficult) while an incorrect response results in the next trial being tested with a higher stimulus magnitude. As the staircase then overshoots the point at which the behaviour changes it will reverse direction again, eventually oscillating around a value of interest. For example, a one-up one-down staircase will tend to sample at stimulus magnitudes where the probability of a positive response is approximately 0.5 (staircases of this type are typically used for matching experiments where the point of subjective equality is the value to be determined). It possible therefore to use the average of the staircase reversals to calculate an estimate of the stimulus magnitude that gives that response probability. Personally though I prefer to fit the data with a psychometric function.

For a two-alternative forced choice study (where you have a 50% guess rate, and so want to sample around 75%) a good choice is a three-down one-up staircases, which aims to converge at P (correct) = 0.794 (Wetherill, 1963; Wetherill & Levitt, 1965). Simulations have shown that the staircase may fail to reach this value (García-Pérez, 1998), however as I always end up fitting psychometric functions anyway the actual value that the staircases converge at is not critical.

You can push the button below to show the staircase class code I have (so far). The advantage of using a class to manage your staircases is that they look after themselves. You set them up with the rules you want them to follow (number of right or wrong answers for a reversal, etc.) and then they will tell you what stimulus to use on each trial, and you can tell them how the observer responded. At the end you query the staircase object to find out how many times each level was tested and how many of those the observer got right. Particularly useful is that you can define an array of staircase objects in Matlab, so you would be able to have SC(1), SC(2), SC(3)* etc. all with their own private rules. At the moment I am experimenting with interleaving staircases that have different up/down rules to sample different points on the psychometric function (and so better constrain the measured psychometric slope).

The code is perfectly functional at the moment, though I do not doubt it could do with improvements. For example, I don't think it's actually necessary for me to use a Matlab "handle" class here. Have fun with it and let me know what you think in the comments.

References:

Dixon, W. J., Mood, A. M., 1948. A Method for Obtaining and Analyzing Sensitivity Data. Journal of the American Statistical Association 43 (241), 109–126.
García-Pérez, M. A., 1998. Forced-choice staircases with fixed step sizes: asymptotic and small-sample properties. Vision Research 38, 1861–1881.
Wetherill, G. B., 1963. Sequential Estimation of Quantal Response Curves. Journal of the Royal Statistical Society. Series B (Methodological) 25 (1), 1–48.
Wetherill, G. B., Levitt, H., 1965. Sequential estimation of points on a psychometric function. The British Journal of Mathematical and Statistical Psychology 18 (1), 1–10.

* You can collapse across multiple objects in Matlab by using the square bracket notation, for example the statement below will return 1 only if all the SC(1) to SC(n) staircases are finished.

all([SC.isFinished])

Why do psychophysics?

Much of science is founded on taking measurements of one sort or another. The task of the scientist can be thought to consist of two parts: firstly ensuring that the correct measurements are made (i.e. "Am I measuring what I am trying to measure?"), and secondly coming up with explanations for what those measurements show. Frequently the validity of a scientific theory is assessed according to how accurately it predicts the outcome of measurements made in the future.

Measure what is measurable, and make measurable what is not so.
— Galileo Galilei

Although other fields suffer their own complications when it comes to measuring the world (e.g. How does one measure the size of the Earth? Or how much of the amyloid protein in a solution has clumped to form a deposit?), psychologists have the particular problem of wanting to measure things that many people believe are unmeasurable. You cannot directly take a ruler to somebody's thoughts, so a more subtle solution is required.

Psychophysics is a set of theoretical tools which make sensations measurable. The internal states of the subject are modelled by a set of variables and processes, based on assumptions about how perceptual decisions are made. Experiments can be designed to probe these internal mechanisms by relating them to the behaviours they would predict under a carefully chosen set of conditions. The models typically involve some sort of variability (e.g. noisy neuronal responses) so rather than predicting a specific response to a stimulus they instead make a probabilistic prediction of how often each of a set of responses should be expected.

The basic measurements which psychophysicists make are therefore the probabilities with which subjects respond in a particular way to a stimulus under some experimental paradigm. A simple example would be showing a subject a faint (low contrast) striped pattern and asking them whether they see it or not. The observer will make both "yes" and "no" responses on different trials (even with an identical stimulus). The probability with which the observer says "yes" can then be used to infer the magnitude of some internal response variable. The way in which that can be done will be the subject of  future blog post.

Hello

Placeholder first blog post here. I am hoping to use this space to put up some helpful explanations of psychophysical techniques, and approaches to modelling vision.

We are so familiar with seeing, that it takes a leap of imagination to realize that there are problems to be solved. But consider it. We are given tiny distorted upside-down images in the eyes, and we see separate solid objects in surrounding space. From the patterns of stimulation on the retina we perceive the world of objects and this is nothing short of a miracle.
— Richard Gregory