A grammar of multi-panel scientific plots: initial thoughts

In common with many scientists, I have no formal training in computer science and my coding skills have been entirely self-taught. I’ve been coding for over a decade and a half, and I thought I was a relatively good programmer, but I had mistaken familiarity with expertise. And so recently I have been on a humble journey of improving my coding skills.

One of the common things I have to produce for a paper, is a nice fun multi-panel plot, visualising my data and/or predictions from computational models. If we just want to plot data, then there are already very nice tools to do this, such as ggplot2 for R (see also the in progress Matlab port, gramm). While this is superb, and extensive, I don’t believe it is able to serve my needs to plot data and model predictions in a highly customised way. And in the last few days, I’ve discovered that the way how I write my code to produce nice multi-panel figures that combine behavioural data and model predictions is been rather atrocious. The code clarity was appalling, it was brittle and hard to change, and computation and visualisation were complected in a nasty tangled way.

So I have started to think about a more systematic way to approach this so that I can streamline the time consuming iterative process of presenting data, and fitting models to data, and presenting all this visually. Perhaps it will result in a code template for doing this, perhaps it will result in a comprehensive toolbox. Who knows. But here are my initial thoughts.

The right abstractions

Having looked back at a bunch of figures in my previously published papers, it seems like we have a few abstractions.

  1. Physical axes / subplots: what is the physical location and size of each of the subplots?
  2. Plot types: What are the basic types of subplots? These could be simple, or composed of multiple plotting sub-functions, but we need to define functions to plot each fundamental plot type we have when visualising our particular data.
  3. Instances: Will each plot type appear just once, or multiple times? If it appears multiple times, then it’s unlikely to show the same data, so how do we map figure instances to data?
  4. Data sources: Does each plot have it’s own data source? Is data from each source represented in multiple plot types, or multiple instances of one plot type?

These abstractions are quite abstract, without examples, so here are a few from papers I’ve published over the years.

This slideshow requires JavaScript.

 

A tentative approach

Here is my tentative idea. Hopefully it will allow for rapid iteration for and less painful production of complex figures. We want to end up with a function called myMultiPanelFigure() or some such.

  1. Define options
  2. Obtain data + analysis or model fitting results
    1. load data, run computationally expensive models/analyses and save the results
    2. OR load them from a previous run
  3. Do all the plotting.
    1. Manually define a set of physical axes (subfigure) and their locations. For each physical plot (n) we should define:
      1. subfigure(n).getData() function that will extract the appropriate data for this subplot
      2. subfigure(n).plotData() function to display to the screen, including axis formatting (but not labelling)
      3. subfigure(n).AxesLabelling() function to label axes
    2. Iterate over all axes, calling their associated getData(), and plotData() functions, but not the labelAxes() function
    3. Iterate over a subset the subfigure().AxesLabelling() functions, to avoid repetition and clutter.
  4. Export pdf figure to desired path and filename.

We will of course have to do the job of defining all the relevant functions to do the hard work. But this should be much easier now we are dealing with small functions with very well-defined tasks.

I’ve half implemented these ideas into something that works. The code (which I’m not sharing until it’s ready) is much much cleaner. It’s easier to make changes and iteratively explore, as you need to do when doing science. I’ll share it once I’m happy it is actually useful to people.

Thoughts and comments welcome.

 

Leave a Reply