My name is Ben and I’m a self-taught programmer with no formal computer science training. A few years ago I gained the painful self-awareness that my scientific programming was shitty. I’m not saying it was wrong (I hope not) but it was just bad. I confused familiarity with my language of choice with proficiency. I had bumbled along, knowing that my code could do with a bit of ‘polish’, but I didn’t know what I didn’t know.
Either way, let’s be clear about our motivations. This is not just about wasting some time with a vague sense of satisfactory craftspersonship, but a calculated strategy. Do you want to: A) move swiftly making much progress and accumulating technical debt, or B) move more deliberately and precisely and end up with readable code which is easy to extend and change? You could probably get away with strategy A in science for a while, but it’s perhaps risky. If people can’t read your code (including you) then what level of trust should we have in it? If the code has no tests, how sure are we that it is doing what it should?
I think it’s no coincidence that the self-awareness of my shitty code happened at the time when I embraced making it available on GitHub. And so we are at Step 1 on the path to coding reformation…
Step 1. Make your code available
If you have a little voice in the back of your mind which occasionally says that your code is shitty, then I really recommend making it available. This is a terrifying thing to do, but nobody dies. However you do it, whether it’s making code available on your website, or hosting it on GitHub and/or the Open Science Framework, it means you view your code in a very different light. Here is a non-technical, overly dramatic, carrot with no technical details. Buy into it, then go learn how to do it 🙂
There are many reasons to go down the route of openness. For some, it’s a principled decision about science being free, for some it’s about reproducibility, and for others, it’s a reluctant compliance with the way how things are going in science. One of the important points though, independent of principles, is the simple pragmatic fact that open source is where the innovation is.
Step 2. Version control
Putting my code on GitHub and being introduced to version control was a revelation. Even though most of my scientific programming is solo, or involved just 1-2 other people, I fully buy into it.
Step 3. Gain faith in your code with tests
I would always go to great lengths to ensure my code was doing what it was meant to. But the way how I did that was shitty. I did what I think many people do, which is to make changes, run some code manually, and see if an error happens or if a result changes. Boom, you are already doing testing. But doing this in a more formal and planned manner, using a testing framework for your particular language will make life much easier. Imagine having a nice suite of tests, you can just set them going with a single command, and all the tests are carried out resulting in a nice report of what tests passed and what failed. This really speeds up the debugging process.
Once you have tests then you can be way more assured (never 100% however) that changes you make to your code don’t produce either immediate bugs, or subtle errors that will come back to bite you in the future. This opens up your world to the awesomeness of refactoring.
Step 4. Refactoring
Once you have tests in place, you can play around with different ways of getting the job done. Refactoring is just the name for changing how your code works, but not what it does. For me, this is where a lot of the learning happened. I spent a long time going over existing code, learning about code smells and Clean Code.
Refactoring can be done in many different ways, and the goodness of your refactoring will increase over time as you pick up more best practices and design patterns. I explore some of these in the last section.
Step 5. Make testing an integral part of your coding with test driven development
The approach of TDD (test driven development) highlights that writing code and testing don’t need to be separate activities, but can be in conjunction. The video above from Martin Fowler has already mentioned this (red/green/refactor) workflow where we can:
- Write a test for functionality that we want. This test will fail (i.e. red) because we have not yet written the code to implement the functionality.
- Write the code. Just get it working, it won’t be pretty, but get it to a point where the tests pass (i.e. green).
- Refactor the code. Now you can update what you just did to improve the readability, speed, or modularity of the code (for example), with some sense of security from your test(s) that you haven’t changed functionality.
Once you’ve got this down, then you can get into lots of esoteric debates about the relative merits of unit tests (do all the small units of code do what I want?) versus integration tests (does the code all together do what I want it to do?). Personally I mostly favour integration tests, and a few select unit tests for really important functions/objects. I don’t like to go overboard with unit tests as it locks in implementation details, and for me this is too constraining.
Next: learn all the things
I reckon once you’ve got these skills and workflows sorted, then you can just go off an explore all of programming space. Much of this comes down to style, personal preferences, and what works well for your particular language. There are many good programming practices which have emerged and have been found to be “generally a good idea”. One of the key things to figure out is the differences between declarative vs. imperative and procedural vs. object oriented vs. functional programming paradigms. Maybe I’ll write about these in the future, but one of the key approaches that I think is really useful, is the concept of de-complecting things. Most shitty code is just a big jumble of different things going on, for example, I used to have big functions full of various state, lots of computation and plotting and file IO all jumbled up. Any wrong step and the whole thing would break either visibly, or more subtly. So a key concept I’ve picked up is the notion of de-complecting. So I really recommend this video (which I cannot embed) Simple made easy, by Rich Hickey.