Chapter 3 Fundamentals: How to write good code
There are no hard and fast rules for how to design good code. Much will depend on the task in hand, the coding platform chosen, and the preferences of the individual. But experienced coders have learnt what works well, and some of the pitfalls to avoid, when preparing to write code. The guidelines below set out to capture that experience as a series of individual tips and advice.
Use comments judiciously throughout your code
They can and should be used to frame distinct sections within your work, to signpost the flow of the work, to clarify choices (e.g. of methodology), and to explain why you have done what you have done (the “what” is usually pretty obvious from the coding itself). For example, you may be excluding all teachers under the age of 25: this will be obvious in the code itself but your comments should explain why this group of teachers is not of interest (e.g. they are likely to lack experience). There’s no good excuse for not commenting your code, and both you and others will reap the benefits when you do. A simple header is a useful way of capturing essential contextual information about the purpose of the code. Without, it may be difficult (or impossible) to identify the purpose of the code and why it was written. Commenting your code is a key component within the documentation pillar of the QA framework and is covered in more detail in the documentation section.
Give your variables descriptive names
There is very little to be gained by using obscure variable names (e.g. x1, a). Choose variable names that reflect the role of the variable in your work (and the same for functions and subqueries). Most coding languages support numerous naming conventions (some of them are discussed in chapter 4) and none is inherently better than any other. Choose one you are comfortable with – and use it consistently. This will go a long way to making your code easier to read and assure.
Avoid embedding constants and literals (“hard-coded” values) within the body of your code
This is most relevant when building models with parameters that could potentially be changed on subsequent runs. Ideally there should be a dedicated section at the top of the code which declares variables (data-typing where necessary) and initialises them – this allows others to see at glance the numerical assumptions and definitions your code employs.
Shorter is not necessarily better
Writing a few lines of very compact, dense, convoluted code might seem efficient and demonstrate your prowess in the language but it is generally better to write more lines of clear, simple code that are easy to understand. Avoid using multiple levels of nested code whenever possible too.
Resource and prepare the input data
It is unlikely that the data you need as input is in a format that is immediately useful. Decide the scope of the data you need and make an extract accordingly: don’t just pull a whole data table into your code when you only need a small part of it.
Be clear about your aims
Ensure that you understand fully what your code has to achieve. Talk with your customer or other stakeholders to get a good sense of what they are expecting. Ask them about the level of detail they are seeking, the available data resources, the assumptions they are prepared to make, the output they require, and the timeframe for the work.
Think about the programming environment you want to work in
Familiarity with language is just one factor, but it should not be the only one. Other issues to think about are the current location of input data, the size of any data sets you need to use, and the scale/dimension of the task. Consider what you will need to produce as output and how you (and your customers) will visualise this.
Choose the best methodology for the task
You are likely to have a number of alternative algorithms or approaches to choose from. A good rule here is to “keep it simple” and choose the least complex of these: this may be enough for your purposes. If your code is well written, it is often relatively easy to upgrade to a more sophisticated approach should the need arise.
Think about how you will structure the work your code will do
Break down the task into smaller, manageable tasks that fulfil a well-defined logical role. You should aim to have clear view of the tasks involved, and their ordering, well before you even think about writing your first line of code. Capture your design in whatever way you find most useful: a simple flowchart or some pseudo-code is generally all you need.
Reflect your chosen structure in the coding
For simple programs there will be a natural “flow” to the code which can be read in linear fashion from “start to finish” like the chapters in a book. More complex programs will include conditional branching or the use of supporting subqueries and/or functions, and in this case the program flow is less obvious. Use of devices such as partitioning and commenting help set out the logical flow of your code. You can read about this in more detail here.
Do not worry too much about optimising your code at the start
Your goal should be to write code that is easy to read, maintain and update as this will generally be more cost-effective in the long run (in terms of analyst time) than making marginal gains from a slightly faster running time.
Don’t get too hung up refining the initial design of your program
You will not be able to foresee every requirement or decide every detail upfront and trying to do this will simply hold up progress. As you start to code, you will almost inevitably encounter new requirements, or find ways of reducing complexity, that require you to revise your initial plans. Just remember that iteration is a natural part of the development process.
Avoid repetition where possible
It can be tempting to simply “cut and paste” sections of code when there’s a need to do similar work: this is not good practice. Look instead for the commonalities and decide if they could be exploited through writing a function or subroutine, or perhaps by some form of looping structure. Not only will this make your code neater and shorter, but it will reduce the risk of errors proliferating.
Don’t reinvent existing solutions
Reuse existing solutions whenever possible, especially where they have already been quality assured. Ask colleagues who might have done something similar already and who can share existing code for a function or procedure. Don’t be afraid to explore online for new features of your chosen language - this can often save considerable effort. Some languages (e.g. R) make extensive use of online libraries of open source code: this is ideal but don’t just treat their contents as black boxes – make sure you understand what they do and how (not least as this will generally improve your own coding capabilities).
Delete unnecessary code
It is always quite tempting to leave an extra section of code in place which has useful “stuff” in it, even though you have since devised a better way of coding the task. Try to avoid doing this as it will only distract or confuse other readers.
Refactor your code if you need
If you know (a part of) your code is not well structured or presented - or you can see a better way to tackle the issue - then work to improve it straight away. Don’t leave it until later as it will be harder once you’ve forgotten the nature of the task. It’s not wasted work as it improves the quality of the product and facilitates the QA process.
Practice version control
Over time you will tweak, change or refactor your code. It’s important to clearly document these changes, and ensure that older versions of the code are still available, especially if they were used to produce outputs. For every output from a coding project, it should be clear which version of the data and which version of your code was used to produce it, and you should be able to quickly reproduce that output if required. Tools such as Git are available to help you manage this process, but even without these tools you should strive to follow the same principles. We cover this in more detail here.
Testing
Testing that your code works is something that should be considered at all stages of the writing process. Each function or small chunk of code that you write will have some expected outputs given certain inputs – check that those outputs are what you expect them to be. You should try to test as you write, which flags up problems early and so avoids issues later down the line. Don’t just test at the end. We cover this in much more detail here.