Chapter 6 Advanced: Testing code

From a coding point of view, testing covers the requirements spelled out in the Verification pillar of the QA framework. Note that testing does not overlap with the Validation QA pillar: it is concerned with code running properly rather than with the adequacy of analytical assumptions. Testing is a vast topic and as such this guide will cover console tests and in-script testing in brief and focus on formalised testing procedures, such as unit testing. These test the different components of “units” of code using tools such as testthat in R or Pytest in Python.

Testing code within the console is not recommended as it leaves no trace or records of what has been checked.

6.1 In-script testing

In-script testing is sufficient to facilitate QA in most cases, although tests should always be accompanied by brief comments signalling their presence and indicating what they are evaluating and what the expected output should be. For a quality assurer, running these tests ensures that the entire analytical workflow is reproducible. In R there are many ways to perform in-script checks, from print() statements, to base functions stop() and stopfinot(), to functions from dedicated packages such as assertthat or testit. Examples are shown below.

# This code snippet creates a function that
# returns the maximum value of a vector
# of numbers. The function contains 
# an in-script test checking that
# its input is numerical. If the condition
# is not met, execution is halted
get_max <- function(x) {
  stopifnot(is.numeric(x))
  max(x)
}

get_max(c(1:10))
# returns 10
get_max(c("a", "b", "c"))
# returns error message
# Error in get_max(c("a", "b", "c")) : is.numeric(x) is not TRUE

In Python similar approaches can be used. There is also a print function and exceptions provide a way of returning an error if a specified condition is met, much like the functionality offered by R’s stop() and stopifnot() functions. There is information about how to raise exceptions here or you can find more technical, detailed information about the different types of exceptions here.

6.2 Formalised testing

Beyond in-script tests, an analyst may decide to take a more formal approach to testing if the project warrants it. Software development has dedicated testing professionals (called “testers”) and there is plenty of literature about software testing on the internet. Much of this is transferable to code based analysis, but not all of it. There are times when it makes sense to make a distinction between a piece of analysis (which is generally operated by a technical specialist) and an IT product (e.g. a website, which is used by a broader user base). Shiny apps sit somewhere between analysis and an IT product therefore these are covered separately later on.

The following types of software testing are commonly used:

Unit testing - tests the different components or “units” of code using tools such as testthat in R.
Integration testing - tests the correct flow of data through the components.
System testing - tests the overall outputs make sense.
Acceptance testing - tests whether or not the work meets its objectives.

6.2.1 Software testing

Before we focus on unit testing, it is important to note that the other testing aspects cover testing of wider areas of a software project than just the code, and other parts of the DfE QA framework cover how to handle these:

Integration testing tests correct data flow, transformations and interactions between different parts of the analysis.
System testing relates to verification tests on the overall outputs.
Acceptance testing relates to the validation pillar of the QA framework.

Software testing can make a distinction between functional and non-functional testing too. In the context of analysis, a functional test would be one which tests the analysis produces what its intended to whereas a non-functional test would cover things such as its run-time and interface (i.e. things which affect the experience of using it, but not the numbers that come out).

For more information, see here or explore further with help from a search engine.

6.2.2 Unit testing

Unit testing seeks to test every distinct code component to enable isolation of specific areas that may be introducing errors. For code made up of functions, it is sometimes easier to think of this as testing each specific function.

One of the key principles of unit testing is that each unit (or function) should be tested independently of all the others and without dependency on any data sources that may change. The objective is to isolate the cause of any errors which means that unit tests should not rely on output from previous functions or rely on data from a live source (e.g SQL) as in both these instances it would not be possible to identify if the function being tested caused the error.

To write a good set of unit tests, think about the different types of input (e.g. positive values, zero values) the function may receive, define some fixed inputs to cover these and then write tests to ensure the different cases are handled correctly.

To implement unit tests, it is generally easiest to use a unit testing framework. These generally allow tests to be written in separate files and then all tests run at once, returning success/failure feedback.

The testthat package is a popular tool to write and run tests in R (more details here)
The pytest framework offers a similar resource for Python (more details here)

6.2.3 testthat

The testthat package, maintained by the RStudio team, is commonly used for unit testing in R. Although most of its documentation and features focus on testing in the context of package writing, testthat can also be used with regular R projects. There’s a brief overview of unit testing below, but for a more detailed introduction see here or here for a more in-depth overview of automated testing, including test driven development.

In testthat, tests are called expectations. These spell out the expected behaviour of the function or unit of code. testthat provides many functions to define expectations - see details here. Expectations falling within a same testing context are stored in a same file and all testing files are saved within a dedicated testing directory. Below is an example of testing a simple function.

Firstly we need to have a defined function to test:

# Create a simple function to demonstrate testing approach
library(tibble)

# create a data frame of random numbers
create_random_numbers_dataframe <- function(n){
  tibble(some_numbers = runif(n))
}

Then we can define some tests for it:

# This file is stored in a directory called "testing_directory" 
# Need the testthat package
library(testthat)

# Need to source our function
source("file_where_function_saved.R")

# Need to run our function to be able to test its impact
# Could also do this within tests if we wanted to test different cases
mydata <- create_random_numbers_dataframe(100)

# The first test checks that the data set
# "mydata" has a a single column and 100 rows
test_that('data dimensions correct', {
  expect_equal(ncol(mydata), 1)
  expect_equal(nrow(mydata), 100)
})

# The second test checks that the maximum value
# of the variable "some_numbers" does not exceed 1
test_that('no value greater than 1', {
  expect_lte(max(mydata$some_numbers), 1)
})

Expectations are tested using the test_dir() function, which takes the path to the testing directory as an argument:

# test expectations and examine outputs
# In this case, all tests are OK
test_dir("testing_directory")

test_dir() provides detailed outputs, including a time line of success/failure and any warnings that may have occurred.

6.2.4 pytest

pytest provides similar functionality for Python as testthat does for R. When getting started with pytest, note that pytest implements a test discovery method to find relevant tests. Unless specified otherwise, test files are assumed to be in the current working directory or a sub-folder thereof. It is possible to setup testpaths to specify different file paths for test files if this better fits the project structure. Within these folders, pytest will discover tests in Python files named as test_*.py or *_test.py. The tests themselves then need to be functions named test_* to be detected as tests.

Whereas testthat has functions to define its expectations, pytest makes use of Pythons assert statement to specify the condition the test expects to be met.

Python IDEs can come with integration for pytest, making it possible to run tests at the click of the button. If you are using Spyder see here or PyCharm see here.

6.2.5 Testing Shiny apps

R Shiny apps include not only analysis, but also a user interface and server logic. This makes them closer to a software development project than the average piece of analytical work and additional functionality is available within R to support testing of Shiny apps. Useful demonstrations of some methods can be found here and here.

Unit, integration and systems testing can all be automated with a little extra consideration given to the structure of your shiny code.

Pulling functions out of reactive objects allows unit testing to be implemented for those functions. Take the following simple reactive object that doubles the value of the input labelled x:

reactive({
  input$x * 2
})

The function can be pulled out of the reactive environment.

doubler <- function(x) x * 2

This function can be tested in the usual way. The reactive object in the server code is then just a call to the function.

reactive({
  doubler(input$x)
})

The structure of a shiny app used to make integration testing more difficult as the flow of data occurs only when the app is running. However, shiny version 1.5.0 includes a couple of functions that allow easy access to these environments for testing. The functions testServer and testModule allow manipulation and inspection of the server parameters (e.g input, output and session) as well as allowing objects created in the session to be inspected.

For example, let’s place an example function within some server code, where a number is supplied by the user (labelled x) and the doubled amount is output as text.

server <- function(input, output, session){
  
  output$value <- renderText({
    paste("Doubled:", doubler(input$x))
  })
  
}

The integration of the doubler function with the input and output can be tested by amending inputs and setting expectations as in the following example:

testServer(
  expr = {
    session$setInputs(x = 2) # set input$x to 2
    testthat::expect_equal(output$value, "Doubled: 4")
    }
  )

Checking the connections between the server and the user interface are implemented correctly can be automated using the shinytest package. This runs the app, records user interactions, and takes snapshots of the application state. In future runs, the saved user interactions are simulated and the resulting state compared to the appropriate snapshot.

Snapshot tests are recorded using shinytest::recordTest("."). In the below image of the test recording app that is launched, the numeric input value has been changed, the output has updated and the application is saved in the resulting state. When the app is exited, using the Save script and exit test recorder button, a test will be generated and saved in the tests/shinytest subfolder. When the tests are run with shinytest::testApp("."), the app will run invisibly, test scripts run, and snapshots compared. Further details can be found here.

Recording a test and snapshot using `shinytest::recordTest(“.”)

To create and test snapshots shinytest runs the app in a headless browser. To avoid proxy issues when running these tests, remove http proxy settings for the session using the following command.

Sys.unsetenv("http_proxy")

It is important not to overlook usability testing (also known as UX testing or user experience testing) when creating an app. While this cannot be automated, there is immense value in watching a third party use an app. It does not matter how immaculate the code is if the end-user finds the app hard to use or difficult to parse. As such, usability testing should form part of any QA process involving shiny apps.

6.2.6 Test driven development

“Test driven development” describes a way of working where tests are written prior to the code being written. Working out what working code encompasses before writing it allows immediate checking of whether new code works and avoids any retrofitting of tests to match achieved output. With a strong QA plan, this could also be implemented for analysis. As testing of each new bit of code happens immediately after creation it also reduces the risk of something faulty getting heavily baked in. As with testing, there are plenty of resources on the internet such as this one.