Explore education statistics data screening • eesyscreener

Quality assurance checks used to screen files against the standards required for official statistics files on the explore education statistics (EES) platform.

Installation

eesyscreener is not currently available on CRAN. For the time being you can install the development version from GitHub.

# install.packages("devtools")
devtools::install_github("dfe-analytical-services/eesyscreener")

Minimal example

This shows a quick reproducible example you can run in the console to test with. It also shows an example of the output structure from the core screen_files() function.

library(eesyscreener)

screen_files(
  "data.csv",
  "data.meta.csv",
  example_data, # replace with your data file
  example_meta # replace with your meta file
)
#> $results_table
#>                   check result
#> 1 check_filename_spaces   PASS
#> 2 check_filename_spaces   PASS
#> 3      check_empty_cols   PASS
#>                                                 message stage
#> 1      'data.csv' does not have spaces in the filename.     1
#> 2 'data.meta.csv' does not have spaces in the filename.     1
#> 3           'data.csv' does not have any blank columns.     1
#> 
#> $overall_stage
#> [1] "Passed"
#> 
#> $overall_message
#> [1] "Passed all checks"

Example CSVs

Quick examples of how to make use of the data within the package to generate CSVs for testing:

library(eesyscreener)

write.csv(example_data, "example_data.csv", row.names = FALSE)
write.csv(example_meta, "example_data.meta.csv", row.names = FALSE)

# Generate a file pairing that will fail the tests (spaces in filename)
write.csv(example_data, "example data.csv", row.names = FALSE)
write.csv(example_meta, "example data.meta.csv", row.names = FALSE)

Other available example files can be found on the documentation site under examples. Use write.csv() as in the examples above to generate CSVs from them.

Generate big test files

If you want to generate larger files for testing with, you can use the generate_test_dfs() function to create files with any number of time periods, locations, filters and indicators.

files <- eesyscreener::generate_test_dfs(
  years = 2013:2015, 
  pcon_names = "Sheffield Central", 
  pcon_codes = "E14000919", 
  num_filters = 2, 
  num_indicators = 3
)

# Data and metadata are returned in a list, to extract:
df <- files$data
df_meta <- files$meta

If you want to go really big, combine with the dfeR package, to pass in vectors of Parliamentary Constituencies, and then data.table for much faster CSV creation.

The following example creates an example data and metadata pair with a data set of just over 6 million rows. Formula to calculate rows is:

length(years) * length(pcon_codes) * (5 ^ num_filters)

# Load this eesyscreener package
devtools::load_all()

# Additional dependencies
library("dfeR")
library("data.table")

# Get a data frame of all Parliamentary Constituencies in England
pcons <- dfeR::fetch_pcons(countries = "England")

# As this is generating a big file, and isn't overly optimised, it may take a minute or two
beefy <- eesyscreener::generate_test_dfs(
  years = c(1980:2025),
  pcon_codes = pcons$pcon_code,
  pcon_names = pcons$pcon_name,
  num_filters = 3,
  num_indicators = 45,
  verbose = TRUE
)

# Then to create CSVs, use data.table as it's much faster
data.table::fwrite(beefy$data, "beefy_data.csv")
data.table::fwrite(beefy$meta, "beefy_data.meta.csv")