Skip to contents

Screen files

screen_csv()
Run all checks against files
screen_dfs()
Run all checks against data and metadata
screen_filenames()
Run all checks against filenames

Data dictionary

data_dictionary
Data dictionary

Example datasets

example_api_long
Example data that has a column name too long for the API
example_api_long_meta
Example metadata that has a column name too long for the API
example_comma_data
Example data that contains strings containing commas beyond the default sniffing length. The original can be obtained from the EES test suite
example_comma_meta
Meta data to be used with comma_data
example_data
Example data
example_filter_group
Example filter group data
example_filter_group_meta
Example filter group metadata
example_filter_group_wrow
Example filter group with extra row data
example_filter_group_wrow_meta
Example filter group metadata with extra row
example_meta
Example metadata
example_output
Example screening output pre-generated from example_data and example_meta
ees_robot_test_data
A set of the robot test files as used in EES UI tests. These are available from: https://github.com/dfe-analytical-services/explore-education-statistics/tree/dev/tests/robot-tests/tests/files

Generate test files

generate_test_dfs()
EES-ily generate some beefy test files

Reference values

Objects containing required, acceptable, or reference values, e.g. mandatory column names

req_data_cols
Required data columns
req_meta_cols
Required metadata columns
optional_meta_cols
Optional metadata columns
acceptable_indicator_units
Acceptable values for indicator units
acceptable_time_ids
Acceptable time identifiers
geography_df
Acceptable geographic levels and their associated columns
api_char_limits
API character limits
four_digit_identifiers
Time identifiers that should have 4 digit numbers
six_digit_identifiers
Time identifiers that should have 6 digit numbers

Filename checks

check_filename_spaces()
Check for spaces in filename
check_filename_special()
Check for special characters in filename
check_filenames_match()
Check filenames line up between data and metadata files

Pre-check necessary columns exist

Ensure the mandatory columns are present before running other checks on the files.

precheck_col_invalid_meta()
Check there are no invalid columns in the metadata
precheck_col_req_data()
Check all required columns are present in data
precheck_col_req_meta()
Check all required columns are present in metadata
precheck_col_to_rows()
Quick check of data columns vs metadata rows

Pre-checks on metadata

These checks should be run before any other metadata checks, as they validate core assumptions about the metadata file itself.

precheck_meta_col_name()
Check col_name is completed for all rows
precheck_meta_col_type()
Check col_type entries are valid
precheck_meta_ob_unit()
Check there are no observational units with rows in the metadata

Metadata checks

check_meta_col_name_duplicate()
Check there are no duplicated column names
check_meta_col_name_spaces()
Check that no col_name values have spaces
check_meta_duplicate_label()
Check there are no duplicate labels
check_meta_filter_group()
Check filter_group is not set for indicator rows
check_meta_filter_group_duplicate()
Check all of the filter_group values are unique
check_meta_filter_group_is_filter()
Check that filter groups are filters
check_meta_filter_group_match()
Check filter groups match in meta and data
check_meta_filter_group_stripped()
Check filter groups are unique when stripping non-alphanumeric characters
check_meta_filter_hint()
Check filter_hint is not set for indicator rows
check_meta_geog_catch()
Check for geography columns mislabelled as filters
check_meta_ind_dp_set()
Check indicator_dp is set for all indicator rows
check_meta_ind_dp_values()
Check indicator_dp only contains blanks or positive integer values.
check_meta_ind_unit()
Check indicator_dp is set for all indicator rows
check_meta_ind_unit_validation()
Check indicator values are valid
check_meta_indicator_dp()
Check indicator_dp is blank for all filters
check_meta_indicator_grouping()
Check indicator_grouping is blank for all filters
check_meta_label()
Check every row has a label

Core data file structure

Checks on column names

These checks validate the names of columns in the data files.

check_col_names_spaces()
Check for spaces in variable names This function checks for spaces in the variable names of a given data frame.
check_col_snake_case()
Check that column names follow snake_case convention

General checks

These can be run on any file, regardless of type or structure.

Pre-checks on filters

Pre-checks on time columns

These checks should be run before any other time checks, as they validate core assumptions about the time columns.

precheck_time_id_mix()
Check the mix of time identifiers in the data file
precheck_time_id_valid()
Check all time_identifier values are valid
precheck_time_period_num()
Check for any non-numeric time_period values

Pre-checks on geography columns

These checks should be run before any other geography checks, as they validate core assuptions about the geography columns.

precheck_geog_level()
Check that the geographic_level values are all valid.

General data file checks

Filter checks

check_filter_defaults()
Check filenames line up between data and metadata files
check_filter_item_limit()
Check number of filter items are below limit Ensures that no filter contains more than 25,000 unique options, this is to protect the EES service against accidental data issues that can cause performance issues within the admin system.
check_filter_whitespace()
Check no filter values have leading or trailing whitespaces

Indicator checks

Geography checks

Time checks

check_time_period()
Check that time periods match the time identifier
check_time_period_six()
Check that 6 digit time periods give consecutive years

API specific checks

check_api_char_limit()
Check if values exceed a character limit
check_api_dict_col_names()
Check if col_names are present in the data dictionary