Skip to contents

Screen files

screen_csv()
Run all checks against files
screen_dfs()
Run all checks against data and metadata
screen_filenames()
Run all checks against filenames

Example datasets

example_api_long
Example data that has a column name too long for the API
example_api_long_meta
Example metadata that has a column name too long for the API
example_comma_data
Example data that contains strings containing commas beyond the default sniffing length. The original can be obtained from the EES test suite
example_comma_meta
Meta data to be used with comma_data
example_data
Example data
example_filter_group
Example filter group data
example_filter_group_meta
Example filter group metadata
example_filter_group_wrow
Example filter group with extra row data
example_filter_group_wrow_meta
Example filter group metadata with extra row
example_meta
Example metadata
example_output
Example screening output pre-generated from example_data and example_meta
ees_robot_test_data
A set of the robot test files as used in EES UI tests. These are available from: https://github.com/dfe-analytical-services/explore-education-statistics/ tree/dev/tests/robot-tests/tests/files

Generate test files

generate_test_dfs()
EES-ily generate some beefy test files

Reference values

Objects containing required, acceptable, or reference values, e.g. mandatory column names

req_data_cols
Required data columns
req_meta_cols
Required metadata columns
optional_meta_cols
Optional metadata columns
acceptable_countries
Countries lookup
acceptable_edas
English devolved areas lookup
acceptable_ethnicity_values
Acceptable ethnicity values
acceptable_extra_geog_options
Extra geography options
acceptable_indicator_units
Acceptable values for indicator units
acceptable_lads
Local authority districts lookup
acceptable_las
Local authorities lookup
acceptable_leps
Local enterprise partnerships lookup
acceptable_lsips
Local skills improvement plan areas lookup
acceptable_pcons
Parliamentary constituencies lookup
acceptable_regions
Regions lookup
acceptable_time_ids
Acceptable time identifiers
acceptable_wards
Wards lookup
geography_df
Acceptable geographic levels and their associated columns
api_char_limits
API character limits
data_dictionary
Data dictionary
gss_symbols
Acceptable values for GSS symbols
legacy_gss_symbols
Legacy values for GSS symbols
four_digit_identifiers
Time identifiers that should have 4 digit numbers
six_digit_identifiers
Time identifiers that should have 6 digit numbers
potential_ob_units_regex
Regex pattern for potential observational unit columns
harmonised_col_names
Harmonised column names lookup

Filename checks

check_filename_spaces()
Check for spaces in filename
check_filename_special()
Check for special characters in filename
check_filenames_match()
Check filenames line up between data and metadata files

Pre-check necessary columns exist

Ensure the mandatory columns are present before running other checks on the files.

precheck_col_invalid_meta()
Check there are no invalid columns in the metadata
precheck_col_req_data()
Check all required columns are present in data
precheck_col_req_meta()
Check all required columns are present in metadata
precheck_col_to_rows()
Quick check of data columns vs metadata rows

Pre-checks across data and metadata

These checks validate consistency between the data file and metadata file.

precheck_cross_data_to_meta()
Check all data variables exist in the metadata file
precheck_cross_meta_to_data()
Check all metadata variables exist in the data file

Checks on column names

These checks validate the names of columns in the data files.

check_col_ind_smushed()
Check that no indicator column names contain typical filter entries
check_col_names_spaces()
Check for spaces in variable names This function checks for spaces in the variable names of a given data frame.
check_col_snake_case()
Check that column names follow snake_case convention
check_col_var_characteristic()
Check for characteristic or characteristic_group variable names
check_col_var_start()
Check that variable names start with a lowercase letter

Pre-checks on metadata

These checks should be run before any other metadata checks, as they validate core assumptions about the metadata file itself.

precheck_meta_col_name()
Check col_name is completed for all rows
precheck_meta_col_type()
Check col_type entries are valid
precheck_meta_ob_unit()
Check there are no observational units with rows in the metadata

Metadata checks

check_meta_col_name_dupe()
Check there are no duplicated column names
check_meta_col_name_spaces()
Check that no col_name values have spaces
check_meta_dupe_label()
Check there are no duplicate labels
check_meta_fil_grp()
Check filter_group is not set for indicator rows
check_meta_fil_grp_dupe()
Check all of the filter_group values are unique
check_meta_fil_grp_is_fil()
Check that filter groups are filters
check_meta_fil_grp_match()
Check filter groups match in meta and data
check_meta_fil_grp_stripped()
Check filter groups are unique when stripping non-alphanumeric characters
check_meta_filter_hint()
Check filter_hint is not set for indicator rows
check_meta_geog_catch()
Check for geography columns mislabelled as filters
check_meta_ind_dp_set()
Check indicator_dp is set for all indicator rows
check_meta_ind_dp_values()
Check indicator_dp only contains blanks or positive integer values.
check_meta_ind_unit()
Check indicator_dp is set for all indicator rows
check_meta_ind_unit_validation()
Check indicator values are valid
check_meta_indicator_dp()
Check indicator_dp is blank for all filters
check_meta_indicator_grouping()
Check indicator_grouping is blank for all filters
check_meta_label()
Check every row has a label

Pre-checks on time columns

These checks should be run before any other time checks, as they validate core assumptions about the time columns.

precheck_time_id_mix()
Check the mix of time identifiers in the data file
precheck_time_id_valid()
Check all time_identifier values are valid
precheck_time_period_num()
Check for any non-numeric time_period values

Pre-checks on filters

These checks should be run before any other filter checks, as they validate core assumptions about the filter columns.

precheck_filter_not_singular()
Check all filters have more than one level

Pre-checks on geography columns

These checks should be run before any other geography checks, as they validate core assumptions about the geography columns.

precheck_geog_level()
Check that the geographic_level values are all valid.
precheck_geog_level_present()
Check we have the right columns for the geographic level

Time checks

check_time_period()
Check that time periods match the time identifier
check_time_period_six()
Check that 6 digit time periods give consecutive years

Filter checks

check_filter_blanks()
Check for blank values in filter and filter group columns
check_filter_defaults()
Check default filter values are present in data
check_filter_group_level()
Check filter groups have an equal or lower number of levels
check_filter_item_limit()
Check number of filter items are below limit Ensures that no filter contains more than 25,000 unique options, this is to protect the EES service against accidental data issues that can cause performance issues within the admin system.
check_filter_ob_total()
Check for Total or All values in observational unit columns
check_filter_whitespace()
Check no filter values have leading or trailing whitespace

Geography checks

check_geog_country_combos()
Check country code and name combinations
check_geog_eda_combos()
Check English devolved area code and name combinations
check_geog_ignored_rows()
Check for rows ignored by the EES table tool
check_geog_la_col_present()
Check that all local authority columns are present together
check_geog_la_combos()
Check local authority code and name combinations
check_geog_lad_combos()
Check local authority district code and name combinations
check_geog_lep_combos()
Check local enterprise partnership code and name combinations
check_geog_level_completed()
Check geographic level columns are completed
check_geog_lsip_combos()
Check local skills improvement plan area code and name combinations
check_geog_na()
Check NA geography codes have the correct name
check_geog_na_code()
Check 'Not available' location codes
check_geog_other_code_dupes()
Check other geography code duplicates
check_geog_other_dupes()
Check for geography name to code duplicates in lower-level geographies
check_geog_overcompleted_cols()
Check for geographic columns completed for unexpected rows
check_geog_pcon_combos()
Check parliamentary constituency code and name combinations
check_geog_region_col_present()
Check region code and name columns are both present
check_geog_region_combos()
Check region code and name combinations
check_geog_region_for_la()
Check that region columns are complete for Local authority rows
check_geog_region_for_lad()
Check that region columns are complete for Local authority district rows
check_geog_ward_combos()
Check ward code and name combinations

General checks

These can be run on any file, regardless of type or structure.

check_general_dupes()
Check for duplicate rows in data
check_general_null()
Check for null and legacy no-data symbols

Indicator checks

check_ind_invalid_entry()
Check for invalid values in indicators

Harmonised variable checks

These checks validate that column names and filter values conform to the DfE harmonised data standards.

check_harmonised_eth_char_grp()
Check ethnicity characteristic_group values against standards
check_harmonised_eth_char_vals()
Check ethnicity characteristic values against GSS standards
check_harmonised_eth_vals()
Check ethnicity column values against GSS standards
check_harmonised_variables()
Check col names against harmonised data standards

API specific checks

check_api_char_col_label()
Check if column labels exceed a character limit
check_api_char_col_name()
Check if column names exceed a character limit
check_api_char_filter_items()
Check if filter items or location names exceed character limit
check_api_char_loc_code()
Check if location codes exceed a character limit
check_data_dict_col_name()
Check col_names against the data dictionary
check_data_dict_fil_item()
Check filter items against the data dictionary