
Package index
-
screen_csv() - Run all checks against files
-
screen_dfs() - Run all checks against data and metadata
-
screen_filenames() - Run all checks against filenames
-
example_api_long - Example data that has a column name too long for the API
-
example_api_long_meta - Example metadata that has a column name too long for the API
-
example_comma_data - Example data that contains strings containing commas beyond the default sniffing length. The original can be obtained from the EES test suite
-
example_comma_meta - Meta data to be used with comma_data
-
example_data - Example data
-
example_filter_group - Example filter group data
-
example_filter_group_meta - Example filter group metadata
-
example_filter_group_wrow - Example filter group with extra row data
-
example_filter_group_wrow_meta - Example filter group metadata with extra row
-
example_meta - Example metadata
-
example_output - Example screening output pre-generated from example_data and example_meta
-
ees_robot_test_data - A set of the robot test files as used in EES UI tests. These are available from: https://github.com/dfe-analytical-services/explore-education-statistics/ tree/dev/tests/robot-tests/tests/files
-
generate_test_dfs() - EES-ily generate some beefy test files
Reference values
Objects containing required, acceptable, or reference values, e.g. mandatory column names
-
req_data_cols - Required data columns
-
req_meta_cols - Required metadata columns
-
optional_meta_cols - Optional metadata columns
-
acceptable_countries - Countries lookup
-
acceptable_edas - English devolved areas lookup
-
acceptable_ethnicity_values - Acceptable ethnicity values
-
acceptable_extra_geog_options - Extra geography options
-
acceptable_indicator_units - Acceptable values for indicator units
-
acceptable_lads - Local authority districts lookup
-
acceptable_las - Local authorities lookup
-
acceptable_leps - Local enterprise partnerships lookup
-
acceptable_lsips - Local skills improvement plan areas lookup
-
acceptable_pcons - Parliamentary constituencies lookup
-
acceptable_regions - Regions lookup
-
acceptable_time_ids - Acceptable time identifiers
-
acceptable_wards - Wards lookup
-
geography_df - Acceptable geographic levels and their associated columns
-
api_char_limits - API character limits
-
data_dictionary - Data dictionary
-
gss_symbols - Acceptable values for GSS symbols
-
legacy_gss_symbols - Legacy values for GSS symbols
-
four_digit_identifiers - Time identifiers that should have 4 digit numbers
-
six_digit_identifiers - Time identifiers that should have 6 digit numbers
-
potential_ob_units_regex - Regex pattern for potential observational unit columns
-
harmonised_col_names - Harmonised column names lookup
-
check_filename_spaces() - Check for spaces in filename
-
check_filename_special() - Check for special characters in filename
-
check_filenames_match() - Check filenames line up between data and metadata files
Pre-check necessary columns exist
Ensure the mandatory columns are present before running other checks on the files.
-
precheck_col_invalid_meta() - Check there are no invalid columns in the metadata
-
precheck_col_req_data() - Check all required columns are present in data
-
precheck_col_req_meta() - Check all required columns are present in metadata
-
precheck_col_to_rows() - Quick check of data columns vs metadata rows
Pre-checks across data and metadata
These checks validate consistency between the data file and metadata file.
-
precheck_cross_data_to_meta() - Check all data variables exist in the metadata file
-
precheck_cross_meta_to_data() - Check all metadata variables exist in the data file
-
check_col_ind_smushed() - Check that no indicator column names contain typical filter entries
-
check_col_names_spaces() - Check for spaces in variable names This function checks for spaces in the variable names of a given data frame.
-
check_col_snake_case() - Check that column names follow snake_case convention
-
check_col_var_characteristic() - Check for characteristic or characteristic_group variable names
-
check_col_var_start() - Check that variable names start with a lowercase letter
Pre-checks on metadata
These checks should be run before any other metadata checks, as they validate core assumptions about the metadata file itself.
-
precheck_meta_col_name() - Check col_name is completed for all rows
-
precheck_meta_col_type() - Check col_type entries are valid
-
precheck_meta_ob_unit() - Check there are no observational units with rows in the metadata
-
check_meta_col_name_dupe() - Check there are no duplicated column names
-
check_meta_col_name_spaces() - Check that no col_name values have spaces
-
check_meta_dupe_label() - Check there are no duplicate labels
-
check_meta_fil_grp() - Check filter_group is not set for indicator rows
-
check_meta_fil_grp_dupe() - Check all of the filter_group values are unique
-
check_meta_fil_grp_is_fil() - Check that filter groups are filters
-
check_meta_fil_grp_match() - Check filter groups match in meta and data
-
check_meta_fil_grp_stripped() - Check filter groups are unique when stripping non-alphanumeric characters
-
check_meta_filter_hint() - Check filter_hint is not set for indicator rows
-
check_meta_geog_catch() - Check for geography columns mislabelled as filters
-
check_meta_ind_dp_set() - Check indicator_dp is set for all indicator rows
-
check_meta_ind_dp_values() - Check indicator_dp only contains blanks or positive integer values.
-
check_meta_ind_unit() - Check indicator_dp is set for all indicator rows
-
check_meta_ind_unit_validation() - Check indicator values are valid
-
check_meta_indicator_dp() - Check indicator_dp is blank for all filters
-
check_meta_indicator_grouping() - Check indicator_grouping is blank for all filters
-
check_meta_label() - Check every row has a label
Pre-checks on time columns
These checks should be run before any other time checks, as they validate core assumptions about the time columns.
-
precheck_time_id_mix() - Check the mix of time identifiers in the data file
-
precheck_time_id_valid() - Check all time_identifier values are valid
-
precheck_time_period_num() - Check for any non-numeric time_period values
Pre-checks on filters
These checks should be run before any other filter checks, as they validate core assumptions about the filter columns.
-
precheck_filter_not_singular() - Check all filters have more than one level
Pre-checks on geography columns
These checks should be run before any other geography checks, as they validate core assumptions about the geography columns.
-
precheck_geog_level() - Check that the geographic_level values are all valid.
-
precheck_geog_level_present() - Check we have the right columns for the geographic level
-
check_time_period() - Check that time periods match the time identifier
-
check_time_period_six() - Check that 6 digit time periods give consecutive years
-
check_filter_blanks() - Check for blank values in filter and filter group columns
-
check_filter_defaults() - Check default filter values are present in data
-
check_filter_group_level() - Check filter groups have an equal or lower number of levels
-
check_filter_item_limit() - Check number of filter items are below limit Ensures that no filter contains more than 25,000 unique options, this is to protect the EES service against accidental data issues that can cause performance issues within the admin system.
-
check_filter_ob_total() - Check for Total or All values in observational unit columns
-
check_filter_whitespace() - Check no filter values have leading or trailing whitespace
-
check_geog_country_combos() - Check country code and name combinations
-
check_geog_eda_combos() - Check English devolved area code and name combinations
-
check_geog_ignored_rows() - Check for rows ignored by the EES table tool
-
check_geog_la_col_present() - Check that all local authority columns are present together
-
check_geog_la_combos() - Check local authority code and name combinations
-
check_geog_lad_combos() - Check local authority district code and name combinations
-
check_geog_lep_combos() - Check local enterprise partnership code and name combinations
-
check_geog_level_completed() - Check geographic level columns are completed
-
check_geog_lsip_combos() - Check local skills improvement plan area code and name combinations
-
check_geog_na() - Check NA geography codes have the correct name
-
check_geog_na_code() - Check 'Not available' location codes
-
check_geog_other_code_dupes() - Check other geography code duplicates
-
check_geog_other_dupes() - Check for geography name to code duplicates in lower-level geographies
-
check_geog_overcompleted_cols() - Check for geographic columns completed for unexpected rows
-
check_geog_pcon_combos() - Check parliamentary constituency code and name combinations
-
check_geog_region_col_present() - Check region code and name columns are both present
-
check_geog_region_combos() - Check region code and name combinations
-
check_geog_region_for_la() - Check that region columns are complete for Local authority rows
-
check_geog_region_for_lad() - Check that region columns are complete for Local authority district rows
-
check_geog_ward_combos() - Check ward code and name combinations
-
check_general_dupes() - Check for duplicate rows in data
-
check_general_null() - Check for null and legacy no-data symbols
-
check_ind_invalid_entry() - Check for invalid values in indicators
Harmonised variable checks
These checks validate that column names and filter values conform to the DfE harmonised data standards.
-
check_harmonised_eth_char_grp() - Check ethnicity characteristic_group values against standards
-
check_harmonised_eth_char_vals() - Check ethnicity characteristic values against GSS standards
-
check_harmonised_eth_vals() - Check ethnicity column values against GSS standards
-
check_harmonised_variables() - Check col names against harmonised data standards
-
check_api_char_col_label() - Check if column labels exceed a character limit
-
check_api_char_col_name() - Check if column names exceed a character limit
-
check_api_char_filter_items() - Check if filter items or location names exceed character limit
-
check_api_char_loc_code() - Check if location codes exceed a character limit
-
check_data_dict_col_name() - Check col_names against the data dictionary
-
check_data_dict_fil_item() - Check filter items against the data dictionary