Package index
-
screen_csv() - Run all checks against files
-
screen_dfs() - Run all checks against data and metadata
-
screen_filenames() - Run all checks against filenames
-
data_dictionary - Data dictionary
-
example_api_long - Example data that has a column name too long for the API
-
example_api_long_meta - Example metadata that has a column name too long for the API
-
example_comma_data - Example data that contains strings containing commas beyond the default sniffing length. The original can be obtained from the EES test suite
-
example_comma_meta - Meta data to be used with comma_data
-
example_data - Example data
-
example_filter_group - Example filter group data
-
example_filter_group_meta - Example filter group metadata
-
example_filter_group_wrow - Example filter group with extra row data
-
example_filter_group_wrow_meta - Example filter group metadata with extra row
-
example_meta - Example metadata
-
example_output - Example screening output pre-generated from example_data and example_meta
-
ees_robot_test_data - A set of the robot test files as used in EES UI tests. These are available from: https://github.com/dfe-analytical-services/explore-education-statistics/tree/dev/tests/robot-tests/tests/files
-
generate_test_dfs() - EES-ily generate some beefy test files
Reference values
Objects containing required, acceptable, or reference values, e.g. mandatory column names
-
req_data_cols - Required data columns
-
req_meta_cols - Required metadata columns
-
optional_meta_cols - Optional metadata columns
-
acceptable_indicator_units - Acceptable values for indicator units
-
acceptable_time_ids - Acceptable time identifiers
-
geography_df - Acceptable geographic levels and their associated columns
-
api_char_limits - API character limits
-
four_digit_identifiers - Time identifiers that should have 4 digit numbers
-
six_digit_identifiers - Time identifiers that should have 6 digit numbers
-
check_filename_spaces() - Check for spaces in filename
-
check_filename_special() - Check for special characters in filename
-
check_filenames_match() - Check filenames line up between data and metadata files
Pre-check necessary columns exist
Ensure the mandatory columns are present before running other checks on the files.
-
precheck_col_invalid_meta() - Check there are no invalid columns in the metadata
-
precheck_col_req_data() - Check all required columns are present in data
-
precheck_col_req_meta() - Check all required columns are present in metadata
-
precheck_col_to_rows() - Quick check of data columns vs metadata rows
Pre-checks on metadata
These checks should be run before any other metadata checks, as they validate core assumptions about the metadata file itself.
-
precheck_meta_col_name() - Check col_name is completed for all rows
-
precheck_meta_col_type() - Check col_type entries are valid
-
precheck_meta_ob_unit() - Check there are no observational units with rows in the metadata
-
check_meta_col_name_duplicate() - Check there are no duplicated column names
-
check_meta_col_name_spaces() - Check that no col_name values have spaces
-
check_meta_duplicate_label() - Check there are no duplicate labels
-
check_meta_filter_group() - Check filter_group is not set for indicator rows
-
check_meta_filter_group_duplicate() - Check all of the filter_group values are unique
-
check_meta_filter_group_is_filter() - Check that filter groups are filters
-
check_meta_filter_group_match() - Check filter groups match in meta and data
-
check_meta_filter_group_stripped() - Check filter groups are unique when stripping non-alphanumeric characters
-
check_meta_filter_hint() - Check filter_hint is not set for indicator rows
-
check_meta_geog_catch() - Check for geography columns mislabelled as filters
-
check_meta_ind_dp_set() - Check indicator_dp is set for all indicator rows
-
check_meta_ind_dp_values() - Check indicator_dp only contains blanks or positive integer values.
-
check_meta_ind_unit() - Check indicator_dp is set for all indicator rows
-
check_meta_ind_unit_validation() - Check indicator values are valid
-
check_meta_indicator_dp() - Check indicator_dp is blank for all filters
-
check_meta_indicator_grouping() - Check indicator_grouping is blank for all filters
-
check_meta_label() - Check every row has a label
-
check_col_names_spaces() - Check for spaces in variable names This function checks for spaces in the variable names of a given data frame.
-
check_col_snake_case() - Check that column names follow snake_case convention
Pre-checks on time columns
These checks should be run before any other time checks, as they validate core assumptions about the time columns.
-
precheck_time_id_mix() - Check the mix of time identifiers in the data file
-
precheck_time_id_valid() - Check all time_identifier values are valid
-
precheck_time_period_num() - Check for any non-numeric time_period values
Pre-checks on geography columns
These checks should be run before any other geography checks, as they validate core assuptions about the geography columns.
-
precheck_geog_level() - Check that the geographic_level values are all valid.
-
check_filter_defaults() - Check filenames line up between data and metadata files
-
check_filter_item_limit() - Check number of filter items are below limit Ensures that no filter contains more than 25,000 unique options, this is to protect the EES service against accidental data issues that can cause performance issues within the admin system.
-
check_filter_whitespace() - Check no filter values have leading or trailing whitespaces
-
check_time_period() - Check that time periods match the time identifier
-
check_time_period_six() - Check that 6 digit time periods give consecutive years
-
check_api_char_limit() - Check if values exceed a character limit
-
check_api_dict_col_names() - Check if col_names are present in the data dictionary