Run all of the checks from the package against the data and metadata objects.
Usage
screen_dfs(
data,
meta,
log_key = NULL,
log_dir = "./",
dd_checks = TRUE,
verbose = FALSE,
stop_on_error = FALSE,
prudence = "lavish"
)Arguments
- data
data.frame, for the data table, more efficient if supplied as a lazy duckplyr data.frame
- meta
data.frame, for the metadata table
- log_key
keystring for creating log file. If given, the screening will write a log file to disk called eesyscreening_log_<log_key>.json default=NULL
- log_dir
Directory within which to place the log file. default="./"
- dd_checks
Run the Data dictionary tests, default=TRUE (this is implemented to allow devs to update robot test data to be consistent with data dictionary tests).
- verbose
logical, if TRUE prints feedback messages to console for every test, if FALSE run silently
- stop_on_error
logical, if TRUE will stop with an error if the result is "FAIL", and will throw genuine warning if result is "WARNING"
- prudence
prudence as used by duckplyr, default = "lavish". Can also be "stingy" and "thrifty".
Examples
screen_dfs(example_data, example_meta)
#> check result
#> 1 col_req_meta PASS
#> 2 col_invalid_meta PASS
#> 3 col_req_data PASS
#> 4 col_to_rows PASS
#> 5 col_name_spaces PASS
#> 6 col_name_duplicate PASS
#> 7 col_names_spaces PASS
#> 8 col_snake_case PASS
#> 9 meta_col_type PASS
#> 10 meta_ob_unit PASS
#> 11 meta_col_name PASS
#> 12 meta_duplicate_label PASS
#> 13 meta_col_name PASS
#> 14 filter_group_is_filter PASS
#> 15 filter_groups_match PASS
#> 16 filter_group_stripped PASS
#> 17 check_meta_label PASS
#> 18 meta_filter_hint PASS
#> 19 indicator_dp PASS
#> 20 filter_group_duplicate PASS
#> 21 meta_ind_dp_set PASS
#> 22 meta_ind_unit PASS
#> 23 indicator_unit_validation PASS
#> 24 indicator_grouping PASS
#> 25 ind_dp_values PASS
#> 26 meta_geog_catch PASS
#> 27 time_id_valid PASS
#> 28 time_id_mix PASS
#> 29 time_period_num PASS
#> 30 time_period PASS
#> 31 time_period_six PASS
#> 32 geographic_level PASS
#> 33 check_filter_defaults PASS
#> 34 filter_whitespace PASS
#> 35 check_filter_item_limit PASS
#> 36 check_api_char_limit_column-name PASS
#> 37 check_api_dict_col_names PASS
#> message
#> 1 All of the required columns are present in the metadata file.
#> 2 There are no invalid columns in the metadata file.
#> 3 All of the required columns are present in the data file.
#> 4 There are an equal number of rows in the metadata file (3) and non-mandatory columns in the data file (3).
#> 5 There are no spaces in the col_name values.
#> 6 All col_name values are unique.
#> 7 There are no spaces in the variable names in the datafile.
#> 8 The variable names in the data file follow the snake_case convention.
#> 9 col_type is always 'Filter' or 'Indicator'.
#> 10 No observational units have been included in the metadata file.
#> 11 The col_name column is completed for every row in the metadata.
#> 12 All labels are unique.
#> 13 No indicators have a filter_grouping_column value.
#> 14 There are no filter groups present.
#> 15 There are no filter groups present.
#> 16 There are no filter groups present.
#> 17 The label column is completed for every row in the metadata.
#> 18 No indicators have a filter_hint value.
#> 19 No filters have an indicator_dp value.
#> 20 There are no filter groups present.
#> 21 The indicator_dp column is completed for all indicators.
#> 22 No filters have an indicator_unit value.
#> 23 The indicator_unit values are valid
#> 24 No filters have an indicator_grouping value.
#> 25 The indicator_dp column only contains blanks, zero, or positive integer values.
#> 26 No filters appear to be mislabelled geography columns.
#> 27 The time_identifier values are all valid.
#> 28 There is only one time_identifier value in the data.
#> 29 The time_period column only contains numeric values.
#> 30 The time_period length matches the time_identifier values in the data file.
#> 31 The six digit time_period values refer to consecutive years.
#> 32 The geographic_level values are all valid.
#> 33 All filters and groups have a default filter item present.
#> 34 No filter labels contain leading or trailing whitespace.
#> 35 All filters and groups have less than 25000 unique entries.
#> 36 All filter / indicator names are less than or equal to the character limit of 50.
#> 37 All col_names are consistent with the data dictionary.
#> guidance_url stage
#> 1 NA Precheck columns
#> 2 NA Precheck columns
#> 3 NA Precheck columns
#> 4 NA Precheck columns
#> 5 NA Precheck columns
#> 6 NA Precheck columns
#> 7 NA Check columns
#> 8 NA Check columns
#> 9 NA Precheck meta
#> 10 NA Precheck meta
#> 11 NA Precheck meta
#> 12 NA Check meta
#> 13 NA Check meta
#> 14 NA Check meta
#> 15 NA Check meta
#> 16 NA Check meta
#> 17 NA Check meta
#> 18 NA Check meta
#> 19 NA Check meta
#> 20 NA Check meta
#> 21 NA Check meta
#> 22 NA Check meta
#> 23 NA Check meta
#> 24 NA Check meta
#> 25 NA Check meta
#> 26 NA Check meta
#> 27 NA Precheck time
#> 28 NA Precheck time
#> 29 NA Precheck time
#> 30 NA Check time
#> 31 NA Check time
#> 32 NA Precheck geography
#> 33 NA Check filters
#> 34 NA Check filters
#> 35 NA Check filters
#> 36 NA Check API
#> 37 NA Check API