Skip to contents

Run all of the checks from the package against the data and metadata objects.

Usage

screen_dfs(
  data,
  meta,
  log_key = NULL,
  log_dir = "./",
  dd_checks = TRUE,
  verbose = FALSE,
  stop_on_error = FALSE,
  prudence = "lavish"
)

Arguments

data

data.frame, for the data table, more efficient if supplied as a lazy duckplyr data.frame

meta

data.frame, for the metadata table

log_key

keystring for creating log file. If given, the screening will write a log file to disk called eesyscreening_log_<log_key>.json default=NULL

log_dir

Directory within which to place the log file. default="./"

dd_checks

Run the Data dictionary tests, default=TRUE (this is implemented to allow devs to update robot test data to be consistent with data dictionary tests).

verbose

logical, if TRUE prints feedback messages to console for every test, if FALSE run silently

stop_on_error

logical, if TRUE will stop with an error if the result is "FAIL", and will throw genuine warning if result is "WARNING"

prudence

prudence as used by duckplyr, default = "lavish". Can also be "stingy" and "thrifty".

Value

data.frame containing the results of the screening

Details

Provide a pair of data.frames and this will run through the checks in order.

Examples

screen_dfs(example_data, example_meta)
#>                               check result
#> 1                      col_req_meta   PASS
#> 2                  col_invalid_meta   PASS
#> 3                      col_req_data   PASS
#> 4                       col_to_rows   PASS
#> 5                   col_name_spaces   PASS
#> 6                col_name_duplicate   PASS
#> 7                  col_names_spaces   PASS
#> 8                    col_snake_case   PASS
#> 9                     meta_col_type   PASS
#> 10                     meta_ob_unit   PASS
#> 11                    meta_col_name   PASS
#> 12             meta_duplicate_label   PASS
#> 13                    meta_col_name   PASS
#> 14           filter_group_is_filter   PASS
#> 15              filter_groups_match   PASS
#> 16            filter_group_stripped   PASS
#> 17                 check_meta_label   PASS
#> 18                 meta_filter_hint   PASS
#> 19                     indicator_dp   PASS
#> 20           filter_group_duplicate   PASS
#> 21                  meta_ind_dp_set   PASS
#> 22                    meta_ind_unit   PASS
#> 23        indicator_unit_validation   PASS
#> 24               indicator_grouping   PASS
#> 25                    ind_dp_values   PASS
#> 26                  meta_geog_catch   PASS
#> 27                    time_id_valid   PASS
#> 28                      time_id_mix   PASS
#> 29                  time_period_num   PASS
#> 30                      time_period   PASS
#> 31                  time_period_six   PASS
#> 32                 geographic_level   PASS
#> 33            check_filter_defaults   PASS
#> 34                filter_whitespace   PASS
#> 35          check_filter_item_limit   PASS
#> 36 check_api_char_limit_column-name   PASS
#> 37         check_api_dict_col_names   PASS
#>                                                                                                       message
#> 1                                               All of the required columns are present in the metadata file.
#> 2                                                          There are no invalid columns in the metadata file.
#> 3                                                   All of the required columns are present in the data file.
#> 4  There are an equal number of rows in the metadata file (3) and non-mandatory columns in the data file (3).
#> 5                                                                 There are no spaces in the col_name values.
#> 6                                                                             All col_name values are unique.
#> 7                                                  There are no spaces in the variable names in the datafile.
#> 8                                       The variable names in the data file follow the snake_case convention.
#> 9                                                                 col_type is always 'Filter' or 'Indicator'.
#> 10                                            No observational units have been included in the metadata file.
#> 11                                            The col_name column is completed for every row in the metadata.
#> 12                                                                                     All labels are unique.
#> 13                                                         No indicators have a filter_grouping_column value.
#> 14                                                                        There are no filter groups present.
#> 15                                                                        There are no filter groups present.
#> 16                                                                        There are no filter groups present.
#> 17                                               The label column is completed for every row in the metadata.
#> 18                                                                    No indicators have a filter_hint value.
#> 19                                                                     No filters have an indicator_dp value.
#> 20                                                                        There are no filter groups present.
#> 21                                                   The indicator_dp column is completed for all indicators.
#> 22                                                                   No filters have an indicator_unit value.
#> 23                                                                        The indicator_unit values are valid
#> 24                                                               No filters have an indicator_grouping value.
#> 25                            The indicator_dp column only contains blanks, zero, or positive integer values.
#> 26                                                     No filters appear to be mislabelled geography columns.
#> 27                                                                  The time_identifier values are all valid.
#> 28                                                       There is only one time_identifier value in the data.
#> 29                                                       The time_period column only contains numeric values.
#> 30                                The time_period length matches the time_identifier values in the data file.
#> 31                                               The six digit time_period values refer to consecutive years.
#> 32                                                                 The geographic_level values are all valid.
#> 33                                                 All filters and groups have a default filter item present.
#> 34                                                   No filter labels contain leading or trailing whitespace.
#> 35                                                All filters and groups have less than 25000 unique entries.
#> 36                          All filter / indicator names are less than or equal to the character limit of 50.
#> 37                                                     All col_names are consistent with the data dictionary.
#>    guidance_url              stage
#> 1            NA   Precheck columns
#> 2            NA   Precheck columns
#> 3            NA   Precheck columns
#> 4            NA   Precheck columns
#> 5            NA   Precheck columns
#> 6            NA   Precheck columns
#> 7            NA      Check columns
#> 8            NA      Check columns
#> 9            NA      Precheck meta
#> 10           NA      Precheck meta
#> 11           NA      Precheck meta
#> 12           NA         Check meta
#> 13           NA         Check meta
#> 14           NA         Check meta
#> 15           NA         Check meta
#> 16           NA         Check meta
#> 17           NA         Check meta
#> 18           NA         Check meta
#> 19           NA         Check meta
#> 20           NA         Check meta
#> 21           NA         Check meta
#> 22           NA         Check meta
#> 23           NA         Check meta
#> 24           NA         Check meta
#> 25           NA         Check meta
#> 26           NA         Check meta
#> 27           NA      Precheck time
#> 28           NA      Precheck time
#> 29           NA      Precheck time
#> 30           NA         Check time
#> 31           NA         Check time
#> 32           NA Precheck geography
#> 33           NA      Check filters
#> 34           NA      Check filters
#> 35           NA      Check filters
#> 36           NA          Check API
#> 37           NA          Check API