Structure of hub_validations class objects
Source:vignettes/articles/hub-validations-class.Rmd
hub-validations-class.Rmd
The high level validate_*()
family of functions all
return a <hub_validations>
S3 class object.
Structure of <hub_validations>
object
A hub_validations
object is effectively a list and
represents the collected output of the series of checks performed by a
higher level validate_*()
function.
Each named element of the list contains the result of an individual
check and inherits from subclass <hub_check>
. The
name of each element is the name of the check.
Let’s examine an example output of a model output file validation
using validate_submission()
.
hub_path <- system.file("testhubs/simple", package = "hubValidations")
v <- validate_submission(hub_path,
file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
)
str(v, max.level = 1)
#> List of 20
#> $ valid_config :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_exists :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_name :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_location :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ round_id_valid :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_format :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ metadata_exists :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_read :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ valid_round_id_col:List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ unique_round_id :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ match_round_id :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ colnames :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ col_types :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ valid_vals :List of 5
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ rows_unique :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ req_vals :List of 5
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ value_col_valid :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ value_col_non_desc:List of 5
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ value_col_sum1 :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_info" "hub_check" "rlang_message" "message" ...
#> $ submission_time :List of 4
#> ..- attr(*, "class")= chr [1:5] "check_failure" "hub_check" "rlang_warning" "warning" ...
#> - attr(*, "class")= chr [1:2] "hub_validations" "list"
The super class returned in each element depends on the status of the check:
If a check succeeds, a
<message/check_success>
condition class object is returned.If a check is skipped, a
<message/check_info>
condition class object is returned.Checks vary with respect to whether they return an
<error/check_error>
or<warning/check_failure>
condition class object if the check fails. Ultimately, both will cause overall validation to fail and the two classes are used primarily to communicate the severity of a failing check.
hub_validations
print method
hub_validations
objects have their own print method
which displays the result, the file name and message of each check:
-
✔
indicates a check was successful (a<message/check_success>
condition class object was returned) -
✖
indicates a high severity check failed (a<error/check_error>
condition class object was returned) -
!
indicates a lower severity check failed (a<warning/check_failure>
condition class object was returned) -
ℹ
indicates a check was skipped (a<message/check_info>
condition class object was returned)
v
#> ::notice ::✔ simple: All hub config files are valid.%0A✔ 2022-10-08-team1-goodmodel.csv: File exists at path%0A model-output/team1-goodmodel/2022-10-08-team1-goodmodel.csv.%0A✔ 2022-10-08-team1-goodmodel.csv: File name "2022-10-08-team1-goodmodel.csv" is%0A valid.%0A✔ 2022-10-08-team1-goodmodel.csv: File directory name matches `model_id`%0A metadata in file name.%0A✔ 2022-10-08-team1-goodmodel.csv: `round_id` is valid.%0A✔ 2022-10-08-team1-goodmodel.csv: File is accepted hub format.%0A✔ 2022-10-08-team1-goodmodel.csv: Metadata file exists at path%0A model-metadata/team1-goodmodel.yaml.%0A✔ 2022-10-08-team1-goodmodel.csv: File could be read successfully.%0A✔ 2022-10-08-team1-goodmodel.csv: `round_id_col` name is valid.%0A✔ 2022-10-08-team1-goodmodel.csv: `round_id` column "origin_date" contains a%0A single, unique round ID value.%0A✔ 2022-10-08-team1-goodmodel.csv: All `round_id_col` "origin_date" values match%0A submission `round_id` from file name.%0A✔ 2022-10-08-team1-goodmodel.csv: Column names are consistent with expected%0A round task IDs and std column names.%0A✔ 2022-10-08-team1-goodmodel.csv: Column data types match hub schema.%0A✔ 2022-10-08-team1-goodmodel.csv: `tbl` contains valid values/value%0A combinations.%0A✔ 2022-10-08-team1-goodmodel.csv: All combinations of task ID%0A column/`output_type`/`output_type_id` values are unique.%0A✔ 2022-10-08-team1-goodmodel.csv: Required task ID/output type/output type ID%0A combinations all present.%0A✔ 2022-10-08-team1-goodmodel.csv: Values in column `value` all valid with%0A respect to modeling task config.%0A✔ 2022-10-08-team1-goodmodel.csv: Values in `value` column are non-decreasing%0A as output_type_ids increase for all unique task ID value/output type%0A combinations of quantile or cdf output types.%0Aℹ 2022-10-08-team1-goodmodel.csv: No pmf output types to check for sum of 1.%0A Check skipped.%0A! 2022-10-08-team1-goodmodel.csv: Submission time must be within accepted%0A submission window for round. Current time 2024-02-29 14:00:20.195044 is%0A outside window 2022-10-02 EDT--2022-10-09 23:59:59 EDT.
Structure of a <hub_check>
object
Let’s look more closely at the structure of the first few elements of
the hub_validations
object retuned by
validate_submission()
v <- validate_submission(hub_path,
file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
)
str(head(v))
#> List of 6
#> $ valid_config :List of 4
#> ..$ message : chr "All hub config files are valid. \n "
#> ..$ where : chr "simple"
#> ..$ call : chr "check_config_hub_valid"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_exists :List of 4
#> ..$ message : chr "File exists at path \033[34mmodel-output/team1-goodmodel/2022-10-08-team1-goodmodel.csv\033[39m. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_exists"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_name :List of 4
#> ..$ message : chr "File name \033[34m\"2022-10-08-team1-goodmodel.csv\"\033[39m is valid. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_name"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_location :List of 4
#> ..$ message : chr "File directory name matches `model_id`\n metadata in file name. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_location"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ round_id_valid:List of 4
#> ..$ message : chr "`round_id` is valid. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_valid_round_id"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
#> $ file_format :List of 4
#> ..$ message : chr "File is accepted hub format. \n "
#> ..$ where : chr "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
#> ..$ call : chr "check_file_format"
#> ..$ use_cli_format: logi TRUE
#> ..- attr(*, "class")= chr [1:5] "check_success" "hub_check" "rlang_message" "message" ...
Each <hub_check>
objects contains the following
elements:
-
message
: the result message containing details about the check. -
where:
: there the check was performed, usually the model output file name. -
call
: the function used to perform the check. -
use_cli_format
: whether the message is formatted using cli format, almost always TRUE.
Extra information
Some <hub_check>
objects contain extra information
about the failing check to help identify affected rows in
submissions.
For example, the <hub_check>
object returned for
the valid_vals
check, which checks that all columns in a
model output file (excluding the value
column) contain
valid combinations of task ID / output type / output type ID values
contains an additional element called error_tbl
, with
details of the invalid value combinations in the rows affected.
To access error_tbl
from the output of
validate_submission()
stored in an object v
,
you would use:
v$valid_vals$error_tbl