Skip to contents

Validating model output files with validate_submission()

While most hubs will have automated validation systems set up to check contributions during submission, hubValidations also provides functionality for validating files locally before submitting them.

For this, submitting teams can use validate_submission() to validate their model output files prior to submitting.

The function takes a relative path, relative to the model output directory, as argument file_path, performs a series of standard validation checks and returns their results in the form of a hub_validations S3 class object.

hub_path <- system.file("testhubs/simple", package = "hubValidations")

validate_submission(hub_path,
  file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
)
#> ::notice ::✔ simple: All hub config files are valid.%0A✔ 2022-10-08-team1-goodmodel.csv: File exists at path%0A  model-output/team1-goodmodel/2022-10-08-team1-goodmodel.csv.%0A✔ 2022-10-08-team1-goodmodel.csv: File name "2022-10-08-team1-goodmodel.csv" is%0A  valid.%0A✔ 2022-10-08-team1-goodmodel.csv: File directory name matches `model_id`%0A  metadata in file name.%0A✔ 2022-10-08-team1-goodmodel.csv: `round_id` is valid.%0A✔ 2022-10-08-team1-goodmodel.csv: File is accepted hub format.%0A✔ 2022-10-08-team1-goodmodel.csv: Metadata file exists at path%0A  model-metadata/team1-goodmodel.yaml.%0A✔ 2022-10-08-team1-goodmodel.csv: File could be read successfully.%0A✔ 2022-10-08-team1-goodmodel.csv: `round_id_col` name is valid.%0A✔ 2022-10-08-team1-goodmodel.csv: `round_id` column "origin_date" contains a%0A  single, unique round ID value.%0A✔ 2022-10-08-team1-goodmodel.csv: All `round_id_col` "origin_date" values match%0A  submission `round_id` from file name.%0A✔ 2022-10-08-team1-goodmodel.csv: Column names are consistent with expected%0A  round task IDs and std column names.%0A✔ 2022-10-08-team1-goodmodel.csv: Column data types match hub schema.%0A✔ 2022-10-08-team1-goodmodel.csv: `tbl` contains valid values/value%0A  combinations.%0A✔ 2022-10-08-team1-goodmodel.csv: All combinations of task ID%0A  column/`output_type`/`output_type_id` values are unique.%0A✔ 2022-10-08-team1-goodmodel.csv: Required task ID/output type/output type ID%0A  combinations all present.%0A✔ 2022-10-08-team1-goodmodel.csv: Values in column `value` all valid with%0A  respect to modeling task config.%0A✔ 2022-10-08-team1-goodmodel.csv: Values in `value` column are non-decreasing%0A  as output_type_ids increase for all unique task ID value/output type%0A  combinations of quantile or cdf output types.%0Aℹ 2022-10-08-team1-goodmodel.csv: No pmf output types to check for sum of 1.%0A  Check skipped.%0A! 2022-10-08-team1-goodmodel.csv: Submission time must be within accepted%0A  submission window for round.  Current time 2024-02-29 14:00:33.053712 is%0A  outside window 2022-10-02 EDT--2022-10-09 23:59:59 EDT.

For more details on the structure of <hub_validations> objects, including how to access more information on individual checks, see vignette("hub-validations-class").

Validation early return

Some checks which are critical to downstream checks will cause validation to stop and return the results of the checks up to and including the critical check that failed early.

They generally return a <error/check_error> condition class object. Any problems identified will need to be resolved and the function re-run for validation to proceed further.

hub_path <- system.file("testhubs/simple", package = "hubValidations")

validate_submission(hub_path,
  file_path = "team1-goodmodel/2022-10-15-hub-baseline.csv"
)
#> ::notice ::✔ simple: All hub config files are valid.%0A✔ 2022-10-15-hub-baseline.csv: File exists at path%0A  model-output/team1-goodmodel/2022-10-15-hub-baseline.csv.%0A✔ 2022-10-15-hub-baseline.csv: File name "2022-10-15-hub-baseline.csv" is%0A  valid.%0A! 2022-10-15-hub-baseline.csv: File directory name must match `model_id`%0A  metadata in file name.  File should be submitted to directory "hub-baseline"%0A  not "team1-goodmodel"%0A✔ 2022-10-15-hub-baseline.csv: `round_id` is valid.%0A✔ 2022-10-15-hub-baseline.csv: File is accepted hub format.%0A✔ 2022-10-15-hub-baseline.csv: Metadata file exists at path%0A  model-metadata/hub-baseline.yml.%0A✔ 2022-10-15-hub-baseline.csv: File could be read successfully.%0A✔ 2022-10-15-hub-baseline.csv: `round_id_col` name is valid.%0A✔ 2022-10-15-hub-baseline.csv: `round_id` column "origin_date" contains a%0A  single, unique round ID value.%0A✖ 2022-10-15-hub-baseline.csv: All `round_id_col` "origin_date" values must%0A  match submission `round_id` from file name.  `round_id` value 2022-10-08 does%0A  not match submission `round_id` "2022-10-15"%0A! 2022-10-15-hub-baseline.csv: Submission time must be within accepted%0A  submission window for round.  Current time 2024-02-29 14:00:34.027268 is%0A  outside window 2022-10-02 EDT--2022-10-09 23:59:59 EDT.

Execution Errors

If an execution error occurs in any of the checks, an <error/check_exec_error> is returned instead. For validation purposes, this results in the same downstream effects as an <error/check_error> object.

Checking for errors with check_for_errors()

You can check whether your file will overall pass validation checks by passing the hub_validations object to check_for_errors().

If validation fails, an error will be thrown and the failing checks will be summarised.

validate_submission(hub_path,
  file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
) %>%
  check_for_errors()
#> ::notice ::! 2022-10-08-team1-goodmodel.csv: Submission time must be within accepted%0A  submission window for round.  Current time 2024-02-29 14:00:34.736806 is%0A  outside window 2022-10-02 EDT--2022-10-09 23:59:59 EDT.
#> Error in `check_for_errors()`:
#> ! 
#> The validation checks produced some failures/errors reported above.

Skipping the submission window check

If you are preparing your submission prior to the submission window opening, you might want to skip the submission window check. You can so by setting argument skip_submit_window_check to TRUE.

This results in the previous valid file (except for failing the validation window check) now passing overall validation.

validate_submission(hub_path,
  file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv",
  skip_submit_window_check = TRUE
) %>%
  check_for_errors()
#>  All validation checks have been successful.

validate_submission check details

Details of checks performed by validate_submission()
Name Check Early return Fail output Extra info
valid_config Hub config valid TRUE check_error
submission_time Current time within file submission window FALSE check_failure
file_exists File exists at file_path provided TRUE check_error
file_name File name valid TRUE check_error
file_location File located in correct team directory FALSE check_failure
round_id_valid File round ID is valid hub round IDs TRUE check_error
file_format File format is accepted hub/round format TRUE check_error
metadata_exists Model metadata file exists in expected location TRUE check_error
file_read File can be read without errors TRUE check_error
valid_round_id_col Round ID var from config exists in data column names. Skipped if round_id_from_var is FALSE in config. FALSE check_failure
unique_round_id Round ID column contains a single unique round ID. Skipped if round_id_from_var is FALSE in config. TRUE check_error
match_round_id Round ID from file contents matches round ID from file name. Skipped if round_id_from_var is FALSE in config. TRUE check_error
colnames File column names match expected column names for round (i.e. task ID names + hub standard column names) TRUE check_error
col_types File column types match expected column types from config. Mainly applicable to parquet & arrow files. FALSE check_failure
valid_vals Columns (excluding value column) contain valid combinations of task ID / output type / output type ID values TRUE check_error error_tbl: table of invalid task ID/output type/output type ID value combinations
rows_unique Columns (excluding value column) contain unique combinations of task ID / output type / output type ID values FALSE check_failure
req_vals Columns (excluding value column) contain all required combinations of task ID / output type / output type ID values FALSE check_failure missing_df: table of missing task ID/output type/output type ID value combinations
value_col_valid Values in value column are coercible to data type configured for each output type FALSE check_failure
value_col_non_desc Values in value column are non-decreasing as output_type_ids increase for all unique task ID /output type value combinations. Applies to quantile or cdf output types only FALSE check_failure error_tbl: table of rows affected
value_col_sum1 Values in the value column of pmf output type data for each unique task ID combination sum to 1. FALSE check_failure error_tbl: table of rows affected

Validating model metadata files with validate_model_metadata()

If you want to check a model metadata file before submitting, you can similarly use function validate_model_metadata().

The function takes model metadata file name as argument file_path, performs a series of validation checks and returns their results in the form of a hub_validations S3 class object.

validate_model_metadata(hub_path,
  file_path = "hub-baseline.yml"
)
#> ::notice ::✔ model-metadata-schema.json: File exists at path%0A  hub-config/model-metadata-schema.json.%0A✔ hub-baseline.yml: File exists at path model-metadata/hub-baseline.yml.%0A✔ hub-baseline.yml: Metadata file extension is "yml" or "yaml".%0A✔ hub-baseline.yml: Metadata file directory name matches "model-metadata".%0A✔ hub-baseline.yml: Metadata file contents are consistent with schema%0A  specifications.%0A✔ hub-baseline.yml: Metadata file name matches the `model_id` specified within%0A  the metadata file.

validate_model_metadata(hub_path,
  file_path = "team1-goodmodel.yaml"
)
#> ::notice ::✔ model-metadata-schema.json: File exists at path%0A  hub-config/model-metadata-schema.json.%0A✔ team1-goodmodel.yaml: File exists at path%0A  model-metadata/team1-goodmodel.yaml.%0A✔ team1-goodmodel.yaml: Metadata file extension is "yml" or "yaml".%0A✔ team1-goodmodel.yaml: Metadata file directory name matches "model-metadata".%0A✖ team1-goodmodel.yaml: Metadata file contents must be consistent with schema%0A  specifications.  - must have required property 'model_details' . - must NOT%0A  have additional properties; saw unexpected property 'models_details'. - must%0A  NOT have additional properties; saw unexpected property%0A  'ensemble_of_hub_models"'. - /include_ensemble must be boolean .

For more details on the structure of <hub_validations> objects, including how to access more information on individual checks, see vignette("hub-validations-class").

validate_model_metadata check details

Details of checks performed by validate_model_metadata()
Name Check Early return Fail output Extra info
metadata_schema_exists A model metadata schema file exists in hub-config directory. TRUE check_error
metadata_file_exists A file with name provided to argument file_path exists at the expected location (the model-metadata directory). TRUE check_error
metadata_file_ext The metadata file has correct extension (yaml or yml). TRUE check_error
metadata_file_location The metadata file has been saved to correct location. TRUE check_failure
metadata_matches_schema The contents of the metadata file match the hub’s model metadata schema TRUE check_error
metadata_file_name The metadata filename matches the model ID specified in the contents of the file. TRUE check_error