See the check details below.

ys_check(
  data,
  spec,
  verbose = FALSE,
  output = tempfile(),
  error_on_fail = TRUE
)

ys_check_file(data, file)

check_data(...)

check_data_file(...)

Arguments

data

a data frame

spec

a yspec object

verbose

logical; if TRUE, extra messages are printed during the check

output

the name of a file or a connection for writing check results

error_on_fail

if FALSE, return logical check status rather than generating an error

file

the full path to a yaml specification file

...

arguments passed from alias function to preferred function name

Details

To pass the data check, all of the following must be true:

  1. The (column) names in the data set must be identical to the names in the spec object.

  2. For discrete data types (where values is set), the unique values in the data set column after removing missing values must be identical to or a subset of the values given in the spec object.

  3. For continuous data types where a range is given, all of the values in the data set column must be greater than the lower bound of the range and less than the upper bound of the range, inclusive, after removing missing values.

Other checks are implicit in the data specification object and are checked on load:

  1. All column names must be less than or equal to 8 characters by default. This maximum number of characters can be overridden by setting option ys.col.len equal to the desired maximum.

Output can be directed to a file (see the output argument) and more verbose output can be requested as the check proceeds by the verbose argument.

Examples


data <- ys_help$data()
spec <- ys_help$spec()

# Recommend running this at the end of data assembly and will fix an error
# stating that the data cols are not sorted according to the spec
data <- dplyr::select(data,names(spec))

ys_check(data,spec)
#> The data set passed all checks.