Syntax

yspec uses standard yaml syntax to state the data set column definitions.

NOTES

  • Use double-quotes around any value you want to be a string
  • true and yes by themselves will be rendered as TRUE; use "yes" if you need that word by itself as a value for a field
  • false and no by themselves will be returned as FALSE; use "no" if you need that word by itself as a value for a field
  • If a string starts with a piece of punctuation, make sure to put the entire string in double quotes
    • Use short: "> QL" not short: > QL
    • Use values: [".", C] not values: [.,C]
    • Use address: "123 Main St." not address: 123 Main St.

Instructions for including TeX in the yaml specification code are provided in a section below.

Organization

Save your data specification code in a file, typically with a .yaml file extension.

At the top of the file, include a block called SETUP__:; this is where the data set meta data is stored. For example

See the details below for other files that can be included here.

Next, list each data set column in order, with the data column name starting in the first column and ending with a colon. For example:

This specifies a “short” name for this column as well as a unit and a range. A complete listing is provided below.

You can see an fully worked example by running

ys_help$yaml()

See the ?ys_help help topic for more information.

Or, you can export a collection of package assets with this command

ys_help$export(output="assets")

See the [ys_help] topic for more information.

SETUP__ specification fields

  • description: <character>; a short description of the data set
  • projectnumber: <character> the project reference number; may be incorporated into rendered define documents; when the project number is given in the first yspec object in a project object, that project number will be rendered in the project-wide define document
  • sponsor: <character> the project sponsor; when the project sponsor is given in the first yspec object in a project object, that project sponsor name will be rendered in the project-wide define document
  • lookup_file: <character>; a yaml array of other yaml files where yspec will look for column lookup information
  • use_internal_db: <logical> (true/false); if true, then yspec will load the internal column lookup database
  • glue: <map>; specify name/value pairs; in the yaml data specification, use <<name>> in the text and value will glued into the text after it has been sanitized; intended use is to allow LaTeX code to evade the sanitizer

Data column specification fields

  • short: short-name
    • a short name for the column; don’t include unit here
  • unit: numeric
    • the unit of measure
  • range: [min-value, max-value]
    • indicates continuous data
  • values: [val1, val2, valn]
    • specify each valid value
    • indicates discrete data
  • values: {decode1: val1, decode22: val2}
    • put the decode into the specification of values using a default yaml map structure (the decode is to the left of the :)
    • Note the curly brackets, not square brackets
  • decode: [decode1, decode2, decode3]
    • Separate the decode from the values specification
    • See example below for clearer way to input very long decodes
  • longvalues: true
    • print the values in a (long) yaml-formatted list
  • comment: just whatever you want to say
  • comment: > say something on multiple lines of text
  • source: ADSL.xpt
    • Where the data came from
    • include both the sdtm domain and variable name
  • about: [short-name, unit]
    • This is a convenience structure.
  • long: a longer name to describe the column
  • dots:
    • A named list of whatever you want to carry along in the object; the dots list isn’t used by any rendering function in the yspec package, but might be used by a custom rendering function
  • axis:
    • A short-ish name that can be used for axis titles for plots
    • Generally, don’t include unit; yspec helpers will add that automatically by default
    • If short will work for your axis title (as it is … with no modification), yspec will use that if no axis field is used
  • type:
    • Can be numeric, character, or integer
    • This is optional; the default is numeric
  • lookup:
    • Logical; if true then the definition for the column is looked up in the lookup_files (specified in SETUP__:)
    • Use the !look handler to indicate lookup

Defaults

  • If type is not given, then it will default to numeric

Examples

Continuous values

  • The about array provides a short name and unit
  • Any time range is given, the data is assumed to be continuous

This is equivalent to

Character data

  • Using values indicates discrete data

Any other array input structure can be used. For example

By default, values are printed as comma-separated list. To get them to print in long format

Discrete data with decode

Method 1

  • Notice that the yaml key can only be simple character value
  • Also, we use curly braces to specify a list like this
  • Finally it is a : that separates decode (on the left) and the value (on the right).

Special handlers are available that add some flexibility to this value / decode specification.

The !value:decode handler allows you to put the value on the left and decode on the right

The default behavior can be achieved with

The handlers also allow associating multiple values with a single decode

To get multiple values with the same decode

Method 2

  • These are more complicated decodes
  • Put the values and deode in brackets (array)

Method 3 Really, it’s the same as method 2, but easier to type and read when the decode gets really long

Look up column definition

Either fill in the lookup field or use the !look handler

You can also give the column name to import

In this example, there would be a column called HT_INCHES in the lookup file that would be imported under the name HT.

Include TeX in data specification document

Most define documents get rendered via xtable and the text gets processed by a sanitize function. yspec implements a custom sanitize function called ys_sanitize, which is similar to xtable::sanitize, but whitelists some symbols so they do not get sanitized.

To protect TeX code from the sanitizer, first create a field in SETUP_ called glue with a map between a name and some corresponding TeX code. In the following example, we with to write \(\mu\)g/L, so we create a name called mugL and map it to $\\mu$g/L:

Once the map is in place, we can write the data set column definition like this:

When the table for the define document is rendered, first the sanitizer will run, but it won’t find anything in the unit field for the DV column. Then yspec will call glue() and replace <<mugL>> with $\\mu%g/L.

Notice that we put all of the values in quotes; this is good practice to ensure that yaml will parse the value as a character data item when reading in the spec.