Extract data from a yspec object

Introduction

This vignette shows you how to extract information from a yspec object.

Set up

knitr::opts_chunk$set(comment = '.')

library(purrr)
library(dplyr)
library(yspec)

Basic

First, recall that the yspec object is just a list. We can use map to get information out of that list. For example, to find the types of all the columns in the data set use purrr:::map

spec <- ys_help$spec()

map(spec, "type")[1:5]

. $C
. [1] "character"
. 
. $NUM
. [1] "numeric"
. 
. $ID
. [1] "numeric"
. 
. $SUBJ
. [1] "character"
. 
. $TIME
. [1] "numeric"

If we wanted some column information in csv format we could also do

map(spec, ~ paste(.x[["col"]],.x[["short"]],.x[["type"]],sep=','))[1:3]

. $C
. [1] "C,comment character,character"
. 
. $NUM
. [1] "NUM,record number,numeric"
. 
. $ID
. [1] "ID,subject identifier,numeric"

There are other helper functions that do more than just pull a field. We can extract a label and form a unit.

Get units

Call ys_get_unit to get a named list of units. When the parens argument is true, it will put parentheses around the unit. Notice that the output here is different than if we were to only map across the list and ask for the unit field. Many columns don’t have units and we’d get NULL back if we just mapped. The ys_get_unit function subs in blank character data items when unit is NULL.

ys_get_unit(spec, parens = TRUE) %>% unlist()

.                C              NUM               ID             SUBJ 
.               ""               ""               ""               "" 
.             TIME              SEQ              CMT             EVID 
.         "(hour)"               ""               ""               "" 
.              AMT               DV              AGE               WT 
.           "(mg)" "(micrograms/L)"        "(years)"           "(kg)" 
.             CRCL              ALB              BMI              AAG 
.       "(ml/min)"         "(g/dL)"        "(m2/kg)"        "(mg/dL)" 
.              SCR              AST              ALT               HT 
.        "(mg/dL)"               ""               ""           "(cm)" 
.               CP             TAFD              TAD             LDOS 
.               ""        "(hours)"        "(hours)"           "(mg)" 
.              MDV              BLQ            PHASE            STUDY 
.               ""               ""               ""               "" 
.               RF 
.               ""

Get labels

We can also get labels

ys_get_label(spec)[1:3]

. $C
. [1] "comment character"
. 
. $NUM
. [1] "record number"
. 
. $ID
. [1] "subject identifier"

Get short

We can also get the short name

ys_get_short(spec)[1:3]

. $C
. [1] "comment character"
. 
. $NUM
. [1] "record number"
. 
. $ID
. [1] "subject identifier"

Get short with unit

This function returns a string with both the unit and short

ys_get_short_unit(spec, parens = TRUE)[1:5] %>% as.list

. $C
. [1] "comment character"
. 
. $NUM
. [1] "record number"
. 
. $ID
. [1] "subject identifier"
. 
. $SUBJ
. [1] "subject identifier"
. 
. $TIME
. [1] "TIME (hour)"

Meta information

The information you entered in SETUP__ as well as other meta information is stored as a list in the meta attribute of the yspec list. It can be accessed with get_meta. This returns a list with the various pieces of meta data.

get_meta(spec)[1:3]

. $description
. [1] "Example PopPK analysis data set"
. 
. $sponsor
. [1] "example-project"
. 
. $projectnumber
. [1] "EXAMPK1011F"

You an pull a single piece of meta data with

pull_meta(spec, "projectnumber")

. [1] "EXAMPK1011F"

What is in the meta data?

m <- get_meta(spec)

names(m)

.  [1] "description"     "sponsor"         "projectnumber"   "use_internal_db"
.  [5] "glue"            "flags"           "lookup_file"     "extend_file"    
.  [9] "spec_file"       "spec_path"       "name"            "data_stem"      
. [13] "data_path"       "primary_key"     "control"         "namespace"

Some descriptions:

spec_file: the yaml source file for the specification information
spec_path: the directory where the yaml source file is located
name: the stem of the yaml source file
data_stem: the stem that is used and it is assumed that data sets will be written using this stem (for example <data_stem>.csv etc).
data_path: the assumed derived data directory

See also reference.html for descriptions of meta fields.

2022-05-19