Introduction

This document shows how data set column definitions can be entered into a lookup file which can be accessed by multiple data specification files within a project. This document also discusses an internal lookup data base that is always available for individual data sets to look up standardized column information for commonly used data items in our workflow.

Create two files

The lookup file

The lookup file that is to be accessed by other data specification files is just another data specification file. For example, create a file called lookup.yml and enter this information.

# in file lookup.yml
AMT: 
  short: dose amount
  unit: nmol
  type: numeric
AMTMG: 
  short: dose amount
  unit: mg
  type: numeric
WT: 
  short: patient weight
  unit: lbs

This information must be valid yspec data specification format and (generally) valid yaml.

The specification file

This is just the standard data specification file

SETUP__:
  description: PKPD analysis data set
  lookup_file: lookup.yml
C: 
  short: comment character
AMT: !look

Notice two things about this file: we included a lookup_file section in the SETUP__ section and we referenced our lookup.yml file. By default, yspec expects that the lookup file is in the same directory as the spec file. Also, in the AMT column, we used the !look handler to indicate that we wanted that data to be looked up.

Alternatively, we could just pass in empty data and yspec will assume that you want to try to look up that data

SETUP__:
  description: PKPD analysis data set
  lookup_file: lookup.yml
C: 
  short: comment character
AMT:

Finally, we can import a column from the lookup file under a new name in the working spec

SETUP__:
  description: PKPD analysis data set
  lookup_file: lookup.yml
C: 
  short: comment character
AMT:
  lookup: AMTMG

In this snippet, we are asking for the AMTMG column from the lookup and bringing it in as AMT in the working spec.

Internal lookup data base

There is an internal data base of common data set columns that yspec will attach by default. So, with no lookup file defined, we could write the following in our specification file

SETUP__:
  description: PKPD analysis data set
  use_internal_db: true
C: 
AMT:
MDV:
EVID: 
WT:
EGFR:
ALB: !look
ZIP_CODE: 
  values: 55378

We can read this data in and have the columns defined

library(yspec)
library(dplyr)

spec <- ys_load(file)

spec
##  name     info unit          short             source       
##  C        cd-  .             comment character ysdb_internal
##  AMT      ---  .             dose amount       ysdb_internal
##  MDV      -d-  .             MDV               ysdb_internal
##  EVID     -d-  .             event ID          ysdb_internal
##  WT       ---  kg            weight            ysdb_internal
##  EGFR     ---  ml/min/1.73m2 eGFR              ysdb_internal
##  ALB      ---  g/dL          albumin           ysdb_internal
##  ZIP_CODE ---  .             ZIP_CODE          .

Tracking the lookup status of each column

This all can get confusing about where each column is coming from. You can audit the spec object and find you where a lookup event happened

## # A tibble: 8 × 2
##   col      lookup_source    
##   <chr>    <chr>            
## 1 C        ysdb_internal.yml
## 2 AMT      ysdb_internal.yml
## 3 MDV      ysdb_internal.yml
## 4 EVID     ysdb_internal.yml
## 5 WT       ysdb_internal.yml
## 6 EGFR     ysdb_internal.yml
## 7 ALB      ysdb_internal.yml
## 8 ZIP_CODE spec.yml

Here, we can see that most of the columns came from the internal data base and that the one column (ZIP_CODE) came by our own specification.

You can also re-create the lookup object (just a named list) for a specification object. Just click open the arrow to see the output.

## List of 23
##  $ C      :List of 6
##   ..$ short        : chr "comment character"
##   ..$ values       : chr [1:2] "." "C"
##   ..$ decode       : chr [1:2] "analysis row" "commented row"
##   ..$ type         : chr "character"
##   ..$ col          : chr "C"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ ID     :List of 4
##   ..$ short        : chr "subject identifier"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "ID"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ USUBJID:List of 4
##   ..$ short        : chr "unique subject identifier"
##   ..$ type         : chr "character"
##   ..$ col          : chr "USUBJID"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ SUBJ   :List of 4
##   ..$ short        : chr "subject identifier"
##   ..$ type         : chr "character"
##   ..$ col          : chr "SUBJ"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ STUDYID:List of 4
##   ..$ short        : chr "study identifier"
##   ..$ type         : chr "character"
##   ..$ col          : chr "STUDYID"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ CMT    :List of 4
##   ..$ short        : chr "compartment number"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "CMT"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ EVID   :List of 4
##   ..$ short        : chr "event ID"
##   ..$ values       :List of 2
##   .. ..$ observation: int 0
##   .. ..$ dose       : int 1
##   ..$ col          : chr "EVID"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ AMT    :List of 4
##   ..$ short        : chr "dose amount"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "AMT"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ RATE   :List of 4
##   ..$ short        : chr "infusion rate"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "RATE"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ II     :List of 4
##   ..$ short        : chr "inter-dose interval"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "II"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ SS     :List of 5
##   ..$ short        : chr "steady state indicator"
##   ..$ values       : int [1:2] 0 1
##   ..$ decode       : chr [1:2] "non-steady state indicator" "steady state indicator"
##   ..$ col          : chr "SS"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ MDV    :List of 6
##   ..$ values       :List of 2
##   .. ..$ non-missing: int 0
##   .. ..$ missing    : int 1
##   ..$ type         : chr "numeric"
##   ..$ long         : chr "missing DV indicator"
##   ..$ comment      : chr "per NONMEM specifications"
##   ..$ col          : chr "MDV"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ DV     :List of 4
##   ..$ short        : chr "dependent variable"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "DV"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ WT     :List of 5
##   ..$ short        : chr "weight"
##   ..$ unit         : chr "kg"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "WT"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ EGFR   :List of 5
##   ..$ short        : chr "eGFR"
##   ..$ long         : chr "estimated glomerular filtration rate"
##   ..$ unit         : chr "ml/min/1.73m2"
##   ..$ col          : chr "EGFR"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ BMI    :List of 5
##   ..$ long         : chr "body mass index"
##   ..$ unit         : chr "m2/kg"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "BMI"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ HT     :List of 5
##   ..$ about        : chr [1:2] "height" "cm"
##   ..$ long         : chr "Height"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "HT"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ ALB    :List of 6
##   ..$ long         : chr "serum albumin"
##   ..$ unit         : chr "g/dL"
##   ..$ short        : chr "albumin"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "ALB"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ AGE    :List of 4
##   ..$ about        : chr [1:2] "age" "years"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "AGE"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ SEX    :List of 3
##   ..$ values       :List of 2
##   .. ..$ male  : int 0
##   .. ..$ female: int 1
##   ..$ col          : chr "SEX"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ NUM    :List of 4
##   ..$ short        : chr "record number"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "NUM"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ BQL    :List of 5
##   ..$ short        : chr "data point below the LOQ"
##   ..$ type         : chr "numeric"
##   ..$ values       :List of 2
##   .. ..$ 0: chr "not below quantitation limit"
##   .. ..$ 1: chr "below quantitation limit"
##   ..$ col          : chr "BQL"
##   ..$ lookup_source: chr "ysdb_internal.yml"
##  $ LOQ    :List of 4
##   ..$ short        : chr "assay limit of quantification"
##   ..$ type         : chr "numeric"
##   ..$ col          : chr "LOQ"
##   ..$ lookup_source: chr "ysdb_internal.yml"