Introduction

This vignette shows you how to add labels to the columns of a data set.

Set up

knitr::opts_chunk$set(comment = '.')

Load specification object and data set

We’ll use the examples provided in the package

data <- ys_help$data()
spec <- ys_help$spec()

The data

as_tibble(data)
. 
[38;5;246m# A tibble: 4,360 × 29
[39m
.    C       NUM    ID  SUBJ  TIME   SEQ   CMT  EVID   AMT    DV   AGE    WT  CRCL
.    
[3m
[38;5;246m<lgl>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<dbl>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<dbl>
[39m
[23m 
[3m
[38;5;246m<dbl>
[39m
[23m 
[3m
[38;5;246m<dbl>
[39m
[23m 
[3m
[38;5;246m<dbl>
[39m
[23m
. 
[38;5;250m 1
[39m 
[31mNA
[39m        1     1     1  0        0     1     1     5   0    28.0  55.2  114.
. 
[38;5;250m 2
[39m 
[31mNA
[39m        2     1     1  0.61     1     2     0    
[31mNA
[39m  61.0  28.0  55.2  114.
. 
[38;5;250m 3
[39m 
[31mNA
[39m        3     1     1  1.15     1     2     0    
[31mNA
[39m  91.0  28.0  55.2  114.
. 
[38;5;250m 4
[39m 
[31mNA
[39m        4     1     1  1.73     1     2     0    
[31mNA
[39m 122.   28.0  55.2  114.
. 
[38;5;250m 5
[39m 
[31mNA
[39m        5     1     1  2.15     1     2     0    
[31mNA
[39m 126.   28.0  55.2  114.
. 
[38;5;250m 6
[39m 
[31mNA
[39m        6     1     1  3.19     1     2     0    
[31mNA
[39m  84.7  28.0  55.2  114.
. 
[38;5;250m 7
[39m 
[31mNA
[39m        7     1     1  4.21     1     2     0    
[31mNA
[39m  62.1  28.0  55.2  114.
. 
[38;5;250m 8
[39m 
[31mNA
[39m        8     1     1  5.09     1     2     0    
[31mNA
[39m  49.1  28.0  55.2  114.
. 
[38;5;250m 9
[39m 
[31mNA
[39m        9     1     1  6.22     1     2     0    
[31mNA
[39m  64.2  28.0  55.2  114.
. 
[38;5;250m10
[39m 
[31mNA
[39m       10     1     1  8.09     1     2     0    
[31mNA
[39m  59.6  28.0  55.2  114.
. 
[38;5;246m# … with 4,350 more rows, and 16 more variables: ALB <dbl>, BMI <dbl>,
[39m
. 
[38;5;246m#   AAG <dbl>, SCR <dbl>, AST <dbl>, ALT <dbl>, HT <dbl>, CP <int>, TAFD <dbl>,
[39m
. 
[38;5;246m#   TAD <dbl>, LDOS <int>, MDV <int>, BLQ <int>, PHASE <int>, STUDY <int>,
[39m
. 
[38;5;246m#   RF <chr>
[39m

The spec

spec
.  name  info unit         short                         source       
.  C     cd-  .            comment character             ysdb_internal
.  NUM   ---  .            record number                 ysdb_internal
.  ID    ---  .            subject identifier            ysdb_internal
.  SUBJ  c--  .            subject identifier            ysdb_internal
.  TIME  ---  hour         TIME                          look         
.  SEQ   -d-  .            SEQ                           .            
.  CMT   ---  .            compartment number            ysdb_internal
.  EVID  -d-  .            event ID                      ysdb_internal
.  AMT   ---  mg           dose amount                   ysdb_internal
.  DV    ---  micrograms/L dependent variable            ysdb_internal
.  AGE   ---  years        age                           ysdb_internal
.  WT    ---  kg           weight                        ysdb_internal
.  CRCL  ---  ml/min       CRCL                          .            
.  ALB   ---  g/dL         albumin                       ysdb_internal
.  BMI   ---  m2/kg        BMI                           ysdb_internal
.  AAG   ---  mg/dL        alpha-1-acid glycoprotein     .            
.  SCR   ---  mg/dL        serum creatinine              .            
.  AST   ---  .            aspartate aminotransferase    .            
.  ALT   ---  .            alanine aminotransferase      .            
.  HT    ---  cm           height                        ysdb_internal
.  CP    -d-  .            Child-Pugh score              look         
.  TAFD  ---  hours        time after first dose         .            
.  TAD   ---  hours        time after dose               .            
.  LDOS  ---  mg           last dose amount              .            
.  MDV   -d-  .            MDV                           ysdb_internal
.  BLQ   -d-  .            below limit of quantification .            
.  PHASE ---  .            study phase indicator         .            
.  STUDY -d-  .            study number                  .            
.  RF    cd-  .            renal function stage          .

Use ys_add_labels

data <- ys_add_labels(data,spec)

It isn’t obvious that anything was done here

as_tibble(data)
. 
[38;5;246m# A tibble: 4,360 × 29
[39m
.    C       NUM    ID  SUBJ  TIME   SEQ   CMT  EVID   AMT    DV   AGE    WT  CRCL
.    
[3m
[38;5;246m<lgl>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<dbl>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<int>
[39m
[23m 
[3m
[38;5;246m<dbl>
[39m
[23m 
[3m
[38;5;246m<dbl>
[39m
[23m 
[3m
[38;5;246m<dbl>
[39m
[23m 
[3m
[38;5;246m<dbl>
[39m
[23m
. 
[38;5;250m 1
[39m 
[31mNA
[39m        1     1     1  0        0     1     1     5   0    28.0  55.2  114.
. 
[38;5;250m 2
[39m 
[31mNA
[39m        2     1     1  0.61     1     2     0    
[31mNA
[39m  61.0  28.0  55.2  114.
. 
[38;5;250m 3
[39m 
[31mNA
[39m        3     1     1  1.15     1     2     0    
[31mNA
[39m  91.0  28.0  55.2  114.
. 
[38;5;250m 4
[39m 
[31mNA
[39m        4     1     1  1.73     1     2     0    
[31mNA
[39m 122.   28.0  55.2  114.
. 
[38;5;250m 5
[39m 
[31mNA
[39m        5     1     1  2.15     1     2     0    
[31mNA
[39m 126.   28.0  55.2  114.
. 
[38;5;250m 6
[39m 
[31mNA
[39m        6     1     1  3.19     1     2     0    
[31mNA
[39m  84.7  28.0  55.2  114.
. 
[38;5;250m 7
[39m 
[31mNA
[39m        7     1     1  4.21     1     2     0    
[31mNA
[39m  62.1  28.0  55.2  114.
. 
[38;5;250m 8
[39m 
[31mNA
[39m        8     1     1  5.09     1     2     0    
[31mNA
[39m  49.1  28.0  55.2  114.
. 
[38;5;250m 9
[39m 
[31mNA
[39m        9     1     1  6.22     1     2     0    
[31mNA
[39m  64.2  28.0  55.2  114.
. 
[38;5;250m10
[39m 
[31mNA
[39m       10     1     1  8.09     1     2     0    
[31mNA
[39m  59.6  28.0  55.2  114.
. 
[38;5;246m# … with 4,350 more rows, and 16 more variables: ALB <dbl>, BMI <dbl>,
[39m
. 
[38;5;246m#   AAG <dbl>, SCR <dbl>, AST <dbl>, ALT <dbl>, HT <dbl>, CP <int>, TAFD <dbl>,
[39m
. 
[38;5;246m#   TAD <dbl>, LDOS <int>, MDV <int>, BLQ <int>, PHASE <int>, STUDY <int>,
[39m
. 
[38;5;246m#   RF <chr>
[39m

How can you tell that the labels were added?

labs <- map(data, attr, "label")

labs[1:5]
. $C
. [1] "comment character"
. 
. $NUM
. [1] "record number"
. 
. $ID
. [1] "subject identifier"
. 
. $SUBJ
. [1] "subject identifier"
. 
. $TIME
. [1] "time after first dose"

Or do this

str(data)

Where does label come from?

Ideally, we’d like to be writing in a label entry for every column in the data set. You can set the ys.require.label option to TRUE to require this when loading the spec (an error will be generated).

But yspec has a function called ys_get_label that will form a label for you. Here are the rules:

  1. If label exists for a column, it will be used
  2. Otherwise, if long is found and it is <= 40 characters, it be used
  3. Otherwise, short will be used; reminder that short defaults to the column name (col) too

Let’s look at some examples:

ys_get_label(spec)[1:3]
. $C
. [1] "comment character"
. 
. $NUM
. [1] "record number"
. 
. $ID
. [1] "subject identifier"
ys_get_label(spec$NUM)
. [1] "record number"
spec$NUM$label
. NULL
spec$C$label
. NULL

Custom label formation

Just as an example, we can add a custom labeling function. For example, I want the label to be the column name.

Set up a function that takes the column data as the first argument

label_fun <- function(x,...) x[["col"]]

Now, pass that function into ys_add_labels

data <- ys_add_labels(data, spec, fun = label_fun)

And check the output

map(data, attr, "label")[1:5]
. $C
. [1] "C"
. 
. $NUM
. [1] "NUM"
. 
. $ID
. [1] "ID"
. 
. $SUBJ
. [1] "SUBJ"
. 
. $TIME
. [1] "TIME"

Extract the label field

Recall that the yspec object is just a list. We can always map across that list and grab the label field:s

map(spec, "label")[1:5]
. $C
. NULL
. 
. $NUM
. NULL
. 
. $ID
. NULL
. 
. $SUBJ
. NULL
. 
. $TIME
. [1] "time after first dose"