Use yspec to document analysis data sets and utilize data attributes in a modeling and simulation workflow.
You can install the development version of yspec from GitHub with:
# install.packages("devtools")
devtools::install_github("metrumresearchgroup/yspec")
Before or while programming an analysis data set, write out data definitions in yaml format:
library(yspec)
library(dplyr)
library(tidyr)
readLines("inst/internal/analysis1.yml")[1:20] %>% writeLines()
:
. SETUP__: Example PopPK analysis data set
. description: example-project
. sponsor: EXAMPK1011F
. projectnumber: true
. use_internal_db:
. glue: "kg/m$^2$"
. bmiunits:
. flags: [AGE:SCR, HT, AST:ALT]
. covariate: "look.yml"
. lookup_file: "analysis1-ext.yml"
. extend_file:
. C:
. NUM:
. ID: !look
. SUBJ: !look
. TIME: time after first dose
. label: hour
. unit:
. SEQ: data type . label
Now use yspec to read that into a object in R
spec <- ys_load("inst/internal/analysis1.yml")
Query this object to get a sense of what is in the data overall
head(spec)
. name info unit short source1 C cd- . comment character ysdb_internal
. 2 NUM --- . record number ysdb_internal
. 3 ID --- . subject identifier ysdb_internal
. 4 SUBJ c-- . subject identifier ysdb_internal
. 5 TIME --- hour TIME look
. 6 SEQ -d- . SEQ .
. 7 CMT --- . compartment number ysdb_internal
. 8 EVID -d- . event ID ysdb_internal
. 9 AMT --- mg dose amount ysdb_internal
. 10 DV --- micrograms/L dependent variable ysdb_internal .
or on a column by column basis for continuous data
$WT
spec
. name value
. col WT
. type numeric
. short weight
. unit kg 40 to 100 . range
as well as categorical data
$BLQ
spec
. name value
. col BLQ
. type numeric
. short below limit of quantification0 : above QL
. value 1 : below QL .
And we can render a define.pdf
file as well
This section illustrates a few examples for how yspec might be used (other than creating define.pdf
).
To make it easier to get started with yspec, we’ve included example data and corresponding yspec object in the package
data <- ys_help$data()
spec <- ys_help$spec()
When you have discrete data, “decodes” can be provided and used to create factors in the data. We have that for the RF column
$RF
spec
. name value
. col RF
. type character function stage
. short renal : Normal
. value norm : Mild
. mild : Moderate
. mod : Severe . sev
Now we’ll have a column called RF_f
which is a factor version of RF
ys_add_factors(data, spec, RF) %>% count(RF, RF_f)
. RF RF_f n1 mild Mild 360
. 2 mod Moderate 360
. 3 norm Normal 3280
. 4 sev Severe 360 .
Every column can have a “short” name; for WT
it is
$WT$short
spec1] "weight" . [
Every continuous data can also have a unit; again for WT
$WT$unit
spec1] "kg" . [
We use the spec to “recode” using this information. First create a data summary in long format
<-
summ %>%
data select(WT, ALB, AGE) %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise(Mean = mean(value), Sd = sd(value))
summ # A tibble: 3 × 3
.
. name Mean Sd<chr> <dbl> <dbl>
. 1 AGE 33.8 8.60
. 2 ALB 4.30 0.707
. 3 WT 70.9 12.8 .
To recode, pull the information from spec
%>%
summ mutate(name = ys_recode(name, spec, unit = TRUE, title_case = TRUE))
# A tibble: 3 × 3
.
. name Mean Sd<chr> <dbl> <dbl>
. 1 Age (years) 33.8 8.60
. 2 Albumin (g/dL) 4.30 0.707
. 3 Weight (kg) 70.9 12.8 .
There are several functions for working on your yspec object
Select some columns
<- ys_select(spec, WT, BMI, HT)
body_size
body_size
. name info unit short source --- kg weight ysdb_internal
. WT --- m2/kg BMI ysdb_internal
. BMI --- cm height ysdb_internal . HT
Filter based on some flags
that were set
ys_filter(spec, covariate)
. name info unit short source --- years age ysdb_internal
. AGE --- kg weight ysdb_internal
. WT --- ml/min CRCL .
. CRCL --- g/dL albumin ysdb_internal
. ALB --- m2/kg BMI ysdb_internal
. BMI --- mg/dL alpha-1-acid glycoprotein .
. AAG --- mg/dL serum creatinine .
. SCR --- . aspartate aminotransferase .
. AST --- . alanine aminotransferase .
. ALT --- cm height ysdb_internal
. HT -d- . Child-Pugh score look . CP
Rename
%>% ys_select(BWT = WT, AGE, SCR)
spec
. name info unit short source --- kg weight ysdb_internal
. BWT --- years age ysdb_internal
. AGE --- mg/dL serum creatinine . . SCR
An analysis project typically has several data sets that can be documented together. We make a project like this
pk <- ys_load("inst/spec/DEM104101F_PK.yml")
pkpd <- ys_load("inst/spec/DEM104101F_PKPD.yml")
ae <- ys_load("inst/spec/DEM104101F_AE.yml")
<- ys_project(pk, pkpd, ae)
proj
proj: ABC101104F
. projectnumber: ABC-Pharma
. sponsor--------------------------------------------
. :
. datafiles
. name description data_stem
. DEM104101F_PK Population PK analysis data set DEM104101F_PK
. DEM104101F_PKPD Population PKPD analysis data set DEM104101F_PKPD . DEM104101F_AE AE analysis data set DEM0104101F_AE_2
This object can be rendered into a single define.pdf
document.