getting-started.Rmd
This “Getting Started with rbabylon” vignette takes the user through some basic scenarios for modeling with NONMEM using rbabylon
, introducing you to its standard workflow and functionality.
rbabylon
is an R interface for running babylon
, which is (will be) a complete solution for managing projects involving modeling and simulation with a number of software solutions used in pharmaceutical sciences. Currently, only NONMEM is supported, so this vignette will only address that.
The first time you use rbabylon
on a system or disk, you must first install babylon
, often aliased as bbi
. The babylon
executable is a single binary file that can be easily installed with:
rbabylon::use_bbi("/data/apps/")
The use_bbi()
function takes the directory into which you want to install bbi
. Note that this can be pointed to anywhere on your system; installing to /data/apps/bbi
is merely a convention used at Metrum Research Group, where rbabylon
was developed.
Once bbi
is installed, you will need to make sure rbabylon
knows where to find it. This can be done by setting the rbabylon.bbi_exe_path
option like so:
options("rbabylon.bbi_exe_path" = "/data/apps/bbi")
Please note that this must be set for every new R session, so it’s recommended to include the above snippet in your .Rprofile, replacing the "/data/apps/bbi"
with the absolute path from your system/project, if you have installed it somewhere different.
If you’re not sure if you have babylon
installed, you can use bbi_version()
to check. This will return the version number of your installation, if one is found, or an empty string if you do not have babylon
installed. You can also use bbi_current_release()
to see the most current version available and run use_bbi()
as specified above if you want to update.
You can also check the babylon documentation for manual installation instructions from the command line.
To actually submit models with bbi
, you will also need a babylon.yaml
configuration file. Think of this as containing “global default settings” for your bbi
runs. Those settings will not be discussed here, but know that they can be modified globally by editing that file, or model-by-model as described in the “Passing arguments to bbi” section below.
The bbi_init()
function will create a babylon.yaml
, with the default settings, in the specified directory. Optionally, you can pass the path to a babylon.yaml
to the .config_path
argument of any rbabylon
function that needs one. However, by default these functions will look for one in the same folder as the model being submitted or manipulated. Therefore, if you will have a number of models in the same folder, it is easiest to put a babylon.yaml
that folder.
MODEL_DIR <- "../model/nonmem/basic" # this should be relative to your working directory
bbi_init(.dir = MODEL_DIR, # the directory to create the babylon.yaml in
.nonmem_dir = "/opt/NONMEM", # location of NONMEM installation
.nonmem_version = "nm74gf") # default NONMEM version to use
Here we define a define variable MODEL_DIR
with the path to the directory containing our models. Notice that you must also pass the path to a working installation of NONMEM, and a default NONMEM version to use.
Note this only needs to be done once for each folder you are modeling in. Once the babylon.yaml
exists, you will not need to run bbi_init()
again unless you want to create another one; for example if you move to modeling in a different directory, or if you want a different set of default settings for some specific subset of models.
To begin modeling, first create a model object using new_model()
. This is an S3 object which you will pass to all of the other rbabylon
functions that submit, summarize, or manipulate models.
The first argument (.path
) must be the path to your model control stream without file extension. For instance, the call below assumes you have a control stream named 1.ctl
or 1.mod
in the directory you just assigned to MODEL_DIR
above.
This new_model()
call will also create a 1.yaml
file in that same directory, which stores model metadata like description and tags (discussed below). If you ever need to recreate this model object in memory, just run mod1 <- read_model(file.path(MODEL_DIR, 1))
to rebuild it from the YAML file on disk.
The model object you have just created can be passed to various functions which you will see in a minute. Now that we’ve created a model, the first thing we will do is submit the model to be run.
mod1 %>% submit_model()
This will return a process object. We won’t discuss this object in this vignette, but it contains some information about the submission call. Please note that checking on a model run in progress is not fully implemented. For now, users should check on their runs manually (by looking at the output directory) and only proceed to the next steps once it has successfully completed.
There are a number of arguments that bbi
can take to modify how models are run. You can print a list of available arguments using the print_bbi_args()
helper function.
print_bbi_args()
#> additional_post_work_envs (character) -- Any additional values (as ENV KEY=VALUE) to provide for the post execution environment (sets CLI flag `--additional_post_work_envs`)
#> background (logical) -- RAW NMFE OPTION - Tells nonmem not to scan StdIn for control characters (sets CLI flag `--background`)
#> clean_lvl (numeric) -- clean level used for file output from a given (set of) runs (default 1) (sets CLI flag `--clean_lvl`)
#> config (character) -- Path (relative or absolute) to another babylon.yaml to load (sets CLI flag `--config`)
#> copy_lvl (numeric) -- copy level used for file output from a given (set of) runs (sets CLI flag `--copy_lvl`)
#> debug (logical) -- debug mode (sets CLI flag `--debug`)
#> delay (numeric) -- Selects a random number of seconds between 1 and this value to stagger / jitter job execution. Assists in dealing with large volumes of work dealing with the same data set. May avoid NMTRAN issues about not being able read / close files (sets CLI flag `--delay`)
#> ext_file (character) -- name of custom ext-file (sets CLI flag `--ext-file`)
#> git (logical) -- whether git is used (sets CLI flag `--git`)
#> json (logical) -- json tree of output, if possible (sets CLI flag `--json`)
#> licfile (character) -- RAW NMFE OPTION - Specify a license file to use with NMFE (Nonmem) (sets CLI flag `--licfile`)
#> log_file (character) -- If populated, specifies the file into which to store the output / logging details from Babylon (sets CLI flag `--log_file`)
#> maxlim (numeric) -- RAW NMFE OPTION - Set the maximum values set for the buffers used by Nonmem (default 100) (sets CLI flag `--maxlim`)
#> mpi_exec_path (character) -- The fully qualified path to mpiexec. Used for nonmem parallel operations (default '/usr/local/mpich3/bin/mpiexec') (sets CLI flag `--mpi_exec_path`)
#> nm_version (character) -- Version of nonmem from the configuration list to use (sets CLI flag `--nm_version`)
#> nm_qual (logical) -- Whether or not to execute with nmqual (autolog.pl (sets CLI flag `--nmqual`)
#> nobuild (logical) -- RAW NMFE OPTION - Skips recompiling and rebuilding on nonmem executable (sets CLI flag `--nobuild`)
#> no_ext_file (logical) -- do not use ext file (sets CLI flag `--no-ext-file`)
#> no_grd_file (logical) -- do not use grd file (sets CLI flag `--no-grd-file`)
#> no_shk_file (logical) -- do not use shk file (sets CLI flag `--no-shk-file`)
#> overwrite (logical) -- Whether or not to remove existing output directories if they are present (sets CLI flag `--overwrite`)
#> parafile (character) -- Location of a user-provided parafile to use for parallel execution (sets CLI flag `--parafile`)
#> parallel (logical) -- Whether or not to run nonmem in parallel mode (sets CLI flag `--parallel`)
#> parallel_timeout (numeric) -- The amount of time to wait for parallel operations in nonmem before timing out (default 2147483647) (sets CLI flag `--parallel_timeout`)
#> post_work_executable (character) -- A script or binary to run when job execution completes or fails (sets CLI flag `--post_work_executable`)
#> prcompile (logical) -- RAW NMFE OPTION - Forces PREDPP compilation (sets CLI flag `--prcompile`)
#> prsame (logical) -- RAW NMFE OPTION - Indicates to nonmem that the PREDPP compilation step should be skipped (sets CLI flag `--prsame`)
#> preview (logical) -- preview action, but don't actually run command (sets CLI flag `--preview`)
#> save_config (logical) -- Whether or not to save the existing configuration to a file with the model (default true) (sets CLI flag `--save_config`)
#> threads (numeric) -- number of threads to execute with (default 4) (sets CLI flag `--threads`)
#> verbose (logical) -- verbose output (sets CLI flag `--verbose`)
As discussed in “Setup” above, these can be set globally in the babylon.yaml
file, and you can see the default values of them in that file. However, specific arguments can also be set or changed for each model. This can be done in two ways:
add_bbi_args()
or replace_bbi_args()
.bbi_args
argument of one of the following functions
Note that any bbi_args
attached to a model object will override the relevant settings in babylon.yaml
, and that .bbi_args
passed into a submit_
or _summary
call will override the relevant settings in both babylon.yaml
and bbi_args
attached to the model object.
See the docs for any of the functions just mentioned for more details on usage and syntax.
It is common to run a model, make some tweaks to it, and then run it again. However, to avoid accidentally deleting model outputs, rbabylon
will error by default if it sees existing output when trying to submit a model. To automatically overwrite any previous model output, just pass overwrite = TRUE
to the .bbi_args
argument described in the previous section. For example:
mod1 %>% submit_model(.bbi_args = list(overwrite = TRUE))
You can also change this setting globally by setting overwrite: true
in the babylon.yaml
file for your project.
Once the model run has completed, users can get a summary object containing much of the commonly used diagnostic information in a named list.
sum1 <- mod1 %>% model_summary()
print(names(sum1))
#> [1] "absolute_model_path" "run_details" "run_heuristics"
#> [4] "parameters_data" "parameter_names" "ofv"
#> [7] "condition_number" "shrinkage_details"
These elements can be accessed manually or extracted with built-in helper functions like so:
param_df1 <- sum1 %>% param_estimates()
param_df1
#> # A tibble: 9 x 8
#> parameter_names estimate stderr random_effect_sd random_effect_s… fixed
#> <chr> <dbl> <dbl> <dbl> <dbl> <lgl>
#> 1 THETA1 2.31 8.61e- 2 NA NA FALSE
#> 2 THETA2 55.0 3.33e+ 0 NA NA FALSE
#> 3 THETA3 465. 2.96e+ 1 NA NA FALSE
#> 4 THETA4 -0.0806 5.55e- 2 NA NA FALSE
#> 5 THETA5 4.13 1.36e+ 0 NA NA FALSE
#> 6 OMEGA(1,1) 0.0964 2.00e- 2 0.311 0.0322 FALSE
#> 7 OMEGA(2,1) 0 1.00e+10 0 10000000000 TRUE
#> 8 OMEGA(2,2) 0.154 2.67e- 2 0.392 0.0341 FALSE
#> 9 SIGMA(1,1) 1 1.00e+10 1 10000000000 TRUE
#> # … with 2 more variables: diag <lgl>, shrinkage <dbl>
To see how to load summaries of multiple models to an easy-to-read tibble, see the Creating a Model Summary Log vignette.
Much of the benefit of rbabylon
is leveraged in the model iteration workflow, and the run log summaries that can be created afterwards. For example, imagine you look at these model results and want to begin iterating on them with a new model.
If you are now in a new R session and no longer have your mod1
object in memory, you can easily rebuild it from the YAML file on disk with read_model()
:
mod1 <- read_model(file.path(MODEL_DIR, 1))
Now you can create a new model object, based on the original, copying and renaming the control stream in the process. The copy_model_from()
call below will create both 2.ctl
and 2.yaml
files in the same directory as the parent model, and return the model object corresponding to them. (copy_model_from()
also stores the model’s “ancestry” which can be useful later in the project, as shown in the Using the based_on field vignette.)
mod2 <- copy_model_from(mod1, 2)
Note that, while the .path
argument in new_model()
and read_model()
is relative to your working directory, the .new_model
argument of copy_model_from()
is relative to the directory containing the parent model. This means that, assuming you would like to create the new model in the same directory as its parent, you only have to pass a filename (without extension) for the new model. Since, by convention, scientists often name their models numerically, you can also pass a number, which will be coerced to the relevant file names internally.
The new control stream file 2.ctl
can now be edited with the desired changes and then submitted and summarized exactly as above.
# manually edit control stream, then...
mod2 %>% submit_model()
mod2 %>% model_summary()
Now the iteration process continues with a third model. Note that you can tell copy_model_from()
to inherit the tags from the parent model and automatically add them to the new model.
mod3 <- copy_model_from(mod2, 3, .inherit_tags = TRUE)
Submit and summarize as before.
# manually edit control stream, then...
mod3 %>% submit_model()
mod3 %>% model_summary()
Add tags and notes for filtering in run log, described next.
At any point, the user can easily construct a “run log” tibble to summarize all models run up to this point.
Before we move on, note that you can get even more information about your models from the config_log()
and summary_log()
functions, as well as add_config()
and add_summary()
which automatically join the columns output from those functions against the tibble output from run_log()
. See the “Further Reading” section below for links to vignettes demonstrating those functions.
log_df <- run_log(MODEL_DIR)
log_df
#> # A tibble: 3 x 10
#> absolute_model_… run yaml_md5 model_type description bbi_args based_on tags
#> <chr> <chr> <chr> <chr> <chr> <list> <list> <lis>
#> 1 /data/GHE/mpn/d… 1 c7486e7… nonmem <NA> <list [… <NULL> <chr…
#> 2 /data/GHE/mpn/d… 2 bc60c6b… nonmem <NA> <list [… <chr [1… <chr…
#> 3 /data/GHE/mpn/d… 3 cfceb67… nonmem <NA> <list [… <chr [1… <chr…
#> # … with 2 more variables: notes <list>, decisions <list>
The run_log()
returns a tibble which can be manipulated like any other tibble. However, several of the columns (tags
, notes
, and based_on
for example) are list columns, which complicates how you can interact with them. We provide some helper functions to more seamlessly interact with these log tibbles, as well as some sample tidyverse
code below.
Hopefully this has given you a good start on understanding the capabilities and basic workflow of rbabylon
. Please see the other vignettes for demonstrations of more advanced or specific functionality.
based_on
field to track a model’s ancestry through the model development process, as well how to leverage config_log()
to check whether older models are still up-to-date.summary_log()
to extract model diagnostics like the objective function value, condition number, and parameter counts.