Introduction

This “Getting Started with rbabylon” vignette takes the user through some basic scenarios for modeling with NONMEM using rbabylon, introducing you to its standard workflow and functionality.

rbabylon is an R interface for running babylon, which is (will be) a complete solution for managing projects involving modeling and simulation with a number of software solutions used in pharmaceutical sciences. Currently, only NONMEM is supported, so this vignette will only address that.

Setup

Installing babylon

The first time you use rbabylon on a system or disk, you must first install babylon, often aliased as bbi. The babylon executable is a single binary file that can be easily installed with:

rbabylon::use_bbi("/data/apps/")

The use_bbi() function takes the directory into which you want to install bbi. Note that this can be pointed to anywhere on your system; installing to /data/apps/bbi is merely a convention used at Metrum Research Group, where rbabylon was developed.

Setting bbi_exe_path

Once bbi is installed, you will need to make sure rbabylon knows where to find it. This can be done by setting the rbabylon.bbi_exe_path option like so:

options("rbabylon.bbi_exe_path" = "/data/apps/bbi")

Please note that this must be set for every new R session, so it’s recommended to include the above snippet in your .Rprofile, replacing the "/data/apps/bbi" with the absolute path from your system/project, if you have installed it somewhere different.

Updating bbi or checking if it’s installed

If you’re not sure if you have babylon installed, you can use bbi_version() to check. This will return the version number of your installation, if one is found, or an empty string if you do not have babylon installed. You can also use bbi_current_release() to see the most current version available and run use_bbi() as specified above if you want to update.

You can also check the babylon documentation for manual installation instructions from the command line.

babylon.yaml configuration file

To actually submit models with bbi, you will also need a babylon.yaml configuration file. Think of this as containing “global default settings” for your bbi runs. Those settings will not be discussed here, but know that they can be modified globally by editing that file, or model-by-model as described in the “Passing arguments to bbi” section below.

bbi_init()

The bbi_init() function will create a babylon.yaml, with the default settings, in the specified directory. Optionally, you can pass the path to a babylon.yaml to the .config_path argument of any rbabylon function that needs one. However, by default these functions will look for one in the same folder as the model being submitted or manipulated. Therefore, if you will have a number of models in the same folder, it is easiest to put a babylon.yaml that folder.

MODEL_DIR <- "../model/nonmem/basic"  # this should be relative to your working directory

bbi_init(.dir = MODEL_DIR,            # the directory to create the babylon.yaml in
         .nonmem_dir = "/opt/NONMEM", # location of NONMEM installation
         .nonmem_version = "nm74gf")  # default NONMEM version to use

Here we define a define variable MODEL_DIR with the path to the directory containing our models. Notice that you must also pass the path to a working installation of NONMEM, and a default NONMEM version to use.

Note this only needs to be done once for each folder you are modeling in. Once the babylon.yaml exists, you will not need to run bbi_init() again unless you want to create another one; for example if you move to modeling in a different directory, or if you want a different set of default settings for some specific subset of models.

Initial modeling run

Create model object

To begin modeling, first create a model object using new_model(). This is an S3 object which you will pass to all of the other rbabylon functions that submit, summarize, or manipulate models.

The first argument (.path) must be the path to your model control stream without file extension. For instance, the call below assumes you have a control stream named 1.ctl or 1.mod in the directory you just assigned to MODEL_DIR above.

mod1 <- new_model(file.path(MODEL_DIR, 1))

This new_model() call will also create a 1.yaml file in that same directory, which stores model metadata like description and tags (discussed below). If you ever need to recreate this model object in memory, just run mod1 <- read_model(file.path(MODEL_DIR, 1)) to rebuild it from the YAML file on disk.

The model object you have just created can be passed to various functions which you will see in a minute. Now that we’ve created a model, the first thing we will do is submit the model to be run.

Submit model

mod1 %>% submit_model()

This will return a process object. We won’t discuss this object in this vignette, but it contains some information about the submission call. Please note that checking on a model run in progress is not fully implemented. For now, users should check on their runs manually (by looking at the output directory) and only proceed to the next steps once it has successfully completed.

Passing arguments to bbi

There are a number of arguments that bbi can take to modify how models are run. You can print a list of available arguments using the print_bbi_args() helper function.

print_bbi_args()
#> additional_post_work_envs (character) -- Any additional values (as ENV KEY=VALUE) to provide for the post execution environment (sets CLI flag `--additional_post_work_envs`)
#> background (logical) -- RAW NMFE OPTION - Tells nonmem not to scan StdIn for control characters (sets CLI flag `--background`)
#> clean_lvl (numeric) -- clean level used for file output from a given (set of) runs (default 1) (sets CLI flag `--clean_lvl`)
#> config (character) -- Path (relative or absolute) to another babylon.yaml to load (sets CLI flag `--config`)
#> copy_lvl (numeric) -- copy level used for file output from a given (set of) runs (sets CLI flag `--copy_lvl`)
#> debug (logical) -- debug mode (sets CLI flag `--debug`)
#> delay (numeric) -- Selects a random number of seconds between 1 and this value to stagger / jitter job execution. Assists in dealing with large volumes of work dealing with the same data set. May avoid NMTRAN issues about not being able read / close files (sets CLI flag `--delay`)
#> ext_file (character) -- name of custom ext-file (sets CLI flag `--ext-file`)
#> git (logical) -- whether git is used (sets CLI flag `--git`)
#> json (logical) -- json tree of output, if possible (sets CLI flag `--json`)
#> licfile (character) -- RAW NMFE OPTION - Specify a license file to use with NMFE (Nonmem) (sets CLI flag `--licfile`)
#> log_file (character) -- If populated, specifies the file into which to store the output / logging details from Babylon (sets CLI flag `--log_file`)
#> maxlim (numeric) -- RAW NMFE OPTION - Set the maximum values set for the buffers used by Nonmem (default 100) (sets CLI flag `--maxlim`)
#> mpi_exec_path (character) -- The fully qualified path to mpiexec. Used for nonmem parallel operations (default '/usr/local/mpich3/bin/mpiexec') (sets CLI flag `--mpi_exec_path`)
#> nm_version (character) -- Version of nonmem from the configuration list to use (sets CLI flag `--nm_version`)
#> nm_qual (logical) -- Whether or not to execute with nmqual (autolog.pl (sets CLI flag `--nmqual`)
#> nobuild (logical) -- RAW NMFE OPTION - Skips recompiling and rebuilding on nonmem executable (sets CLI flag `--nobuild`)
#> no_ext_file (logical) -- do not use ext file (sets CLI flag `--no-ext-file`)
#> no_grd_file (logical) -- do not use grd file (sets CLI flag `--no-grd-file`)
#> no_shk_file (logical) -- do not use shk file (sets CLI flag `--no-shk-file`)
#> overwrite (logical) -- Whether or not to remove existing output directories if they are present (sets CLI flag `--overwrite`)
#> parafile (character) -- Location of a user-provided parafile to use for parallel execution (sets CLI flag `--parafile`)
#> parallel (logical) -- Whether or not to run nonmem in parallel mode (sets CLI flag `--parallel`)
#> parallel_timeout (numeric) -- The amount of time to wait for parallel operations in nonmem before timing out (default 2147483647) (sets CLI flag `--parallel_timeout`)
#> post_work_executable (character) -- A script or binary to run when job execution completes or fails (sets CLI flag `--post_work_executable`)
#> prcompile (logical) -- RAW NMFE OPTION - Forces PREDPP compilation (sets CLI flag `--prcompile`)
#> prsame (logical) -- RAW NMFE OPTION - Indicates to nonmem that the PREDPP compilation step should be skipped (sets CLI flag `--prsame`)
#> preview (logical) -- preview action, but don't actually run command (sets CLI flag `--preview`)
#> save_config (logical) -- Whether or not to save the existing configuration to a file with the model (default true) (sets CLI flag `--save_config`)
#> threads (numeric) -- number of threads to execute with (default 4) (sets CLI flag `--threads`)
#> verbose (logical) -- verbose output (sets CLI flag `--verbose`)

As discussed in “Setup” above, these can be set globally in the babylon.yaml file, and you can see the default values of them in that file. However, specific arguments can also be set or changed for each model. This can be done in two ways:

Note that any bbi_args attached to a model object will override the relevant settings in babylon.yaml, and that .bbi_args passed into a submit_ or _summary call will override the relevant settings in both babylon.yaml and bbi_args attached to the model object.

See the docs for any of the functions just mentioned for more details on usage and syntax.

Overwriting output from a previously run model

It is common to run a model, make some tweaks to it, and then run it again. However, to avoid accidentally deleting model outputs, rbabylon will error by default if it sees existing output when trying to submit a model. To automatically overwrite any previous model output, just pass overwrite = TRUE to the .bbi_args argument described in the previous section. For example:

mod1 %>% submit_model(.bbi_args = list(overwrite = TRUE))

You can also change this setting globally by setting overwrite: true in the babylon.yaml file for your project.

Summarize model

Once the model run has completed, users can get a summary object containing much of the commonly used diagnostic information in a named list.

sum1 <- mod1 %>% model_summary()
print(names(sum1))
#> [1] "absolute_model_path" "run_details"         "run_heuristics"     
#> [4] "parameters_data"     "parameter_names"     "ofv"                
#> [7] "condition_number"    "shrinkage_details"

These elements can be accessed manually or extracted with built-in helper functions like so:

param_df1 <- sum1 %>% param_estimates()
param_df1
#> # A tibble: 9 x 8
#>   parameter_names estimate   stderr random_effect_sd random_effect_s… fixed
#>   <chr>              <dbl>    <dbl>            <dbl>            <dbl> <lgl>
#> 1 THETA1            2.31   8.61e- 2           NA              NA      FALSE
#> 2 THETA2           55.0    3.33e+ 0           NA              NA      FALSE
#> 3 THETA3          465.     2.96e+ 1           NA              NA      FALSE
#> 4 THETA4           -0.0806 5.55e- 2           NA              NA      FALSE
#> 5 THETA5            4.13   1.36e+ 0           NA              NA      FALSE
#> 6 OMEGA(1,1)        0.0964 2.00e- 2            0.311           0.0322 FALSE
#> 7 OMEGA(2,1)        0      1.00e+10            0     10000000000      TRUE 
#> 8 OMEGA(2,2)        0.154  2.67e- 2            0.392           0.0341 FALSE
#> 9 SIGMA(1,1)        1      1.00e+10            1     10000000000      TRUE 
#> # … with 2 more variables: diag <lgl>, shrinkage <dbl>

To see how to load summaries of multiple models to an easy-to-read tibble, see the Creating a Model Summary Log vignette.

Iteration

Much of the benefit of rbabylon is leveraged in the model iteration workflow, and the run log summaries that can be created afterwards. For example, imagine you look at these model results and want to begin iterating on them with a new model.

If you are now in a new R session and no longer have your mod1 object in memory, you can easily rebuild it from the YAML file on disk with read_model():

mod1 <- read_model(file.path(MODEL_DIR, 1))

copy_model_from()

Now you can create a new model object, based on the original, copying and renaming the control stream in the process. The copy_model_from() call below will create both 2.ctl and 2.yaml files in the same directory as the parent model, and return the model object corresponding to them. (copy_model_from() also stores the model’s “ancestry” which can be useful later in the project, as shown in the Using the based_on field vignette.)

mod2 <- copy_model_from(mod1, 2)

Note that, while the .path argument in new_model() and read_model() is relative to your working directory, the .new_model argument of copy_model_from() is relative to the directory containing the parent model. This means that, assuming you would like to create the new model in the same directory as its parent, you only have to pass a filename (without extension) for the new model. Since, by convention, scientists often name their models numerically, you can also pass a number, which will be coerced to the relevant file names internally.

The new control stream file 2.ctl can now be edited with the desired changes and then submitted and summarized exactly as above.

# manually edit control stream, then...
mod2 %>% submit_model()
mod2 %>% model_summary()

Adding tags and notes

After looking at these results, the user can add tags, which can later be used to organize your modeling runs.

mod1 <- mod1 %>% add_tags("orig acop model")
mod2 <- mod2 %>% add_tags("2 compartment")

Note that using free text for your tags is discouraged, for reasons mentioned in the ?modify_tags help page. For simplicity’s sake, we ignore that advice here, but please read it before using tags extensively in the wild.

In addition to tags, the user can add notes that can be referenced later.

mod2 <- mod2 %>% 
  add_notes("2 compartment model more appropriate than 1 compartment")

Continue to iterate…

Now the iteration process continues with a third model. Note that you can tell copy_model_from() to inherit the tags from the parent model and automatically add them to the new model.

mod3 <- copy_model_from(mod2, 3, .inherit_tags = TRUE)

Submit and summarize as before.

# manually edit control stream, then...
mod3 %>% submit_model()
mod3 %>% model_summary()

Add tags and notes for filtering in run log, described next.

mod3 <- mod3 %>% 
        add_tags(c("combined RUV", "iiv CL")) %>%
        add_notes("Added combined error structure because it seemed like a good idea")

Run log

At any point, the user can easily construct a “run log” tibble to summarize all models run up to this point.

Before we move on, note that you can get even more information about your models from the config_log() and summary_log() functions, as well as add_config() and add_summary() which automatically join the columns output from those functions against the tibble output from run_log(). See the “Further Reading” section below for links to vignettes demonstrating those functions.

log_df <- run_log(MODEL_DIR)
log_df
#> # A tibble: 3 x 10
#>   absolute_model_… run   yaml_md5 model_type description bbi_args based_on tags 
#>   <chr>            <chr> <chr>    <chr>      <chr>       <list>   <list>   <lis>
#> 1 /data/GHE/mpn/d… 1     c7486e7… nonmem     <NA>        <list [… <NULL>   <chr…
#> 2 /data/GHE/mpn/d… 2     bc60c6b… nonmem     <NA>        <list [… <chr [1… <chr…
#> 3 /data/GHE/mpn/d… 3     cfceb67… nonmem     <NA>        <list [… <chr [1… <chr…
#> # … with 2 more variables: notes <list>, decisions <list>

The run_log() returns a tibble which can be manipulated like any other tibble. However, several of the columns (tags, notes, and based_on for example) are list columns, which complicates how you can interact with them. We provide some helper functions to more seamlessly interact with these log tibbles, as well as some sample tidyverse code below.

Viewing tags example

The rbabylon::collapse_to_string() function can collapse any list column into a string representation of its contents. It is specifically designed for collapsing columns like tags, notes, and based_on into a more human-readable format.

log_df %>% 
  collapse_to_string(tags, notes) %>%
  select(run, tags, notes)
#> # A tibble: 3 x 3
#>   run   tags                       notes                                        
#>   <chr> <chr>                      <chr>                                        
#> 1 1     orig acop model            <NA>                                         
#> 2 2     2 compartment              2 compartment model more appropriate than 1 …
#> 3 3     2 compartment, combined R… Added combined error structure because it se…

Filtering tags example

This code uses purrr::map_lgl to filter the run log to only rows containing a specific tag ("2 compartment").

log_df %>% 
  filter(map_lgl(tags, ~ "2 compartment" %in% .x)) %>%
  collapse_to_string(tags, notes) %>%
  select(run, tags, notes)
#> # A tibble: 2 x 3
#>   run   tags                       notes                                        
#>   <chr> <chr>                      <chr>                                        
#> 1 2     2 compartment              2 compartment model more appropriate than 1 …
#> 2 3     2 compartment, combined R… Added combined error structure because it se…

Further reading

Hopefully this has given you a good start on understanding the capabilities and basic workflow of rbabylon. Please see the other vignettes for demonstrations of more advanced or specific functionality.