getting-started.RmdThis “Getting Started with rbabylon” vignette takes the user through some basic scenarios for modeling with NONMEM using rbabylon, introducing you to its standard workflow and functionality.
library(rbabylon) suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(purrr))
rbabylon is an R interface for running babylon. babylon is (will be) a complete solution for managing projects involving modeling and simulation with a number of software solutions used in pharmaceutical sciences. Currently, only NONMEM is supported, so this vignette will only address that.
The first time you use rbabylon on a system or disk, you must first install babylon, often aliased as bbi. The first time you are using it for a given project, you will also need to run bbi_init() in your modeling directory.
To reiterate, there are two relevant paths for this section.
BBI_EXE_PATH – the path the babylon executable file on your system.MODEL_DIR – the root modeling directory where you will be running models with rbabylon
BBI_EXE_PATH <- "/data/apps/bbi" # this should be an absolute path MODEL_DIR <- "../inst/nonmem" # this should be relative to your working directory
You can install the babylon binary with:
rbabylon::use_bbi()
This will install to the /data/apps/ folder by default, which matches the BBI_EXE_PATH you just defined (and will use soon). If you have changed that, you can pass your custom path to the directory where you want to put bbi like so use_bbi("/some/other/dir"). In that case, you would want to set BBI_EXE_PATH = "/some/other/dir/bbi".
You can also check the babylon documentation for manual installation instructions.
If you’re not sure if you have babylon installed, you can use bbi_version() to check. This will return the version number of your installation, if one is found, or an empty string if you do not have babylon installed. You can also use bbi_current_release() to see the most current version available and run use_bbi() as specified above if you want to update.
Next, initialize babylon by pointing it to your modeling directory, a working installation of NONMEM, and a default NONMEM version to use.
bbi_init(.dir = MODEL_DIR, # the root modeling directory .nonmem_dir = "/opt/NONMEM", # location of NONMEM installation .nonmem_version = "nm74gf") # default NONMEM version to use
This will create a babylon.yaml file in your MODEL_DIR directory which contains a lot of default settings for running models, etc. Those settings will not be discussed here, but know that they can be modified globally by editing that file, or model-by-model as described in the “Passing arguments to bbi” section below.
Once babylon is installed and initialized, you will need to make rbabylon knows where to find it. This can be done by setting the following options:
options( 'rbabylon.bbi_exe_path' = BBI_EXE_PATH, 'rbabylon.model_directory' = normalizePath(MODEL_DIR, mustWork = TRUE) )
This must be set for every new R session, so it may be wise to include the above snippet in your .Rprofile, replacing the BBI_EXE_PATH and MODEL_DIR constants with the relevant paths from your system/project. Note that, if setting 'rbabylon.model_directory' explicitly like this, you must set it to an absolute path. The recommended way to do this is to use normalizePath("path/to/dir/") where "path/to/dir/" is a relative path from the script location (in this case the .Rprofile location). That way the project will still be portable between different servers/disks.
NOTE: By setting the rbabylon.model_directory option, you will not need to pass the full path to model files when calling functions like read_model() or submit_model(). Instead you will pass a path relative to the specified modeling directory. To be clear, it is not necessary to set this option, but it is definitely a recommended convenience.
To begin modeling, first create a model object. This is an S3 object with an accompanying YAML file, kept in your modeling directory. The YAML serves as the “record of truth” for the model, and will persist changes you make to the model as you go.
The .yaml_path argument should have the same name (but with a .yaml extension) as your base model control stream. For instance, the command below assumes you have a control stream name 1.ctl or 1.mod in the model directory you just defined. The .description argument is required and corresponds to what would be in the $PROB definition in a control stream.
mod1 <- new_model(.yaml_path = "1.yaml", .description = "our first model")
The model object you have just created can be passed to various functions which you will see in a minute. If you ever need to recreate it, just run mod1 <- read_model("1.yaml") to rebuild it from the YAML. The first thing we will do is submit the model to be run.
mod1 %>% submit_model()
This will return a process object. We won’t discuss this object in this vignette, but other documentation will show how it can be used to check on the model run in progress. Please note that, for this first release, checking on a model run in progress is not fully implemented. For now, users should check on their runs manually and only proceed to the next steps once it has successfully completed.
There are a number of arguments that babylon can take to modify how models are run. You can print a list of available arguments using the print_nonmem_args() helper function. (Similar helper functions will be added as new modeling software are supported.)
print_nonmem_args() #> cache_dir (character) -- directory path for cache of nonmem executables for NM7.4+ (sets CLI flag `--cache_dir`) #> cache_exe (character) -- name of executable stored in cache (sets CLI flag `--cache_exe`) #> clean_lvl (numeric) -- clean level used for file output from a given (set of) runs (default 1) (sets CLI flag `--clean_lvl`) #> config (character) -- config file (default is $HOME/babylon.yaml) (sets CLI flag `--config`) #> copy_lvl (numeric) -- copy level used for file output from a given (set of) runs (sets CLI flag `--copy_lvl`) #> debug (logical) -- debug mode (sets CLI flag `--debug`) #> delay (numeric) -- Selects a random number of seconds between 1 and this value to stagger / jitter job execution. Assists in dealing with large volumes of work dealing with the same data set. May avoid NMTRAN issues about not being able read / close files (sets CLI flag `--delay`) #> ext_file (character) -- name of custom ext-file (sets CLI flag `--ext-file`) #> git (logical) -- whether git is used (sets CLI flag `--git`) #> gitignore_lvl (numeric) -- gitignore lvl for a given (set of) runs (sets CLI flag `--gitignoreLvl`) #> json (logical) -- json tree of output, if possible (sets CLI flag `--json`) #> mpi_exec_path (character) -- The fully qualified path to mpiexec. Used for nonmem parallel operations (default '/usr/local/mpich3/bin/mpiexec') (sets CLI flag `--mpi_exec_path`) #> nm_version (character) -- Version of nonmem from the configuration list to use (sets CLI flag `--nm_version`) #> nm_qual (logical) -- Whether or not to execute with nmqual (autolog.pl (sets CLI flag `--nmqual`) #> no_cor_file (logical) -- do not use cor file (sets CLI flag `--no-cor-file`) #> no_cov_file (logical) -- do not use cov file (sets CLI flag `--no-cov-file`) #> no_ext_file (logical) -- do not use ext file (sets CLI flag `--no-ext-file`) #> no_grd_file (logical) -- do not use grd file (sets CLI flag `--no-grd-file`) #> no_shk_file (logical) -- do not use shk file (sets CLI flag `--no-shk-file`) #> nodes (numeric) -- The number of nodes on which to perform parallel operations (default 8) (sets CLI flag `--nodes`) #> output_dir (character) -- Go template for the output directory to use for storging details of each executed model (default '{{ .Name}}') (sets CLI flag `--output_dir`) #> overwrite (logical) -- Whether or not to remove existing output directories if they are present (sets CLI flag `--overwrite`) #> parafile (character) -- Location of a user-provided parafile to use for parallel execution (sets CLI flag `--parafile`) #> parallel (logical) -- Whether or not to run nonmem in parallel mode (sets CLI flag `--parallel`) #> preview (logical) -- preview action, but don't actually run command (sets CLI flag `--preview`) #> saveConfig (logical) -- Whether or not to save the existing configuration to a file with the model (default true) (sets CLI flag `--save_config`) #> save_exe (character) -- what to name the executable when stored in cache (sets CLI flag `--save_exe`) #> threads (numeric) -- number of threads to execute with (default 4) (sets CLI flag `--threads`) #> timeout (numeric) -- The amount of time to wait for parallel operations in nonmem before timing out (default 2147483647) (sets CLI flag `--timeout`) #> verbose (logical) -- verbose output (sets CLI flag `--verbose`)
These can be specified globally in the babylon.yaml file, and you can see the default values of them in that file. However, they can also be specified or changed for each model. This can be done in several ways:
bbi_args:
add_bbi_args() or replace_bbi_args
.bbi_args argument of one of the following functions
See the docs for any of those functions for more details on usage and syntax.
It is common to run a model, make some tweaks to it, and then run it again. However, to avoid accidentally deleting model outputs, rbabylon will error by default if it sees existing output when trying to submit a model. To automatically overwrite any previous model output, just pass overwrite = TRUE to the .bbi_args argument described in the previous section. For example:
mod1 %>% submit_model(.bbi_args = list(overwrite = TRUE))
You can also change this setting globally by adding overwrite: true to the babylon.yaml file for your project.
Once the model run has completed, users can get a summary object containing much of the commonly used diagnostic information in a named list.
sum1 <- mod1 %>% model_summary() print(names(sum1)) #> [1] "run_details" "run_heuristics" "parameters_data" #> [4] "parameter_names" "ofv" "shrinkage_details" #> [7] "covariance_theta" "correlation_theta"
These elements can be accessed manually or extracted with built-in helper functions like so:
param_df1 <- sum1 %>% param_estimates() param_df1 #> # A tibble: 9 x 7 #> names estimate stderr random_effect_sd random_effect_sdse fixed diag #> <chr> <dbl> <dbl> <dbl> <dbl> <int> <lgl> #> 1 THETA1 2.31 8.61e- 2 NA NA 0 NA #> 2 THETA2 55.0 3.33e+ 0 NA NA 0 NA #> 3 THETA3 465. 2.96e+ 1 NA NA 0 NA #> 4 THETA4 -0.0806 5.55e- 2 NA NA 0 NA #> 5 THETA5 4.13 1.36e+ 0 NA NA 0 NA #> 6 OMEGA(1,1) 0.0964 2.00e- 2 0.311 0.0322 0 TRUE #> 7 OMEGA(2,1) 0 1.00e+10 0 10000000000 1 FALSE #> 8 OMEGA(2,2) 0.154 2.67e- 2 0.392 0.0341 0 TRUE #> 9 SIGMA(1,1) 1 1.00e+10 1 10000000000 1 TRUE
Much of the benefit of rbabylon is leveraged in the model iteration workflow, and the run log summaries that can be created afterwards. For example, imagine you look at these model results and want to begin iterating on them with a new model.
Imagine you are now in a new R session and no longer have your mod1 object. You can easily rebuild it from the YAML file on disk. NOTE: You can pass either a character or integer to read_model(), submit_model(), etc. to identify your model, assuming that you have set your model directory correctly.
mod1 <- read_model(1)
Now the user can create a new model object, based on the original, copying and renaming the control stream in the process.
mod2 <- mod1 %>% copy_model_from(.new_model = 2, .description = "two compartment base model")
The new control stream file 2.ctl can now be edited with the desired changes and then submitted and summarized exactly as above.
mod2 %>% submit_model() mod2 %>% model_summary()
After looking at these results, the user can add tags, which can later be used to organize your modeling runs.
In addition to tags, the user can add decisions that can be referenced later.
mod2 <- mod2 %>% add_decisions("2 compartment model more appropriate than 1 compartment")
Now the iteration process continues.
mod3 <- mod2 %>% copy_model_from(.new_model = 3, .description = "two compartment with residual errors")
Submit and summarize as before.
mod3 %>% submit_model() mod3 %>% model_summary()
Add tags and decisions for filtering in run log, described next.
mod3 <- mod3 %>% add_tags("2 compartment") %>% add_decisions("Added residual errors because it seemed like a good idea.")
At any point, the user can easily construct a run log tibble to summarize all models run up to this point.
log_df <- run_log() log_df #> # A tibble: 3 x 8 #> absolute_model_… yaml_md5 model_type description bbi_args based_on tags #> <chr> <chr> <chr> <chr> <list> <list> <lis> #> 1 /tmp/RtmpDNwElm… 8521b38… nonmem our first … <list [… <NULL> <chr… #> 2 /tmp/RtmpDNwElm… 63ee952… nonmem two compar… <list [… <chr [1… <chr… #> 3 /tmp/RtmpDNwElm… bfa56db… nonmem two compar… <list [… <chr [1… <chr… #> # … with 1 more variable: decisions <list>
The run_log() returns a tibble (called log_df in this example) which can be manipulated like any other tibble. However, several of the columns (tags and based_on for example) are list columns, which complicates how the user interacts with them. Future releases will have helper functions to more seamlessly interact with log_df, but until then, we have provided some sample dplyr code below.
Hopefully this has given you a good start on understanding the capabilities and basic workflow of rbabylon. Please see the other vignettes for demonstrations of more advanced or specific functionality.
based_on field to track a model’s ancestry through the model development process, as well how to leverage md5 digests to check whether older models are still up-to-date.