Introduction

This vignette demonstrates how to use the based_on field to track a model’s ancestry through the model development process. You will also see one common use for this: using the tibble output from run_log() to check that your models are up-to-date. By “up-to-date” we mean that none of the model files or data files have changed since the model was run.

If you are new to rbabylon, the “Getting Started with rbabylon” vignette will take you through some basic scenarios for modeling with NONMEM using rbabylon, introducing you to its standard workflow and functionality.

Setup

There is some initial set up necessary for using rbabylon. Please refer to the “Getting Started” vignette, mentioned above, if you have not done this yet. Once this is done, load the library and set your modeling directory.

library(rbabylon)
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(purrr))
suppressPackageStartupMessages(library(fs))

Modeling process

The modeling process will always start with an initial model, which we create with the new_model() call.

mod1 <- new_model(.yaml_path = "1.yaml", .description = "one compartment base model")

From there, the iterative model development process proceeds. The copy_model_from() function will do several things, including creating a new model file and filling in some relevant metadata. Notably, it will also add the model that you copied from into the based_on field for the new model.

mod2 <- copy_model_from(.parent_mod = mod1, .new_model = 2, .description = "two compartment base model")

mod2$based_on
#> [1] "1"

NOTE: In a real model development process, these models would obviously be run and the diagnostics examined before moving on. For the sake of brevity, imagine that all happens “behind the curtain” in this example. In other words, in between each of the calls to copy_model_from() you would be doing all of the normal iterative modeling work.

# ...submit mod2...look at diagnostics...decide on changes for next iteration...

mod3 <- copy_model_from(.parent_mod = mod2, .new_model = 3, .description = "two compartment with residual errors")

# ...submit mod3...look at diagnostics...decide on changes for next iteration...

mod4 <- copy_model_from(.parent_mod = mod3, .new_model = 4, .description = "two compartment with residual errors and added covariates")

# ...submit mod4...look at diagnostics...decide to go back to mod2 as basis for next iteration...

mod5 <- copy_model_from(.parent_mod = mod2, .new_model = 5, .description = "two compartment base model with added covariates")

# ...submit mod5...look at diagnostics...decide on changes for next iteration...

mod6 <- copy_model_from(.parent_mod = mod5, .new_model = 6, .description = "two compartment base model with added covariates and something else")

# ...submit mod6...look at diagnostics...decide you're done!

Now that you have arrived at your final model, you can add a tag to it, which will be used shortly for filtering the run_log() tibble.

mod6 <- mod6 %>% add_tags("final model")

Operating on a model object

As seen above, you can simply use mod$based_on to see what is stored in the based_on field of a given model. However, there are two additional helper functions that are useful to know.

get_based_on

First, by using get_based_on() you can retrieve the absolute path to all models in the based_on field.

mod6 %>% get_based_on()
#> [1] "/tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/5"

This is useful because the path(s) retrieved will unambiguously identify the parent model(s) and can therefore be passed to things like read_model() or model_summary() like so:

parent_mod <- mod6 %>% get_based_on() %>% read_model()
str(parent_mod)
#> List of 9
#>  $ description      : chr "two compartment base model with added covariates"
#>  $ based_on         : chr "2"
#>  $ model_type       : chr "nonmem"
#>  $ model_path       : chr "5.ctl"
#>  $ model_working_dir: chr "/tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem"
#>  $ orig_yaml_file   : chr "5.yaml"
#>  $ yaml_md5         : chr "06d4b25a18ad1a729fe0498ff9cd7f60"
#>  $ bbi_args         : list()
#>  $ output_dir       : chr "5"
#>  - attr(*, "class")= chr [1:2] "bbi_nonmem_model" "list"

get_model_ancestry

The second helper function walks up the tree of inheritence by iteratively calling get_based_on() on each parent model to determine the full set of models that led up to the current model.

mod6 %>% get_model_ancestry()
#> [1] "/tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/1"
#> [2] "/tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/2"
#> [3] "/tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/5"

In this case, model 6 was based on 5, which was based on 2, which in turn was based on 1. You will see one example of how this can be useful in the “Final model family” section below.

Using the run log

While it may be useful to look at the ancestry of a single model object, it may be even more useful to use the based_on field later in the modeling process when you are looking back and trying to summarize the model activities as a whole. The run_log() function is helpful for this. It returns a tibble with metadata about each model.

log_df <- run_log()
log_df
#> # A tibble: 6 x 8
#>   absolute_model_… yaml_md5 model_type description bbi_args based_on tags 
#>   <chr>            <chr>    <chr>      <chr>       <list>   <list>   <lis>
#> 1 /tmp/RtmpDNwElm… 8e2bfb1… nonmem     one compar… <list [… <NULL>   <NUL…
#> 2 /tmp/RtmpDNwElm… 7bf7577… nonmem     two compar… <list [… <chr [1… <NUL…
#> 3 /tmp/RtmpDNwElm… 0a39c21… nonmem     two compar… <list [… <chr [1… <NUL…
#> 4 /tmp/RtmpDNwElm… 685b34c… nonmem     two compar… <list [… <chr [1… <NUL…
#> 5 /tmp/RtmpDNwElm… 06d4b25… nonmem     two compar… <list [… <chr [1… <NUL…
#> 6 /tmp/RtmpDNwElm… d5aa7c4… nonmem     two compar… <list [… <chr [1… <chr…
#> # … with 1 more variable: decisions <list>

Filtering tags example

Among other things, the run log contains the tags that have been assigned to each model. Here we use purrr::map_lgl to filter the run log to only the final model.

final_model_path <- log_df %>%
  filter(map_lgl(tags, ~ "final model" %in% .x)) %>%
  get_model_path()

final_model_path
#> [1] "/tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/6.ctl"

Next we can use the get_model_ancestry() function to filter the tibble to only the models that led up to the final model.

log_df %>%
  filter(absolute_model_path %in% get_model_ancestry(final_model_path))
#> # A tibble: 3 x 8
#>   absolute_model_… yaml_md5 model_type description bbi_args based_on tags 
#>   <chr>            <chr>    <chr>      <chr>       <list>   <list>   <lis>
#> 1 /tmp/RtmpDNwElm… 8e2bfb1… nonmem     one compar… <list [… <NULL>   <NUL…
#> 2 /tmp/RtmpDNwElm… 7bf7577… nonmem     two compar… <list [… <chr [1… <NUL…
#> 3 /tmp/RtmpDNwElm… 06d4b25… nonmem     two compar… <list [… <chr [1… <NUL…
#> # … with 1 more variable: decisions <list>

As you can see, models 3 and 4 are discarded because they did not lead to the final model. Review “Modeling Process” section above if you are not sure why this is the case. We will use the two techniques together in the “Final model family” section below.

Checking if models are up-to-date with config_log()

Now imagine you are coming back to this project some time later and want to make sure that all of the outputs you have are still up-to-date with the model files and data currently in the project.

When babylon runs a model, it creates a file named bbi_config.json in the output directory. This file contains a lot of information about the state and configuration at runtime. Notably, it contains an md5 digest of both the model file and the data file at run time. Users can compare this to the current md5 hashes of these two files to check if any model or data files have changed since the model was last run. This serves as a check that the outputs are up-to-date with the model and data.

You can call config_log directly, but it is likely more useful to join the two together automatically with run_log() %>% add_config().

log_df <- log_df %>% add_config()
log_df
#> # A tibble: 6 x 11
#>   absolute_model_… yaml_md5 model_type description bbi_args based_on tags 
#>   <chr>            <chr>    <chr>      <chr>       <list>   <list>   <lis>
#> 1 /tmp/RtmpDNwElm… 8e2bfb1… nonmem     one compar… <list [… <NULL>   <NUL…
#> 2 /tmp/RtmpDNwElm… 7bf7577… nonmem     two compar… <list [… <chr [1… <NUL…
#> 3 /tmp/RtmpDNwElm… 0a39c21… nonmem     two compar… <list [… <chr [1… <NUL…
#> 4 /tmp/RtmpDNwElm… 685b34c… nonmem     two compar… <list [… <chr [1… <NUL…
#> 5 /tmp/RtmpDNwElm… 06d4b25… nonmem     two compar… <list [… <chr [1… <NUL…
#> 6 /tmp/RtmpDNwElm… d5aa7c4… nonmem     two compar… <list [… <chr [1… <chr…
#> # … with 4 more variables: decisions <list>, model_md5 <chr>, data_path <chr>,
#> #   data_md5 <chr>

Now you can easily check the md5 digests from the config log against those of the current files.

# extract path to data files and model files
norm_data_paths <- fs::path_norm(file.path(log_df$absolute_model_path, log_df$data_path))
norm_model_paths <- get_model_path(log_df)

# get md5 digests and compare to those from config_log() columns
log_df <- log_df %>% mutate(
                        current_data_md5  = tools::md5sum(norm_data_paths),
                        data_md5_match    = .data$data_md5 == .data$current_data_md5,
                        current_model_md5  = tools::md5sum(norm_model_paths),
                        model_md5_match   = .data$model_md5 == .data$current_model_md5
                      )

log_df %>% select(absolute_model_path, data_md5_match, model_md5_match)
#> # A tibble: 6 x 3
#>   absolute_model_path                          data_md5_match model_md5_match
#>   <chr>                                        <lgl>          <lgl>          
#> 1 /tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/1 TRUE           TRUE           
#> 2 /tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/2 TRUE           TRUE           
#> 3 /tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/3 TRUE           FALSE          
#> 4 /tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/4 TRUE           FALSE          
#> 5 /tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/5 TRUE           TRUE           
#> 6 /tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/6 TRUE           TRUE

Final model family

From the model_md5_match column in the previous example, you can see that some of the model files have changed since they were run. However, you may only care about your final model and the models that led to it.

final_model_family <- bind_rows(
  log_df %>%
    filter(absolute_model_path %in% get_model_ancestry(final_model_path)), # the ancestors of the final model
  log_df %>%
    filter(map_lgl(tags, ~ "final model" %in% .x)) # the final model itself
)
final_model_family %>% select(absolute_model_path, data_md5_match, model_md5_match)
#> # A tibble: 4 x 3
#>   absolute_model_path                          data_md5_match model_md5_match
#>   <chr>                                        <lgl>          <lgl>          
#> 1 /tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/1 TRUE           TRUE           
#> 2 /tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/2 TRUE           TRUE           
#> 3 /tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/5 TRUE           TRUE           
#> 4 /tmp/RtmpDNwElm/rbabylon-0.7.0/inst/nonmem/6 TRUE           TRUE

When we filter to only those models, you can see that they are all still up-to-date. Great news.

Using the based_on field

Seth Green