
This vignette demonstrates how to use the based_on field to track a model’s ancestry through the model development process. You will also see one common use for this: using the tibble output from config_log() to check that your models are up-to-date. By “up-to-date” we mean that none of the model files or data files have changed since the model was run.

If you are new to rbabylon, the “Getting Started with rbabylon” vignette will take you through some basic scenarios for modeling with NONMEM using rbabylon, introducing you to its standard workflow and functionality.


There is some initial set up necessary for using rbabylon. Please refer to the “Getting Started” vignette, mentioned above, if you have not done this yet. Once this is done, load the library.

Modeling process

The modeling process will always start with an initial model, which we create with the new_model() call.

MODEL_DIR <- "../nonmem"
mod1 <- new_model(file.path(MODEL_DIR, 1))

From there, the iterative model development process proceeds. The copy_model_from() function will do several things, including creating a new model file and filling in some relevant metadata. Notably, it will also add the model that you copied from into the based_on field for the new model.

mod2 <- copy_model_from(.parent_mod = mod1, .new_model = 2)

#> [1] "1"

NOTE: In a real model development process, these models would obviously be run and the diagnostics examined before moving on. For the sake of brevity, imagine that all happens “behind the curtain” in this example. In other words, in between each of the calls to copy_model_from() you would be doing all of the normal iterative modeling work.

# ...submit mod2...look at diagnostics...decide on changes for next iteration...

mod3 <- copy_model_from(mod2, 3)

# ...submit mod3...look at diagnostics...decide on changes for next iteration...

mod4 <- copy_model_from(mod3, 4)

# ...submit mod4...look at diagnostics...decide to go back to mod2 as basis for next iteration...

mod5 <- copy_model_from(mod2, 5)

# ...submit mod5...look at diagnostics...decide on changes for next iteration...

mod6 <- copy_model_from(mod5, 6)

# ...submit mod6...look at diagnostics...decide you're done!

Now that you have arrived at your final model, you can add a description to identify it, which will be used shortly for filtering the run_log() tibble.

mod6 <- mod6 %>% add_description("Final model")

Operating on a model object

As seen above, you can simply use mod$based_on to see what is stored in the based_on field of a given model. However, there are two additional helper functions that are useful to know.


First, by using get_based_on() you can retrieve the absolute path to all models in the based_on field.

mod6 %>% get_based_on()
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/rbabylon/model/nonmem/basic/5"

This is useful because the path(s) retrieved will unambiguously identify the parent model(s) and can therefore be passed to things like read_model() or model_summary() like so:

parent_mod <- mod6 %>% get_based_on() %>% read_model()
#> List of 5
#>  $ model_type         : chr "nonmem"
#>  $ based_on           : chr "2"
#>  $ absolute_model_path: chr "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/rbabylon/model/nonmem/basic/5"
#>  $ yaml_md5           : chr "ac0ba292b017a72bb316359f0df09bb2"
#>  $ bbi_args           : list()
#>  - attr(*, "class")= chr [1:2] "bbi_nonmem_model" "list"


The second helper function walks up the tree of inheritence by iteratively calling get_based_on() on each parent model to determine the full set of models that led up to the current model.

mod6 %>% get_model_ancestry()
#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/rbabylon/model/nonmem/basic/1"
#> [2] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/rbabylon/model/nonmem/basic/2"
#> [3] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/rbabylon/model/nonmem/basic/5"

In this case, model 6 was based on 5, which was based on 2, which in turn was based on 1. You will see one example of how this can be useful in the “Final model family” section below.

Using the run log

While it may be useful to look at the ancestry of a single model object, it may be even more useful to use the based_on field later in the modeling process when you are looking back and trying to summarize the model activities as a whole. The run_log() function is helpful for this. It returns a tibble with metadata about each model.

log_df <- run_log(MODEL_DIR)
#> # A tibble: 6 x 10
#>   absolute_model_… run   yaml_md5 model_type description bbi_args based_on tags 
#>   <chr>            <chr> <chr>    <chr>      <chr>       <list>   <list>   <lis>
#> 1 /data/GHE/mpn/d… 1     3e3e686… nonmem     <NA>        <list [… <NULL>   <NUL…
#> 2 /data/GHE/mpn/d… 2     b992cc4… nonmem     <NA>        <list [… <chr [1… <NUL…
#> 3 /data/GHE/mpn/d… 3     ac0ba29… nonmem     <NA>        <list [… <chr [1… <NUL…
#> 4 /data/GHE/mpn/d… 4     6818db1… nonmem     <NA>        <list [… <chr [1… <NUL…
#> 5 /data/GHE/mpn/d… 5     ac0ba29… nonmem     <NA>        <list [… <chr [1… <NUL…
#> 6 /data/GHE/mpn/d… 6     feb9bf3… nonmem     Final model <list [… <chr [1… <NUL…
#> # … with 2 more variables: notes <list>, decisions <list>

Among other things, the run log contains any descriptions that have been assigned to each model. Here we use dplyr::filter() and dplyr::pull() to get the path to the final model.

final_model_path <- 
  log_df %>% 
  filter(description == "Final model") %>%

#> [1] "/data/GHE/mpn/deployment/deployments/2021-01-04/renv/library/R-3.6/x86_64-pc-linux-gnu/rbabylon/model/nonmem/basic/6"

Next we can use the get_model_ancestry() function to filter the tibble to only the models that led up to the final model.

log_df %>% 
  filter(absolute_model_path %in% get_model_ancestry(final_model_path)) %>%
  collapse_to_string(based_on) %>% # collapses list column for easier printing
  select(run, based_on)
#> # A tibble: 3 x 2
#>   run   based_on
#>   <chr> <chr>   
#> 1 1     <NA>    
#> 2 2     1       
#> 3 5     2

As you can see, models 3 and 4 are discarded because they did not lead to the final model. Review “Modeling Process” section above if you are not sure why this is the case. We will use the two techniques together in the “Final model family” section below.

Checking if models are up-to-date with config_log()

Now imagine you are coming back to this project some time later and want to make sure that all of the outputs you have are still up-to-date with the model files and data currently in the project.

When babylon runs a model, it creates a file named bbi_config.json in the output directory. This file contains a lot of information about the state and configuration at the time when the model was run. Notably, it contains an md5 digest of both the model file and the data file at execution time.

The config_log() function parses these bbi_config.json files and extracts some relevant information to a bbi_config_log_df tibble. The model_has_changed and data_has_changed columns compare the md5 digests (stored during model execution) against the model and data files as they currently exist on disk at the time config_log() is called. This serves as a check that the outputs are up-to-date with the current model and data.

You can call config_log directly, but it is often useful to join it to a run log automatically with run_log() %>% add_config().

log_df <- log_df %>% add_config()
log_df %>% select(run, model_has_changed, data_has_changed)
#> # A tibble: 6 x 3
#>   run   model_has_changed data_has_changed
#>   <chr> <lgl>             <lgl>           
#> 1 1     FALSE             FALSE           
#> 2 2     FALSE             FALSE           
#> 3 3     TRUE              FALSE           
#> 4 4     TRUE              FALSE           
#> 5 5     FALSE             FALSE           
#> 6 6     FALSE             FALSE

Final model family

From the model_has_changed column in the previous example, you can see that some of the model files have changed since they were run. However, you may only care about your final model and the models that led to it. You can use the description and based_on columns from the run_log() to filter to only those models.

final_model_family <- bind_rows(
  log_df %>% 
    filter(absolute_model_path %in% get_model_ancestry(final_model_path)), # the ancestors of the final model
  log_df %>% 
    filter(description == "Final model") # the final model itself

final_model_family %>% 
  collapse_to_string(based_on) %>%
  select(run, based_on, description, model_has_changed, data_has_changed)
#> # A tibble: 4 x 5
#>   run   based_on description model_has_changed data_has_changed
#>   <chr> <chr>    <chr>       <lgl>             <lgl>           
#> 1 1     <NA>     <NA>        FALSE             FALSE           
#> 2 2     1        <NA>        FALSE             FALSE           
#> 3 5     2        <NA>        FALSE             FALSE           
#> 4 6     5        Final model FALSE             FALSE

When we filter to only those models, you can see that they are all still up-to-date. Great news.