Introduction

This vignette demonstrates how to use summary_log() to extract model diagnostics like the objective function value, condition number, and parameter counts.

If you are new to rbabylon, the “Getting Started with rbabylon” vignette will take you through some basic scenarios for modeling with NONMEM using rbabylon, introducing you to its standard workflow and functionality.

There is a lot of information in the bbi_summary_log_df tibble that is output from summary_log(). However, it is important to note that all of this, and quite a bit more, is contained in the bbi_nonmem_summary object output from model_summary(). If you are trying to dig deep into the outputs of a small number of models, see the Summarize section of the “Getting Started” vignette for an introduction to that functionality.

summary_log() is more useful for getting a slightly higher-level view of a larger batch of models; potentially all the models in a given project, or something like a large group of bootstrapped runs.

Setup

There is some initial set up necessary for using rbabylon. Please refer to the “Getting Started” vignette, mentioned above, if you have not done this yet. Once this is done, load the library.

What’s in a Model Summary?

As mentioned above, the bbi_summary_log_df tibble contains a lot of information, which is a subset of what is contained in the bbi_nonmem_summary object returned from model_summary().

MODEL_DIR <- system.file("model", "nonmem", "complex", package = "rbabylon")

sum1 <- 
  read_model(file.path(MODEL_DIR, "iovmm")) %>%
  model_summary()

names(sum1)
#> [1] "absolute_model_path" "run_details"         "run_heuristics"     
#> [4] "parameters_data"     "parameter_names"     "ofv"                
#> [7] "condition_number"    "shrinkage_details"

For example, the run_details section alone contains wealth of information about this model run:

str(sum1$run_details)
#> List of 16
#>  $ version               : chr "7.4.4"
#>  $ run_start             : chr "-999999999"
#>  $ run_end               : chr "Tue Aug 25 16:50:27 EDT 2020"
#>  $ estimation_time       : num 132
#>  $ covariance_time       : num 1.04
#>  $ cpu_time              : num 134
#>  $ function_evaluations  : int 380
#>  $ significant_digits    : int 3
#>  $ problem_text          : chr "LEM 10 mixture model and IOV on CL"
#>  $ mod_file              : chr "-999999999"
#>  $ estimation_method     : chr "First Order Conditional Estimation"
#>  $ data_set              : chr "../MixSim.csv"
#>  $ number_of_patients    : int 300
#>  $ number_of_obs         : int 12600
#>  $ number_of_data_records: int 13500
#>  $ output_files_used     : chr [1:5] "iovmm.lst" "iovmm.cpu" "iovmm.ext" "iovmm.grd" ...

Much of this is very useful, but it’s also a bit intimidating, and it can take some work to unpack it all and find the bits and pieces you’re looking for.

What’s in the Summary Log?

The summary_log() function is designed to extract some of the most relevant diagnostics and model outputs from a batch of model summaries and organize them into a more easily digestible tibble. Like run_log() and config_log(), it takes two arguments:

  • .base_dir – Directory to look for models in.
  • .recurse – Logical indicating whether to search recursively in subdirectories. This is TRUE by default.
sum_df <- summary_log(MODEL_DIR)
names(sum_df)
#>  [1] "absolute_model_path"     "run"                    
#>  [3] "bbi_summary"             "error_msg"              
#>  [5] "needed_fail_flags"       "estimation_method"      
#>  [7] "problem_text"            "number_of_patients"     
#>  [9] "number_of_obs"           "ofv"                    
#> [11] "param_count"             "condition_number"       
#> [13] "any_heuristics"          "covariance_step_aborted"
#> [15] "large_condition_number"  "correlations_not_ok"    
#> [17] "parameter_near_boundary" "hessian_reset"          
#> [19] "has_final_zero_gradient" "minimization_terminated"
#> [21] "eta_pval_significant"    "prderr"

The specific columns returned are described below, though there is also a list of them, with brief definitions, in the summary_log() docs that can be accessed any time with ?summary_log() in the console.

Housekeeping Columns

The first column is absolute_model_path which contains an absolute path that unambiguously identifies each model. This serves as the primary key for the tibble. The second column is simply the basename of this path, which is just a convenience for printing and viewing.

The third column contains the bbi_nonmem_summary object, discussed above, for each model. This can be extracted and manipulated if you would like more detailed data from it.

The error_msg and needed_fail_flags columns describe whether bbi had any trouble parsing the model outputs. These won’t be discussed in detail here. Refer to the summary_log() docs for more information.

Run Details Columns

The next batch of columns contain the core diagnostics and model outputs. As mentioned above, descriptions of what each column contains can be found in the summary_log() docs.

sum_df %>% 
  collapse_to_string(estimation_method) %>%
  select(
    run,
    ofv, 
    param_count, 
    estimation_method, 
    problem_text, 
    number_of_patients, 
    number_of_obs, 
    condition_number
  )
#> # A tibble: 4 x 8
#>   run       ofv param_count estimation_meth… problem_text number_of_patie…
#>   <chr>   <dbl>       <int> <chr>            <chr>                   <int>
#> 1 1001    3843.          15 MCMC Bayesian A… Run# 1001.1               240
#> 2 acop…  44159.           9 First Order Con… LEM PK mode…               39
#> 3 exam… -10839.          21 Stochastic Appr… RUN# exampl…              400
#> 4 iovmm  14722.          11 First Order Con… LEM 10 mixt…              300
#> # … with 2 more variables: number_of_obs <int>, condition_number <dbl>

Run Heuristics Columns

The run_heuristics element of the bbi_nonmem_summary object contains a number of logical values indicating whether particular heuristic issues were found in the model. Note that these are not necessarily errors with the model run, but are closer to warning flags that should possibly be investigated. Each heuristic is described in more detail in the summary_log() docs.

Note that all heuristics will be FALSE by default (and never NA) and will only be TRUE if they are explicitly triggered. For example, large_condition_number will be FALSE even in the case when a condition number was not calculated at all.

All of the heuristic flags are pivoted out to their own columns in the bbi_summary_log_df tibble. It’s useful to note that, except for needed_fail_flags (discussed above), these are the only logical columns in the tibble and can therefore be easily selected with tidyselect::where(is.logical). (Note: where() only became available in tidyselect (>= 1.1.0), released May 2020.)

sum_df %>% select(run, where(is.logical), -needed_fail_flags)
#> # A tibble: 4 x 12
#>   run   needed_fail_fla… any_heuristics covariance_step… large_condition…
#>   <chr> <lgl>            <lgl>          <lgl>            <lgl>           
#> 1 1001  TRUE             TRUE           FALSE            TRUE            
#> 2 acop… FALSE            TRUE           FALSE            FALSE           
#> 3 exam… FALSE            FALSE          FALSE            FALSE           
#> 4 iovmm FALSE            TRUE           FALSE            FALSE           
#> # … with 7 more variables: correlations_not_ok <lgl>,
#> #   parameter_near_boundary <lgl>, hessian_reset <lgl>,
#> #   has_final_zero_gradient <lgl>, minimization_terminated <lgl>,
#> #   eta_pval_significant <lgl>, prderr <lgl>

Notice that there is also an any_heuristics column, which can easily be used to filter to only runs that had at least one heuristic flag triggered.

Add Summary

Just like config_log() has add_config(), you can also use add_summary() to join all of these columns onto an existing bbi_run_log_df (the tibble output from run_log()). This can be useful if you have a run log that you have previously filtered on something like the tags or based_on columns, and you would like to append some simple diagnostics.

# contrived example: in real life this would be filtering on tags, based_on, etc.
log_df <- run_log(MODEL_DIR)
log_df <- log_df[2:3, ]

# add summary columns
log_df <- log_df %>% add_summary()

log_df %>% select(run, tags, ofv, param_count, any_heuristics)
#> # A tibble: 2 x 5
#>   run              tags       ofv param_count any_heuristics
#>   <chr>            <list>   <dbl>       <int> <lgl>         
#> 1 acop-iov         <NULL>  44159.           9 TRUE          
#> 2 example2_saemimp <NULL> -10839.          21 FALSE