nonmem-parallel.RmdThis file shows some helpful patterns for using bbr with NONMEM, mostly relating to running models with parallel execution, either multiple cores on a local machine (head node, laptop, etc.) or by executing on a grid.
library(bbr)
MODEL_DIR <- system.file("model", "nonmem" , "pk-parallel", package = "bbr")The bbi.yaml file is created when bbi_init() is originally called. It usually lives in the same folder as your control streams. This file contains “global” settings for running bbi.
You can set a default number of threads (the number of cores bbi will use for each model) in this file. This will be used whenever you call submit_model(), but can be overriden by passing submit_model(..., .bbi_args = list(threads = SOME_INTEGER)).
It defaults to parallel: false so you need to change to parallel: true or you will need to pass .bbi_args = list(parallel=TRUE) to each of your submit_model() calls to make them respect the threads argument.
You can also pass through .bbi_args to bbi_init() to set the global defaults when you first create the bbi.yaml
For small runs, use a 4-core or 8-core head node and just do submit_model(..., .mode = "local").
This will execute the model on your head node, means you will not have to wait for worker nodes to spin up.
This is especially nice for new models that are fairly simple and you just want a quick result back.
mod <- read_model(file.path(MODEL_DIR, 100))
submit_model(
mod,
.mode = "local",
.bbi_args = list(parallel = TRUE, threads = 4) # not needed if set in bbi.yaml
)Running:
bbi nonmem run local /data/home/sethg/bbr/inst/model/nonmem/pk-parallel/100.ctl --overwrite --threads=4 --parallel
In /data/home/sethg/bbr/inst/model/nonmem/pk-parallel
Process finished.
You can also set options("bbr.bbi_exe_mode" = "local") to default to local execution for all models, though this will reset with each new R session.
You can pass use submit_model(..., .wait = FALSE) to get you console back immediately.
This is only relevant for .mode = "local" because grid jobs give you your console as soon as the job is submitted to the grid (which typically takes less than a second).
mod <- read_model(file.path(MODEL_DIR, 200))
proc <- submit_model(
mod,
.mode = "local",
.wait = FALSE, # this is what gets you your console back
.bbi_args = list(parallel = TRUE, threads = 8) # not needed if set in bbi.yaml
)There are a few ways to check on your model while it runs:
mod
── Status ──────────────────────────────────────────────────────────────────────
• Incomplete Run
── Absolute Model Path ─────────────────────────────────────────────────────────
• /data/home/sethg/bbr/inst/model/nonmem/pk-parallel/200
── YAML & Model Files ──────────────────────────────────────────────────────────
• /data/home/sethg/bbr/inst/model/nonmem/pk-parallel/200.yaml
• /data/home/sethg/bbr/inst/model/nonmem/pk-parallel/200.ctl
── Description ─────────────────────────────────────────────────────────────────
• original acop model
── Tags ────────────────────────────────────────────────────────────────────────
• acop tag
• other tag
── BBI Args ────────────────────────────────────────────────────────────────────
• overwrite: TRUE
• threads: 4
bbr::tail_*() helpers to check on the .lst or OUTPUT files.
tail_lst(mod)Mon Sep 27 12:37:49 EDT 2021
$PROBLEM From bbr: see 107.yaml for details
...
WARNINGS AND ERRORS (IF ANY) FOR PROBLEM 1
(WARNING 2) NM-TRAN INFERS THAT THE DATA ARE POPULATION.
tail_output(mod, .head = 0, .tail = 10)...
1.1744E-01 -2.8718E+02 -1.4260E+02 -1.2769E+02 3.1651E+02
0ITERATION NO.: 1 OBJECTIVE VALUE: 31860.4697608811 NO. OF FUNC. EVALS.: 15
CUMULATIVE NO. OF FUNC. EVALS.: 26
NPARAMETR: 4.9843E-01 3.6746E+00 1.0057E+00 4.4079E+00 1.9381E+00 9.9679E-01 9.9702E-01 4.9976E-01 2.0001E-01 9.9998E-03
1.0000E-02 2.0002E-01 1.0003E-02 2.0001E-01 4.9994E-02
PARAMETER: 9.9686E-02 1.0499E-01 1.0057E-01 1.1020E-01 9.6903E-02 9.9679E-02 9.9702E-02 9.9952E-02 1.0001E-01 9.9996E-02
1.0000E-01 1.0005E-01 1.0003E-01 1.0002E-01 9.9941E-02
GRADIENT: 2.9997E+03 -1.2157E+04 -3.9326E+03 5.4698E+04 2.2187E+04 1.5492E+03 1.5490E+03 2.5233E+02 -2.5887E+02 2.7454E+00
3.2905E+01 3.4014E+01 -9.9727E+01 -1.5446E+02 2.2149E+02
tail -f in the terminal to “follow” the files as they change.
lst_path <- build_path_from_model(mod, ".lst")
cat(paste("tail -f", lst_path))tail -f /data/home/sethg/bbr/inst/model/nonmem/pk-parallel/200/200.lst
processx object including in the bbi_process object.
submit_model() to a variable (proc, above)processx docs for details on available methods.
proc$process$is_alive()TRUE
You can submit a dummy job to the grid to get it to bring up a worker, while you finish preparing your run. For SGE, this is done with qsub.
system("echo 'sleep 10' | qsub")Unable to run job: warning: sethg's job is not allowed to run in any queue
Your job 1 ("STDIN") has been submitted
Exiting.
This message is just telling you that there are no worker nodes currently up, but one will begin to spin up now.
If you have a long-running model, it is often nice to launch it locally first, just to see if it starts.
DATASET ERROR or some silly syntax error.
mod <- read_model(file.path(MODEL_DIR, 300))
proc <- submit_model(
mod,
.mode = "local",
.wait = FALSE # this is what gets you your console back
)
tail_output(mod, .tail = 15)License Registered to: Metrum Research Group
Expiration Date: 14 JUL 2022
Current Date: 27 SEP 2021
...
IWRS=IWRESI
IPRD=IPREDI
IRS=IRESI
MONITORING OF SEARCH:
0ITERATION NO.: 0 OBJECTIVE VALUE: 32098.8565339901 NO. OF FUNC. EVALS.: 11
CUMULATIVE NO. OF FUNC. EVALS.: 11
NPARAMETR: 5.0000E-01 3.5000E+00 1.0000E+00 4.0000E+00 2.0000E+00 1.0000E+00 1.0000E+00 5.0000E-01 2.0000E-01 1.0000E-02
1.0000E-02 2.0000E-01 1.0000E-02 2.0000E-01 5.0000E-02
PARAMETER: 1.0000E-01 1.0000E-01 1.0000E-01 1.0000E-01 1.0000E-01 1.0000E-01 1.0000E-01 1.0000E-01 1.0000E-01 1.0000E-01
1.0000E-01 1.0000E-01 1.0000E-01 1.0000E-01 1.0000E-01
GRADIENT: 1.6936E+03 -2.6923E+04 -3.0654E+03 -5.5039E+04 1.6715E+04 1.7314E+03 1.6062E+03 2.6154E+02 -7.7264E+01 2.0742E+01
1.1744E-01 -2.8718E+02 -1.4260E+02 -1.2769E+02 3.1651E+02
There are several ways to kill the process:
killall nonmem in the terminal (or with system) to kill all NONMEM jobs running locally.
system("killall nonmem")kill() method of the processx object (see above about how to get to this object).
proc$process$kill()Then relaunch the model on the grid, using .bbi_args = list(overwrite = TRUE) to overwrite the incomplete results.
submit_model(
mod,
.bbi_args = list(
overwrite = TRUE,
parallel = TRUE,
threads = 8
)
)Running:
bbi nonmem run sge /data/home/sethg/bbr/inst/model/nonmem/pk-parallel/300.ctl --overwrite --threads=8 --parallel
In /data/home/sethg/bbr/inst/model/nonmem/pk-parallel
Process finished.
Note: .mode = "sge" (submit to the grid) is the default for submit_model() so there is no need to pass it as an argument.
You can monitor your grid jobs with tail_lst() or tail_output() as above, or use qstat -f in the terminal, or with the Grafana dashboard (if you are on Metworx).
system("qstat -f")############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
7 0.55500 Run_300 sethg qw 09/27/2021 12:53:10 8
You can use submit_models() to submit a batch of models to the grid.
Each model will be a separate grid job
The grid will “auto-scale”; bringing up new workers to handle all of the jobs
This is also a very easy way to use nested parallelism:
.bbi_args = list(parallel = TRUE, threads = SOME_INTEGER) then each model will use SOME_INTEGER cores on its worker node.
mods <- map(c(200, 300, 400), ~ read_model(file.path(MODEL_DIR, .x)))
submit_models(
mods,
.bbi_args = list(
overwrite = TRUE,
parallel = TRUE,
threads = 8
)
)
system("qstat -f")############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
8 0.55500 Run_200 sethg qw 09/27/2021 12:56:18 8
9 0.55500 Run_300 sethg qw 09/27/2021 12:56:18 8
10 0.55500 Run_400 sethg qw 09/27/2021 12:56:18 8
qdel in the terminal (or with system() from R).
qdel SOME_NUMBER kills the job specified by it’s “job id” (the first column in the qstat -f output)qdel -u YOUR_USERNAME kills all jobs launched by you
this_user <- Sys.getenv("USER")
system(paste("qdel -u", this_user))For a big model, try running a few iterations with different numbers of threads. This way you can see the gains from adding more threads and decide how many to use as you continue to work on this model.
MAX_EVAL or NITER in your control stream, so that it finishes fairly quicklytest_threads() function below to:
copy_model_from()
threads
#' Takes a model object and runs it with various threads values
#'
#' @param .mod bbi_model object to copy/test
#' @param thread_opts Integer vector of threads values to test
#' @param .mode Passed through to bbr::submit_models(.mode)
#'
#' @return A list of the model objects for the submitted models.
test_threads <- function(.mod, threads_opts, .mode = "sge") {
mods <- map(threads_opts, ~ {
.m <- copy_model_from(
.mod,
paste0(get_model_id(.mod), "_", .x, "threads")
) %>%
add_bbi_args(list(parallel = TRUE, threads = .x))
})
submit_models(mods, .mode = .mode, .wait = FALSE)
mods
}
# running it
mod <- read_model(file.path(MODEL_DIR, 200)) # this should be the copy with capped iterations
mods <- test_threads(mod, c(8, 16, 24))Submitting 3 models with 3 unique configurations.
After the models all finish, use this check_threads() function to see how long each took to run.
#' Check estimation time for models run with various threads values
#'
#' @param mods list of bbi model objects created by `test_threads()`
#'
#' @return A tibble with columns `threads` (number of threads) and `time`
#' (elapsed estimation time in seconds for test models).
check_threads <- function(mods) {
purrr::map_dfr(mods, ~ {
s <- model_summary(.x)
threads <- as.numeric(stringr::str_extract(s$absolute_model_path, "\\d+(?=threads$)"))
tibble::tibble(threads = threads, time = s$run_details$estimation_time)
})
}
check_threads(mods)# A tibble: 3 x 2
threads time
<int> <dbl>
1 8 149.5
2 16 88.2
3 24 65.9