Memoise a function

mf <- memoise(f) creates mf, a memoised copy of f. A memoised copy is basically a lazier version of the same function: it saves the answers of new invocations, and re-uses the answers of old ones. Under the right circumstances, this can provide a very nice speedup indeed.

memoise(
  f,
  ...,
  envir = environment(f),
  cache = cachem::cache_mem(max_size = 1024 * 1024^2),
  omit_args = c(),
  hash = function(x) rlang::hash(x)
)

Arguments

f: Function of which to create a memoised copy.
...: optional variables to use as additional restrictions on caching, specified as one-sided formulas (no LHS). See Examples for usage.
envir: Environment of the returned function.
cache: Cache object. The default is a [cachem::cache_mem()] with a max size of 1024 MB.
omit_args: Names of arguments to ignore when calculating hash.
hash: A function which takes an R object as input and returns a string which is used as a cache key.

Details

There are two main ways to use the memoise function. Say that you wish to memoise glm, which is in the stats package; then you could use
mem_glm <- memoise(glm), or you could use
glm <- memoise(stats::glm).
The first form has the advantage that you still have easy access to both the memoised and the original function. The latter is especially useful to bring the benefits of memoisation to an existing block of R code.

Two example situations where memoise could be of use:

You're evaluating a function repeatedly over the rows (or larger chunks) of a dataset, and expect to regularly get the same input.
You're debugging or developing something, which involves a lot of re-running the code. If there are a few expensive calls in there, memoising them can make life a lot more pleasant. If the code is in a script file that you're source()ing, take care that you don't just put
glm <- memoise(stats::glm)
at the top of your file: that would reinitialise the memoised function every time the file was sourced. Wrap it in
if (!is.memoised(glm)) , or do the memoisation call once at the R prompt, or put it somewhere else where it won't get repeated.

It is recommended that functions in a package are not memoised at build-time, but when the package is loaded. The simplest way to do this is within .onLoad() with, for example

# file.R
fun <- function() {
 some_expensive_process()
}

# zzz.R
.onLoad <- function(libname, pkgname) {
 fun <<- memoise::memoise(fun)
}

Examples

# a() is evaluated anew each time. memA() is only re-evaluated
# when you call it with a new set of parameters.
a <- function(n) { runif(n) }
memA <- memoise(a)
replicate(5, a(2))
#>          [,1]      [,2]      [,3]      [,4]       [,5]
#> [1,] 0.415038 0.4219275 0.9413055 0.1898295 0.07739351
#> [2,] 0.284280 0.6813861 0.4445195 0.9525727 0.19533414
replicate(5, memA(2))
#>            [,1]       [,2]       [,3]       [,4]       [,5]
#> [1,] 0.14631742 0.14631742 0.14631742 0.14631742 0.14631742
#> [2,] 0.05877252 0.05877252 0.05877252 0.05877252 0.05877252

# Caching is done based on parameters' value, so same-name-but-
# changed-value correctly produces two different outcomes...
N <- 4; memA(N)
#> [1] 0.1943754 0.7390476 0.2050565 0.5618004
N <- 5; memA(N)
#> [1] 0.8893391 0.7287381 0.6007475 0.5075282 0.2752053
# ... and same-value-but-different-name correctly produces
#     the same cached outcome.
N <- 4; memA(N)
#> [1] 0.1943754 0.7390476 0.2050565 0.5618004
N2 <- 4; memA(N2)
#> [1] 0.1943754 0.7390476 0.2050565 0.5618004

# memoise() knows about default parameters.
b <- function(n, dummy="a") { runif(n) }
memB <- memoise(b)
memB(2)
#> [1] 0.1122580 0.7915759
memB(2, dummy="a")
#> [1] 0.1122580 0.7915759
# This works, because the interface of the memoised function is the same as
# that of the original function.
formals(b)
#> $n
#> 
#> 
#> $dummy
#> [1] "a"
#> 
formals(memB)
#> $n
#> 
#> 
#> $dummy
#> [1] "a"
#> 
# However, it doesn't know about parameter relevance.
# Different call means different caching, no matter
# that the outcome is the same.
memB(2, dummy="b")
#> [1] 0.6397423 0.2591034

# You can create multiple memoisations of the same function,
# and they'll be independent.
memA(2)
#> [1] 0.14631742 0.05877252
memA2 <- memoise(a)
memA(2)  # Still the same outcome
#> [1] 0.14631742 0.05877252
memA2(2) # Different cache, different outcome
#> [1] 0.3469360 0.5518991

# Multiple memoized functions can share a cache.
cm <- cachem::cache_mem(max_size = 50 * 1024^2)
memA <- memoise(a, cache = cm)
memB <- memoise(b, cache = cm)

# Don't do the same memoisation assignment twice: a brand-new
# memoised function also means a brand-new cache, and *that*
# you could as easily and more legibly achieve using forget().
# (If you're not sure whether you already memoised something,
#  use is.memoised() to check.)
memA(2)
#> [1] 0.9815699 0.1360091
memA <- memoise(a)
memA(2)
#> [1] 0.06457701 0.28272079

# Make a memoized result automatically time out after 10 seconds.
memA3 <- memoise(a, cache = cachem::cache_mem(max_age = 10))
memA3(2)
#> [1] 0.1915341 0.8167848

Arguments

Details

See also

Examples