mf <- memoise(f)
creates mf
, a memoised copy of
f
. A memoised copy is basically a
lazier version of the same function: it saves the answers of
new invocations, and re-uses the answers of old ones. Under the right
circumstances, this can provide a very nice speedup indeed.
memoise(
f,
...,
envir = environment(f),
cache = cachem::cache_mem(max_size = 1024 * 1024^2),
omit_args = c(),
hash = function(x) rlang::hash(x)
)
Function of which to create a memoised copy.
optional variables to use as additional restrictions on caching, specified as one-sided formulas (no LHS). See Examples for usage.
Environment of the returned function.
Cache object. The default is a [cachem::cache_mem()] with a max size of 1024 MB.
Names of arguments to ignore when calculating hash.
A function which takes an R object as input and returns a string which is used as a cache key.
There are two main ways to use the memoise
function. Say that
you wish to memoise glm
, which is in the stats
package; then you could use mem_glm <- memoise(glm)
, or you could use glm <- memoise(stats::glm)
.
The first form has the advantage that you still have easy access to
both the memoised and the original function. The latter is especially
useful to bring the benefits of memoisation to an existing block
of R code.
Two example situations where memoise
could be of use:
You're evaluating a function repeatedly over the rows (or larger chunks) of a dataset, and expect to regularly get the same input.
You're debugging or developing something, which involves
a lot of re-running the code. If there are a few expensive calls
in there, memoising them can make life a lot more pleasant.
If the code is in a script file that you're source()
ing,
take care that you don't just put glm <- memoise(stats::glm)
at the top of your file: that would reinitialise the memoised
function every time the file was sourced. Wrap it in if (!is.memoised(glm))
, or do the memoisation call
once at the R prompt, or put it somewhere else where it won't get
repeated.
It is recommended that functions in a package are not memoised at build-time,
but when the package is loaded. The simplest way to do this is within
.onLoad()
with, for example
# file.R
fun <- function() {
some_expensive_process()
}
# zzz.R
.onLoad <- function(libname, pkgname) {
fun <<- memoise::memoise(fun)
}
# a() is evaluated anew each time. memA() is only re-evaluated
# when you call it with a new set of parameters.
a <- function(n) { runif(n) }
memA <- memoise(a)
replicate(5, a(2))
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.415038 0.4219275 0.9413055 0.1898295 0.07739351
#> [2,] 0.284280 0.6813861 0.4445195 0.9525727 0.19533414
replicate(5, memA(2))
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.14631742 0.14631742 0.14631742 0.14631742 0.14631742
#> [2,] 0.05877252 0.05877252 0.05877252 0.05877252 0.05877252
# Caching is done based on parameters' value, so same-name-but-
# changed-value correctly produces two different outcomes...
N <- 4; memA(N)
#> [1] 0.1943754 0.7390476 0.2050565 0.5618004
N <- 5; memA(N)
#> [1] 0.8893391 0.7287381 0.6007475 0.5075282 0.2752053
# ... and same-value-but-different-name correctly produces
# the same cached outcome.
N <- 4; memA(N)
#> [1] 0.1943754 0.7390476 0.2050565 0.5618004
N2 <- 4; memA(N2)
#> [1] 0.1943754 0.7390476 0.2050565 0.5618004
# memoise() knows about default parameters.
b <- function(n, dummy="a") { runif(n) }
memB <- memoise(b)
memB(2)
#> [1] 0.1122580 0.7915759
memB(2, dummy="a")
#> [1] 0.1122580 0.7915759
# This works, because the interface of the memoised function is the same as
# that of the original function.
formals(b)
#> $n
#>
#>
#> $dummy
#> [1] "a"
#>
formals(memB)
#> $n
#>
#>
#> $dummy
#> [1] "a"
#>
# However, it doesn't know about parameter relevance.
# Different call means different caching, no matter
# that the outcome is the same.
memB(2, dummy="b")
#> [1] 0.6397423 0.2591034
# You can create multiple memoisations of the same function,
# and they'll be independent.
memA(2)
#> [1] 0.14631742 0.05877252
memA2 <- memoise(a)
memA(2) # Still the same outcome
#> [1] 0.14631742 0.05877252
memA2(2) # Different cache, different outcome
#> [1] 0.3469360 0.5518991
# Multiple memoized functions can share a cache.
cm <- cachem::cache_mem(max_size = 50 * 1024^2)
memA <- memoise(a, cache = cm)
memB <- memoise(b, cache = cm)
# Don't do the same memoisation assignment twice: a brand-new
# memoised function also means a brand-new cache, and *that*
# you could as easily and more legibly achieve using forget().
# (If you're not sure whether you already memoised something,
# use is.memoised() to check.)
memA(2)
#> [1] 0.9815699 0.1360091
memA <- memoise(a)
memA(2)
#> [1] 0.06457701 0.28272079
# Make a memoized result automatically time out after 10 seconds.
memA3 <- memoise(a, cache = cachem::cache_mem(max_age = 10))
memA3(2)
#> [1] 0.1915341 0.8167848