Apply a function (or functions) across multiple columns

across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate(). See vignette("colwise") for more details.

if_any() and if_all() apply the same predicate function to a selection of columns and combine the results into a single logical vector: if_any() is TRUE when the predicate is TRUE for any of the selected columns, if_all() is TRUE when the predicate is TRUE for all selected columns.

across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all().

across(.cols = everything(), .fns = NULL, ..., .names = NULL)

if_any(.cols = everything(), .fns = NULL, ..., .names = NULL)

if_all(.cols = everything(), .fns = NULL, ..., .names = NULL)

Arguments

.cols, cols

<tidy-select> Columns to transform. Because across() is used within functions like summarise() and mutate(), you can't select or compute upon grouping variables.

.fns

Functions to apply to each of the selected columns. Possible values are:

A function, e.g. mean.
A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE)
A list of functions/lambdas, e.g. list(mean = mean, n_miss = ~ sum(is.na(.x))
NULL: the default value, returns the selected columns in a data frame without applying a transformation. This is useful for when you want to use a function that takes a data frame.

Within these functions you can use cur_column() and cur_group() to access the current column and grouping keys respectively.

...

Additional arguments for the function calls in .fns. Using these ... is strongly discouraged because of issues of timing of evaluation.

.names

A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default (NULL) is equivalent to "{.col}" for the single function case and "{.col}_{.fn}" for the case where a list is used for .fns.

Value

across() returns a tibble with one column for each column in .cols and each function in .fns. if_any() and if_all() return a logical vector.

Timing of evaluation

R code in dplyr verbs is generally evaluated once per group. Inside across() however, code is evaluated once for each combination of columns and groups. If the evaluation timing is important, for example if you're generating random variables, think about when it should happen and place your code in consequence.

gdf <-
  tibble(g = c(1, 1, 2, 3), v1 = 10:13, v2 = 20:23) %>%
  group_by(g)

set.seed(1)

# Outside: 1 normal variate
n <- rnorm(1)
gdf %>% mutate(across(v1:v2, ~ .x + n))

## # A tibble: 4 × 3
## # Groups:   g [3]
##       g    v1    v2
##   <dbl> <dbl> <dbl>
## 1     1  9.37  19.4
## 2     1 10.4   20.4
## 3     2 11.4   21.4
## 4     3 12.4   22.4

# Inside a verb: 3 normal variates (ngroup)
gdf %>% mutate(n = rnorm(1), across(v1:v2, ~ .x + n))

## # A tibble: 4 × 4
## # Groups:   g [3]
##       g    v1    v2      n
##   <dbl> <dbl> <dbl>  <dbl>
## 1     1  10.2  20.2  0.184
## 2     1  11.2  21.2  0.184
## 3     2  11.2  21.2 -0.836
## 4     3  14.6  24.6  1.60

# Inside `across()`: 6 normal variates (ncol * ngroup)
gdf %>% mutate(across(v1:v2, ~ .x + rnorm(1)))

## # A tibble: 4 × 3
## # Groups:   g [3]
##       g    v1    v2
##   <dbl> <dbl> <dbl>
## 1     1  10.3  20.7
## 2     1  11.3  21.7
## 3     2  11.2  22.6
## 4     3  13.5  22.7

Examples

# across() -----------------------------------------------------------------
# Different ways to select the same set of columns
# See <https://tidyselect.r-lib.org/articles/syntax.html> for details
iris %>%
  as_tibble() %>%
  mutate(across(c(Sepal.Length, Sepal.Width), round))
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1            5           4          1.4         0.2 setosa 
#>  2            5           3          1.4         0.2 setosa 
#>  3            5           3          1.3         0.2 setosa 
#>  4            5           3          1.5         0.2 setosa 
#>  5            5           4          1.4         0.2 setosa 
#>  6            5           4          1.7         0.4 setosa 
#>  7            5           3          1.4         0.3 setosa 
#>  8            5           3          1.5         0.2 setosa 
#>  9            4           3          1.4         0.2 setosa 
#> 10            5           3          1.5         0.1 setosa 
#> # … with 140 more rows
iris %>%
  as_tibble() %>%
  mutate(across(c(1, 2), round))
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1            5           4          1.4         0.2 setosa 
#>  2            5           3          1.4         0.2 setosa 
#>  3            5           3          1.3         0.2 setosa 
#>  4            5           3          1.5         0.2 setosa 
#>  5            5           4          1.4         0.2 setosa 
#>  6            5           4          1.7         0.4 setosa 
#>  7            5           3          1.4         0.3 setosa 
#>  8            5           3          1.5         0.2 setosa 
#>  9            4           3          1.4         0.2 setosa 
#> 10            5           3          1.5         0.1 setosa 
#> # … with 140 more rows
iris %>%
  as_tibble() %>%
  mutate(across(1:Sepal.Width, round))
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1            5           4          1.4         0.2 setosa 
#>  2            5           3          1.4         0.2 setosa 
#>  3            5           3          1.3         0.2 setosa 
#>  4            5           3          1.5         0.2 setosa 
#>  5            5           4          1.4         0.2 setosa 
#>  6            5           4          1.7         0.4 setosa 
#>  7            5           3          1.4         0.3 setosa 
#>  8            5           3          1.5         0.2 setosa 
#>  9            4           3          1.4         0.2 setosa 
#> 10            5           3          1.5         0.1 setosa 
#> # … with 140 more rows
iris %>%
  as_tibble() %>%
  mutate(across(where(is.double) & !c(Petal.Length, Petal.Width), round))
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1            5           4          1.4         0.2 setosa 
#>  2            5           3          1.4         0.2 setosa 
#>  3            5           3          1.3         0.2 setosa 
#>  4            5           3          1.5         0.2 setosa 
#>  5            5           4          1.4         0.2 setosa 
#>  6            5           4          1.7         0.4 setosa 
#>  7            5           3          1.4         0.3 setosa 
#>  8            5           3          1.5         0.2 setosa 
#>  9            4           3          1.4         0.2 setosa 
#> 10            5           3          1.5         0.1 setosa 
#> # … with 140 more rows

# A purrr-style formula
iris %>%
  group_by(Species) %>%
  summarise(across(starts_with("Sepal"), ~ mean(.x, na.rm = TRUE)))
#> # A tibble: 3 × 3
#>   Species    Sepal.Length Sepal.Width
#>   <fct>             <dbl>       <dbl>
#> 1 setosa             5.01        3.43
#> 2 versicolor         5.94        2.77
#> 3 virginica          6.59        2.97

# A named list of functions
iris %>%
  group_by(Species) %>%
  summarise(across(starts_with("Sepal"), list(mean = mean, sd = sd)))
#> # A tibble: 3 × 5
#>   Species    Sepal.Length_mean Sepal.Length_sd Sepal.Width_mean Sepal.Width_sd
#>   <fct>                  <dbl>           <dbl>            <dbl>          <dbl>
#> 1 setosa                  5.01           0.352             3.43          0.379
#> 2 versicolor              5.94           0.516             2.77          0.314
#> 3 virginica               6.59           0.636             2.97          0.322

# Use the .names argument to control the output names
iris %>%
  group_by(Species) %>%
  summarise(across(starts_with("Sepal"), mean, .names = "mean_{.col}"))
#> # A tibble: 3 × 3
#>   Species    mean_Sepal.Length mean_Sepal.Width
#>   <fct>                  <dbl>            <dbl>
#> 1 setosa                  5.01             3.43
#> 2 versicolor              5.94             2.77
#> 3 virginica               6.59             2.97
iris %>%
  group_by(Species) %>%
  summarise(across(starts_with("Sepal"), list(mean = mean, sd = sd), .names = "{.col}.{.fn}"))
#> # A tibble: 3 × 5
#>   Species    Sepal.Length.mean Sepal.Length.sd Sepal.Width.mean Sepal.Width.sd
#>   <fct>                  <dbl>           <dbl>            <dbl>          <dbl>
#> 1 setosa                  5.01           0.352             3.43          0.379
#> 2 versicolor              5.94           0.516             2.77          0.314
#> 3 virginica               6.59           0.636             2.97          0.322

# When the list is not named, .fn is replaced by the function's position
iris %>%
  group_by(Species) %>%
  summarise(across(starts_with("Sepal"), list(mean, sd), .names = "{.col}.fn{.fn}"))
#> # A tibble: 3 × 5
#>   Species    Sepal.Length.fn1 Sepal.Length.fn2 Sepal.Width.fn1 Sepal.Width.fn2
#>   <fct>                 <dbl>            <dbl>           <dbl>           <dbl>
#> 1 setosa                 5.01            0.352            3.43           0.379
#> 2 versicolor             5.94            0.516            2.77           0.314
#> 3 virginica              6.59            0.636            2.97           0.322

# across() returns a data frame, which can be used as input of another function
df <- data.frame(
  x1  = c(1, 2, NA),
  x2  = c(4, NA, 6),
  y   = c("a", "b", "c")
)
df %>%
  mutate(x_complete = complete.cases(across(starts_with("x"))))
#>   x1 x2 y x_complete
#> 1  1  4 a       TRUE
#> 2  2 NA b      FALSE
#> 3 NA  6 c      FALSE
df %>%
  filter(complete.cases(across(starts_with("x"))))
#>   x1 x2 y
#> 1  1  4 a

# if_any() and if_all() ----------------------------------------------------
iris %>%
  filter(if_any(ends_with("Width"), ~ . > 4))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.7         4.4          1.5         0.4  setosa
#> 2          5.2         4.1          1.5         0.1  setosa
#> 3          5.5         4.2          1.4         0.2  setosa
iris %>%
  filter(if_all(ends_with("Width"), ~ . > 2))
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 1           6.3         3.3          6.0         2.5 virginica
#> 2           7.1         3.0          5.9         2.1 virginica
#> 3           6.5         3.0          5.8         2.2 virginica
#> 4           7.6         3.0          6.6         2.1 virginica
#> 5           7.2         3.6          6.1         2.5 virginica
#> 6           6.8         3.0          5.5         2.1 virginica
#> 7           5.8         2.8          5.1         2.4 virginica
#> 8           6.4         3.2          5.3         2.3 virginica
#> 9           7.7         3.8          6.7         2.2 virginica
#> 10          7.7         2.6          6.9         2.3 virginica
#> 11          6.9         3.2          5.7         2.3 virginica
#> 12          6.7         3.3          5.7         2.1 virginica
#> 13          6.4         2.8          5.6         2.1 virginica
#> 14          6.4         2.8          5.6         2.2 virginica
#> 15          7.7         3.0          6.1         2.3 virginica
#> 16          6.3         3.4          5.6         2.4 virginica
#> 17          6.9         3.1          5.4         2.1 virginica
#> 18          6.7         3.1          5.6         2.4 virginica
#> 19          6.9         3.1          5.1         2.3 virginica
#> 20          6.8         3.2          5.9         2.3 virginica
#> 21          6.7         3.3          5.7         2.5 virginica
#> 22          6.7         3.0          5.2         2.3 virginica
#> 23          6.2         3.4          5.4         2.3 virginica

Apply a function (or functions) across multiple columns

Arguments

Value

Timing of evaluation

See also

Examples