Nesting creates a list-column of data frames; unnesting flattens it back out into regular columns. Nesting is implicitly a summarising operation: you get one row for each group defined by the non-nested columns. This is useful in conjunction with other summaries that work with whole datasets, most notably models.
Learn more in vignette("nest")
.
nest(.data, ..., .names_sep = NULL, .key = deprecated())
unnest(
data,
cols,
...,
keep_empty = FALSE,
ptype = NULL,
names_sep = NULL,
names_repair = "check_unique",
.drop = deprecated(),
.id = deprecated(),
.sep = deprecated(),
.preserve = deprecated()
)
A data frame.
<tidy-select
> Columns to nest, specified
using name-variable pairs of the form new_col = c(col1, col2, col3)
.
The right hand side can be any valid tidy select expression.
:
previously you could write df %>% nest(x, y, z)
and df %>% unnest(x, y, z)
. Convert to df %>% nest(data = c(x, y, z))
.
and df %>% unnest(c(x, y, z))
.
If you previously created new variable in unnest()
you'll now need to
do it explicitly with mutate()
. Convert df %>% unnest(y = fun(x, y, z))
to df %>% mutate(y = fun(x, y, z)) %>% unnest(y)
.
:
No longer needed because of the new new_col = c(col1, col2, col3)
syntax.
A data frame.
<tidy-select
> Columns to unnest.
If you unnest()
multiple columns, parallel entries must be of
compatible sizes, i.e. they're either equal or length 1 (following the
standard tidyverse recycling rules).
By default, you get one row of output for each element
of the list your unchopping/unnesting. This means that if there's a
size-0 element (like NULL
or an empty data frame), that entire row
will be dropped from the output. If you want to preserve all rows,
use keep_empty = TRUE
to replace size-0 elements with a single row
of missing values.
Optionally, a named list of column name-prototype pairs to
coerce cols
to, overriding the default that will be guessed from
combining the individual values. Alternatively, a single empty ptype
can be supplied, which will be applied to all cols
.
If NULL
, the default, the names will be left
as is. In nest()
, inner names will come from the former outer names;
in unnest()
, the new outer names will come from the inner names.
If a string, the inner and outer names will be used together. In
unnest()
, the names of the new outer columns will be formed by pasting
together the outer and the inner column names, separated by names_sep
. In
nest()
, the new inner names will have the outer names + names_sep
automatically stripped. This makes names_sep
roughly symmetric between
nesting and unnesting.
Used to check that output data frame has valid names. Must be one of the following options:
"minimal": no name repair or checks, beyond basic existence,
"unique": make sure names are unique and not empty,
"check_unique": (the default), no name repair, but check they are unique,
"universal": make the names unique and syntactic
a function: apply custom name repair.
tidyr_legacy: use the name repair from tidyr 0.8.
a formula: a purrr-style anonymous function (see rlang::as_function()
)
See vctrs::vec_as_names()
for more details on these terms and the
strategies used to enforce them.
:
all list-columns are now preserved; If there are any that you
don't want in the output use select()
to remove them prior to
unnesting.
:
convert df %>% unnest(x, .id = "id")
to df %>% mutate(id = names(x)) %>% unnest(x))
.
tidyr 1.0.0 introduced a new syntax for nest()
and unnest()
that's
designed to be more similar to other functions. Converting to the new syntax
should be straightforward (guided by the message you'll recieve) but if
you just need to run an old analysis, you can easily revert to the previous
behaviour using nest_legacy()
and unnest_legacy()
as follows:
df %>% nest(data = c(x, y))
specifies the columns to be nested; i.e. the
columns that will appear in the inner data frame. Alternatively, you can
nest()
a grouped data frame created by dplyr::group_by()
. The grouping
variables remain in the outer data frame and the others are nested. The
result preserves the grouping of the input.
Variables supplied to nest()
will override grouping variables so that
df %>% group_by(x, y) %>% nest(data = !z)
will be equivalent to
df %>% nest(data = !z)
.
df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1)
# Note that we get one row of output for each unique combination of
# non-nested variables
df %>% nest(data = c(y, z))
#> # A tibble: 3 × 2
#> x data
#> <dbl> <list>
#> 1 1 <tibble [3 × 2]>
#> 2 2 <tibble [2 × 2]>
#> 3 3 <tibble [1 × 2]>
# chop does something similar, but retains individual columns
df %>% chop(c(y, z))
#> # A tibble: 3 × 3
#> x y z
#> <dbl> <list<int>> <list<int>>
#> 1 1 [3] [3]
#> 2 2 [2] [2]
#> 3 3 [1] [1]
# use tidyselect syntax and helpers, just like in dplyr::select()
df %>% nest(data = any_of(c("y", "z")))
#> # A tibble: 3 × 2
#> x data
#> <dbl> <list>
#> 1 1 <tibble [3 × 2]>
#> 2 2 <tibble [2 × 2]>
#> 3 3 <tibble [1 × 2]>
iris %>% nest(data = !Species)
#> # A tibble: 3 × 2
#> Species data
#> <fct> <list>
#> 1 setosa <tibble [50 × 4]>
#> 2 versicolor <tibble [50 × 4]>
#> 3 virginica <tibble [50 × 4]>
nest_vars <- names(iris)[1:4]
iris %>% nest(data = any_of(nest_vars))
#> # A tibble: 3 × 2
#> Species data
#> <fct> <list>
#> 1 setosa <tibble [50 × 4]>
#> 2 versicolor <tibble [50 × 4]>
#> 3 virginica <tibble [50 × 4]>
iris %>%
nest(petal = starts_with("Petal"), sepal = starts_with("Sepal"))
#> # A tibble: 3 × 3
#> Species petal sepal
#> <fct> <list> <list>
#> 1 setosa <tibble [50 × 2]> <tibble [50 × 2]>
#> 2 versicolor <tibble [50 × 2]> <tibble [50 × 2]>
#> 3 virginica <tibble [50 × 2]> <tibble [50 × 2]>
iris %>%
nest(width = contains("Width"), length = contains("Length"))
#> # A tibble: 3 × 3
#> Species width length
#> <fct> <list> <list>
#> 1 setosa <tibble [50 × 2]> <tibble [50 × 2]>
#> 2 versicolor <tibble [50 × 2]> <tibble [50 × 2]>
#> 3 virginica <tibble [50 × 2]> <tibble [50 × 2]>
# Nesting a grouped data frame nests all variables apart from the group vars
library(dplyr)
fish_encounters %>%
group_by(fish) %>%
nest()
#> # A tibble: 19 × 2
#> # Groups: fish [19]
#> fish data
#> <fct> <list>
#> 1 4842 <tibble [11 × 2]>
#> 2 4843 <tibble [11 × 2]>
#> 3 4844 <tibble [11 × 2]>
#> 4 4845 <tibble [5 × 2]>
#> 5 4847 <tibble [3 × 2]>
#> 6 4848 <tibble [4 × 2]>
#> 7 4849 <tibble [2 × 2]>
#> 8 4850 <tibble [6 × 2]>
#> 9 4851 <tibble [2 × 2]>
#> 10 4854 <tibble [2 × 2]>
#> 11 4855 <tibble [5 × 2]>
#> 12 4857 <tibble [9 × 2]>
#> 13 4858 <tibble [11 × 2]>
#> 14 4859 <tibble [5 × 2]>
#> 15 4861 <tibble [11 × 2]>
#> 16 4862 <tibble [9 × 2]>
#> 17 4863 <tibble [2 × 2]>
#> 18 4864 <tibble [2 × 2]>
#> 19 4865 <tibble [3 × 2]>
# Nesting is often useful for creating per group models
mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(models = lapply(data, function(df) lm(mpg ~ wt, data = df)))
#> # A tibble: 3 × 3
#> # Groups: cyl [3]
#> cyl data models
#> <dbl> <list> <list>
#> 1 6 <tibble [7 × 10]> <lm>
#> 2 4 <tibble [11 × 10]> <lm>
#> 3 8 <tibble [14 × 10]> <lm>
# unnest() is primarily designed to work with lists of data frames
df <- tibble(
x = 1:3,
y = list(
NULL,
tibble(a = 1, b = 2),
tibble(a = 1:3, b = 3:1)
)
)
df %>% unnest(y)
#> # A tibble: 4 × 3
#> x a b
#> <int> <dbl> <dbl>
#> 1 2 1 2
#> 2 3 1 3
#> 3 3 2 2
#> 4 3 3 1
df %>% unnest(y, keep_empty = TRUE)
#> # A tibble: 5 × 3
#> x a b
#> <int> <dbl> <dbl>
#> 1 1 NA NA
#> 2 2 1 2
#> 3 3 1 3
#> 4 3 2 2
#> 5 3 3 1
# If you have lists of lists, or lists of atomic vectors, instead
# see hoist(), unnest_wider(), and unnest_longer()
#' # You can unnest multiple columns simultaneously
df <- tibble(
a = list(c("a", "b"), "c"),
b = list(1:2, 3),
c = c(11, 22)
)
df %>% unnest(c(a, b))
#> # A tibble: 3 × 3
#> a b c
#> <chr> <dbl> <dbl>
#> 1 a 1 11
#> 2 b 2 11
#> 3 c 3 22
# Compare with unnesting one column at a time, which generates
# the Cartesian product
df %>% unnest(a) %>% unnest(b)
#> # A tibble: 5 × 3
#> a b c
#> <chr> <dbl> <dbl>
#> 1 a 1 11
#> 2 a 2 11
#> 3 b 1 11
#> 4 b 2 11
#> 5 c 3 22