Rectangle a nested list into a tidy tibble

hoist(), unnest_longer(), and unnest_wider() provide tools for rectangling, collapsing deeply nested lists into regular columns. hoist() allows you to selectively pull components of a list-column out in to their own top-level columns, using the same syntax as purrr::pluck(). unnest_wider() turns each element of a list-column into a column, and unnest_longer() turns each element of a list-column into a row. unnest_auto() picks between unnest_wider() or unnest_longer() based heuristics described below.

Learn more in vignette("rectangle").

hoist(
  .data,
  .col,
  ...,
  .remove = TRUE,
  .simplify = TRUE,
  .ptype = list(),
  .transform = list()
)

unnest_longer(
  data,
  col,
  values_to = NULL,
  indices_to = NULL,
  indices_include = NULL,
  names_repair = "check_unique",
  simplify = TRUE,
  ptype = list(),
  transform = list()
)

unnest_wider(
  data,
  col,
  names_sep = NULL,
  simplify = TRUE,
  names_repair = "check_unique",
  ptype = list(),
  transform = list()
)

unnest_auto(data, col)

Arguments

.data, data	A data frame.
.col, col	List-column to extract components from.
...	Components of `.col` to turn into columns in the form `col_name = "pluck_specification"`. You can pluck by name with a character vector, by position with an integer vector, or with a combination of the two with a list. See `purrr::pluck()` for details. The column names must be unique in a call to `hoist()`, although existing columns with the same name will be overwritten. When plucking with a single string you can choose to omit the name, i.e. `hoist(df, col, "x")` is short-hand for `hoist(df, col, x = "x")`.
.remove	If `TRUE`, the default, will remove extracted components from `.col`. This ensures that each value lives only in one place.
.simplify, simplify	If `TRUE`, will attempt to simplify lists of length-1 vectors to an atomic vector
.ptype, ptype	Optionally, a named list of prototypes declaring the desired output type of each component. Use this argument if you want to check each element has the types you expect when simplifying.
.transform, transform	Optionally, a named list of transformation functions applied to each component. Use this function if you want transform or parse individual elements as they are hoisted.
values_to	Name of column to store vector values. Defaults to `col`.
indices_to	A string giving the name of column which will contain the inner names or position (if not named) of the values. Defaults to `col` with `_id` suffix
indices_include	Add an index column? Defaults to `TRUE` when `col` has inner names.
names_repair	Used to check that output data frame has valid names. Must be one of the following options: "minimal": no name repair or checks, beyond basic existence, "unique": make sure names are unique and not empty, "check_unique": (the default), no name repair, but check they are unique, "universal": make the names unique and syntactic a function: apply custom name repair. tidyr_legacy: use the name repair from tidyr 0.8. a formula: a purrr-style anonymous function (see `rlang::as_function()`) See `vctrs::vec_as_names()` for more details on these terms and the strategies used to enforce them.
names_sep	If `NULL`, the default, the names will be left as is. If a string, the inner and outer names will be paste together using `names_sep` as a separator.

Unnest variants

The three unnest() functions differ in how they change the shape of the output data frame:

unnest_wider() preserves the rows, but changes the columns.
unnest_longer() preserves the columns, but changes the rows
unnest() can change both rows and columns.

These principles guide their behaviour when they are called with a non-primary data type. For example, if you unnest_wider() a list of data frames, the number of rows must be preserved, so each column is turned into a list column of length one. Or if you unnest_longer() a list of data frame, the number of columns must be preserved so it creates a packed column. I'm not sure how if these behaviours are useful in practice, but they are theoretically pleasing.

`unnest_auto()` heuristics

unnest_auto() inspects the inner names of the list-col:

If all elements are unnamed, it uses unnest_longer()
If all elements are named, and there's at least one name in common acros all components, it uses unnest_wider()
Otherwise, it falls back to unnest_longer(indices_include = TRUE).

Examples

df <- tibble(
  character = c("Toothless", "Dory"),
  metadata = list(
    list(
      species = "dragon",
      color = "black",
      films = c(
        "How to Train Your Dragon",
        "How to Train Your Dragon 2",
        "How to Train Your Dragon: The Hidden World"
       )
    ),
    list(
      species = "blue tang",
      color = "blue",
      films = c("Finding Nemo", "Finding Dory")
    )
  )
)
df
#> # A tibble: 2 × 2
#>   character metadata        
#>   <chr>     <list>          
#> 1 Toothless <named list [3]>
#> 2 Dory      <named list [3]>

# Turn all components of metadata into columns
df %>% unnest_wider(metadata)
#> # A tibble: 2 × 4
#>   character species   color films    
#>   <chr>     <chr>     <chr> <list>   
#> 1 Toothless dragon    black <chr [3]>
#> 2 Dory      blue tang blue  <chr [2]>

# Extract only specified components
df %>% hoist(metadata,
  "species",
  first_film = list("films", 1L),
  third_film = list("films", 3L)
)
#> # A tibble: 2 × 5
#>   character species   first_film               third_film             metadata  
#>   <chr>     <chr>     <chr>                    <chr>                  <list>    
#> 1 Toothless dragon    How to Train Your Dragon How to Train Your Dra… <named li…
#> 2 Dory      blue tang Finding Nemo             NA                     <named li…

df %>%
  unnest_wider(metadata) %>%
  unnest_longer(films)
#> # A tibble: 5 × 4
#>   character species   color films                                     
#>   <chr>     <chr>     <chr> <chr>                                     
#> 1 Toothless dragon    black How to Train Your Dragon                  
#> 2 Toothless dragon    black How to Train Your Dragon 2                
#> 3 Toothless dragon    black How to Train Your Dragon: The Hidden World
#> 4 Dory      blue tang blue  Finding Nemo                              
#> 5 Dory      blue tang blue  Finding Dory                              

# unnest_longer() is useful when each component of the list should
# form a row
df <- tibble(
  x = 1:3,
  y = list(NULL, 1:3, 4:5)
)
df %>% unnest_longer(y)
#> # A tibble: 6 × 2
#>       x     y
#>   <int> <int>
#> 1     1    NA
#> 2     2     1
#> 3     2     2
#> 4     2     3
#> 5     3     4
#> 6     3     5
# Automatically creates names if widening
df %>% unnest_wider(y)
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> # A tibble: 3 × 4
#>       x  ...1  ...2  ...3
#>   <int> <int> <int> <int>
#> 1     1    NA    NA    NA
#> 2     2     1     2     3
#> 3     3     4     5    NA
# But you'll usually want to provide names_sep:
df %>% unnest_wider(y, names_sep = "_")
#> # A tibble: 3 × 4
#>       x   y_1   y_2   y_3
#>   <int> <int> <int> <int>
#> 1     1    NA    NA    NA
#> 2     2     1     2     3
#> 3     3     4     5    NA

# And similarly if the vectors are named
df <- tibble(
  x = 1:2,
  y = list(c(a = 1, b = 2), c(a = 10, b = 11, c = 12))
)
df %>% unnest_wider(y)
#> # A tibble: 2 × 4
#>       x     a     b     c
#>   <int> <dbl> <dbl> <dbl>
#> 1     1     1     2    NA
#> 2     2    10    11    12
df %>% unnest_longer(y)
#> # A tibble: 5 × 3
#>       x     y y_id 
#>   <int> <dbl> <chr>
#> 1     1     1 a    
#> 2     1     2 b    
#> 3     2    10 a    
#> 4     2    11 b    
#> 5     2    12 c

Rectangle a nested list into a tidy tibble

Arguments

Unnest variants

unnest_auto() heuristics

Examples

`unnest_auto()` heuristics