Rectangle a nested list into a tidy tibble

hoist(), unnest_longer(), and unnest_wider() provide tools for rectangling, collapsing deeply nested lists into regular columns. hoist() allows you to selectively pull components of a list-column out in to their own top-level columns, using the same syntax as purrr::pluck(). unnest_wider() turns each element of a list-column into a column, and unnest_longer() turns each element of a list-column into a row. unnest_auto() picks between unnest_wider() or unnest_longer() based on heuristics described below.

Learn more in vignette("rectangle").

hoist(
  .data,
  .col,
  ...,
  .remove = TRUE,
  .simplify = TRUE,
  .ptype = NULL,
  .transform = NULL
)

unnest_longer(
  data,
  col,
  values_to = NULL,
  indices_to = NULL,
  indices_include = NULL,
  names_repair = "check_unique",
  simplify = TRUE,
  ptype = NULL,
  transform = NULL
)

unnest_wider(
  data,
  col,
  names_sep = NULL,
  simplify = TRUE,
  strict = FALSE,
  names_repair = "check_unique",
  ptype = NULL,
  transform = NULL
)

unnest_auto(data, col)

Arguments

.data, data

A data frame.

.col, col

List-column to extract components from.

For hoist() and unnest_auto(), this must identify a single column.

For unnest_wider() and unnest_longer(), you can use tidyselect to select multiple columns to unnest simultaneously. When using unnest_longer() with multiple columns, values across columns that originated from the same row are recycled to a common size.

...

Components of .col to turn into columns in the form col_name = "pluck_specification". You can pluck by name with a character vector, by position with an integer vector, or with a combination of the two with a list. See purrr::pluck() for details.

The column names must be unique in a call to hoist(), although existing columns with the same name will be overwritten. When plucking with a single string you can choose to omit the name, i.e. hoist(df, col, "x") is short-hand for hoist(df, col, x = "x").

.remove

If TRUE, the default, will remove extracted components from .col. This ensures that each value lives only in one place. If all components are removed from .col, then .col will be removed from the result entirely.

.simplify, simplify

If TRUE, will attempt to simplify lists of length-1 vectors to an atomic vector. Can also be a named list containing TRUE or FALSE declaring whether or not to attempt to simplify a particular column. If a named list is provided, the default for any unspecified columns is TRUE.

.ptype, ptype

Optionally, a named list of prototypes declaring the desired output type of each component. Alternatively, a single empty prototype can be supplied, which will be applied to all components. Use this argument if you want to check that each element has the type you expect when simplifying.

If a ptype has been specified, but simplify = FALSE or simplification isn't possible, then a list-of column will be returned and each element will have type ptype.

.transform, transform

Optionally, a named list of transformation functions applied to each component. Alternatively, a single function can be supplied, which will be applied to all components. Use this argument if you want to transform or parse individual elements as they are extracted.

When both ptype and transform are supplied, the transform is applied before the ptype.

values_to

A string giving the column name (or names) to store the unnested values in. If multiple columns are specified in col, this can also be a glue string containing "{col}" to provide a template for the column names. The default, NULL, gives the output columns the same names as the input columns.

indices_to

A string giving the column name (or names) to store the the inner names or positions (if not named) of the values. If multiple columns are specified in col, this can also be a glue string containing "{col}" to provide a template for the column names. The default, NULL, gives the output columns the same names as values_to, but suffixed with "_id".

indices_include

A single logical value specifying whether or not to add an index column. If any value has inner names, the index column will be a character vector of those names, otherwise it will be an integer vector of positions. If NULL, defaults to TRUE if any value has inner names or if indices_to is provided.

If indices_to is provided, then indices_include must not be FALSE.

names_repair

Used to check that output data frame has valid names. Must be one of the following options:

"minimal": no name repair or checks, beyond basic existence,
"unique": make sure names are unique and not empty,
"check_unique": (the default), no name repair, but check they are unique,
"universal": make the names unique and syntactic
a function: apply custom name repair.
tidyr_legacy: use the name repair from tidyr 0.8.
a formula: a purrr-style anonymous function (see rlang::as_function())

See vctrs::vec_as_names() for more details on these terms and the strategies used to enforce them.

names_sep

If NULL, the default, the names will be left as is. If a string, the outer and inner names will be pasted together using names_sep as a separator.

If the values being unnested are unnamed and names_sep is supplied, the inner names will be automatically generated as an increasing sequence of integers.

strict

A single logical specifying whether or not to apply strict vctrs typing rules. If FALSE, typed empty values (like list() or integer()) nested within list-columns will be treated like NULL and will not contribute to the type of the unnested column. This is useful when working with JSON, where empty values tend to lose their type information and show up as list().

Unnest variants

The three unnest() functions differ in how they change the shape of the output data frame:

unnest_wider() preserves the rows, but changes the columns.
unnest_longer() preserves the columns, but changes the rows
unnest() can change both rows and columns.

These principles guide their behaviour when they are called with a non-primary data type. For example, if you unnest_wider() a list of data frames, the number of rows must be preserved, so each column is turned into a list column of length one. Or if you unnest_longer() a list of data frames, the number of columns must be preserved so it creates a packed column. I'm not sure how if these behaviours are useful in practice, but they are theoretically pleasing.

`unnest_auto()` heuristics

unnest_auto() inspects the inner names of the list-col:

If all elements are unnamed, it uses unnest_longer(indices_include = FALSE).
If all elements are named, and there's at least one name in common across all components, it uses unnest_wider().
Otherwise, it falls back to unnest_longer(indices_include = TRUE).

Examples

df <- tibble(
  character = c("Toothless", "Dory"),
  metadata = list(
    list(
      species = "dragon",
      color = "black",
      films = c(
        "How to Train Your Dragon",
        "How to Train Your Dragon 2",
        "How to Train Your Dragon: The Hidden World"
       )
    ),
    list(
      species = "blue tang",
      color = "blue",
      films = c("Finding Nemo", "Finding Dory")
    )
  )
)
df
#> # A tibble: 2 × 2
#>   character metadata        
#>   <chr>     <list>          
#> 1 Toothless <named list [3]>
#> 2 Dory      <named list [3]>

# Turn all components of metadata into columns
df %>% unnest_wider(metadata)
#> # A tibble: 2 × 4
#>   character species   color films    
#>   <chr>     <chr>     <chr> <list>   
#> 1 Toothless dragon    black <chr [3]>
#> 2 Dory      blue tang blue  <chr [2]>

# Choose not to simplify list-cols of length-1 elements
df %>% unnest_wider(metadata, simplify = FALSE)
#> # A tibble: 2 × 4
#>   character species   color     films    
#>   <chr>     <list>    <list>    <list>   
#> 1 Toothless <chr [1]> <chr [1]> <chr [3]>
#> 2 Dory      <chr [1]> <chr [1]> <chr [2]>
df %>% unnest_wider(metadata, simplify = list(color = FALSE))
#> # A tibble: 2 × 4
#>   character species   color     films    
#>   <chr>     <chr>     <list>    <list>   
#> 1 Toothless dragon    <chr [1]> <chr [3]>
#> 2 Dory      blue tang <chr [1]> <chr [2]>

# Extract only specified components
df %>% hoist(metadata,
  "species",
  first_film = list("films", 1L),
  third_film = list("films", 3L)
)
#> # A tibble: 2 × 5
#>   character species   first_film               third_film           metadata    
#>   <chr>     <chr>     <chr>                    <chr>                <list>      
#> 1 Toothless dragon    How to Train Your Dragon How to Train Your D… <named list>
#> 2 Dory      blue tang Finding Nemo             NA                   <named list>

df %>%
  unnest_wider(metadata) %>%
  unnest_longer(films)
#> # A tibble: 5 × 4
#>   character species   color films                                     
#>   <chr>     <chr>     <chr> <chr>                                     
#> 1 Toothless dragon    black How to Train Your Dragon                  
#> 2 Toothless dragon    black How to Train Your Dragon 2                
#> 3 Toothless dragon    black How to Train Your Dragon: The Hidden World
#> 4 Dory      blue tang blue  Finding Nemo                              
#> 5 Dory      blue tang blue  Finding Dory                              

# unnest_longer() is useful when each component of the list should
# form a row
df <- tibble(
  x = 1:3,
  y = list(NULL, 1:3, 4:5)
)
df %>% unnest_longer(y)
#> # A tibble: 6 × 2
#>       x     y
#>   <int> <int>
#> 1     1    NA
#> 2     2     1
#> 3     2     2
#> 4     2     3
#> 5     3     4
#> 6     3     5
# Automatically creates names if widening
df %>% unnest_wider(y)
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> # A tibble: 3 × 4
#>       x  ...1  ...2  ...3
#>   <int> <int> <int> <int>
#> 1     1    NA    NA    NA
#> 2     2     1     2     3
#> 3     3     4     5    NA
# But you'll usually want to provide names_sep:
df %>% unnest_wider(y, names_sep = "_")
#> # A tibble: 3 × 4
#>       x   y_1   y_2   y_3
#>   <int> <int> <int> <int>
#> 1     1    NA    NA    NA
#> 2     2     1     2     3
#> 3     3     4     5    NA

# And similarly if the vectors are named
df <- tibble(
  x = 1:2,
  y = list(c(a = 1, b = 2), c(a = 10, b = 11, c = 12))
)
df %>% unnest_wider(y)
#> # A tibble: 2 × 4
#>       x     a     b     c
#>   <int> <dbl> <dbl> <dbl>
#> 1     1     1     2    NA
#> 2     2    10    11    12
df %>% unnest_longer(y)
#> # A tibble: 5 × 3
#>       x     y y_id 
#>   <int> <dbl> <chr>
#> 1     1     1 a    
#> 2     1     2 b    
#> 3     2    10 a    
#> 4     2    11 b    
#> 5     2    12 c    

# Both unnest_wider() and unnest_longer() allow you to unnest multiple
# columns at once. This is particularly useful with unnest_longer(), where
# unnesting sequentially would generate a cartesian product of the rows.
df <- tibble(
  x = 1:2,
  y = list(1:2, 3:4),
  z = list(5:6, 7:8)
)
unnest_longer(df, c(y, z))
#> # A tibble: 4 × 3
#>       x     y     z
#>   <int> <int> <int>
#> 1     1     1     5
#> 2     1     2     6
#> 3     2     3     7
#> 4     2     4     8
unnest_longer(unnest_longer(df, y), z)
#> # A tibble: 8 × 3
#>       x     y     z
#>   <int> <int> <int>
#> 1     1     1     5
#> 2     1     1     6
#> 3     1     2     5
#> 4     1     2     6
#> 5     2     3     7
#> 6     2     3     8
#> 7     2     4     7
#> 8     2     4     8

# With JSON, it is common for empty elements to be represented by `list()`
# rather then their typed equivalent, like `integer()`
json <- list(
  list(x = 1:2, y = 1:2),
  list(x = list(), y = 3:4),
  list(x = 3L, y = list())
)
df <- tibble(json = json)

# The defaults of `unnest_wider()` treat empty types (like `list()`) as `NULL`.
# This chains nicely into `unnest_longer()`.
wide <- unnest_wider(df, json)
wide
#> # A tibble: 3 × 2
#>   x         y        
#>   <list>    <list>   
#> 1 <int [2]> <int [2]>
#> 2 <NULL>    <int [2]>
#> 3 <int [1]> <NULL>   

unnest_longer(wide, c(x, y))
#> # A tibble: 5 × 2
#>       x     y
#>   <int> <int>
#> 1     1     1
#> 2     2     2
#> 3    NA     3
#> 4    NA     4
#> 5     3    NA

# To instead enforce strict vctrs typing rules, use `strict`
wide_strict <- unnest_wider(df, json, strict = TRUE)
wide_strict
#> # A tibble: 3 × 2
#>   x          y         
#>   <list>     <list>    
#> 1 <int [2]>  <int [2]> 
#> 2 <list [0]> <int [2]> 
#> 3 <int [1]>  <list [0]>

try(unnest_longer(wide_strict, c(x, y)))
#> Error in stop_vctrs(message, class = c(class, "vctrs_error_incompatible"),  : 
#>   Can't combine `..1$x` <integer> and `..3$x` <list>.

Rectangle a nested list into a tidy tibble

Arguments

Unnest variants

unnest_auto() heuristics

See also

Examples

`unnest_auto()` heuristics