[Maturing]

Chopping and unchopping preserve the width of a data frame, changing its length. chop() makes df shorter by converting rows within each group into list-columns. unchop() makes df longer by expanding list-columns so that each element of the list-column gets its own row in the output. chop() and unchop() are building blocks for more complicated functions (like unnest(), unnest_longer(), and unnest_wider()) and are generally more suitable for programming than interactive data analysis.

chop(data, cols)

unchop(data, cols, keep_empty = FALSE, ptype = NULL)

Arguments

data

A data frame.

cols

<tidy-select> Columns to chop or unchop (automatically quoted).

For unchop(), each column should be a list-column containing generalised vectors (e.g. any mix of NULLs, atomic vector, S3 vectors, a lists, or data frames).

keep_empty

By default, you get one row of output for each element of the list your unchopping/unnesting. This means that if there's a size-0 element (like NULL or an empty data frame), that entire row will be dropped from the output. If you want to preserve all rows, use keep_empty = TRUE to replace size-0 elements with a single row of missing values.

ptype

Optionally, a named list of column name-prototype pairs to coerce cols to, overriding the default that will be guessed from combining the individual values.

Details

Generally, unchopping is more useful than chopping because it simplifies a complex data structure, and nest()ing is usually more appropriate that chop()ing` since it better preserves the connections between observations.

chop() creates list-columns of class vctrs::list_of() to ensure consistent behaviour when the chopped data frame is emptied. For instance this helps getting back the original column types after the roundtrip chop and unchop. Because <list_of> keeps tracks of the type of its elements, unchop() is able to reconstitute the correct vector type even for empty list-columns.

Examples

# Chop ============================================================== df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) # Note that we get one row of output for each unique combination of # non-chopped variables df %>% chop(c(y, z))
#> # A tibble: 3 × 3 #> x y z #> <dbl> <list<int>> <list<int>> #> 1 1 [3] [3] #> 2 2 [2] [2] #> 3 3 [1] [1]
# cf nest df %>% nest(data = c(y, z))
#> # A tibble: 3 × 2 #> x data #> <dbl> <list> #> 1 1 <tibble [3 × 2]> #> 2 2 <tibble [2 × 2]> #> 3 3 <tibble [1 × 2]>
# Unchop ============================================================ df <- tibble(x = 1:4, y = list(integer(), 1L, 1:2, 1:3)) df %>% unchop(y)
#> # A tibble: 6 × 2 #> x y #> <int> <int> #> 1 2 1 #> 2 3 1 #> 3 3 2 #> 4 4 1 #> 5 4 2 #> 6 4 3
df %>% unchop(y, keep_empty = TRUE)
#> # A tibble: 7 × 2 #> x y #> <int> <int> #> 1 1 NA #> 2 2 1 #> 3 3 1 #> 4 3 2 #> 5 4 1 #> 6 4 2 #> 7 4 3
# Incompatible types ------------------------------------------------- # If the list-col contains types that can not be natively df <- tibble(x = 1:2, y = list("1", 1:3)) try(df %>% unchop(y))
#> Error : Can't combine `..1` <character> and `..2` <integer>.
# Unchopping data frames ----------------------------------------------------- # Unchopping a list-col of data frames must generate a df-col because # unchop leaves the column names unchanged df <- tibble(x = 1:3, y = list(NULL, tibble(x = 1), tibble(y = 1:2))) df %>% unchop(y)
#> # A tibble: 3 × 2 #> x y$x $y #> <int> <dbl> <int> #> 1 2 1 NA #> 2 3 NA 1 #> 3 3 NA 2
df %>% unchop(y, keep_empty = TRUE)
#> # A tibble: 4 × 2 #> x y$x $y #> <int> <dbl> <int> #> 1 1 NA NA #> 2 2 1 NA #> 3 3 NA 1 #> 4 3 NA 2