Pack and unpack — pack • tidyr

Packing and unpacking preserve the length of a data frame, changing its width. pack() makes df narrow by collapsing a set of columns into a single df-column. unpack() makes data wider by expanding df-columns back out into individual columns.

pack(.data, ..., .names_sep = NULL)

unpack(data, cols, names_sep = NULL, names_repair = "check_unique")

Arguments

...	<`tidy-select`> Columns to pack, specified using name-variable pairs of the form `new_col = c(col1, col2, col3)`. The right hand side can be any valid tidy select expression.
data, .data	A data frame.
cols	<`tidy-select`> Column to unpack.
names_sep, .names_sep	If `NULL`, the default, the names will be left as is. In `pack()`, inner names will come from the former outer names; in `unpack()`, the new outer names will come from the inner names. If a string, the inner and outer names will be used together. In `pack()`, the names of the new outer columns will be formed by pasting together the outer and the inner column names, separated by `names_sep`. In `unpack()`, the new inner names will have the outer names (+ `names_sep`) automatically stripped. This makes `names_sep` roughly symmetric between packing and unpacking.
names_repair	Used to check that output data frame has valid names. Must be one of the following options: "minimal": no name repair or checks, beyond basic existence, "unique": make sure names are unique and not empty, "check_unique": (the default), no name repair, but check they are unique, "universal": make the names unique and syntactic a function: apply custom name repair. tidyr_legacy: use the name repair from tidyr 0.8. a formula: a purrr-style anonymous function (see `rlang::as_function()`) See `vctrs::vec_as_names()` for more details on these terms and the strategies used to enforce them.

Details

Generally, unpacking is more useful than packing because it simplifies a complex data structure. Currently, few functions work with df-cols, and they are mostly a curiosity, but seem worth exploring further because they mimic the nested column headers that are so popular in Excel.

Examples

# Packing =============================================================
# It's not currently clear why you would ever want to pack columns
# since few functions work with this sort of data.
df <- tibble(x1 = 1:3, x2 = 4:6, x3 = 7:9, y = 1:3)
df
#> # A tibble: 3 × 4
#>      x1    x2    x3     y
#>   <int> <int> <int> <int>
#> 1     1     4     7     1
#> 2     2     5     8     2
#> 3     3     6     9     3
df %>% pack(x = starts_with("x"))
#> # A tibble: 3 × 2
#>       y  x$x1   $x2   $x3
#>   <int> <int> <int> <int>
#> 1     1     1     4     7
#> 2     2     2     5     8
#> 3     3     3     6     9
df %>% pack(x = c(x1, x2, x3), y = y)
#> # A tibble: 3 × 2
#>    x$x1   $x2   $x3   y$y
#>   <int> <int> <int> <int>
#> 1     1     4     7     1
#> 2     2     5     8     2
#> 3     3     6     9     3

# .names_sep allows you to strip off common prefixes; this
# acts as a natural inverse to name_sep in unpack()
iris %>%
  as_tibble() %>%
  pack(
    Sepal = starts_with("Sepal"),
    Petal = starts_with("Petal"),
    .names_sep = "."
  )
#> # A tibble: 150 × 3
#>    Species Sepal$Length $Width Petal$Length $Width
#>    <fct>          <dbl>  <dbl>        <dbl>  <dbl>
#>  1 setosa           5.1    3.5          1.4    0.2
#>  2 setosa           4.9    3            1.4    0.2
#>  3 setosa           4.7    3.2          1.3    0.2
#>  4 setosa           4.6    3.1          1.5    0.2
#>  5 setosa           5      3.6          1.4    0.2
#>  6 setosa           5.4    3.9          1.7    0.4
#>  7 setosa           4.6    3.4          1.4    0.3
#>  8 setosa           5      3.4          1.5    0.2
#>  9 setosa           4.4    2.9          1.4    0.2
#> 10 setosa           4.9    3.1          1.5    0.1
#> # … with 140 more rows

# Unpacking ===========================================================
df <- tibble(
  x = 1:3,
  y = tibble(a = 1:3, b = 3:1),
  z = tibble(X = c("a", "b", "c"), Y = runif(3), Z = c(TRUE, FALSE, NA))
)
df
#> # A tibble: 3 × 3
#>       x   y$a    $b z$X       $Y $Z   
#>   <int> <int> <int> <chr>  <dbl> <lgl>
#> 1     1     1     3 a     0.0281 TRUE 
#> 2     2     2     2 b     0.466  FALSE
#> 3     3     3     1 c     0.390  NA   
df %>% unpack(y)
#> # A tibble: 3 × 4
#>       x     a     b z$X       $Y $Z   
#>   <int> <int> <int> <chr>  <dbl> <lgl>
#> 1     1     1     3 a     0.0281 TRUE 
#> 2     2     2     2 b     0.466  FALSE
#> 3     3     3     1 c     0.390  NA   
df %>% unpack(c(y, z))
#> # A tibble: 3 × 6
#>       x     a     b X          Y Z    
#>   <int> <int> <int> <chr>  <dbl> <lgl>
#> 1     1     1     3 a     0.0281 TRUE 
#> 2     2     2     2 b     0.466  FALSE
#> 3     3     3     1 c     0.390  NA   
df %>% unpack(c(y, z), names_sep = "_")
#> # A tibble: 3 × 6
#>       x   y_a   y_b z_X      z_Y z_Z  
#>   <int> <int> <int> <chr>  <dbl> <lgl>
#> 1     1     1     3 a     0.0281 TRUE 
#> 2     2     2     2 b     0.466  FALSE
#> 3     3     3     1 c     0.390  NA