Packing and unpacking preserve the length of a data frame, changing its
width. pack()
makes df
narrow by collapsing a set of columns into a
single df-column. unpack()
makes data
wider by expanding df-columns
back out into individual columns.
pack(.data, ..., .names_sep = NULL)
unpack(data, cols, names_sep = NULL, names_repair = "check_unique")
<tidy-select
> Columns to pack, specified
using name-variable pairs of the form new_col = c(col1, col2, col3)
.
The right hand side can be any valid tidy select expression.
A data frame.
<tidy-select
> Column to unpack.
If NULL
, the default, the names will be left
as is. In pack()
, inner names will come from the former outer names;
in unpack()
, the new outer names will come from the inner names.
If a string, the inner and outer names will be used together. In
unpack()
, the names of the new outer columns will be formed by pasting
together the outer and the inner column names, separated by names_sep
. In
pack()
, the new inner names will have the outer names + names_sep
automatically stripped. This makes names_sep
roughly symmetric between
packing and unpacking.
Used to check that output data frame has valid names. Must be one of the following options:
"minimal": no name repair or checks, beyond basic existence,
"unique": make sure names are unique and not empty,
"check_unique": (the default), no name repair, but check they are unique,
"universal": make the names unique and syntactic
a function: apply custom name repair.
tidyr_legacy: use the name repair from tidyr 0.8.
a formula: a purrr-style anonymous function (see rlang::as_function()
)
See vctrs::vec_as_names()
for more details on these terms and the
strategies used to enforce them.
Generally, unpacking is more useful than packing because it simplifies a complex data structure. Currently, few functions work with df-cols, and they are mostly a curiosity, but seem worth exploring further because they mimic the nested column headers that are so popular in Excel.
# Packing =============================================================
# It's not currently clear why you would ever want to pack columns
# since few functions work with this sort of data.
df <- tibble(x1 = 1:3, x2 = 4:6, x3 = 7:9, y = 1:3)
df
#> # A tibble: 3 × 4
#> x1 x2 x3 y
#> <int> <int> <int> <int>
#> 1 1 4 7 1
#> 2 2 5 8 2
#> 3 3 6 9 3
df %>% pack(x = starts_with("x"))
#> # A tibble: 3 × 2
#> y x$x1 $x2 $x3
#> <int> <int> <int> <int>
#> 1 1 1 4 7
#> 2 2 2 5 8
#> 3 3 3 6 9
df %>% pack(x = c(x1, x2, x3), y = y)
#> # A tibble: 3 × 2
#> x$x1 $x2 $x3 y$y
#> <int> <int> <int> <int>
#> 1 1 4 7 1
#> 2 2 5 8 2
#> 3 3 6 9 3
# .names_sep allows you to strip off common prefixes; this
# acts as a natural inverse to name_sep in unpack()
iris %>%
as_tibble() %>%
pack(
Sepal = starts_with("Sepal"),
Petal = starts_with("Petal"),
.names_sep = "."
)
#> # A tibble: 150 × 3
#> Species Sepal$Length $Width Petal$Length $Width
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 5.1 3.5 1.4 0.2
#> 2 setosa 4.9 3 1.4 0.2
#> 3 setosa 4.7 3.2 1.3 0.2
#> 4 setosa 4.6 3.1 1.5 0.2
#> 5 setosa 5 3.6 1.4 0.2
#> 6 setosa 5.4 3.9 1.7 0.4
#> 7 setosa 4.6 3.4 1.4 0.3
#> 8 setosa 5 3.4 1.5 0.2
#> 9 setosa 4.4 2.9 1.4 0.2
#> 10 setosa 4.9 3.1 1.5 0.1
#> # … with 140 more rows
# Unpacking ===========================================================
df <- tibble(
x = 1:3,
y = tibble(a = 1:3, b = 3:1),
z = tibble(X = c("a", "b", "c"), Y = runif(3), Z = c(TRUE, FALSE, NA))
)
df
#> # A tibble: 3 × 3
#> x y$a $b z$X $Y $Z
#> <int> <int> <int> <chr> <dbl> <lgl>
#> 1 1 1 3 a 0.920 TRUE
#> 2 2 2 2 b 0.00419 FALSE
#> 3 3 3 1 c 0.912 NA
df %>% unpack(y)
#> # A tibble: 3 × 4
#> x a b z$X $Y $Z
#> <int> <int> <int> <chr> <dbl> <lgl>
#> 1 1 1 3 a 0.920 TRUE
#> 2 2 2 2 b 0.00419 FALSE
#> 3 3 3 1 c 0.912 NA
df %>% unpack(c(y, z))
#> # A tibble: 3 × 6
#> x a b X Y Z
#> <int> <int> <int> <chr> <dbl> <lgl>
#> 1 1 1 3 a 0.920 TRUE
#> 2 2 2 2 b 0.00419 FALSE
#> 3 3 3 1 c 0.912 NA
df %>% unpack(c(y, z), names_sep = "_")
#> # A tibble: 3 × 6
#> x y_a y_b z_X z_Y z_Z
#> <int> <int> <int> <chr> <dbl> <lgl>
#> 1 1 1 3 a 0.920 TRUE
#> 2 2 2 2 b 0.00419 FALSE
#> 3 3 3 1 c 0.912 NA