pivot_wider()
"widens" data, increasing the number of columns and
decreasing the number of rows. The inverse transformation is
pivot_longer()
.
Learn more in vignette("pivot")
.
pivot_wider(
data,
id_cols = NULL,
id_expand = FALSE,
names_from = name,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
names_repair = "check_unique",
values_from = value,
values_fill = NULL,
values_fn = NULL,
unused_fn = NULL,
...
)
A data frame to pivot.
<tidy-select
> A set of columns that
uniquely identifies each observation. Defaults to all columns in data
except for the columns specified in names_from
and values_from
.
Typically used when you have redundant variables, i.e. variables whose
values are perfectly correlated with existing variables.
Should the values in the id_cols
columns be expanded by
expand()
before pivoting? This results in more rows, the output will
contain a complete expansion of all possible values in id_cols
. Implicit
factor levels that aren't represented in the data will become explicit.
Additionally, the row values corresponding to the expanded id_cols
will
be sorted.
<tidy-select
> A pair of
arguments describing which column (or columns) to get the name of the
output column (names_from
), and which column (or columns) to get the
cell values from (values_from
).
If values_from
contains multiple values, the value will be added to the
front of the output column.
String added to the start of every variable name. This is
particularly useful if names_from
is a numeric vector and you want to
create syntactic variable names.
If names_from
or values_from
contains multiple
variables, this will be used to join their values together into a single
string to use as a column name.
Instead of names_sep
and names_prefix
, you can supply
a glue specification that uses the names_from
columns (and special
.value
) to create custom column names.
Should the column names be sorted? If FALSE
, the default,
column names are ordered by first appearance.
When names_from
identifies a column (or columns) with
multiple unique values, and multiple values_from
columns are provided,
in what order should the resulting column names be combined?
"fastest"
varies names_from
values fastest, resulting in a column
naming scheme of the form: value1_name1, value1_name2, value2_name1, value2_name2
. This is the default.
"slowest"
varies names_from
values slowest, resulting in a column
naming scheme of the form: value1_name1, value2_name1, value1_name2, value2_name2
.
Should the values in the names_from
columns be expanded
by expand()
before pivoting? This results in more columns, the output
will contain column names corresponding to a complete expansion of all
possible values in names_from
. Implicit factor levels that aren't
represented in the data will become explicit. Additionally, the column
names will be sorted, identical to what names_sort
would produce.
What happens if the output has invalid column names?
The default, "check_unique"
is to error if the columns are duplicated.
Use "minimal"
to allow duplicates in the output, or "unique"
to
de-duplicated by adding numeric suffixes. See vctrs::vec_as_names()
for more options.
Optionally, a (scalar) value that specifies what each
value
should be filled in with when missing.
This can be a named list if you want to apply different fill values to different value columns.
Optionally, a function applied to the value in each cell
in the output. You will typically use this when the combination of
id_cols
and names_from
columns does not uniquely identify an
observation.
This can be a named list if you want to apply different aggregations
to different values_from
columns.
Optionally, a function applied to summarize the values from
the unused columns (i.e. columns not identified by id_cols
,
names_from
, or values_from
).
The default drops all unused columns from the result.
This can be a named list if you want to apply different aggregations to different unused columns.
id_cols
must be supplied for unused_fn
to be useful, since otherwise
all unspecified columns will be considered id_cols
.
This is similar to grouping by the id_cols
then summarizing the
unused columns using unused_fn
.
Additional arguments passed on to methods.
pivot_wider()
is an updated approach to spread()
, designed to be both
simpler to use and to handle more use cases. We recommend you use
pivot_wider()
for new code; spread()
isn't going away but is no longer
under active development.
pivot_wider_spec()
to pivot "by hand" with a data frame that
defines a pivotting specification.
# See vignette("pivot") for examples and explanation
fish_encounters
#> # A tibble: 114 × 3
#> fish station seen
#> <fct> <fct> <int>
#> 1 4842 Release 1
#> 2 4842 I80_1 1
#> 3 4842 Lisbon 1
#> 4 4842 Rstr 1
#> 5 4842 Base_TD 1
#> 6 4842 BCE 1
#> 7 4842 BCW 1
#> 8 4842 BCE2 1
#> 9 4842 BCW2 1
#> 10 4842 MAE 1
#> # … with 104 more rows
fish_encounters %>%
pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 × 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 1 1 1
#> 2 4843 1 1 1 1 1 1 1 1 1 1 1
#> 3 4844 1 1 1 1 1 1 1 1 1 1 1
#> 4 4845 1 1 1 1 1 NA NA NA NA NA NA
#> 5 4847 1 1 1 NA NA NA NA NA NA NA NA
#> 6 4848 1 1 1 1 NA NA NA NA NA NA NA
#> 7 4849 1 1 NA NA NA NA NA NA NA NA NA
#> 8 4850 1 1 NA 1 1 1 1 NA NA NA NA
#> 9 4851 1 1 NA NA NA NA NA NA NA NA NA
#> 10 4854 1 1 NA NA NA NA NA NA NA NA NA
#> 11 4855 1 1 1 1 1 NA NA NA NA NA NA
#> 12 4857 1 1 1 1 1 1 1 1 1 NA NA
#> 13 4858 1 1 1 1 1 1 1 1 1 1 1
#> 14 4859 1 1 1 1 1 NA NA NA NA NA NA
#> 15 4861 1 1 1 1 1 1 1 1 1 1 1
#> 16 4862 1 1 1 1 1 1 1 1 1 NA NA
#> 17 4863 1 1 NA NA NA NA NA NA NA NA NA
#> 18 4864 1 1 NA NA NA NA NA NA NA NA NA
#> 19 4865 1 1 1 NA NA NA NA NA NA NA NA
# Fill in missing values
fish_encounters %>%
pivot_wider(names_from = station, values_from = seen, values_fill = 0)
#> # A tibble: 19 × 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 1 1 1
#> 2 4843 1 1 1 1 1 1 1 1 1 1 1
#> 3 4844 1 1 1 1 1 1 1 1 1 1 1
#> 4 4845 1 1 1 1 1 0 0 0 0 0 0
#> 5 4847 1 1 1 0 0 0 0 0 0 0 0
#> 6 4848 1 1 1 1 0 0 0 0 0 0 0
#> 7 4849 1 1 0 0 0 0 0 0 0 0 0
#> 8 4850 1 1 0 1 1 1 1 0 0 0 0
#> 9 4851 1 1 0 0 0 0 0 0 0 0 0
#> 10 4854 1 1 0 0 0 0 0 0 0 0 0
#> 11 4855 1 1 1 1 1 0 0 0 0 0 0
#> 12 4857 1 1 1 1 1 1 1 1 1 0 0
#> 13 4858 1 1 1 1 1 1 1 1 1 1 1
#> 14 4859 1 1 1 1 1 0 0 0 0 0 0
#> 15 4861 1 1 1 1 1 1 1 1 1 1 1
#> 16 4862 1 1 1 1 1 1 1 1 1 0 0
#> 17 4863 1 1 0 0 0 0 0 0 0 0 0
#> 18 4864 1 1 0 0 0 0 0 0 0 0 0
#> 19 4865 1 1 1 0 0 0 0 0 0 0 0
# Generate column names from multiple variables
us_rent_income
#> # A tibble: 104 × 5
#> GEOID NAME variable estimate moe
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 01 Alabama income 24476 136
#> 2 01 Alabama rent 747 3
#> 3 02 Alaska income 32940 508
#> 4 02 Alaska rent 1200 13
#> 5 04 Arizona income 27517 148
#> 6 04 Arizona rent 972 4
#> 7 05 Arkansas income 23789 165
#> 8 05 Arkansas rent 709 5
#> 9 06 California income 29454 109
#> 10 06 California rent 1358 3
#> # … with 94 more rows
us_rent_income %>%
pivot_wider(
names_from = variable,
values_from = c(estimate, moe)
)
#> # A tibble: 52 × 6
#> GEOID NAME estimate_income estimate_rent moe_income moe_rent
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 747 136 3
#> 2 02 Alaska 32940 1200 508 13
#> 3 04 Arizona 27517 972 148 4
#> 4 05 Arkansas 23789 709 165 5
#> 5 06 California 29454 1358 109 3
#> 6 08 Colorado 32401 1125 109 5
#> 7 09 Connecticut 35326 1123 195 5
#> 8 10 Delaware 31560 1076 247 10
#> 9 11 District of Columbia 43198 1424 681 17
#> 10 12 Florida 25952 1077 70 3
#> # … with 42 more rows
# You can control whether `names_from` values vary fastest or slowest
# relative to the `values_from` column names using `names_vary`.
us_rent_income %>%
pivot_wider(
names_from = variable,
values_from = c(estimate, moe),
names_vary = "slowest"
)
#> # A tibble: 52 × 6
#> GEOID NAME estimate_income moe_income estimate_rent moe_rent
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 136 747 3
#> 2 02 Alaska 32940 508 1200 13
#> 3 04 Arizona 27517 148 972 4
#> 4 05 Arkansas 23789 165 709 5
#> 5 06 California 29454 109 1358 3
#> 6 08 Colorado 32401 109 1125 5
#> 7 09 Connecticut 35326 195 1123 5
#> 8 10 Delaware 31560 247 1076 10
#> 9 11 District of Columbia 43198 681 1424 17
#> 10 12 Florida 25952 70 1077 3
#> # … with 42 more rows
# When there are multiple `names_from` or `values_from`, you can use
# use `names_sep` or `names_glue` to control the output variable names
us_rent_income %>%
pivot_wider(
names_from = variable,
names_sep = ".",
values_from = c(estimate, moe)
)
#> # A tibble: 52 × 6
#> GEOID NAME estimate.income estimate.rent moe.income moe.rent
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 747 136 3
#> 2 02 Alaska 32940 1200 508 13
#> 3 04 Arizona 27517 972 148 4
#> 4 05 Arkansas 23789 709 165 5
#> 5 06 California 29454 1358 109 3
#> 6 08 Colorado 32401 1125 109 5
#> 7 09 Connecticut 35326 1123 195 5
#> 8 10 Delaware 31560 1076 247 10
#> 9 11 District of Columbia 43198 1424 681 17
#> 10 12 Florida 25952 1077 70 3
#> # … with 42 more rows
us_rent_income %>%
pivot_wider(
names_from = variable,
names_glue = "{variable}_{.value}",
values_from = c(estimate, moe)
)
#> # A tibble: 52 × 6
#> GEOID NAME income_estimate rent_estimate income_moe rent_moe
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 747 136 3
#> 2 02 Alaska 32940 1200 508 13
#> 3 04 Arizona 27517 972 148 4
#> 4 05 Arkansas 23789 709 165 5
#> 5 06 California 29454 1358 109 3
#> 6 08 Colorado 32401 1125 109 5
#> 7 09 Connecticut 35326 1123 195 5
#> 8 10 Delaware 31560 1076 247 10
#> 9 11 District of Columbia 43198 1424 681 17
#> 10 12 Florida 25952 1077 70 3
#> # … with 42 more rows
# Can perform aggregation with `values_fn`
warpbreaks <- as_tibble(warpbreaks[c("wool", "tension", "breaks")])
warpbreaks
#> # A tibble: 54 × 3
#> wool tension breaks
#> <fct> <fct> <dbl>
#> 1 A L 26
#> 2 A L 30
#> 3 A L 54
#> 4 A L 25
#> 5 A L 70
#> 6 A L 52
#> 7 A L 51
#> 8 A L 26
#> 9 A L 67
#> 10 A M 18
#> # … with 44 more rows
warpbreaks %>%
pivot_wider(
names_from = wool,
values_from = breaks,
values_fn = mean
)
#> # A tibble: 3 × 3
#> tension A B
#> <fct> <dbl> <dbl>
#> 1 L 44.6 28.2
#> 2 M 24 28.8
#> 3 H 24.6 18.8
# Can pass an anonymous function to `values_fn` when you
# need to supply additional arguments
warpbreaks$breaks[1] <- NA
warpbreaks %>%
pivot_wider(
names_from = wool,
values_from = breaks,
values_fn = ~mean(.x, na.rm = TRUE)
)
#> # A tibble: 3 × 3
#> tension A B
#> <fct> <dbl> <dbl>
#> 1 L 46.9 28.2
#> 2 M 24 28.8
#> 3 H 24.6 18.8