Pivot data from long to wide using a spec — pivot_wider

This is a low level interface to pivotting, inspired by the cdata package, that allows you to describe pivotting with a data frame.

pivot_wider_spec(
  data,
  spec,
  names_repair = "check_unique",
  id_cols = NULL,
  values_fill = NULL,
  values_fn = NULL
)

build_wider_spec(
  data,
  names_from = name,
  values_from = value,
  names_prefix = "",
  names_sep = "_",
  names_glue = NULL,
  names_sort = FALSE
)

Arguments

data	A data frame to pivot.
spec	A specification data frame. This is useful for more complex pivots because it gives you greater control on how metadata stored in the columns become column names in the result. Must be a data frame containing character `.name` and `.value` columns. Additional columns in `spec` should be named to match columns in the long format of the dataset and contain values corresponding to columns pivoted from the wide format. The special `.seq` variable is used to disambiguate rows internally; it is automatically removed after pivotting.
names_repair	What happens if the output has invalid column names? The default, `"check_unique"` is to error if the columns are duplicated. Use `"minimal"` to allow duplicates in the output, or `"unique"` to de-duplicated by adding numeric suffixes. See `vctrs::vec_as_names()` for more options.
id_cols	<`tidy-select`> A set of columns that uniquely identifies each observation. Defaults to all columns in `data` except for the columns specified in `names_from` and `values_from`. Typically used when you have redundant variables, i.e. variables whose values are perfectly correlated with existing variables.
values_fill	Optionally, a (scalar) value that specifies what each `value` should be filled in with when missing. This can be a named list if you want to apply different aggregations to different value columns.
values_fn	Optionally, a function applied to the `value` in each cell in the output. You will typically use this when the combination of `id_cols` and `value` column does not uniquely identify an observation. This can be a named list if you want to apply different aggregations to different value columns.
names_from	<`tidy-select`> A pair of arguments describing which column (or columns) to get the name of the output column (`names_from`), and which column (or columns) to get the cell values from (`values_from`). If `values_from` contains multiple values, the value will be added to the front of the output column.
values_from	<`tidy-select`> A pair of arguments describing which column (or columns) to get the name of the output column (`names_from`), and which column (or columns) to get the cell values from (`values_from`). If `values_from` contains multiple values, the value will be added to the front of the output column.
names_prefix	String added to the start of every variable name. This is particularly useful if `names_from` is a numeric vector and you want to create syntactic variable names.
names_sep	If `names_from` or `values_from` contains multiple variables, this will be used to join their values together into a single string to use as a column name.
names_glue	Instead of `names_sep` and `names_prefix`, you can supply a glue specification that uses the `names_from` columns (and special `.value`) to create custom column names.
names_sort	Should the column names be sorted? If `FALSE`, the default, column names are ordered by first appearance.

Examples

# See vignette("pivot") for examples and explanation

us_rent_income
#> # A tibble: 104 × 5
#>    GEOID NAME       variable estimate   moe
#>    <chr> <chr>      <chr>       <dbl> <dbl>
#>  1 01    Alabama    income      24476   136
#>  2 01    Alabama    rent          747     3
#>  3 02    Alaska     income      32940   508
#>  4 02    Alaska     rent         1200    13
#>  5 04    Arizona    income      27517   148
#>  6 04    Arizona    rent          972     4
#>  7 05    Arkansas   income      23789   165
#>  8 05    Arkansas   rent          709     5
#>  9 06    California income      29454   109
#> 10 06    California rent         1358     3
#> # … with 94 more rows
spec1 <- us_rent_income %>%
  build_wider_spec(names_from = variable, values_from = c(estimate, moe))
spec1
#> # A tibble: 4 × 3
#>   .name           .value   variable
#>   <chr>           <chr>    <chr>   
#> 1 estimate_income estimate income  
#> 2 estimate_rent   estimate rent    
#> 3 moe_income      moe      income  
#> 4 moe_rent        moe      rent    

us_rent_income %>%
  pivot_wider_spec(spec1)
#> # A tibble: 52 × 6
#>    GEOID NAME                 estimate_income estimate_rent moe_income moe_rent
#>    <chr> <chr>                          <dbl>         <dbl>      <dbl>    <dbl>
#>  1 01    Alabama                        24476           747        136        3
#>  2 02    Alaska                         32940          1200        508       13
#>  3 04    Arizona                        27517           972        148        4
#>  4 05    Arkansas                       23789           709        165        5
#>  5 06    California                     29454          1358        109        3
#>  6 08    Colorado                       32401          1125        109        5
#>  7 09    Connecticut                    35326          1123        195        5
#>  8 10    Delaware                       31560          1076        247       10
#>  9 11    District of Columbia           43198          1424        681       17
#> 10 12    Florida                        25952          1077         70        3
#> # … with 42 more rows

# Is equivalent to
us_rent_income %>%
  pivot_wider(names_from = variable, values_from = c(estimate, moe))
#> # A tibble: 52 × 6
#>    GEOID NAME                 estimate_income estimate_rent moe_income moe_rent
#>    <chr> <chr>                          <dbl>         <dbl>      <dbl>    <dbl>
#>  1 01    Alabama                        24476           747        136        3
#>  2 02    Alaska                         32940          1200        508       13
#>  3 04    Arizona                        27517           972        148        4
#>  4 05    Arkansas                       23789           709        165        5
#>  5 06    California                     29454          1358        109        3
#>  6 08    Colorado                       32401          1125        109        5
#>  7 09    Connecticut                    35326          1123        195        5
#>  8 10    Delaware                       31560          1076        247       10
#>  9 11    District of Columbia           43198          1424        681       17
#> 10 12    Florida                        25952          1077         70        3
#> # … with 42 more rows

# `pivot_wider_spec()` provides more control over column names and output format
# instead of creating columns with estimate_ and moe_ prefixes,
# keep original variable name for estimates and attach _moe as suffix
spec2 <- tibble(
  .name = c("income", "rent", "income_moe", "rent_moe"),
  .value = c("estimate", "estimate", "moe", "moe"),
  variable = c("income", "rent", "income", "rent")
)

us_rent_income %>%
  pivot_wider_spec(spec2)
#> # A tibble: 52 × 6
#>    GEOID NAME                 income  rent income_moe rent_moe
#>    <chr> <chr>                 <dbl> <dbl>      <dbl>    <dbl>
#>  1 01    Alabama               24476   747        136        3
#>  2 02    Alaska                32940  1200        508       13
#>  3 04    Arizona               27517   972        148        4
#>  4 05    Arkansas              23789   709        165        5
#>  5 06    California            29454  1358        109        3
#>  6 08    Colorado              32401  1125        109        5
#>  7 09    Connecticut           35326  1123        195        5
#>  8 10    Delaware              31560  1076        247       10
#>  9 11    District of Columbia  43198  1424        681       17
#> 10 12    Florida               25952  1077         70        3
#> # … with 42 more rows