Pivot data from wide to long — pivot

pivot_longer() "lengthens" data, increasing the number of rows and decreasing the number of columns. The inverse transformation is pivot_wider()

Learn more in vignette("pivot").

pivot_longer(
  data,
  cols,
  names_to = "name",
  names_prefix = NULL,
  names_sep = NULL,
  names_pattern = NULL,
  names_ptypes = list(),
  names_transform = list(),
  names_repair = "check_unique",
  values_to = "value",
  values_drop_na = FALSE,
  values_ptypes = list(),
  values_transform = list(),
  ...
)

Arguments

data	A data frame to pivot.
cols	<`tidy-select`> Columns to pivot into longer format.
names_to	A string specifying the name of the column to create from the data stored in the column names of `data`. Can be a character vector, creating multiple columns, if `names_sep` or `names_pattern` is provided. In this case, there are two special values you can take advantage of: `NA` will discard that component of the name. `.value` indicates that component of the name defines the name of the column containing the cell values, overriding `values_to`.
names_prefix	A regular expression used to remove matching text from the start of each variable name.
names_sep, names_pattern	If `names_to` contains multiple values, these arguments control how the column name is broken up. `names_sep` takes the same specification as `separate()`, and can either be a numeric vector (specifying positions to break on), or a single string (specifying a regular expression to split on). `names_pattern` takes the same specification as `extract()`, a regular expression containing matching groups (`()`). If these arguments do not give you enough control, use `pivot_longer_spec()` to create a spec object and process manually as needed.
names_ptypes, values_ptypes	A list of column name-prototype pairs. A prototype (or ptype for short) is a zero-length vector (like `integer()` or `numeric()`) that defines the type, class, and attributes of a vector. Use these arguments if you want to confirm that the created columns are the types that you expect. Note that if you want to change (instead of confirm) the types of specific columns, you should use `names_transform` or `values_transform` instead.
names_transform, values_transform	A list of column name-function pairs. Use these arguments if you need to change the types of specific columns. For example, `names_transform = list(week = as.integer)` would convert a character variable called `week` to an integer. If not specified, the type of the columns generated from `names_to` will be character, and the type of the variables generated from `values_to` will be the common type of the input columns used to generate them.
names_repair	What happens if the output has invalid column names? The default, `"check_unique"` is to error if the columns are duplicated. Use `"minimal"` to allow duplicates in the output, or `"unique"` to de-duplicated by adding numeric suffixes. See `vctrs::vec_as_names()` for more options.
values_to	A string specifying the name of the column to create from the data stored in cell values. If `names_to` is a character containing the special `.value` sentinel, this value will be ignored, and the name of the value column will be derived from part of the existing column names.
values_drop_na	If `TRUE`, will drop rows that contain only `NA`s in the `value_to` column. This effectively converts explicit missing values to implicit missing values, and should generally be used only when missing values in `data` were created by its structure.
...	Additional arguments passed on to methods.

Details

pivot_longer() is an updated approach to gather(), designed to be both simpler to use and to handle more use cases. We recommend you use pivot_longer() for new code; gather() isn't going away but is no longer under active development.

Examples

# See vignette("pivot") for examples and explanation

# Simplest case where column names are character data
relig_income
#> # A tibble: 18 × 11
#>    religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` `$75-100k`
#>    <chr>      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>      <dbl>
#>  1 Agnostic      27        34        60        81        76       137        122
#>  2 Atheist       12        27        37        52        35        70         73
#>  3 Buddhist      27        21        30        34        33        58         62
#>  4 Catholic     418       617       732       670       638      1116        949
#>  5 Don’t k…      15        14        15        11        10        35         21
#>  6 Evangel…     575       869      1064       982       881      1486        949
#>  7 Hindu          1         9         7         9        11        34         47
#>  8 Histori…     228       244       236       238       197       223        131
#>  9 Jehovah…      20        27        24        24        21        30         15
#> 10 Jewish        19        19        25        25        30        95         69
#> 11 Mainlin…     289       495       619       655       651      1107        939
#> 12 Mormon        29        40        48        51        56       112         85
#> 13 Muslim         6         7         9        10         9        23         16
#> 14 Orthodox      13        17        23        32        32        47         38
#> 15 Other C…       9         7        11        13        13        14         18
#> 16 Other F…      20        33        40        46        49        63         46
#> 17 Other W…       5         2         3         4         2         7          3
#> 18 Unaffil…     217       299       374       365       341       528        407
#> # … with 3 more variables: $100-150k <dbl>, >150k <dbl>,
#> #   Don't know/refused <dbl>
relig_income %>%
  pivot_longer(!religion, names_to = "income", values_to = "count")
#> # A tibble: 180 × 3
#>    religion income             count
#>    <chr>    <chr>              <dbl>
#>  1 Agnostic <$10k                 27
#>  2 Agnostic $10-20k               34
#>  3 Agnostic $20-30k               60
#>  4 Agnostic $30-40k               81
#>  5 Agnostic $40-50k               76
#>  6 Agnostic $50-75k              137
#>  7 Agnostic $75-100k             122
#>  8 Agnostic $100-150k            109
#>  9 Agnostic >150k                 84
#> 10 Agnostic Don't know/refused    96
#> # … with 170 more rows

# Slightly more complex case where columns have common prefix,
# and missing missings are structural so should be dropped.
billboard
#> # A tibble: 317 × 79
#>    artist   track   date.entered   wk1   wk2   wk3   wk4   wk5   wk6   wk7   wk8
#>    <chr>    <chr>   <date>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 2 Pac    Baby D… 2000-02-26      87    82    72    77    87    94    99    NA
#>  2 2Ge+her  The Ha… 2000-09-02      91    87    92    NA    NA    NA    NA    NA
#>  3 3 Doors… Krypto… 2000-04-08      81    70    68    67    66    57    54    53
#>  4 3 Doors… Loser   2000-10-21      76    76    72    69    67    65    55    59
#>  5 504 Boyz Wobble… 2000-04-15      57    34    25    17    17    31    36    49
#>  6 98^0     Give M… 2000-08-19      51    39    34    26    26    19     2     2
#>  7 A*Teens  Dancin… 2000-07-08      97    97    96    95   100    NA    NA    NA
#>  8 Aaliyah  I Don'… 2000-01-29      84    62    51    41    38    35    35    38
#>  9 Aaliyah  Try Ag… 2000-03-18      59    53    38    28    21    18    16    14
#> 10 Adams, … Open M… 2000-08-26      76    76    74    69    68    67    61    58
#> # … with 307 more rows, and 68 more variables: wk9 <dbl>, wk10 <dbl>,
#> #   wk11 <dbl>, wk12 <dbl>, wk13 <dbl>, wk14 <dbl>, wk15 <dbl>, wk16 <dbl>,
#> #   wk17 <dbl>, wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>,
#> #   wk23 <dbl>, wk24 <dbl>, wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>,
#> #   wk29 <dbl>, wk30 <dbl>, wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>,
#> #   wk35 <dbl>, wk36 <dbl>, wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>,
#> #   wk41 <dbl>, wk42 <dbl>, wk43 <dbl>, wk44 <dbl>, wk45 <dbl>, wk46 <dbl>, …
billboard %>%
 pivot_longer(
   cols = starts_with("wk"),
   names_to = "week",
   names_prefix = "wk",
   values_to = "rank",
   values_drop_na = TRUE
 )
#> # A tibble: 5,307 × 5
#>    artist  track                   date.entered week   rank
#>    <chr>   <chr>                   <date>       <chr> <dbl>
#>  1 2 Pac   Baby Don't Cry (Keep... 2000-02-26   1        87
#>  2 2 Pac   Baby Don't Cry (Keep... 2000-02-26   2        82
#>  3 2 Pac   Baby Don't Cry (Keep... 2000-02-26   3        72
#>  4 2 Pac   Baby Don't Cry (Keep... 2000-02-26   4        77
#>  5 2 Pac   Baby Don't Cry (Keep... 2000-02-26   5        87
#>  6 2 Pac   Baby Don't Cry (Keep... 2000-02-26   6        94
#>  7 2 Pac   Baby Don't Cry (Keep... 2000-02-26   7        99
#>  8 2Ge+her The Hardest Part Of ... 2000-09-02   1        91
#>  9 2Ge+her The Hardest Part Of ... 2000-09-02   2        87
#> 10 2Ge+her The Hardest Part Of ... 2000-09-02   3        92
#> # … with 5,297 more rows

# Multiple variables stored in column names
who %>% pivot_longer(
  cols = new_sp_m014:newrel_f65,
  names_to = c("diagnosis", "gender", "age"),
  names_pattern = "new_?(.*)_(.)(.*)",
  values_to = "count"
)
#> # A tibble: 405,440 × 8
#>    country     iso2  iso3   year diagnosis gender age   count
#>    <chr>       <chr> <chr> <int> <chr>     <chr>  <chr> <int>
#>  1 Afghanistan AF    AFG    1980 sp        m      014      NA
#>  2 Afghanistan AF    AFG    1980 sp        m      1524     NA
#>  3 Afghanistan AF    AFG    1980 sp        m      2534     NA
#>  4 Afghanistan AF    AFG    1980 sp        m      3544     NA
#>  5 Afghanistan AF    AFG    1980 sp        m      4554     NA
#>  6 Afghanistan AF    AFG    1980 sp        m      5564     NA
#>  7 Afghanistan AF    AFG    1980 sp        m      65       NA
#>  8 Afghanistan AF    AFG    1980 sp        f      014      NA
#>  9 Afghanistan AF    AFG    1980 sp        f      1524     NA
#> 10 Afghanistan AF    AFG    1980 sp        f      2534     NA
#> # … with 405,430 more rows

# Multiple observations per row
anscombe
#>    x1 x2 x3 x4    y1   y2    y3    y4
#> 1  10 10 10  8  8.04 9.14  7.46  6.58
#> 2   8  8  8  8  6.95 8.14  6.77  5.76
#> 3  13 13 13  8  7.58 8.74 12.74  7.71
#> 4   9  9  9  8  8.81 8.77  7.11  8.84
#> 5  11 11 11  8  8.33 9.26  7.81  8.47
#> 6  14 14 14  8  9.96 8.10  8.84  7.04
#> 7   6  6  6  8  7.24 6.13  6.08  5.25
#> 8   4  4  4 19  4.26 3.10  5.39 12.50
#> 9  12 12 12  8 10.84 9.13  8.15  5.56
#> 10  7  7  7  8  4.82 7.26  6.42  7.91
#> 11  5  5  5  8  5.68 4.74  5.73  6.89
anscombe %>%
 pivot_longer(everything(),
   names_to = c(".value", "set"),
   names_pattern = "(.)(.)"
 )
#> # A tibble: 44 × 3
#>    set       x     y
#>    <chr> <dbl> <dbl>
#>  1 1        10  8.04
#>  2 2        10  9.14
#>  3 3        10  7.46
#>  4 4         8  6.58
#>  5 1         8  6.95
#>  6 2         8  8.14
#>  7 3         8  6.77
#>  8 4         8  5.76
#>  9 1        13  7.58
#> 10 2        13  8.74
#> # … with 34 more rows