expand()
generates all combination of variables found in a dataset.
It is paired with nesting()
and crossing()
helpers. crossing()
is a wrapper around expand_grid()
that de-duplicates and sorts its inputs;
nesting()
is a helper that only finds combinations already present in the
data.
expand()
is often useful in conjunction with joins:
use it with right_join()
to convert implicit missing values to
explicit missing values (e.g., fill in gaps in your data frame).
use it with anti_join()
to figure out which combinations are missing
(e.g., identify gaps in your data frame).
expand(data, ..., .name_repair = "check_unique") crossing(..., .name_repair = "check_unique") nesting(..., .name_repair = "check_unique")
data | A data frame. |
---|---|
... | Specification of columns to expand. Columns can be atomic vectors or lists.
When used with factors, When used with continuous variables, you may need to fill in values
that do not appear in the data: to do so use expressions like
|
.name_repair | Treatment of problematic column names:
This argument is passed on as |
complete()
to expand list objects. expand_grid()
to input vectors rather than a data frame.
fruits <- tibble( type = c("apple", "orange", "apple", "orange", "orange", "orange"), year = c(2010, 2010, 2012, 2010, 2010, 2012), size = factor( c("XS", "S", "M", "S", "S", "M"), levels = c("XS", "S", "M", "L") ), weights = rnorm(6, as.numeric(size) + 2) ) # All possible combinations --------------------------------------- # Note that all defined, but not necessarily present, levels of the # factor variable `size` are retained. fruits %>% expand(type)#> # A tibble: 2 × 1 #> type #> <chr> #> 1 apple #> 2 orangefruits %>% expand(type, size)#> # A tibble: 8 × 2 #> type size #> <chr> <fct> #> 1 apple XS #> 2 apple S #> 3 apple M #> 4 apple L #> 5 orange XS #> 6 orange S #> 7 orange M #> 8 orange Lfruits %>% expand(type, size, year)#> # A tibble: 16 × 3 #> type size year #> <chr> <fct> <dbl> #> 1 apple XS 2010 #> 2 apple XS 2012 #> 3 apple S 2010 #> 4 apple S 2012 #> 5 apple M 2010 #> 6 apple M 2012 #> 7 apple L 2010 #> 8 apple L 2012 #> 9 orange XS 2010 #> 10 orange XS 2012 #> 11 orange S 2010 #> 12 orange S 2012 #> 13 orange M 2010 #> 14 orange M 2012 #> 15 orange L 2010 #> 16 orange L 2012# Only combinations that already appear in the data --------------- fruits %>% expand(nesting(type))#> # A tibble: 2 × 1 #> type #> <chr> #> 1 apple #> 2 orangefruits %>% expand(nesting(type, size))#> # A tibble: 4 × 2 #> type size #> <chr> <fct> #> 1 apple XS #> 2 apple M #> 3 orange S #> 4 orange Mfruits %>% expand(nesting(type, size, year))#> # A tibble: 4 × 3 #> type size year #> <chr> <fct> <dbl> #> 1 apple XS 2010 #> 2 apple M 2012 #> 3 orange S 2010 #> 4 orange M 2012# Other uses ------------------------------------------------------- # Use with `full_seq()` to fill in values of continuous variables fruits %>% expand(type, size, full_seq(year, 1))#> # A tibble: 24 × 3 #> type size `full_seq(year, 1)` #> <chr> <fct> <dbl> #> 1 apple XS 2010 #> 2 apple XS 2011 #> 3 apple XS 2012 #> 4 apple S 2010 #> 5 apple S 2011 #> 6 apple S 2012 #> 7 apple M 2010 #> 8 apple M 2011 #> 9 apple M 2012 #> 10 apple L 2010 #> # … with 14 more rowsfruits %>% expand(type, size, 2010:2012)#> # A tibble: 24 × 3 #> type size `2010:2012` #> <chr> <fct> <int> #> 1 apple XS 2010 #> 2 apple XS 2011 #> 3 apple XS 2012 #> 4 apple S 2010 #> 5 apple S 2011 #> 6 apple S 2012 #> 7 apple M 2010 #> 8 apple M 2011 #> 9 apple M 2012 #> 10 apple L 2010 #> # … with 14 more rows# Use `anti_join()` to determine which observations are missing all <- fruits %>% expand(type, size, year) all#> # A tibble: 16 × 3 #> type size year #> <chr> <fct> <dbl> #> 1 apple XS 2010 #> 2 apple XS 2012 #> 3 apple S 2010 #> 4 apple S 2012 #> 5 apple M 2010 #> 6 apple M 2012 #> 7 apple L 2010 #> 8 apple L 2012 #> 9 orange XS 2010 #> 10 orange XS 2012 #> 11 orange S 2010 #> 12 orange S 2012 #> 13 orange M 2010 #> 14 orange M 2012 #> 15 orange L 2010 #> 16 orange L 2012#>#> # A tibble: 12 × 3 #> type size year #> <chr> <fct> <dbl> #> 1 apple XS 2012 #> 2 apple S 2010 #> 3 apple S 2012 #> 4 apple M 2010 #> 5 apple L 2010 #> 6 apple L 2012 #> 7 orange XS 2010 #> 8 orange XS 2012 #> 9 orange S 2012 #> 10 orange M 2010 #> 11 orange L 2010 #> 12 orange L 2012#>#> # A tibble: 18 × 4 #> type year size weights #> <chr> <dbl> <fct> <dbl> #> 1 apple 2010 XS 1.60 #> 2 orange 2010 S 4.26 #> 3 apple 2012 M 2.56 #> 4 orange 2010 S 3.99 #> 5 orange 2010 S 4.62 #> 6 orange 2012 M 6.15 #> 7 apple 2012 XS NA #> 8 apple 2010 S NA #> 9 apple 2012 S NA #> 10 apple 2010 M NA #> 11 apple 2010 L NA #> 12 apple 2012 L NA #> 13 orange 2010 XS NA #> 14 orange 2012 XS NA #> 15 orange 2012 S NA #> 16 orange 2010 M NA #> 17 orange 2010 L NA #> 18 orange 2012 L NA