Split data frame by groups — group_split • dplyr

group_split() works like base::split() but

it uses the grouping structure from group_by() and therefore is subject to the data mask
it does not name the elements of the list based on the grouping as this typically loses information and is confusing.

group_keys() explains the grouping structure, by returning a data frame that has one row per group and one column per grouping variable.

group_split(.tbl, ..., .keep = TRUE)

Arguments

.tbl	A tbl
...	Grouping specification, forwarded to `group_by()`
.keep	Should the grouping columns be kept

Value

group_split() returns a list of tibbles. Each tibble contains the rows of .tbl for the associated group and all the columns, including the grouping variables.
group_keys() returns a tibble with one row per group, and one column per grouping variable

Grouped data frames

The primary use case for group_split() is with already grouped data frames, typically a result of group_by(). In this case group_split() only uses the first argument, the grouped tibble, and warns when ... is used.

Because some of these groups may be empty, it is best paired with group_keys() which identifies the representatives of each grouping variable for the group.

Ungrouped data frames

When used on ungrouped data frames, group_split() and group_keys() forwards the ... to group_by() before the split, therefore the ... are subject to the data mask.

Using these functions on an ungrouped data frame only makes sense if you need only one or the other, because otherwise the grouping algorithm is performed each time.

Rowwise data frames

group_split() returns a list of one-row tibbles is returned, and the ... are ignored and warned against

See also

Other grouping functions: group_by(), group_map(), group_nest(), group_trim()

Examples

# ----- use case 1 : on an already grouped tibble
ir <- iris %>%
  group_by(Species)

group_split(ir)
#> <list_of<
#>   tbl_df<
#>     Sepal.Length: double
#>     Sepal.Width : double
#>     Petal.Length: double
#>     Petal.Width : double
#>     Species     : factor<fb977>
#>   >
#> >[3]>
#> [[1]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 40 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>     
#>  1          7           3.2          4.7         1.4 versicolor
#>  2          6.4         3.2          4.5         1.5 versicolor
#>  3          6.9         3.1          4.9         1.5 versicolor
#>  4          5.5         2.3          4           1.3 versicolor
#>  5          6.5         2.8          4.6         1.5 versicolor
#>  6          5.7         2.8          4.5         1.3 versicolor
#>  7          6.3         3.3          4.7         1.6 versicolor
#>  8          4.9         2.4          3.3         1   versicolor
#>  9          6.6         2.9          4.6         1.3 versicolor
#> 10          5.2         2.7          3.9         1.4 versicolor
#> # … with 40 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>    
#>  1          6.3         3.3          6           2.5 virginica
#>  2          5.8         2.7          5.1         1.9 virginica
#>  3          7.1         3            5.9         2.1 virginica
#>  4          6.3         2.9          5.6         1.8 virginica
#>  5          6.5         3            5.8         2.2 virginica
#>  6          7.6         3            6.6         2.1 virginica
#>  7          4.9         2.5          4.5         1.7 virginica
#>  8          7.3         2.9          6.3         1.8 virginica
#>  9          6.7         2.5          5.8         1.8 virginica
#> 10          7.2         3.6          6.1         2.5 virginica
#> # … with 40 more rows
#> 
group_keys(ir)
#> # A tibble: 3 x 1
#>   Species   
#>   <fct>     
#> 1 setosa    
#> 2 versicolor
#> 3 virginica 

# this can be useful if the grouped data has been altered before the split
ir <- iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > mean(Sepal.Length))

group_split(ir)
#> <list_of<
#>   tbl_df<
#>     Sepal.Length: double
#>     Sepal.Width : double
#>     Petal.Length: double
#>     Petal.Width : double
#>     Species     : factor<fb977>
#>   >
#> >[3]>
#> [[1]]
#> # A tibble: 22 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          5.4         3.9          1.7         0.4 setosa 
#>  3          5.4         3.7          1.5         0.2 setosa 
#>  4          5.8         4            1.2         0.2 setosa 
#>  5          5.7         4.4          1.5         0.4 setosa 
#>  6          5.4         3.9          1.3         0.4 setosa 
#>  7          5.1         3.5          1.4         0.3 setosa 
#>  8          5.7         3.8          1.7         0.3 setosa 
#>  9          5.1         3.8          1.5         0.3 setosa 
#> 10          5.4         3.4          1.7         0.2 setosa 
#> # … with 12 more rows
#> 
#> [[2]]
#> # A tibble: 24 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>     
#>  1          7           3.2          4.7         1.4 versicolor
#>  2          6.4         3.2          4.5         1.5 versicolor
#>  3          6.9         3.1          4.9         1.5 versicolor
#>  4          6.5         2.8          4.6         1.5 versicolor
#>  5          6.3         3.3          4.7         1.6 versicolor
#>  6          6.6         2.9          4.6         1.3 versicolor
#>  7          6           2.2          4           1   versicolor
#>  8          6.1         2.9          4.7         1.4 versicolor
#>  9          6.7         3.1          4.4         1.4 versicolor
#> 10          6.2         2.2          4.5         1.5 versicolor
#> # … with 14 more rows
#> 
#> [[3]]
#> # A tibble: 22 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>    
#>  1          7.1         3            5.9         2.1 virginica
#>  2          7.6         3            6.6         2.1 virginica
#>  3          7.3         2.9          6.3         1.8 virginica
#>  4          6.7         2.5          5.8         1.8 virginica
#>  5          7.2         3.6          6.1         2.5 virginica
#>  6          6.8         3            5.5         2.1 virginica
#>  7          7.7         3.8          6.7         2.2 virginica
#>  8          7.7         2.6          6.9         2.3 virginica
#>  9          6.9         3.2          5.7         2.3 virginica
#> 10          7.7         2.8          6.7         2   virginica
#> # … with 12 more rows
#> 
group_keys(ir)
#> # A tibble: 3 x 1
#>   Species   
#>   <fct>     
#> 1 setosa    
#> 2 versicolor
#> 3 virginica 

# ----- use case 2: using a group_by() grouping specification

# both group_split() and group_keys() have to perform the grouping
# so it only makes sense to do this if you only need one or the other
iris %>%
  group_split(Species)
#> <list_of<
#>   tbl_df<
#>     Sepal.Length: double
#>     Sepal.Width : double
#>     Petal.Length: double
#>     Petal.Width : double
#>     Species     : factor<fb977>
#>   >
#> >[3]>
#> [[1]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 40 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>     
#>  1          7           3.2          4.7         1.4 versicolor
#>  2          6.4         3.2          4.5         1.5 versicolor
#>  3          6.9         3.1          4.9         1.5 versicolor
#>  4          5.5         2.3          4           1.3 versicolor
#>  5          6.5         2.8          4.6         1.5 versicolor
#>  6          5.7         2.8          4.5         1.3 versicolor
#>  7          6.3         3.3          4.7         1.6 versicolor
#>  8          4.9         2.4          3.3         1   versicolor
#>  9          6.6         2.9          4.6         1.3 versicolor
#> 10          5.2         2.7          3.9         1.4 versicolor
#> # … with 40 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>    
#>  1          6.3         3.3          6           2.5 virginica
#>  2          5.8         2.7          5.1         1.9 virginica
#>  3          7.1         3            5.9         2.1 virginica
#>  4          6.3         2.9          5.6         1.8 virginica
#>  5          6.5         3            5.8         2.2 virginica
#>  6          7.6         3            6.6         2.1 virginica
#>  7          4.9         2.5          4.5         1.7 virginica
#>  8          7.3         2.9          6.3         1.8 virginica
#>  9          6.7         2.5          5.8         1.8 virginica
#> 10          7.2         3.6          6.1         2.5 virginica
#> # … with 40 more rows
#> 

iris %>%
  group_keys(Species)
#> Warning: The `...` argument of `group_keys()` is deprecated as of dplyr 1.0.0.
#> Please `group_by()` first
#> # A tibble: 3 x 1
#>   Species   
#>   <fct>     
#> 1 setosa    
#> 2 versicolor
#> 3 virginica