Count observations by group — count • dplyr

count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()). count() is paired with tally(), a lower-level helper that is equivalent to df %>% summarise(n = n()). Supply wt to perform weighted counts, switching the summary from n = n() to n = sum(wt).

add_count() and add_tally() are equivalents to count() and tally() but use mutate() instead of summarise() so that they add a new column with group-wise counts.

count(x, ..., wt = NULL, sort = FALSE, name = NULL)

tally(x, wt = NULL, sort = FALSE, name = NULL)

add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = deprecated())

add_tally(x, wt = NULL, sort = FALSE, name = NULL)

Arguments

x	A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).
...	<`data-masking`> Variables to group by.
wt	<`data-masking`> Frequency weights. Can be `NULL` or a variable: If `NULL` (the default), counts the number of rows in each group. If a variable, computes `sum(wt)` for each group.
sort	If `TRUE`, will show the largest groups at the top.
name	The name of the new column in the output. If omitted, it will default to `n`. If there's already a column called `n`, it will error, and require you to specify the name.
.drop	For `count()`: if `FALSE` will include counts for empty groups (i.e. for levels of factors that don't exist in the data). Deprecated in `add_count()` since it didn't actually affect the output.

Value

An object of the same type as .data. count() and add_count() group transiently, so the output has the same groups as the input.

Examples

# count() is a convenient way to get a sense of the distribution of
# values in a dataset
starwars %>% count(species)
#> # A tibble: 38 x 2
#>    species       n
#>    <chr>     <int>
#>  1 Aleena        1
#>  2 Besalisk      1
#>  3 Cerean        1
#>  4 Chagrian      1
#>  5 Clawdite      1
#>  6 Droid         6
#>  7 Dug           1
#>  8 Ewok          1
#>  9 Geonosian     1
#> 10 Gungan        3
#> # … with 28 more rows
starwars %>% count(species, sort = TRUE)
#> # A tibble: 38 x 2
#>    species      n
#>    <chr>    <int>
#>  1 Human       35
#>  2 Droid        6
#>  3 NA           4
#>  4 Gungan       3
#>  5 Kaminoan     2
#>  6 Mirialan     2
#>  7 Twi'lek      2
#>  8 Wookiee      2
#>  9 Zabrak       2
#> 10 Aleena       1
#> # … with 28 more rows
starwars %>% count(sex, gender, sort = TRUE)
#> # A tibble: 6 x 3
#>   sex            gender        n
#>   <chr>          <chr>     <int>
#> 1 male           masculine    60
#> 2 female         feminine     16
#> 3 none           masculine     5
#> 4 NA             NA            4
#> 5 hermaphroditic masculine     1
#> 6 none           feminine      1
starwars %>% count(birth_decade = round(birth_year, -1))
#> # A tibble: 15 x 2
#>    birth_decade     n
#>           <dbl> <int>
#>  1           10     1
#>  2           20     6
#>  3           30     4
#>  4           40     6
#>  5           50     8
#>  6           60     4
#>  7           70     4
#>  8           80     2
#>  9           90     3
#> 10          100     1
#> 11          110     1
#> 12          200     1
#> 13          600     1
#> 14          900     1
#> 15           NA    44

# use the `wt` argument to perform a weighted count. This is useful
# when the data has already been aggregated once
df <- tribble(
  ~name,    ~gender,   ~runs,
  "Max",    "male",       10,
  "Sandra", "female",      1,
  "Susan",  "female",      4
)
# counts rows:
df %>% count(gender)
#> # A tibble: 2 x 2
#>   gender     n
#>   <chr>  <int>
#> 1 female     2
#> 2 male       1
# counts runs:
df %>% count(gender, wt = runs)
#> # A tibble: 2 x 2
#>   gender     n
#>   <chr>  <dbl>
#> 1 female     5
#> 2 male      10

# tally() is a lower-level function that assumes you've done the grouping
starwars %>% tally()
#> # A tibble: 1 x 1
#>       n
#>   <int>
#> 1    87
starwars %>% group_by(species) %>% tally()
#> # A tibble: 38 x 2
#>    species       n
#>    <chr>     <int>
#>  1 Aleena        1
#>  2 Besalisk      1
#>  3 Cerean        1
#>  4 Chagrian      1
#>  5 Clawdite      1
#>  6 Droid         6
#>  7 Dug           1
#>  8 Ewok          1
#>  9 Geonosian     1
#> 10 Gungan        3
#> # … with 28 more rows

# both count() and tally() have add_ variants that work like
# mutate() instead of summarise
df %>% add_count(gender, wt = runs)
#> # A tibble: 3 x 4
#>   name   gender  runs     n
#>   <chr>  <chr>  <dbl> <dbl>
#> 1 Max    male      10    10
#> 2 Sandra female     1     5
#> 3 Susan  female     4     5
df %>% add_tally(wt = runs)
#> # A tibble: 3 x 4
#>   name   gender  runs     n
#>   <chr>  <chr>  <dbl> <dbl>
#> 1 Max    male      10    15
#> 2 Sandra female     1    15
#> 3 Susan  female     4    15