We can use the gt summary_rows() function to insert summary rows into a table. There are two main types of summary rows: (1) groupwise summaries, and (2) the grand summary. The groupwise summaries operate on one or more row groups, which can themselves be generated in a variety of ways:

We generate the summary data through specification of aggregation functions. You choose how to format the values in the resulting summary cells by use of a formatter function (e.g, fmt_number()) and any relevant options. These summary rows are automatically inserted within the relevant row groups (for groupwise summaries) or at the bottom of the table (as a grand summary), where each summary row is the result of a different aggregation function.

Preparing the Input Data Table

Let’s use the exibble dataset (included in the gt package) to demonstrate how summary rows can be added. That table contains categorical columns named row and group, along several columns with varying data types. Here is a preview of the exibble dataset using solely the gt() function with no other options:

exibble %>% gt()
num char fctr date time datetime currency row group
1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950 row_1 grp_a
2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950 row_2 grp_a
3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390 row_3 grp_a
4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000 row_4 grp_a
5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810 row_5 grp_b
NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255 row_6 grp_b
7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA row_7 grp_b
8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440 row_8 grp_b

We’ll create a table stub with both row labels (using the row column) and row groups (using the group column). The end result will be a table organized with labeled rows that are grouped together (the row group labels identify each of the row groups). To make the examples a bit easier to follow, some of the columns in exibble will first be dropped through a dplyr::select() statement.

# Create a gt table using the `exibble` dataset
exibble_a <-
  exibble %>%
  select(-c(fctr, date, time, datetime)) %>%
  gt(rowname_col = "row", groupname_col = "group") %>%
  fmt_missing(columns = everything())

exibble_a
num char currency
grp_a
row_1 1.111e-01 apricot 49.950
row_2 2.222e+00 banana 17.950
row_3 3.333e+01 coconut 1.390
row_4 4.444e+02 durian 65100.000
grp_b
row_5 5.550e+03 1325.810
row_6 fig 13.255
row_7 7.770e+05 grapefruit
row_8 8.880e+06 honeydew 0.440

There are two groups in this data table: grp_a and grp_b. This gives us flexibility to create both a grand summary and groupwise summary rows.

Generating Groupwise Summary Rows

Groupwise summary rows can be generated by using the summary_rows() function and, importantly, through specifying the groups that will receive summary rows. We can provide a vector group names, as in c("grp_a", "grp_b"), or, use TRUE to signify that all groups should receive summary rows. Aside from the selection of groups, there is control over which columns are to used for the summary. Since each call to summary_rows() only performs one set of aggregation functions, we may want specific aggregations for different subsets of columns.

To make any sort of summary, we need to use functions that will perform the aggregation. We can provide base functions such as mean(), sum(), min(), max(), and more, within a list(). Each function provided will result in a summary row for each group.

Because each function will yield a row, we need to be able to label that row. So, each summary row will receive a summary row label. We can provide our preferred names by naming the functions within the list (e.g, list(average = "mean", total = "sum", SD = "sd")).

# Create groupwise summary rows for both
# groups (using `groups = TRUE`); use the
# `mean()`, `sum()`, and `sd()` functions
# (only for the `num` column)
exibble_b <- 
  exibble_a %>%
  summary_rows(
    groups = TRUE,
    columns = num,
    fns = list(
      average = "mean",
      total = "sum",
      SD = "sd")
  )

exibble_b
num char currency
grp_a
row_1 1.111e-01 apricot 49.950
row_2 2.222e+00 banana 17.950
row_3 3.333e+01 coconut 1.390
row_4 4.444e+02 durian 65100.000
average 120.02
total 480.06
SD 216.79
grp_b
row_5 5.550e+03 1325.810
row_6 fig 13.255
row_7 7.770e+05 grapefruit
row_8 8.880e+06 honeydew 0.440
average
total
SD

We can specify the aggregation functions by use of function names in quotes (e.g., "sum"), as bare functions (e.g., sum), or as one-sided R formulas using a leading ~. In the formula representation, a . serves as the data to be summarized, so we can use sum(., na.rm = TRUE). The use of named arguments is recommended as those names will serve as summary row labels (the labels can derived from the function names but only when not providing bare function names).

Now that the summary_rows() function has been somewhat explained, let’s look at how we can create groupwise summary rows for the exibble_a table. We’ll create summaries for both available groups (groups = TRUE) and use the mean(), sum(), and sd() functions with the function-name-in-quotes method (and this will only pertain to the num column).

In the previous example we have an NA value in the num/row_6 cell, and so we get NA outputs from mean(), sum(), and sd() in grp_b’s summary rows (which are replaced with em dashes, itself controllable through the missing_text option). To avoid this, let’s rewrite the above using the names-and-formulae method.

# Create groupwise summary rows for both
# groups (using `groups = TRUE`); we will
# use names and formulas this time in `fns`
exibble_c <- 
  exibble_a %>%
  summary_rows(
    groups = TRUE,
    columns = num,
    fns = list(
      avg = ~mean(., na.rm = TRUE),
      total = ~sum(., na.rm = TRUE),
      s.d. = ~sd(., na.rm = TRUE)
    )
  )

exibble_c
num char currency
grp_a
row_1 1.111e-01 apricot 49.950
row_2 2.222e+00 banana 17.950
row_3 3.333e+01 coconut 1.390
row_4 4.444e+02 durian 65100.000
avg 120.02
total 480.06
s.d. 216.79
grp_b
row_5 5.550e+03 1325.810
row_6 fig 13.255
row_7 7.770e+05 grapefruit
row_8 8.880e+06 honeydew 0.440
avg 3,220,850.00
total 9,662,550.00
s.d. 4,916,123.25

Here we see that summary rows were created for both groups. However, the output of the summary row data is quite different than that of the cell data. The formatter argument (and extra ... arguments) allows for use of any of the fmt_*() functions that we’d normally use to format cell data. In this example (another rewrite of the above), the cell data in the num column is formatted in scientific notation and the resulting summary cell data is also formatted in the same way (including the options of decimals = 3).

# Define a named list of aggregation
# functions and summary row labels
fns_labels <- 
  list(
    avg = ~mean(., na.rm = TRUE),
    total = ~sum(., na.rm = TRUE),
    s.d. = ~sd(., na.rm = TRUE)
  )

# Create groupwise summary rows as
# before, supply `fns_labels` to `fns`,
# and format the cell summary data
exibble_d <- 
  exibble_a %>%
  fmt_scientific(
    columns = num,
    decimals = 3
  ) %>%
  summary_rows(
    groups = TRUE,
    columns = num,
    fns = fns_labels,
    formatter = fmt_scientific,
    decimals = 3
  )

exibble_d
num char currency
grp_a
row_1 1.111 × 10−1 apricot 49.950
row_2 2.222 banana 17.950
row_3 3.333 × 101 coconut 1.390
row_4 4.444 × 102 durian 65100.000
avg 1.200 × 102
total 4.801 × 102
s.d. 2.168 × 102
grp_b
row_5 5.550 × 103 1325.810
row_6 fig 13.255
row_7 7.770 × 105 grapefruit
row_8 8.880 × 106 honeydew 0.440
avg 3.221 × 106
total 9.663 × 106
s.d. 4.916 × 106

The input to fns is very permissive in regard to how the functions are defined. It is entirely valid to provide functions in the various forms shown earlier such that list("sum", avg = ~mean(., na.rm = TRUE), SD = "sd") will be correctly interpreted. It is recommended to use formula notation.

The default for formatter is set to fmt_number which is a sensible default for many scenarios. The setting of argument values for a particular formatter can be done in the ... area of the function call (as was done above for the decimals argument).

Using Multiple Calls of summary_rows()

We can re-use summary row labels and fill the otherwise empty summary cells with similar aggregations but perhaps with different formatting options. Here’s an example where the currency column contains aggregate values that share the same summary rows as for the num column, adds two more rows, and uses currency formatting:

# Create groupwise summary rows as
# before, supply `fns_labels` to `fns`,
# and format the cell summary data
exibble_e <- 
  exibble_a %>%
  fmt_scientific(
    columns = num,
    decimals = 3
  ) %>%
  fmt_currency(
    columns = currency,
    currency = "EUR"
    ) %>%
  summary_rows(
    groups = TRUE,
    columns = num,
    fns = fns_labels,
    formatter = fmt_scientific,
    decimals = 3
  ) %>%
  summary_rows(
    groups = "grp_a",
    columns = currency,
    fns = c(
      fns_labels,
      min = ~min(.),
      max = ~max(.)),
    formatter = fmt_currency,
    currency = "EUR"
  )

exibble_e
num char currency
grp_a
row_1 1.111 × 10−1 apricot €49.95
row_2 2.222 banana €17.95
row_3 3.333 × 101 coconut €1.39
row_4 4.444 × 102 durian €65,100.00
avg 1.200 × 102 €16,292.32
total 4.801 × 102 €65,169.29
s.d. 2.168 × 102 €32,538.46
min €1.39
max €65,100.00
grp_b
row_5 5.550 × 103 €1,325.81
row_6 fig €13.26
row_7 7.770 × 105 grapefruit
row_8 8.880 × 106 honeydew €0.44
avg 3.221 × 106
total 9.663 × 106
s.d. 4.916 × 106

A thing to again note in the above example is that even though two independent calls of summary_rows() were made, summary data within common summary row names were ‘squashed’ together, thus avoiding the fragmentation of summary rows. Put another way, we don’t create additional summary rows across separate calls if we are referencing the same summary row labels. If the summary row labels provided in fns were to be different across columns however, additional summary rows would be produced even if the types of data aggregations were to be functionally equivalent.

We can also store these argument values as local variables and pass them in both separate fmt_*number*() calls also to arguments within summary_rows() calls. This is useful for standardizing formatting parameters across different table cell types. Here’s an example of that, which additional passes the fr_BE locale to all functions that take a locale value.

# Provide common formatting parameters to a list
# object named `formats`; the number of decimal
# places will be `2` and the locale is "fr_BE"
formats <- 
  list(
    decimals = 3,
    locale = "fr_BE",
    currency = "EUR"
  )

# Format the `num` and `currency` columns
# (using the values stored in `formats`);
# when generating summary rows we can also
# supply formatter options from this list
exibble_f <- 
  exibble_a %>%
  fmt_scientific(
    columns = num,
    decimals = formats$decimals,
    locale = formats$locale
  ) %>%
  fmt_currency(
    columns = currency,
    currency = formats$currency,
    locale = formats$locale
  ) %>%
  summary_rows(
    groups = TRUE,
    columns = num,
    fns = fns_labels,
    formatter = fmt_scientific,
    decimals = formats$decimals,
    locale = formats$locale
  ) %>%
  summary_rows(
    groups = "grp_a",
    columns = currency,
    fns = c(
      fns_labels,
      min = ~min(.),
      max = ~max(.)
    ),
    formatter = fmt_currency,
    currency = formats$currency,
    locale = formats$locale
  )

exibble_f
num char currency
grp_a
row_1 1,111 × 10−1 apricot €49,95
row_2 2,222 banana €17,95
row_3 3,333 × 101 coconut €1,39
row_4 4,444 × 102 durian €65 100,00
avg 1,200 × 102 €16 292,32
total 4,801 × 102 €65 169,29
s.d. 2,168 × 102 €32 538,46
min €1,39
max €65 100,00
grp_b
row_5 5,550 × 103 €1 325,81
row_6 fig €13,26
row_7 7,770 × 105 grapefruit
row_8 8,880 × 106 honeydew €0,44
avg 3,221 × 106
total 9,663 × 106
s.d. 4,916 × 106

Passing in parameters like this is useful, especially if there are larger numbers of columns. When we store formatting parameters outside of the gt() pipeline, we separate our concerns between data structuring and data formatting. Putting styles and options into objects becomes more important if we intend to centralize formatting options for reuse.

Creating a Grand Summary

A grand summary aggregates column data regardless of the groups within the data. Grand summaries can also be created for gt tables that don’t have row groups, or, don’t have a stub. Finally, we can create a table that has both groupwise summaries and a grand summary.

Let’s keep it simple and create grand summary rows on a table without a stub. We’ll use exibble dataset for this once more. A few exibble columns are select()ed, passed to gt(), and then grand_summary_rows(). Notice that, in the resulting table, a stub is created just for the summary row labels (they have to go somewhere).

# Create a simple grand summary on a gt
# table that contains no stub
exibble_g <-
  exibble %>%
  select(num, char, currency) %>%
  gt() %>%
  grand_summary_rows(
    columns = c(num, currency),
    fns = fns_labels
  )

exibble_g
num char currency
1.111e-01 apricot 49.950
2.222e+00 banana 17.950
3.333e+01 coconut 1.390
4.444e+02 durian 65100.000
5.550e+03 NA 1325.810
NA fig 13.255
7.770e+05 grapefruit NA
8.880e+06 honeydew 0.440
avg 1,380,432.87 9,501.26
total 9,663,030.06 66,508.79
s.d. 3,319,613.32 24,521.60

A grand summary can be used in conjunction with groupwise summaries. Here’s an table where both types of summaries are present:

# Using the table in `exibble_f`, create
# grand summary rows (using two separate
# calls of `grand_summary_rows()` since
# the formatting will be different)
exibble_h <- 
  exibble_f %>%
  grand_summary_rows(
    columns = num,
    fns = fns_labels,
    formatter = fmt_number,
    suffixing = TRUE,
    locale = formats$locale
  ) %>%
  grand_summary_rows(
    columns = currency,
    fns = fns_labels,
    formatter = fmt_currency,
    suffixing = TRUE,
    locale = formats$locale
  )

exibble_h
num char currency
grp_a
row_1 1,111 × 10−1 apricot €49,95
row_2 2,222 banana €17,95
row_3 3,333 × 101 coconut €1,39
row_4 4,444 × 102 durian €65 100,00
avg 1,200 × 102 €16 292,32
total 4,801 × 102 €65 169,29
s.d. 2,168 × 102 €32 538,46
min €1,39
max €65 100,00
grp_b
row_5 5,550 × 103 €1 325,81
row_6 fig €13,26
row_7 7,770 × 105 grapefruit
row_8 8,880 × 106 honeydew €0,44
avg 3,221 × 106
total 9,663 × 106
s.d. 4,916 × 106
avg 1,38M $9,50K
total 9,66M $66,51K
s.d. 3,32M $24,52K

Note that the grand summary has a double line separating it from groupwise summary that’s part of grp_b. If this default styling appears to be too subtle, we can elect to add further styling to both groupwise summaries and the grand summary by using tab_options().

Adding Some Style to the Summary Cells

While the summary cells (both groupwise and grand) have a distinct appearance that sets them apart from the data cells, there’s always the option to modify their appearance. We can use the tab_options() function to perform these customizations. Here are the options specific to the summary cells (for groupwise summaries) and the grand summary cells:

  • summary_row.background.color & grand_summary_row.background.color
  • summary_row.padding & grand_summary_row.padding
  • summary_row.text_transform & grand_summary_row.text_transform

We can also target the summary cells and grand summary cells with the location helper functions cells_summary() and cells_grand_summary(). This is important for adding footnotes with tab_footnote() and for setting styles with tab_style() (both have the locations argument).

Here is an example that uses multiple calls of tab_options() and tab_footnote(). The cell background color for both types of summary cells is modified and two footnotes are added.

# Using the gt table of `exibble_h` as a
# starting point, style summary cells with
# `tab_options()` and add two footnotes
exibble_i <- 
  exibble_h %>%
  tab_options(
    summary_row.background.color = "lightblue",
    grand_summary_row.background.color = "lightgreen"
  ) %>%
  tab_footnote(
    footnote = md("Mean of all *num* values."),
    locations = cells_grand_summary(
      columns = "num", rows = "avg"
    )
  ) %>%
  tab_footnote(
    footnote = md("Highest `currency` value in **grp_a**"),
    locations = cells_summary(
      groups = "grp_a",
      columns = "currency",
      rows = "max"
    )
  )

exibble_i
num char currency
grp_a
row_1 1,111 × 10−1 apricot €49,95
row_2 2,222 banana €17,95
row_3 3,333 × 101 coconut €1,39
row_4 4,444 × 102 durian €65 100,00
avg 1,200 × 102 €16 292,32
total 4,801 × 102 €65 169,29
s.d. 2,168 × 102 €32 538,46
min €1,39
max €65 100,001
grp_b
row_5 5,550 × 103 €1 325,81
row_6 fig €13,26
row_7 7,770 × 105 grapefruit
row_8 8,880 × 106 honeydew €0,44
avg 3,221 × 106
total 9,663 × 106
s.d. 4,916 × 106
avg 1,38M2 $9,50K
total 9,66M $66,51K
s.d. 3,32M $24,52K

1 Highest currency value in grp_a

2 Mean of all num values.

Extracting the Summary Data from the gt Table Object

For a reproducible workflow, we do not want to have situations where any data created or modified cannot be accessed. While having summarized values be created in a gt pipeline presents advantages to readability and intent of analysis, it is recognized that the output table itself is essentially ‘read only’, as the input data undergoes processing and movement to an entirely different format.

However, the object created still contains data and we can obtain the summary data from a gt table object using the extract_summary() function. Taking the gt_summary object, we get a list of tibbles containing the summary data while preserving the correct data types:

# Extract the summary data from `exibble_d`
# to a list  object
summary_list <- exibble_d %>% extract_summary()
# Print out the summary for the `grp_a` group
summary_list %>% .$summary_df_data_list %>% .$grp_a
#> # A tibble: 3 x 7
#>   group_id rowname   num  char currency   row group
#>   <chr>    <chr>   <dbl> <dbl>    <dbl> <dbl> <dbl>
#> 1 grp_a    avg      120.    NA       NA    NA    NA
#> 2 grp_a    total    480.    NA       NA    NA    NA
#> 3 grp_a    s.d.     217.    NA       NA    NA    NA
# Print out the summary for the `grp_b` group
summary_list %>% .$summary_df_data_list %>% .$grp_b
#> # A tibble: 3 x 7
#>   group_id rowname      num  char currency   row group
#>   <chr>    <chr>      <dbl> <dbl>    <dbl> <dbl> <dbl>
#> 1 grp_b    avg     3220850     NA       NA    NA    NA
#> 2 grp_b    total   9662550     NA       NA    NA    NA
#> 3 grp_b    s.d.    4916123.    NA       NA    NA    NA

The output tibbles within the list always contain the groupname and rowname columns. The groupname column is filled with the name of the row group that was given to summary_rows(). The rowname column contains the descriptive stub labels for the summary rows (recall that values are either supplied explicitly in summary_rows(), or, are generated from the function names). The remaining columns are those from the original dataset.

The output data from extract_summary() can be reintroduced to a reproducible workflow and serve as downstream inputs or undergo validation. Perhaps interestingly, the output tibbles are structured in a way that facilitates direct input back to gt() (i.e., it has the magic groupname and rowname columns). This can produce a new, standalone summary table where the summary rows are now data rows:

# Take `exibble_d`, which internally has a list
# of summary data frames, extract the summaries,
# and then combine them; input that into `gt()`,
# and format the `num` column with `fmt_number()`
exibble_d %>%
  extract_summary() %>%
  unlist(recursive = FALSE) %>%
  bind_rows() %>%
  gt() %>%
  fmt_number(
    columns = num,
    decimals = 1
  ) %>%
  fmt_missing(
    columns = c(char, currency, row, group)
  )
group_id num char currency row group
avg grp_a 120.0
total grp_a 480.1
s.d. grp_a 216.8
avg grp_b 3,220,850.0
total grp_b 9,662,550.0
s.d. grp_b 4,916,123.3

Providing Our Own Aggregation Functions to Generate Summary Rows

While many of the functions available in base R and within packages are useful as aggregate functions, we may occasionally have the need to create our own custom functions. When taking this approach the main things to keep in mind are that a vector of values is the main input, and, a single value should be returned. The return value can be pretty much any class (e.g., numeric, character, logical) and it’s the formatter function that will handle any custom formatting while also converting to character.

Here, we’ll define a function that takes a vector of numeric values and outputs the two highest values (sorted low to high) above a threshold value. The output from this function is always a formatted character string.

# Define a function that gives the
# highest two values above a threshold
agg_highest_two_above_value <- function(x, threshold) {
  
  # Get sorted values above threshold value
  values <- sort(round(x[x >= threshold], 2))
  
  # Return character string with 2 highest values above threshold
  if (length(values) == 0) {
    return(paste0("No values above ", threshold))
  } else {
    return(
      paste(
        formatC(
          tail(
            sort(round(x[x > threshold], 2)), 2),
          format = "f", digits = 2), collapse = ", "))
  }
}

# Let's test this function with some values
agg_highest_two_above_value(
  x = c(0.73, 0.93, 0.75, 0.86, 0.23, 0.81),
  threshold = 0.8
)
#> [1] "0.86, 0.93"

Because this is character value that’s returned, we don’t need formatting functions like fmt_number(), fmt_percent(), etc. However, a useful formatter (and we do need some formatter) is the fmt_passthrough() function. Like the name suggests, it to great extent passes values through but formats as character (like all the fmt_*() function do) and it provides the option to decorate the output with a pattern. Let’s have a look at how the agg_highest_two_above_value() function can be used with the fmt_passthrough() formatter function.

# Create a gt table with `exibble_a` and use
# the custom function with a threshold of `20`;
# the `fmt_passthrough` funtion allows for
# minimal formatting of the aggregate values
exibble_j <- 
  exibble_a %>%
  grand_summary_rows(
    columns = c(num, currency),
    fns = list(
      high = ~ agg_highest_two_above_value(., 20)),
    formatter = fmt_passthrough,
    pattern = "({x})"
  )

exibble_j
num char currency
grp_a
row_1 1.111e-01 apricot 49.950
row_2 2.222e+00 banana 17.950
row_3 3.333e+01 coconut 1.390
row_4 4.444e+02 durian 65100.000
grp_b
row_5 5.550e+03 1325.810
row_6 fig 13.255
row_7 7.770e+05 grapefruit
row_8 8.880e+06 honeydew 0.440
high (777000.00, 8880000.00) (1325.81, 65100.00)

We can extract the grand summary data from the exibble_j object. Note that columns num and currency are classed as character since it was character outputs that were generated by the agg_highest_two_above_value() function.

# Extract the summary list from `exibble_j`
# and inspect using `str()`
exibble_j %>%
  extract_summary() %>%
  str()
#> List of 1
#>  $ summary_df_data_list:List of 1
#>   ..$ ::GRAND_SUMMARY: tibble [1 × 7] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ group_id: chr "::GRAND_SUMMARY"
#>   .. ..$ rowname : chr "high"
#>   .. ..$ num     : chr "777000.00, 8880000.00"
#>   .. ..$ char    : num NA
#>   .. ..$ currency: chr "1325.81, 65100.00"
#>   .. ..$ row     : num NA
#>   .. ..$ group   : num NA