../../../data/deployment/2020-03-09/vignettes/extending.Rmd
extending.Rmd
To extend the tibble package for new types of columnar data, you need to understand how printing works. The presentation of a column in a tibble is powered by four S3 generics:
type_sum()
determines what goes into the column header.pillar_shaft()
determines what goes into the body of the column.is_vector_s3()
and obj_sum()
are used when rendering list columns.If you have written an S3 or S4 class that can be used as a column, you can override these generics to make sure your data prints well in a tibble. To start, you must import the pillar
package that powers the printing of tibbles. Either add pillar
to the Imports:
section of your DESCRIPTION
, or simply call:
This short vignette assumes a package that implements an S3 class "latlon"
and uses roxygen2
to create documentation and the NAMESPACE
file. For this vignette to work we need to attach pillar:
We define a class "latlon"
that encodes geographic coordinates in a complex number. For simplicity, the values are printed as degrees and minutes only.
#' @export
latlon <- function(lat, lon) {
as_latlon(complex(real = lon, imaginary = lat))
}
#' @export
as_latlon <- function(x) {
structure(x, class = "latlon")
}
#' @export
c.latlon <- function(x, ...) {
as_latlon(NextMethod())
}
#' @export
`[.latlon` <- function(x, i) {
as_latlon(NextMethod())
}
#' @export
format.latlon <- function(x, ..., formatter = deg_min) {
x_valid <- which(!is.na(x))
lat <- unclass(Im(x[x_valid]))
lon <- unclass(Re(x[x_valid]))
ret <- rep("<NA>", length(x))
ret[x_valid] <- paste(
formatter(lat, c("N", "S")),
formatter(lon, c("E", "W"))
)
format(ret, justify = "right")
}
deg_min <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- trunc(x)
x <- x - deg
min <- round(x * 60)
ret <- sprintf("%d°%.2d'%s", deg, min, pm[ifelse(sign >= 0, 1, 2)])
format(ret, justify = "right")
}
#' @export
print.latlon <- function(x, ...) {
cat(format(x), sep = "\n")
invisible(x)
}
latlon(32.7102978, -117.1704058)
## 32°43'N 117°10'W
More methods are needed to make this class fully compatible with data frames, see e.g. the hms package for a more complete example.
Columns on this class can be used in a tibble right away, but the output will be less than ideal:
library(tibble)
data <- tibble(
venue = "rstudio::conf",
year = 2017:2019,
loc = latlon(
c(28.3411783, 32.7102978, NA),
c(-81.5480348, -117.1704058, NA)
),
paths = list(
loc[1],
c(loc[1], loc[2]),
loc[2]
)
)
data
## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <latlon> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <latlon>
## 2 rstudio::conf 2018 32°43'N 117°10'W <latlon>
## 3 rstudio::conf 2019 <NA> <latlon>
(The paths
column is a list that contains arbitrary data, in our case latlon
vectors. A list column is a powerful way to attach hierarchical or unstructured data to an observation in a data frame.)
The output has three main problems:
loc
column is displayed as <S3: latlon>
. This default formatting works reasonably well for any kind of object, but the generated output may be too wide and waste precious space when displaying the tibble.loc
column are formatted as complex numbers (the underlying storage), without using the format()
method we have defined. This is by design.paths
column are also displayed as <S3: latlon>
.In the remainder I’ll show how to fix these problems, and also how to implement rendering that adapts to the available width.
To display <geo>
as data type, we need to override the type_sum()
method. This method should return a string that can be used in a column header. For your own classes, strive for an evocative abbreviation that’s under 6 characters.
Because the value shown there doesn’t depend on the data, we just return a constant. (For date-times, the column info will eventually contain information about the timezone, see #53.)
## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <geo>
## 2 rstudio::conf 2018 32°43'N 117°10'W <geo>
## 3 rstudio::conf 2019 <NA> <geo>
To use our format method for rendering, we implement the pillar_shaft()
method for our class. (A pillar is mainly a shaft (decorated with an ornament), with a capital above and a base below. Multiple pillars form a colonnade, which can be stacked in multiple tiers. This is the motivation behind the names in our API.)
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "right")
}
The simplest variant calls our format()
method, everything else is handled by pillar, in particular by the new_pillar_shaft_simple()
helper. Note how the align
argument affects the alignment of NA values and of the column name and type.
## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <geo>
## 2 rstudio::conf 2018 32°43'N 117°10'W <geo>
## 3 rstudio::conf 2019 NA <geo>
We could also use left alignment and indent only the NA
values:
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "left", na_indent = 5)
}
data
## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <geo>
## 2 rstudio::conf 2018 32°43'N 117°10'W <geo>
## 3 rstudio::conf 2019 NA <geo>
If there is not enough space to render the values, the formatted values are truncated with an ellipsis. This doesn’t currently apply to our class, because we haven’t specified a minimum width for our values:
## # A tibble: 3 x 4
## venue year loc
## <chr> <int> <geo>
## 1 rstu… 2017 28°20'N 81°33'W
## 2 rstu… 2018 32°43'N 117°10'W
## 3 rstu… 2019 NA
## # … with 1 more variable:
## # paths <list>
If we specify a minimum width when constructing the shaft, the loc
column will be truncated:
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "right", min_width = 10)
}
print(data, width = 35)
## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <lis>
## 1 rstudio… 2017 28°20'N 81… <geo>
## 2 rstudio… 2018 32°43'N 117… <geo>
## 3 rstudio… 2019 NA <geo>
This may be useful for character data, but for lat-lon data we may prefer to show full degrees and remove the minutes if the available space is not enough to show accurate values. A more sophisticated implementation of the pillar_shaft()
method is required to achieve this:
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
deg <- format(x, formatter = deg)
deg[is.na(x)] <- pillar::style_na("NA")
deg_min <- format(x)
deg_min[is.na(x)] <- pillar::style_na("NA")
pillar::new_pillar_shaft(
list(deg = deg, deg_min = deg_min),
width = pillar::get_max_extent(deg_min),
min_width = pillar::get_max_extent(deg),
subclass = "pillar_shaft_latlon"
)
}
Here, pillar_shaft()
returns an object of the "pillar_shaft_latlon"
class created by the generic new_pillar_shaft()
constructor. This object contains the necessary information to render the values, and also minimum and maximum width values. For simplicity, both formattings are pre-rendered, and the minimum and maximum widths are computed from there. Note that we also need to take care of NA
values explicitly. (get_max_extent()
is a helper that computes the maximum display width occupied by the values in a character vector.)
For completeness, the code that implements the degree-only formatting looks like this:
deg <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- round(x)
ret <- sprintf("%d°%s", deg, pm[ifelse(sign >= 0, 1, 2)])
format(ret, justify = "right")
}
All that’s left to do is to implement a format()
method for our new "pillar_shaft_latlon"
class. This method will be called with a width
argument, which then determines which of the formattings to choose:
#' @export
format.pillar_shaft_latlon <- function(x, width, ...) {
if (all(crayon::col_nchar(x$deg_min) <= width)) {
ornament <- x$deg_min
} else {
ornament <- x$deg
}
pillar::new_ornament(ornament)
}
data
## Warning: The `subclass` argument to `new_pillar_shaft()` is deprecated, please use the `class` argument.
## This warning is displayed once per session.
## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <geo>
## 2 rstudio::conf 2018 32°43'N 117°10'W <geo>
## 3 rstudio::conf 2019 NA <geo>
## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <lis>
## 1 rstudio… 2017 28°N 82°W <geo>
## 2 rstudio… 2018 33°N 117°W <geo>
## 3 rstudio… 2019 NA <geo>
Both new_pillar_shaft_simple()
and new_ornament()
accept ANSI escape codes for coloring, emphasis, or other ways of highlighting text on terminals that support it. Some formattings are predefined, e.g. style_subtle()
displays text in a light gray. For default data types, this style is used for insignificant digits. We’ll be formatting the degree and minute signs in a subtle style, because they serve only as separators. You can also use the crayon package to add custom formattings to your output.
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x, formatter = deg_min_color)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "left", na_indent = 5)
}
deg_min_color <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- trunc(x)
x <- x - deg
rad <- round(x * 60)
ret <- sprintf(
"%d%s%.2d%s%s",
deg,
pillar::style_subtle("°"),
rad,
pillar::style_subtle("'"),
pm[ifelse(sign >= 0, 1, 2)]
)
ret[is.na(x)] <- ""
format(ret, justify = "right")
}
data
## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <geo>
## 2 rstudio::conf 2018 32°43'N 117°10'W <geo>
## 3 rstudio::conf 2019 NA <geo>
Currently, ANSI escapes are not rendered in vignettes, so the display here isn’t much different from earlier examples. This may change in the future.
To tweak the output in the paths
column, we simply need to indicate that our class is an S3 vector:
## # A tibble: 3 x 4
## venue year loc paths
## <chr> <int> <geo> <list>
## 1 rstudio::conf 2017 28°20'N 81°33'W <geo [1]>
## 2 rstudio::conf 2018 32°43'N 117°10'W <geo [2]>
## 3 rstudio::conf 2019 NA <geo [1]>
This is picked up by the default implementation of obj_sum()
, which then shows the type and the length in brackets. If your object is built on top of an atomic vector the default will be adequate. You, will, however, need to provide an obj_sum()
method for your class if your object is vectorised and built on top of a list.
An example of an object of this type in base R is POSIXlt
: it is a list with 9 components.
time <- as.POSIXlt("2018-12-20 23:29:51", tz = "CET")
x <- as.POSIXlt(time + c(0, 60, 3600))
str(unclass(x))
## List of 11
## $ sec : num [1:3] 51 51 51
## $ min : int [1:3] 29 30 29
## $ hour : int [1:3] 23 23 0
## $ mday : int [1:3] 20 20 21
## $ mon : int [1:3] 11 11 11
## $ year : int [1:3] 118 118 118
## $ wday : int [1:3] 4 4 5
## $ yday : int [1:3] 353 353 354
## $ isdst : int [1:3] 0 0 0
## $ zone : chr [1:3] "CET" "CET" "CET"
## $ gmtoff: int [1:3] 3600 3600 3600
## - attr(*, "tzone")= chr [1:3] "CET" "CET" "CEST"
But it pretends to be a vector with 3 elements:
## [1] "2018-12-20 23:29:51 CET" "2018-12-20 23:30:51 CET"
## [3] "2018-12-21 00:29:51 CET"
## [1] 3
## POSIXlt[1:3], format: "2018-12-20 23:29:51" "2018-12-20 23:30:51" "2018-12-21 00:29:51"
So we need to define a method that returns a character vector the same length as x
:
If you want to test the output of your code, you can compare it with a known state recorded in a text file. For this, pillar offers the expect_known_display()
expectation which requires and works best with the testthat package. Make sure that the output is generated only by your package to avoid inconsistencies when external code is updated. Here, this means that you test only the shaft portion of the pillar, and not the entire pillar or even a tibble that contains a column with your data type!
The tests work best with the testthat package:
The code below will compare the output of pillar_shaft(data$loc)
with known output stored in the latlon.txt
file. The first run warns because the file doesn’t exist yet.
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt"
)
})
From the second run on, the printing will be compared with the file:
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt"
)
})
However, if we look at the file we’ll notice strange things: The output contains ANSI escapes!
## [1] "28\033[38;5;246m°\033[39m20\033[38;5;246m'\033[39mN 81\033[38;5;246m°\033[39m33\033[38;5;246m'\033[39mW"
## [2] "32\033[38;5;246m°\033[39m43\033[38;5;246m'\033[39mN 117\033[38;5;246m°\033[39m10\033[38;5;246m'\033[39mW"
## [3] " \033[31mNA\033[39m "
We can turn them off by passing crayon = FALSE
to the expectation, but we need to run twice again:
library(testthat)
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt",
crayon = FALSE
)
})
## Error: Test failed: 'latlon pillar matches known output'
## * <text>:3: Results have changed from known value recorded in 'latlon.txt'.
## 3/3 mismatches
## x[1]: "28°20'N 81°33'W"
## y[1]: "28\033[38;5;246m°\033[39m20\033[38;5;246m'\033[39mN 81\033[38;5;246m°\033[39m33\033[38;5;246m'\033[39mW"
##
## x[2]: "32°43'N 117°10'W"
## y[2]: "32\033[38;5;246m°\033[39m43\033[38;5;246m'\033[39mN 117\033[38;5;246m°\033[39m10\033[38;5;246m'\033[39mW"
##
## x[3]: " NA "
## y[3]: " \033[31mNA\033[39m "
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt",
crayon = FALSE
)
})
readLines("latlon.txt")
## [1] "28°20'N 81°33'W" "32°43'N 117°10'W" " NA "
You may want to create a series of output files for different scenarios:
For this it is helpful to create your own expectation function. Use the tidy evaluation framework to make sure that construction and printing happens at the right time:
expect_known_latlon_display <- function(x, file_base) {
quo <- rlang::quo(pillar::pillar_shaft(x))
pillar::expect_known_display(
!! quo,
file = paste0(file_base, ".txt")
)
pillar::expect_known_display(
!! quo,
file = paste0(file_base, "-bw.txt"),
crayon = FALSE
)
}
test_that("latlon pillar matches known output", {
expect_known_latlon_display(data$loc, file_base = "latlon")
})
## Error: Test failed: 'latlon pillar matches known output'
## * <text>:2: Results have changed from known value recorded in 'latlon.txt'.
## 3/3 mismatches
## x[1]: "28\033[38;5;246m°\033[39m20\033[38;5;246m'\033[39mN 81\033[38;5;246m°\033[39m33\033[38;5;246m'\033[39mW"
## y[1]: "28°20'N 81°33'W"
##
## x[2]: "32\033[38;5;246m°\033[39m43\033[38;5;246m'\033[39mN 117\033[38;5;246m°\033[39m10\033[38;5;246m'\033[39mW"
## y[2]: "32°43'N 117°10'W"
##
## x[3]: " \033[31mNA\033[39m "
## y[3]: " NA "
## [1] "28\033[38;5;246m°\033[39m20\033[38;5;246m'\033[39mN 81\033[38;5;246m°\033[39m33\033[38;5;246m'\033[39mW"
## [2] "32\033[38;5;246m°\033[39m43\033[38;5;246m'\033[39mN 117\033[38;5;246m°\033[39m10\033[38;5;246m'\033[39mW"
## [3] " \033[31mNA\033[39m "
## [1] "28°20'N 81°33'W" "32°43'N 117°10'W" " NA "
Learn more about the tidyeval framework in the dplyr vignette.