Tidy summarizes information about the components of a model. A model component might be a single term in a regression, a single hypothesis, a cluster, or a class. Exactly what tidy considers to be a model component varies across models but is usually self-evident. If a model has several distinct types of components, you will need to specify which components to return.
# S3 method for prcomp
tidy(x, matrix = "u", ...)A prcomp object returned by stats::prcomp().
Character specifying which component of the PCA should be tidied.
"u", "samples", "scores", or "x": returns information about
the map from the original space into principle components space.
"v", "rotation", "loadings" or "variables": returns information
about the map from principle components space back into the original
space.
"d", "eigenvalues" or "pcs": returns information about the
eigenvalues.
Additional arguments. Not used. Needed to match generic
signature only. Cautionary note: Misspelled arguments will be
absorbed in ..., where they will be ignored. If the misspelled
argument has a default value, the default value will be used.
For example, if you pass conf.lvel = 0.9, all computation will
proceed using conf.level = 0.95. Additionally, if you pass
newdata = my_tibble to an augment() method that does not
accept a newdata argument, it will use the default value for
the data argument.
A tibble::tibble with columns depending on the component of
PCA being tidied.
If matrix is "u", "samples", "scores", or "x" each row in the
tidied output corresponds to the original data in PCA space. The columns
are:
rowID of the original observation (i.e. rowname from original data).
PCInteger indicating a principal component.
valueThe score of the observation for that particular principal component. That is, the location of the observation in PCA space.
If matrix is "v", "rotation", "loadings" or "variables", each
row in the tidied output corresponds to information about the principle
components in the original space. The columns are:
rowThe variable labels (colnames) of the data set on which PCA was performed.
PCAn integer vector indicating the principal component.
valueThe value of the eigenvector (axis score) on the indicated principal component.
If matrix is "d", "eigenvalues" or "pcs", the columns are:
PCAn integer vector indicating the principal component.
std.devStandard deviation explained by this PC.
percentFraction of variation explained by this component (a numeric value between 0 and 1).
cumulativeCumulative fraction of variation explained by principle components up to this component (a numeric value between 0 and 1).
See https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca for information on how to interpret the various tidied matrices. Note that SVD is only equivalent to PCA on centered data.
Other svd tidiers:
augment.prcomp(),
tidy_irlba(),
tidy_svd()
# feel free to ignore the following line—it allows {broom} to supply
# examples without requiring the data-supplying package to be installed.
if (requireNamespace("maps", quietly = TRUE)) {
pc <- prcomp(USArrests, scale = TRUE)
# information about rotation
tidy(pc)
# information about samples (states)
tidy(pc, "samples")
# information about PCs
tidy(pc, "pcs")
# state map
library(dplyr)
library(ggplot2)
library(maps)
pc %>%
tidy(matrix = "samples") %>%
mutate(region = tolower(row)) %>%
inner_join(map_data("state"), by = "region") %>%
ggplot(aes(long, lat, group = group, fill = value)) +
geom_polygon() +
facet_wrap(~PC) +
theme_void() +
ggtitle("Principal components of arrest data")
au <- augment(pc, data = USArrests)
au
ggplot(au, aes(.fittedPC1, .fittedPC2)) +
geom_point() +
geom_text(aes(label = .rownames), vjust = 1, hjust = 1)
}