Filtering joins filter rows from x
based on the presence or absence
of matches in y
:
semi_join()
return all rows from x
with a match in y
.
anti_join()
return all rows from x
without a match in y
.
semi_join(x, y, by = NULL, copy = FALSE, ...) # S3 method for data.frame semi_join(x, y, by = NULL, copy = FALSE, ..., na_matches = c("na", "never")) anti_join(x, y, by = NULL, copy = FALSE, ...) # S3 method for data.frame anti_join(x, y, by = NULL, copy = FALSE, ..., na_matches = c("na", "never"))
x, y | A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
---|---|
by | A character vector of variables to join by. If To join by different variables on To join by multiple variables, use a vector with length > 1.
For example, To perform a cross-join, generating all combinations of |
copy | If |
... | Other parameters passed onto methods. |
na_matches | Should The default, Use |
An object of the same type as x
. The output has the following properties:
Rows are a subset of the input, but appear in the same order.
Columns are not modified.
Data frame attributes are preserved.
Groups are taken from x
. The number of groups may be reduced.
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
semi_join()
: dbplyr (tbl_lazy
), dplyr (data.frame
)
.
anti_join()
: dbplyr (tbl_lazy
), dplyr (data.frame
)
.
Other joins:
mutate-joins
,
nest_join()
# "Filtering" joins keep cases from the LHS band_members %>% semi_join(band_instruments)#>#> # A tibble: 2 x 2 #> name band #> <chr> <chr> #> 1 John Beatles #> 2 Paul Beatlesband_members %>% anti_join(band_instruments)#>#> # A tibble: 1 x 2 #> name band #> <chr> <chr> #> 1 Mick Stones# To suppress the message about joining variables, supply `by` band_members %>% semi_join(band_instruments, by = "name")#> # A tibble: 2 x 2 #> name band #> <chr> <chr> #> 1 John Beatles #> 2 Paul Beatles# This is good practice in production code