fixed

Compare literal bytes in the string. This is very fast, but not usually what you want for non-ASCII character sets.

coll

Compare strings respecting standard collation rules.

regex

The default. Uses ICU regular expressions.

boundary

Match boundaries between things.

fixed(pattern, ignore_case = FALSE)

coll(pattern, ignore_case = FALSE, locale = "en", ...)

regex(pattern, ignore_case = FALSE, multiline = FALSE,
  comments = FALSE, dotall = FALSE, ...)

boundary(type = c("character", "line_break", "sentence", "word"),
  skip_word_none = NA, ...)

Arguments

pattern

Pattern to modify behaviour.

ignore_case

Should case differences be ignored in the match?

locale

Locale to use for comparisons. See stringi::stri_locale_list() for all possible options. Defaults to "en" (English) to ensure that the default collation is consistent across platforms.

...

Other less frequently used arguments passed on to stringi::stri_opts_collator(), stringi::stri_opts_regex(), or stringi::stri_opts_brkiter()

multiline

If TRUE, $ and ^ match the beginning and end of each line. If FALSE, the default, only match the start and end of the input.

comments

If TRUE, white space and comments beginning with # are ignored. Escape literal spaces with \ .

dotall

If TRUE, . will also match line terminators.

type

Boundary type to detect.

character

Every character is a boundary.

line_break

Boundaries are places where it is acceptable to have a line break in the current locale.

sentence

The beginnings and ends of sentences are boundaries, using intelligent rules to avoid counting abbreviations (details).

word

The beginnings and ends of words are boundaries.

skip_word_none

Ignore "words" that don't contain any characters or numbers - i.e. punctuation. Default NA will skip such "words" only when splitting on word boundaries.

See also

str_wrap() for breaking text to form paragraphs

stringi::stringi-search-boundaries for more detail on the various boundaries

Examples

pattern <- "a.b" strings <- c("abb", "a.b") str_detect(strings, pattern)
#> [1] TRUE TRUE
str_detect(strings, fixed(pattern))
#> [1] FALSE TRUE
str_detect(strings, coll(pattern))
#> [1] FALSE TRUE
# coll() is useful for locale-aware case-insensitive matching i <- c("I", "\u0130", "i") i
#> [1] "I" "İ" "i"
str_detect(i, fixed("i", TRUE))
#> [1] TRUE FALSE TRUE
str_detect(i, coll("i", TRUE))
#> [1] TRUE FALSE TRUE
str_detect(i, coll("i", TRUE, locale = "tr"))
#> [1] FALSE TRUE TRUE
# Word boundaries words <- c("These are some words.") str_count(words, boundary("word"))
#> [1] 4
str_split(words, " ")[[1]]
#> [1] "These" "are" "" "" "some" "words."
str_split(words, boundary("word"))[[1]]
#> [1] "These" "are" "some" "words"
# Regular expression variations str_extract_all("The Cat in the Hat", "[a-z]+")
#> [[1]] #> [1] "he" "at" "in" "the" "at" #>
str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE))
#> [[1]] #> [1] "The" "Cat" "in" "the" "Hat" #>
str_extract_all("a\nb\nc", "^.")
#> [[1]] #> [1] "a" #>
str_extract_all("a\nb\nc", regex("^.", multiline = TRUE))
#> [[1]] #> [1] "a" "b" "c" #>
str_extract_all("a\nb\nc", "a.")
#> [[1]] #> character(0) #>
str_extract_all("a\nb\nc", regex("a.", dotall = TRUE))
#> [[1]] #> [1] "a\n" #>