about_search: String Searching#

Description#

This man page explains how to perform string search-based operations in stringi.

Details#

The following independent string searching engines are available in stringi.

  • stri_*_regexICU’s regular expressions (regexes), see about_search_regex,

  • stri_*_fixed – locale-independent byte-wise pattern matching, see about_search_fixed,

  • stri_*_collICU’s StringSearch, locale-sensitive, Collator-based pattern search, useful for natural language processing tasks, see about_search_coll,

  • stri_*_charclass – character classes search, e.g., Unicode General Categories or Binary Properties, see about_search_charclass,

  • stri_*_boundaries – text boundary analysis, see about_search_boundaries

Each search engine is able to perform many search-based operations. These may include:

  • stri_detect_* - detect if a pattern occurs in a string, see, e.g., stri_detect,

  • stri_count_* - count the number of pattern occurrences, see, e.g., stri_count,

  • stri_locate_* - locate all, first, or last occurrences of a pattern, see, e.g., stri_locate,

  • stri_extract_* - extract all, first, or last occurrences of a pattern, see, e.g., stri_extract and, in case of regexes, stri_match,

  • stri_replace_* - replace all, first, or last occurrences of a pattern, see, e.g., stri_replace and also stri_trim,

  • stri_split_* - split a string into chunks indicated by occurrences of a pattern, see, e.g., stri_split,

  • stri_startswith_* and stri_endswith_* detect if a string starts or ends with a pattern match, see, e.g., stri_startswith,

  • stri_subset_* - return a subset of a character vector with strings that match a given pattern, see, e.g., stri_subset.

Author(s)#

Marek Gagolewski and other contributors

See Also#

The official online manual of stringi at https://stringi.gagolewski.com/

Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02

Other text_boundaries: about_search_boundaries, stri_count_boundaries(), stri_extract_all_boundaries(), stri_locate_all_boundaries(), stri_opts_brkiter(), stri_split_boundaries(), stri_split_lines(), stri_trans_tolower(), stri_wrap()

Other search_regex: about_search_regex, stri_opts_regex()

Other search_fixed: about_search_fixed, stri_opts_fixed()

Other search_coll: about_search_coll, stri_opts_collator()

Other search_charclass: about_search_charclass, stri_trim_both()

Other search_detect: stri_detect(), stri_startswith()

Other search_count: stri_count_boundaries(), stri_count()

Other search_locate: stri_locate_all_boundaries(), stri_locate_all()

Other search_replace: stri_replace_all(), stri_replace_rstr(), stri_trim_both()

Other search_split: stri_split_boundaries(), stri_split_lines(), stri_split()

Other search_subset: stri_subset()

Other search_extract: stri_extract_all_boundaries(), stri_extract_all(), stri_match_all()

Other stringi_general_topics: about_arguments, about_encoding, about_locale, about_search_boundaries, about_search_charclass, about_search_coll, about_search_fixed, about_search_regex, about_stringi