String searching facilities described here provide a way to locate a specific sequence of bytes in a string. The search engine’s settings may be tuned up (for example to perform case-insensitive search) via a call to the
The fast Knuth-Morris-Pratt search algorithm, with worst time complexity of O(n+p) (
n == length(str),
p == length(pattern)) is implemented (with some tweaks for very short search patterns).
Be aware that, for natural language processing, fixed pattern searching might not be what you actually require. It is because a bitwise match will not give correct results in cases of:
see also about_search_coll.
Note that the conversion of input data to Unicode is done as usual.
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02