about_stringi: Fast and Portable Character String Processing in R¶
Description¶
stringi is THE R package for fast, correct, consistent, and convenient string/text manipulation. It gives predictable results on every platform, in each locale, and under any native character encoding.
Keywords: R, text processing, character strings, internationalization, localization, ICU, ICU4C, i18n, l10n, Unicode.
Homepage: https://stringi.gagolewski.com/
License: The BSD-3-clause license for the package code, the ICU license for the accompanying ICU4C distribution, and the UCD license for the Unicode Character Database. See the COPYRIGHTS and LICENSE file for more details.
Details¶
Manual pages on general topics:
about_encoding – character encoding issues, including information on encoding management in stringi, as well as on encoding detection and conversion.
about_locale – locale issues, including locale management and specification in stringi, and the list of locale-sensitive operations. In particular, see
stri_opts_collator
for a description of the string collation algorithm, which is used for string comparing, ordering, ranking, sorting, case-folding, and searching.about_arguments – information on how stringi handles the arguments passed to its function.
Facilities available¶
Refer to the following:
about_search for string searching facilities; these include pattern searching, matching, string splitting, and so on. The following independent search engines are provided:
about_search_regex – with ICU (Java-like) regular expressions,
about_search_fixed – fast, locale-independent, byte-wise pattern matching,
about_search_coll – locale-aware pattern matching for natural language processing tasks,
about_search_charclass – seeking elements of particular character classes, like “all whites-paces” or “all digits”,
about_search_boundaries – text boundary analysis.
stri_datetime_format
for date/time formatting and parsing. Also refer to the links therein for other date/time/time zone- related operations.stri_stats_general
andstri_stats_latex
for gathering some fancy statistics on a character vector’s contents.stri_join
,stri_dup
,%s+%
, andstri_flatten
for concatenation-based operations.stri_sub
for extracting and replacing substrings, andstri_reverse
for a joyful function to reverse all code points in a string.stri_length
(among others) for determining the number of code points in a string. See alsostri_count_boundaries
for counting the number of Unicode characters andstri_width
for approximating the width of a string.stri_trim
(among others) for trimming characters from the beginning or/and end of a string, see also about_search_charclass, andstri_pad
for padding strings so that they are of the same width. Additionally,stri_wrap
wraps text into lines.stri_trans_tolower
(among others) for case mapping, i.e., conversion to lower, UPPER, or Title Case,stri_trans_nfc
(among others) for Unicode normalization,stri_trans_char
for translating individual code points, andstri_trans_general
for other universal text transforms, including transliteration.stri_cmp
,%s<%
,stri_order
,stri_sort
,stri_rank
,stri_unique
, andstri_duplicated
for collation-based, locale-aware operations, see also about_locale.stri_split_lines
(among others) to split a string into text lines.stri_escape_unicode
(among others) for escaping some code points.stri_rand_strings
,stri_rand_shuffle
, andstri_rand_lipsum
for generating (pseudo)random strings.stri_read_raw
,stri_read_lines
, andstri_write_lines
for reading and writing text files.
Note that each man page provides many further links to other interesting facilities and topics.
References¶
stringi Package Homepage, https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
ICU – International Components for Unicode, https://icu.unicode.org/
ICU4C API Documentation, https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/
The Unicode Consortium, https://home.unicode.org/
UTF-8, A Transformation Format of ISO 10646 – RFC 3629, https://www.rfc-editor.org/rfc/rfc3629
See Also¶
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other stringi_general_topics: about_arguments
, about_encoding
, about_locale
, about_search
, about_search_boundaries
, about_search_charclass
, about_search_coll
, about_search_fixed
, about_search_regex