stringi: THE String Processing Package for R¶
stringi (pronounced “stringy”, IPA [strinɡi]) is THE R package for very fast, portable, correct, consistent, and convenient string/text processing in any locale or character encoding.
—by Marek Gagolewski
Thanks to ICU, stringi fully supports a wide range of Unicode standards.
It gives you a multitude of functions for:
string concatenation, padding, wrapping,
substring extraction,
pattern searching (e.g., with ICU Java-like regular expressions),
collation and sorting,
random string generation,
case mapping,
string transliteration,
Unicode normalisation,
date-time formatting and parsing,
and many more.
stringi is among the most often downloaded R
packages.
You can obtain it from CRAN by calling:
install.packages("stringi")
stringi’s source code is hosted on GitHub. It has been released under the open source BSD-3-clause license.
The package’s API was inspired by Hadley Wickham’s stringr package (and since 2015 stringr is powered by stringi). Moreover, Hadley suggested many new package features. Thanks! The contributions from Bartlomiej Tartanus and many others is greatly appreciated.
Tutorial
API Documentation
- R Package stringi Reference
- about_arguments: Passing Arguments to Functions in stringi
- about_encoding: Character Encodings and stringi
- about_locale: Locales and stringi
- about_search_boundaries: Text Boundary Analysis in stringi
- about_search_charclass: Character Classes in stringi
- about_search_coll: Locale-Sensitive Text Searching in stringi
- about_search_fixed: Locale-Insensitive Fixed Pattern Matching in stringi
- about_search_regex: Regular Expressions in stringi
- about_search: String Searching
- about_stringi: THE String Processing Package
- operator_add: Concatenate Two Character Vectors
- operator_compare: Compare Strings with or without Collation
- operator_dollar: C-Style Formatting with sprintf as a Binary Operator
- stri_compare: Compare Strings with or without Collation
- stri_count_boundaries: Count the Number of Text Boundaries
- stri_count: Count the Number of Pattern Matches
- stri_datetime_add: Date and Time Arithmetic
- stri_datetime_create: Create a Date-Time Object
- stri_datetime_fields: Get Values for Date and Time Fields
- stri_datetime_format: Date and Time Formatting and Parsing
- stri_datetime_fstr: Convert strptime-Style Format Strings
- stri_datetime_now: Get Current Date and Time
- stri_datetime_symbols: List Localizable Date-Time Formatting Data
- stri_detect: Detect a Pattern Match
- stri_dup: Duplicate Strings
- stri_duplicated: Determine Duplicated Elements
- stri_enc_detect: Detect Character Set and Language
- stri_enc_detect2: [DEPRECATED] Detect Locale-Sensitive Character Encoding
- stri_enc_fromutf32: Convert From UTF-32
- stri_enc_info: Query a Character Encoding
- stri_enc_isascii: Check If a Data Stream Is Possibly in ASCII
- stri_enc_isutf16: Check If a Data Stream Is Possibly in UTF-16 or UTF-32
- stri_enc_isutf8: Check If a Data Stream Is Possibly in UTF-8
- stri_enc_list: List Known Character Encodings
- stri_enc_mark: Get Declared Encodings of Each String
- stri_enc_set: Set or Get Default Character Encoding in stringi
- stri_enc_toascii: Convert To ASCII
- stri_enc_tonative: Convert Strings To Native Encoding
- stri_enc_toutf32: Convert Strings To UTF-32
- stri_enc_toutf8: Convert Strings To UTF-8
- stri_encode: Convert Strings Between Given Encodings
- stri_escape_unicode: Escape Unicode Code Points
- stri_extract_boundaries: Extract Data Between Text Boundaries
- stri_extract: Extract Occurrences of a Pattern
- stri_flatten: Flatten a String
- stri_info: Query Default Settings for stringi
- stri_isempty: Determine if a String is of Length Zero
- stri_join_list: Concatenate Strings in a List
- stri_join: Concatenate Character Vectors
- stri_length: Count the Number of Code Points
- stri_list2matrix: Convert a List to a Character Matrix
- stri_locale_info: Query Given Locale
- stri_locale_list: List Available Locales
- stri_locale_set: Set or Get Default Locale in stringi
- stri_locate_boundaries: Locate Text Boundaries
- stri_locate: Locate Occurrences of a Pattern
- stri_match: Extract Regex Pattern Matches, Together with Capture Groups
- stri_na2empty: Replace NAs with Empty Strings
- stri_numbytes: Count the Number of Bytes
- stri_opts_brkiter: Generate a List with BreakIterator Settings
- stri_opts_collator: Generate a List with Collator Settings
- stri_opts_fixed: Generate a List with Fixed Pattern Search Engine’s Settings
- stri_opts_regex: Generate a List with Regex Matcher Settings
- stri_order: Ordering Permutation
- stri_pad: Pad (Center/Left/Right Align) a String
- stri_rand_lipsum: A Lorem Ipsum Generator
- stri_rand_shuffle: Randomly Shuffle Code Points in Each String
- stri_rand_strings: Generate Random Strings
- stri_read_lines: Read Text Lines from a Text File
- stri_read_raw: Read Text File as Raw
- stri_remove_empty: Remove All Empty Strings from a Character Vector
- stri_replace_na: Replace Missing Values in a Character Vector
- stri_replace: Replace Occurrences of a Pattern
- stri_reverse: Reverse Each String
- stri_sort_key: Sort Keys
- stri_sort: Sorting
- stri_split_boundaries: Split a String at Text Boundaries
- stri_split_lines: Split a String Into Text Lines
- stri_split: Split a String By Pattern Matches
- stri_startsendswith: Determine if the Start or End of a String Matches a Pattern
- stri_stats_general: General Statistics for a Character Vector
- stri_stats_latex: Statistics for a Character Vector Containing LaTeX Commands
- stri_sub_all: Extract or Replace Multiple Substrings
- stri_sub: Extract a Substring From or Replace a Substring In a Character Vector
- stri_subset: Select Elements that Match a Given Pattern
- stri_timezone_info: Query a Given Time Zone
- stri_timezone_list: List Available Time Zone Identifiers
- stri_timezone_set: Set or Get Default Time Zone in stringi
- stri_trans_casemap: Transform Strings with Case Mapping
- stri_trans_char: Translate Characters
- stri_trans_general: General Text Transforms, Including Transliteration
- stri_trans_list: List Available Text Transforms and Transliterators
- stri_trans_nf: Perform or Check For Unicode Normalization
- stri_trim: Trim Characters from the Left and/or Right Side of a String
- stri_unescape_unicode: Un-escape All Escape Sequences
- stri_unique: Extract Unique Elements
- stri_width: Determine the Width of Code Points
- stri_wrap: Word Wrap Text to Format Paragraphs
- stri_write_lines: Write Text Lines to a Text File
Other
- Source code (GitHub)
- Bug Tracker and Feature Proposals
- CRAN
- What’s New in stringi
- 1.5.4 (2020-XX-YY) devel
- 1.5.3 (2020-09-04) CRAN
- 1.4.6 (2020-02-17) CRAN
- 1.4.5 (2020-01-11) CRAN
- 1.4.4 (2020-01-06) CRAN
- 1.4.3 (2019-03-12) CRAN
- 1.3.1 (2019-02-10) CRAN
- 1.2.4 (2018-07-20) CRAN
- 1.2.3 (2018-05-16) CRAN
- 1.2.2 (2018-05-01) CRAN
- 1.1.7 (2018-03-06) CRAN
- 1.1.6 (2017-11-10) CRAN
- 1.1.5 (2017-04-07) CRAN
- 1.1.3 (2017-03-21) CRAN
- 1.1.2 (2016-09-30) CRAN
- 1.1.1 (2016-05-25) CRAN
- 1.0-1 (2015-10-22) CRAN
- 0.5-5 (2015-06-28) CRAN
- 0.5-2 (2015-06-21) CRAN
- 0.4-1 (2014-12-11) CRAN
- 0.3-1 (2014-11-06) CRAN
- 0.2-5 (2014-05-16) CRAN
- 0.2-4 (2014-05-15) CRAN
- 0.2-3 (2014-05-14) CRAN
- 0.1-25 (2014-03-12) CRAN
- 0.1-24 (2014-03-11) devel
- 0.1-23 (2014-03-11) devel
- 0.1-22 (2014-02-20) devel
- 0.1-21 (2014-02-19) devel
- 0.1-20 (2014-02-17) devel
- 0.1-11 (2013-11-16) devel
- 0.1-10 (2013-11-13) devel
- 0.1-6 (2013-07-05) devel
- 0.1-1 (2013-01-05) devel
- Installing stringi