stri_unique: Extract Unique Elements#

Description#

This function returns a character vector like str, but with duplicate elements removed.

Usage#

stri_unique(str, ..., opts_collator = NULL)

Arguments#

str

a character vector

...

additional settings for opts_collator

opts_collator

a named list with ICU Collator’s options, see stri_opts_collator, NULL for default collation options

Details#

As usual in stringi, no attributes are copied. Unlike unique, this function tests for canonical equivalence of strings (and not whether the strings are just bytewise equal). Such an operation is locale-dependent. Hence, stri_unique is significantly slower (but much better suited for natural language processing) than its base R counterpart.

See also stri_duplicated for indicating non-unique elements.

Value#

Returns a character vector.

Author(s)#

Marek Gagolewski and other contributors

References#

Collation - ICU User Guide, https://unicode-org.github.io/icu/userguide/collation/

See Also#

The official online manual of stringi at https://stringi.gagolewski.com/

Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02

Other locale_sensitive: %s<%(), about_locale, about_search_boundaries, about_search_coll, stri_compare(), stri_count_boundaries(), stri_duplicated(), stri_enc_detect2(), stri_extract_all_boundaries(), stri_locate_all_boundaries(), stri_opts_collator(), stri_order(), stri_rank(), stri_sort_key(), stri_sort(), stri_split_boundaries(), stri_trans_tolower(), stri_wrap()

Examples#

# normalized and non-Unicode-normalized version of the same code point:
stri_unique(c('\u0105', stri_trans_nfkd('\u0105')))
## [1] "ą"
unique(c('\u0105', stri_trans_nfkd('\u0105')))
## [1] "ą" "ą"
stri_unique(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'), strength=1)
## [1] "groß"