stri_unique: Extract Unique Elements¶
Description¶
This function returns a character vector like str
, but with duplicate elements removed.
Usage¶
stri_unique(str, ..., opts_collator = NULL)
Arguments¶
|
a character vector |
|
additional settings for |
|
a named list with ICU Collator’s options, see |
Details¶
As usual in stringi, no attributes are copied. Unlike unique
, this function tests for canonical equivalence of strings (and not whether the strings are just bytewise equal). Such an operation is locale-dependent. Hence, stri_unique
is significantly slower (but much better suited for natural language processing) than its base R counterpart.
See also stri_duplicated
for indicating non-unique elements.
Value¶
Returns a character vector.
References¶
Collation - ICU User Guide, https://unicode-org.github.io/icu/userguide/collation/
See Also¶
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other locale_sensitive: %s<%()
, about_locale
, about_search_boundaries
, about_search_coll
, stri_compare()
, stri_count_boundaries()
, stri_duplicated()
, stri_enc_detect2()
, stri_extract_all_boundaries()
, stri_locate_all_boundaries()
, stri_opts_collator()
, stri_order()
, stri_rank()
, stri_sort()
, stri_sort_key()
, stri_split_boundaries()
, stri_trans_tolower()
, stri_wrap()
Examples¶
# normalized and non-Unicode-normalized version of the same code point:
stri_unique(c('\u0105', stri_trans_nfkd('\u0105')))
## [1] "ą"
unique(c('\u0105', stri_trans_nfkd('\u0105')))
## [1] "ą" "ą"
stri_unique(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'), strength=1)
## [1] "groß"