stri_unique: Extract Unique Elements¶

Description¶

This function returns a character vector like str, but with duplicate elements removed.

Usage¶

stri_unique(str, ..., opts_collator = NULL)

Arguments¶


`str`	a character vector
`...`	additional settings for `opts_collator`
`opts_collator`	a named list with ICU Collator’s options, see `stri_opts_collator`, `NULL` for default collation options

Details¶

As usual in stringi, no attributes are copied. Unlike unique, this function tests for canonical equivalence of strings (and not whether the strings are just bytewise equal). Such an operation is locale-dependent. Hence, stri_unique is significantly slower (but much better suited for natural language processing) than its base R counterpart.

See also stri_duplicated for indicating non-unique elements.

Value¶

Returns a character vector.

Author(s)¶

Marek Gagolewski and other contributors

References¶

Collation - ICU User Guide, https://unicode-org.github.io/icu/userguide/collation/

Examples¶

# normalized and non-Unicode-normalized version of the same code point:
stri_unique(c('\u0105', stri_trans_nfkd('\u0105')))
## [1] "ą"
unique(c('\u0105', stri_trans_nfkd('\u0105')))
## [1] "ą" "ą"
stri_unique(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'), strength=1)
## [1] "groß"