stri_duplicated: Determine Duplicated Elements¶
Description¶
stri_duplicated()
determines which strings in a character vector are duplicates of other elements.
stri_duplicated_any()
determines if there are any duplicated strings in a character vector.
Usage¶
stri_duplicated(
str,
from_last = FALSE,
fromLast = from_last,
...,
opts_collator = NULL
)
stri_duplicated_any(
str,
from_last = FALSE,
fromLast = from_last,
...,
opts_collator = NULL
)
Arguments¶
|
a character vector |
|
a single logical value; indicates whether search should be performed from the last to the first string |
|
[DEPRECATED] alias of |
|
additional settings for |
|
a named list with ICU Collator’s options, see |
Details¶
Missing values are regarded as equal.
Unlike duplicated
and anyDuplicated
, these functions test for canonical equivalence of strings (and not whether the strings are just bytewise equal) Such operations are locale-dependent. Hence, stri_duplicated
and stri_duplicated_any
are significantly slower (but much better suited for natural language processing) than their base R counterparts.
See also stri_unique
for extracting unique elements.
Value¶
stri_duplicated()
returns a logical vector of the same length as str
. Each of its elements indicates whether a canonically equivalent string was already found in str
.
stri_duplicated_any()
returns a single non-negative integer. Value of 0 indicates that all the elements in str
are unique. Otherwise, it gives the index of the first non-unique element.
References¶
Collation - ICU User Guide, https://unicode-org.github.io/icu/userguide/collation/
See Also¶
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other locale_sensitive: %s<%()
, about_locale
, about_search_boundaries
, about_search_coll
, stri_compare()
, stri_count_boundaries()
, stri_enc_detect2()
, stri_extract_all_boundaries()
, stri_locate_all_boundaries()
, stri_opts_collator()
, stri_order()
, stri_rank()
, stri_sort()
, stri_sort_key()
, stri_split_boundaries()
, stri_trans_tolower()
, stri_unique()
, stri_wrap()
Examples¶
# In the following examples, we have 3 duplicated values,
# 'a' - 2 times, NA - 1 time
stri_duplicated(c('a', 'b', 'a', NA, 'a', NA))
## [1] FALSE FALSE TRUE FALSE TRUE TRUE
stri_duplicated(c('a', 'b', 'a', NA, 'a', NA), from_last=TRUE)
## [1] TRUE FALSE TRUE TRUE FALSE FALSE
stri_duplicated_any(c('a', 'b', 'a', NA, 'a', NA))
## [1] 3
# compare the results:
stri_duplicated(c('\u0105', stri_trans_nfkd('\u0105')))
## [1] FALSE TRUE
duplicated(c('\u0105', stri_trans_nfkd('\u0105')))
## [1] FALSE FALSE
stri_duplicated(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'), strength=1)
## [1] FALSE TRUE TRUE TRUE
duplicated(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'))
## [1] FALSE FALSE FALSE FALSE