stri_sort_key: Sort Keys#

Description#

This function computes a locale-dependent sort key, which is an alternative character representation of the string that, when ordered in the C locale (which orders using the underlying bytes directly), will give an equivalent ordering to the original string. It is useful for enhancing algorithms that sort only in the C locale (e.g., the strcmp function in libc) with the ability to be locale-aware.

Usage#

stri_sort_key(str, ..., opts_collator = NULL)

Arguments#

str

a character vector

...

additional settings for opts_collator

opts_collator

a named list with ICU Collator’s options, see stri_opts_collator, NULL for default collation options

Details#

For more information on ICU’s Collator and how to tune it up in stringi, refer to stri_opts_collator.

See also stri_rank for ranking strings with a single character vector, i.e., generating relative sort keys.

Value#

The result is a character vector with the same length as str that contains the sort keys. The output is marked as bytes-encoded.

Author(s)#

Marek Gagolewski and other contributors

References#

Collation - ICU User Guide, https://unicode-org.github.io/icu/userguide/collation/

See Also#

The official online manual of stringi at https://stringi.gagolewski.com/

Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02

Other locale_sensitive: %s<%(), about_locale, about_search_boundaries, about_search_coll, stri_compare(), stri_count_boundaries(), stri_duplicated(), stri_enc_detect2(), stri_extract_all_boundaries(), stri_locate_all_boundaries(), stri_opts_collator(), stri_order(), stri_rank(), stri_sort(), stri_split_boundaries(), stri_trans_tolower(), stri_unique(), stri_wrap()

Examples#

stri_sort_key(c('hladny', 'chladny'), locale='pl_PL')
## [1] "8@*0DZ\001\n\001\n"  ".8@*0DZ\001\v\001\v"
stri_sort_key(c('hladny', 'chladny'), locale='sk_SK')
## [1] "8@*0DZ\001\n\001\n"     "9\002@*0DZ\001\n\001\n"