stri_sort: String Sorting¶

Description¶

This function sorts a character vector according to a locale-dependent lexicographic order.

Usage¶

stri_sort(str, decreasing = FALSE, na_last = NA, ..., opts_collator = NULL)

Arguments¶


`str`	a character vector
`decreasing`	a single logical value; should the sort order be nondecreasing (`FALSE`, default, i.e., weakly increasing) or nonincreasing (`TRUE`)?
`na_last`	a single logical value; controls the treatment of `NA`s in `str`. If `TRUE`, then missing values in `str` are put at the end; if `FALSE`, they are put at the beginning; if `NA`, then they are removed from the output
`...`	additional settings for `opts_collator`
`opts_collator`	a named list with ICU Collator’s options, see `stri_opts_collator`, `NULL` for default collation options

Details¶

For more information on ICU’s Collator and how to tune it up in stringi, refer to stri_opts_collator.

As usual in stringi, non-character inputs are coerced to strings, see an example below for a somewhat non-intuitive behavior of lexicographic sorting on numeric inputs.

This function uses a stable sort algorithm (STL’s stable_sort), which performs up to \(N*log^2(N)\) element comparisons, where \(N\) is the length of str.

Value¶

The result is a sorted version of str, i.e., a character vector.

Author(s)¶

Marek Gagolewski and other contributors

References¶

Collation - ICU User Guide, https://unicode-org.github.io/icu/userguide/collation/

Examples¶

stri_sort(c('hladny', 'chladny'), locale='pl_PL')
## [1] "chladny" "hladny"
stri_sort(c('hladny', 'chladny'), locale='sk_SK')
## [1] "hladny"  "chladny"
stri_sort(sample(LETTERS))
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
## [20] "T" "U" "V" "W" "X" "Y" "Z"
stri_sort(c(1, 100, 2, 101, 11, 10))  # lexicographic order
## [1] "1"   "10"  "100" "101" "11"  "2"
stri_sort(c(1, 100, 2, 101, 11, 10), numeric=TRUE)  # OK for integers
## [1] "1"   "2"   "10"  "11"  "100" "101"
stri_sort(c(0.25, 0.5, 1, -1, -2, -3), numeric=TRUE)  # incorrect
## [1] "-1"   "-2"   "-3"   "0.5"  "0.25" "1"