stri_enc_mark: Get Declared Encodings of Each String
Reads declared encodings for each string in a character vector as seen by stringi.
character vector or an object coercible to a character vector
Encoding, R has a simple encoding marking mechanism: strings can be declared to be in
Moreover, we may check (via the R/C API) whether a string is in ASCII (R assumes that this holds if and only if all bytes in a string are not greater than 127, so there is an implicit assumption that your platform uses an encoding that extends ASCII) or in the system’s default (a.k.a.
Intuitively, the default encoding should be equivalent to the one you use on
stdin (e.g., your ‘keyboard’). In stringi we assume that such an encoding is equivalent to the one returned by
stri_enc_get. It is automatically detected by ICU to match – by default – the encoding part of the
LC_CTYPE category as given by
Returns a character vector of the same length as
str. Unlike in the
Encoding function, here the possible encodings are:
UTF-8. Additionally, missing values are handled properly.
This gives exactly the same data that is used by all the functions in stringi to re-encode their inputs.
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02