stri_locate_boundaries: Locate Text Boundaries¶
Description¶
These functions locate text boundaries (like character, word, line, or sentence boundaries). Use stri_locate_all_*
to locate all the matches. stri_locate_first_*
and stri_locate_last_*
give the first or the last matches, respectively.
Usage¶
stri_locate_all_boundaries(
str,
omit_no_match = FALSE,
get_length = FALSE,
...,
opts_brkiter = NULL
)
stri_locate_last_boundaries(str, get_length = FALSE, ..., opts_brkiter = NULL)
stri_locate_first_boundaries(str, get_length = FALSE, ..., opts_brkiter = NULL)
stri_locate_all_words(
str,
omit_no_match = FALSE,
locale = NULL,
get_length = FALSE
)
stri_locate_last_words(str, locale = NULL, get_length = FALSE)
stri_locate_first_words(str, locale = NULL, get_length = FALSE)
Arguments¶
|
character vector or an object coercible to |
|
single logical value; if |
|
single logical value; if |
|
additional settings for |
|
named list with ICU BreakIterator’s settings, see |
|
|
Details¶
Vectorized over str
.
For more information on text boundary analysis performed by ICU’s BreakIterator
, see stringi-search-boundaries.
For stri_locate_*_words
, just like in stri_extract_all_words
and stri_count_words
, ICU’s word BreakIterator
iterator is used to locate the word boundaries, and all non-word characters (UBRK_WORD_NONE
rule status) are ignored. This function is equivalent to a call to stri_locate_*_boundaries(str, type='word', skip_word_none=TRUE, locale=locale)
Value¶
stri_locate_all_*
yields a list of length(str)
integer matrices. stri_locate_first_*
and stri_locate_last_*
generate return an integer matrix. See stri_locate
for more details.
See Also¶
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other search_locate: about_search
, stri_locate_all()
Other indexing: stri_locate_all()
, stri_sub()
, stri_sub_all()
Other locale_sensitive: %s<%()
, about_locale
, about_search_boundaries
, about_search_coll
, stri_compare()
, stri_count_boundaries()
, stri_duplicated()
, stri_enc_detect2()
, stri_extract_all_boundaries()
, stri_opts_collator()
, stri_order()
, stri_rank()
, stri_sort()
, stri_sort_key()
, stri_split_boundaries()
, stri_trans_tolower()
, stri_unique()
, stri_wrap()
Other text_boundaries: about_search
, about_search_boundaries
, stri_count_boundaries()
, stri_extract_all_boundaries()
, stri_opts_brkiter()
, stri_split_boundaries()
, stri_split_lines()
, stri_trans_tolower()
, stri_wrap()
Examples¶
test <- 'The\u00a0above-mentioned features are very useful. Spam, spam, eggs, bacon, and spam.'
stri_locate_all_words(test)
## [[1]]
## start end
## [1,] 1 3
## [2,] 5 9
## [3,] 11 19
## [4,] 24 31
## [5,] 33 35
## [6,] 37 40
## [7,] 42 47
## [8,] 50 53
## [9,] 56 59
## [10,] 62 65
## [11,] 68 72
## [12,] 75 77
## [13,] 79 82
stri_locate_all_boundaries(
'Mr. Jones and Mrs. Brown are very happy. So am I, Prof. Smith.',
type='sentence',
locale='en_US@ss=standard' # ICU >= 56 only
)
## [[1]]
## start end
## [1,] 1 41
## [2,] 42 62