# about_search_boundaries: Text Boundary Analysis in stringi

## Description

Text boundary analysis is the process of locating linguistic boundaries while formatting and handling text.

## Details

Examples of the boundary analysis process include:

Generally, text boundary analysis is a locale-dependent operation. For example, in Japanese and Chinese one does not separate words with spaces - a line break can occur even in the middle of a word. These languages have punctuation and diacritical marks that cannot start or end a line, so this must also be taken into account.

stringi uses ICU’s BreakIterator to locate specific text boundaries. Note that the BreakIterator’s behavior may be controlled in come cases, see stri_opts_brkiter.

• The character boundary iterator tries to match what a user would think of as a “character” – a basic unit of a writing system for a language – which may be more than just a single Unicode code point.

• The word boundary iterator locates the boundaries of words, for purposes such as “Find whole words” operations.

• The line_break iterator locates positions that would be appropriate to wrap lines when displaying the text.

• The break iterator of type sentence locates sentence boundaries.

For technical details on different classes of text boundaries refer to the ICU User Guide, see below.

## Author(s)

Marek Gagolewski and other contributors

## References

Boundary Analysis – ICU User Guide, https://unicode-org.github.io/icu/userguide/boundaryanalysis/

## See Also

The official online manual of stringi at https://stringi.gagolewski.com/

Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi: 10.18637/jss.v103.i02