Class WCWidth
The WCWidth class provides methods for calculating the display width of Unicode characters in terminal environments. This is important for proper text alignment and cursor positioning, especially when dealing with wide characters (such as East Asian characters) and zero-width characters (such as combining marks).
This implementation is based on Markus Kuhn's wcwidth implementation, which follows the Unicode Standard guidelines for character width. It categorizes characters as:
- Zero width (0) - Control characters, combining marks, format characters
- Single width (1) - Most Latin, Greek, Cyrillic, and other scripts
- Double width (2) - East Asian scripts (Chinese, Japanese, Korean)
- Ambiguous width (-1) - Characters with context-dependent width
Tables are generated from Unicode 16.0 data files:
combiningtable from UnicodeData.txt (categories Mn, Me, Cf, plus Hangul Jamo U+1160-11FF and U+200B)- East Asian Width from EastAsianWidth.txt
- Emoji presentation from emoji-data.txt
This class is used throughout JLine for calculating string display widths, which is essential for proper terminal display formatting, cursor positioning, and text alignment.
-
Method Summary
Modifier and TypeMethodDescriptionstatic intcharCountForDisplay(CharSequence cs, int index, Terminal terminal) Compute the number of Java chars that form the display unit (code point or grapheme cluster) starting atindexincs.static intcharCountForGraphemeCluster(CharSequence cs, int index) Returns the number ofchars consumed by the grapheme cluster starting atindexin the givenCharSequence.static booleanisRegionalIndicator(int cp) Determines whether a Unicode code point is a Regional Indicator Symbol (U+1F1E6..U+1F1FF).static intwcwidth(int ucs) static intwcwidthForDisplay(CharSequence cs, int index, Terminal terminal) Compute the display width in terminal columns of the character or grapheme cluster that begins at the given index in the character sequence.static intwcwidthForGraphemeCluster(CharSequence cs, int index) Returns the display width of the grapheme cluster starting atindex.
-
Method Details
-
wcwidth
public static int wcwidth(int ucs) -
charCountForGraphemeCluster
Returns the number ofchars consumed by the grapheme cluster starting atindexin the givenCharSequence.A grapheme cluster is a user-perceived character that may be composed of multiple Unicode code points. This method recognizes:
- ZWJ sequences (e.g., family emoji ๐จโ๐ฉโ๐งโ๐ฆ)
- Regional indicator pairs (flags, e.g., ๐ซ๐ท)
- Emoji modifier sequences (skin tones, e.g., ๐๐ฝ)
- Variation selector sequences (U+FE0E, U+FE0F)
- Combining mark sequences
- Parameters:
cs- the character sequenceindex- the starting char index- Returns:
- the number of chars consumed by the grapheme cluster
-
wcwidthForGraphemeCluster
Returns the display width of the grapheme cluster starting atindex.Variation selectors override the base code point's width: VS16 (
U+FE0F) upgrades the cluster to emoji presentation (width 2), while VS15 (U+FE0E) downgrades it to text presentation (width 1). When neither is present, the width of the base code point (viawcwidth(int)) is used.- Parameters:
cs- the character sequenceindex- the starting char index- Returns:
- the display width of the grapheme cluster, same range as
wcwidth(int)
-
wcwidthForDisplay
Compute the display width in terminal columns of the character or grapheme cluster that begins at the given index in the character sequence.If the terminal has grapheme-cluster mode enabled, or if
terminalisnulland the runtime provides JDK-level grapheme-cluster support (JDK 21+), the measurement is grapheme-cluster-aware so emoji variation selectors and ZWJ sequences are handled as a single display unit. Otherwise the width of the single code point atindexis returned.- Parameters:
cs- the character sequence containing the clusterindex- the starting char index of the character or clusterterminal- the terminal to query for grapheme-cluster mode, ornull- Returns:
- the display width in terminal columns for the character or cluster
-
charCountForDisplay
Compute the number of Java chars that form the display unit (code point or grapheme cluster) starting atindexincs.If the terminal indicates grapheme cluster mode, or when
terminalisnulland JDK grapheme-cluster support is available (JDK 21+), this uses grapheme-cluster segmentation so ZWJ sequences, flag pairs, skin-tone modifiers, and similar multi-code-point units are treated as a single unit and may span multiplechars. Otherwise it returns thechar countfor the code point atindex.- Parameters:
cs- the character sequenceindex- the starting char indexterminal- the terminal to consult for grapheme cluster mode, ornull- Returns:
- the number of
charunits to advance past the display unit beginning atindex
-
isRegionalIndicator
public static boolean isRegionalIndicator(int cp) Determines whether a Unicode code point is a Regional Indicator Symbol (U+1F1E6..U+1F1FF).- Parameters:
cp- the Unicode code point to test- Returns:
- `true` if the code point is within U+1F1E6..U+1F1FF, `false` otherwise
-