dd622f83c1
Address review feedback on #471 from @coderabbitai. The BMP-only codepoint ranges missed two classes of characters: - Non-BMP Han extensions (CJK Unified Ideographs Extension B, C, D, E, F) such as 𠀀. A long string of Extension-B characters would still be tokenized as a single unbreakable unit and overflow the box. - Halfwidth Katakana (U+FF65-U+FF9F) such as カ. Same failure mode. Switch to Unicode script property escapes (\\p{Script=Han}, \\p{Script=Hiragana}, \\p{Script=Katakana}, \\p{Script=Hangul}) which cover these cases without enumerating ranges. tsconfig target is ES2020; property escapes require ES2018+ so this is safe. Verified coverage: 漢 あ ア 가 𠀀 カ all match; A and digits do not.