Fixes#5264
Chinese, Japanese, Korean, and other Unicode characters are now
properly recognized in hashtags, following the standard hashtag
parsing conventions used by Twitter, Instagram, and GitHub.
Changes:
- Updated tag parser to allow Unicode letters and digits
- Tags stop at whitespace and punctuation (both ASCII and CJK)
- Allow dash, underscore, forward slash in tags
- Added comprehensive tests for CJK characters and emoji
Examples:
- #测试 → recognized as tag '测试'
- #日本語 → recognized as tag '日本語'
- #한국어 → recognized as tag '한국어'
- #测试。→ recognized as tag '测试' (stops at punctuation)
- #work/测试/项目 → hierarchical tag with Unicode