Internationalization

13 languages with full Unicode support, CJK/emoji rendering, and automatic locale detection — compiled directly into the C binary.

13
13 Languages
100%
100% Coverage
176
176 Translation Keys
27
27 Source Files

Overview

Scorpiox Code ships with 13 languages compiled into a single static binary. No external locale files, no resource bundles, no runtime dependencies. The CLI auto-detects your system locale via LANG, LC_ALL, and LC_CTYPE environment variables. The web interface parses Accept-Language headers. Fallback is always English.

Supported Languages

Code Language Native Name Direction
en English English LTR
ar Arabic العربية RTL RTL
de German Deutsch LTR
es Spanish Español LTR
fr French Français LTR
it Italian Italiano LTR
ja Japanese 日本語 LTR
ko Korean 한국어 LTR
pt Portuguese Português LTR
ru Russian Русский LTR
tr Turkish Türkçe LTR
zh-CN Chinese (Simplified) 简体中文 LTR
zh-TW Chinese (Traditional) 繁體中文 LTR

Unicode Support

The TUI handles full UTF-8 including CJK wide characters and emoji. Custom C functions compute display widths correctly — essential for terminal column alignment when mixing Latin, CJK, and emoji glyphs.

Core Functions

sx_term_utf8_len()

Returns byte length of a single UTF-8 codepoint from its lead byte

sx_term_utf8_width()

Returns display width (1 or 2) of a UTF-8 codepoint — 2 for CJK/emoji

sx_term_str_width()

Returns total display width of a UTF-8 string, summing per-character widths

sx_sanitize_utf8()

Strips invalid UTF-8 sequences, replacing with safe fallback characters

utf8_to_wide()

Converts UTF-8 byte string to wide-character (wchar_t) string

wide_to_utf8()

Converts wide-character string back to UTF-8 byte string

CJK & Emoji Ranges

CJK Hangul Jamo
U+1100 – U+115F
CJK Radicals → Yi
U+2E80 – U+A4CF
Hangul Syllables
U+AC00 – U+D7A3
CJK Compat Ideographs
U+F900 – U+FAFF
CJK Forms / Compat
U+FE10 – U+FE6F
Fullwidth Forms
U+FF00 – U+FF60
Fullwidth Signs
U+FFE0 – U+FFE6
Emoji
U+1F300 – U+1F9FF
Supplementary CJK
U+20000 – U+3FFFD

Locale Detection

The CLI initializes with setlocale(LC_ALL, "") to respect system settings. It then checks LANG, LC_ALL, and LC_CTYPE environment variables to select the appropriate language. The web server reads the Accept-Language HTTP header and negotiates the best match from the 13 supported languages.

/* CLI locale initialization — every entry point */ setlocale(LC_ALL, ""); /* Environment variables checked (in priority order) */ LC_ALL → overrides everything LC_CTYPE → character classification LANG → default fallback locale /* Web server: HTTP header negotiation */ Accept-Language: ja,en;q=0.9,zh-CN;q=0.8 → resolved: ja (Japanese)

Locale Detection — 14 source files use setlocale():

Website Translation System

The website uses a Python T dictionary pattern: T[key][lang]. Each page defines translation keys for all 13 languages. The detect_lang() function checks ?lang= query params first, then parses Accept-Language headers. All 176 translation keys across 16 page files maintain 100% coverage for every language.

# Python T dictionary pattern (every .py page) T = { 'page_title': { 'en': 'Internationalization — SCORPIOX CODE', 'ja': '国際化 — SCORPIOX CODE', 'ar': 'التدويل — SCORPIOX CODE', # ... 13 languages total }, } # Language detection def detect_lang(environ): # 1. Check ?lang= query parameter # 2. Parse Accept-Language header # 3. Fallback to 'en'

Configuration

# scorpiox-env.txt — force a specific language LANG=ja_JP.UTF-8 # Or set via environment before launching $ export LANG=de_DE.UTF-8 $ scorpiox-code # Web: use ?lang= query parameter https://code.scorpiox.net/?lang=ko https://code.scorpiox.net/?lang=zh-CN https://code.scorpiox.net/?lang=ar

Source Files

Unicode handling is implemented across 27 source files. Key implementation files include sx_term.c/h for terminal width calculations, sxui_input.c for wide-character input, and sx_html_strip.c for UTF-8 safe HTML parsing.