Adversarial-Text Defense

Unicode gives attackers a large surface for manipulating text that looks unchanged to a human: homoglyph substitution (Latin a → Cyrillic а), invisible character injection (zero-width spaces), zalgo (stacked combining marks), and bidirectional control abuse. These perturbations evade NLP classifiers, bypass content moderation, and corrupt downstream text processing — with no visible cue.

The standard advice is "sanitize your input." But which sanitization? Most pipelines reach for the text-cleaning libraries they already have — ftfy, unidecode, anyascii — which were built for encoding repair and ASCII conversion. translit provides the visual mapping they miss — as a defense-in-depth layer, not a complete control.

Scope. translit canonicalizes the confusables it bundles (TR39) and strips the format characters it enumerates. It does not promise to stop any attack class, and the confusable space is far larger than any table. See the Threat Model and Coverage and limits below.

The core distinction: visual vs. phonetic mapping

The single architectural choice that determines whether a tool can reverse a homoglyph attack is how it maps a confusable character:

Approach Example Reverses a TR39 homoglyph?
Phonetic transliteration Cyrillic р (U+0440) → Latin r (by sound) ❌ No — produces r, not the original p
Visual confusable mapping (TR39) Cyrillic р (U+0440) → Latin p (by appearance) ✅ For confusables in the TR39 table — restores the prototype the attacker replaced

An attacker who swaps Latin p for the identical-looking Cyrillic р is exploiting appearance. Only a tool that maps by appearance — per Unicode Technical Report #39 — undoes the substitution. unidecode, anyascii, cyrtranslit, and uroman all map phonetically, so they cannot.

translit implements TR39 visual confusable mapping. Use normalize_confusables and strip_obfuscation for defense; use transliterate only when you want phonetic romanization (e.g. building a readable slug), never as a security control.

Evidence

This distinction was evaluated empirically in "Fire Extinguishers Full of Gasoline: Evaluating Unicode Text Normalization as a Defence Against Adversarial Attacks" — a benchmark of eight preprocessing tools, two independent TR39 implementations, and seven Unicode normalization baselines across six attack types, three downstream tasks (SST-2, toxicity, AG News), and two model architectures (DistilBERT, RoBERTa-base): 435,864 experimental observations. Headline results:

  • Phonetic tools plateau; visual mapping recovers the tested pairs. On homoglyph attacks, phonetic transliterators recover roughly half of inputs (XMR ≈ 0.49), while TR39 visual mapping (translit-rs) reached XMR = 1.000 on the tested TR39 pairs (17 Latin–Cyrillic, 19 Greek). That is coverage of those pairs — not a guarantee against arbitrary homoglyphs (see Coverage and limits).
  • ftfy is equivalent to doing nothing (TOST equivalence, δ = 0.05, across all six attack types).
  • unidecode actively harms. It maps invisible characters to visible ASCII sequences, introducing spurious tokens and significantly degrading classifier accuracy on invisible-character attacks (McNemar's test, p = 6.9 × 10⁻⁹).
  • Plain Unicode normalization is not a defense. NFC, NFKC, NFKD, and casefold provide zero defense against homoglyphs and negligible defense against the rest.
  • Preserve case. A case-preserving pipeline fully restores downstream accuracy; a case-folding variant costs 3.4 pp on cased models. translit's defense pipelines preserve case by design (only ml_normalize folds case, deliberately).
  • Direction matters. Normalize confusables toward the text's dominant script. For Cyrillic-native text, normalizing toward Latin reduces a Cyrillic-native model to near-chance — normalize_confusables(text, target_script="cyrillic") exists for this.

The XMR metric is published as a versioned specification on Zenodo: 10.5281/zenodo.19323513.

Exact Match Recovery (XMR)

XMR measures whether a preprocessing function P exactly reverses an adversarial corruption C on a corpus T:

XMR(P, C, T) = (1/|T|) · Σ  1[ P(C(t)) == P(t) ]   for t in T

It compares the preprocessed-corrupted text against the preprocessed-clean text (not the raw original), so it is fair to tools that alter clean text as a side effect. It is inference-free (O(n) string comparison), decomposable per attack type, and a conservative upper bound on failure rate.

Coverage and limits

The XMR results above measure the tested TR39 pairs. Real coverage is bounded by the bundled data and by what normalization can do at all:

  • Single-letter Latin confusables: complete. translit folds 100% of UTS#39 single-codepoint confusables whose prototype is a basic Latin letter (gated by tests/test_confusable_coverage.py). This is the dominant real-world case — registered homograph domains are overwhelmingly single-character Latin substitutions (Holgers et al., USENIX 2006).
  • The confusable space is unbounded. Deng et al. (2020) found 8,000+ homoglyphs with deep learning; measured against translit's bundled data, ~89% of their letter homoglyphs are not in TR39 at all. A TR39-derived tool cannot canonicalize what TR39 does not list.
  • Normalization alone is a partial defense on real text. On real phishing, table-driven confusable lookup restores only ~35% of perturbed words, vs ~96% for a context-aware model (Lee et al., BitAbuse, 2025). Use translit as the fast, deterministic first layer — not the whole pipeline.

Out of scope by design (not bugs): confusables outside the bundled table, whole-script spoofs, multi-character confusables (rnm), and Unicode-version skew. See the full Threat Model.

What to use

Goal Use Pipeline
Fold confusables in a string (TR39) normalize_confusables(text) NFKC-free, single pass
Maximum deobfuscation (homoglyph + zalgo + invisible + bidi + emoji) strip_obfuscation(text) NFKC → strip zalgo → strip bidi → strip zero-width → demojize → confusables → strip accents → collapse
Clean untrusted user input sanitize_user_input(text) NFKC → strip bidi → strip zero-width → strip zalgo → confusables → collapse
General security cleanup security_clean(text) NFKC → confusables → strip bidi → collapse
Detect (don't transform) is_confusable(text), is_mixed_script(text) predicate
Check a domain for IDN spoofing is_safe_hostname(host) per-label script + confusable analysis
from translit import strip_obfuscation, normalize_confusables, is_safe_hostname

assert strip_obfuscation("рroduсt") == 'product'
assert normalize_confusables("раypal") == 'paypal'

safe, details = is_safe_hostname("аpple.com")   # leading Cyrillic а
# safe is False; details.mixed_script and details.has_confusables explain why

strip_obfuscation deliberately does not transliterate (it preserves case and non-confusable characters). If you also need ASCII romanization, chain transliterate() afterwards.

See also