Enums & Types

Script

from translit import Script

Enum of Unicode script identifiers returned by detect_scripts().

Major world scripts

Member Value
Script.LATIN "Latin"
Script.CYRILLIC "Cyrillic"
Script.GREEK "Greek"
Script.ARABIC "Arabic"
Script.HEBREW "Hebrew"

Indic scripts

Member Value
Script.DEVANAGARI "Devanagari"
Script.BENGALI "Bengali"
Script.GURMUKHI "Gurmukhi"
Script.GUJARATI "Gujarati"
Script.ORIYA "Oriya"
Script.TAMIL "Tamil"
Script.TELUGU "Telugu"
Script.KANNADA "Kannada"
Script.MALAYALAM "Malayalam"
Script.SINHALA "Sinhala"

East Asian scripts

Member Value
Script.HAN "Han"
Script.HIRAGANA "Hiragana"
Script.KATAKANA "Katakana"
Script.HANGUL "Hangul"

Southeast Asian scripts

Member Value
Script.THAI "Thai"
Script.LAO "Lao"
Script.MYANMAR "Myanmar"
Script.KHMER "Khmer"
Script.BALINESE "Balinese"
Script.JAVANESE "Javanese"
Script.TAI_LE "TaiLe"
Script.NEW_TAI_LUE "NewTaiLue"

Central/North Asian scripts

Member Value
Script.TIBETAN "Tibetan"
Script.MONGOLIAN "Mongolian"

Caucasian scripts

Member Value
Script.GEORGIAN "Georgian"
Script.ARMENIAN "Armenian"

African scripts

Member Value
Script.ETHIOPIC "Ethiopic"
Script.NKO "NKo"
Script.VAI "Vai"

Middle Eastern scripts

Member Value
Script.SYRIAC "Syriac"
Script.THAANA "Thaana"
Script.COPTIC "Coptic"

Americas

Member Value
Script.CHEROKEE "Cherokee"
Script.CANADIAN_ABORIGINAL "CanadianAboriginal"

Historical European scripts

Member Value
Script.RUNIC "Runic"
Script.OGHAM "Ogham"
Script.GOTHIC "Gothic"

Ancient Near Eastern scripts

Member Value
Script.OLD_PERSIAN "OldPersian"
Script.CUNEIFORM "Cuneiform"
Script.LINEAR_B "LinearB"

Meta-scripts

Member Value
Script.COMMON "Common"
Script.INHERITED "Inherited"

NF

from translit import NF

Enum of Unicode normalization forms.

Member Value Description
NF.C "NFC" Canonical Decomposition + Composition
NF.D "NFD" Canonical Decomposition
NF.KC "NFKC" Compatibility Decomposition + Composition
NF.KD "NFKD" Compatibility Decomposition

EmojiProvider

from translit import EmojiProvider

Protocol for custom emoji name providers. Implement this to supply your own emoji-to-text mappings for demojize() and set_emoji_provider().

class FrenchEmoji:
    def lookup(self, sequence: list[int]) -> str | None:
        table = {(0x1F600,): "visage souriant"}
        return table.get(tuple(sequence))

from translit import demojize
demojize("hello 😀", provider=FrenchEmoji())

Required method

Method Signature Description
lookup (self, sequence: list[int]) -> str \| None Return text name for an emoji codepoint sequence, or None if not recognized

The sequence argument is a list of Unicode codepoints forming the emoji (e.g., [0x1F468, 0x200D, 0x1F469] for a ZWJ sequence).

Type aliases

Defined in translit._types:

ErrorMode

ErrorMode = Literal["replace", "ignore", "preserve"]

Controls behavior when a character has no transliteration mapping.

Value Behavior
"replace" Substitute with replace_with string
"ignore" Silently drop the character
"preserve" Keep the original character unchanged

Platform

Platform = Literal["universal", "posix", "windows"]

Target platform for filename sanitization rules.

NormalizationForm

NormalizationForm = Literal["NFC", "NFD", "NFKC", "NFKD"]

Unicode normalization form identifier.

Language constants

Pre-defined string constants for language codes:

from translit import LANG_DE, LANG_FR, LANG_ES  # etc.

European

LANG_BG, LANG_CA, LANG_CS, LANG_CY, LANG_DA, LANG_DE, LANG_EL, LANG_ES, LANG_ET, LANG_FI, LANG_FR, LANG_GA, LANG_HR, LANG_HU, LANG_IS, LANG_IT, LANG_LT, LANG_LV, LANG_MT, LANG_NL, LANG_NO, LANG_PL, LANG_PT, LANG_RO, LANG_RU, LANG_SK, LANG_SL, LANG_SQ, LANG_SR, LANG_SV, LANG_TR, LANG_UK

Semitic

LANG_HE

Caucasian

LANG_HY, LANG_KA

South Asian (Indic)

LANG_AS, LANG_BN, LANG_GU, LANG_HI, LANG_KN, LANG_ML, LANG_MR, LANG_NE, LANG_OR, LANG_PA, LANG_SA, LANG_SI, LANG_TA, LANG_TE

Southeast Asian

LANG_KM, LANG_LO, LANG_MY, LANG_TH, LANG_VI

Middle Eastern

LANG_AR, LANG_DV, LANG_FA

East Asian

LANG_JA, LANG_KO, LANG_ZH

Central/North Asian

LANG_BO, LANG_MN

African

LANG_AM, LANG_JV

Auto-detection

LANG_AUTO — pass as lang="auto" to auto-detect the language from the dominant non-Latin script in the input text. See Language Support for the full script-to-language mapping.

Introspection

from translit import list_langs, list_scripts

list_langs()    # → ["am", "ar", "as", "bg", "bn", ...]
list_scripts()  # → ["Arabic", "Armenian", "Balinese", ...]
Function Returns
list_langs() Sorted list of available language codes (str)
list_scripts() Sorted list of recognized Script enum values (str)