Benchmarks

How to reproduce and extend the translit benchmark suite.

For published results and analysis, see Performance.

Benchmark suite overview

The benchmarks/ directory contains three tiers of benchmarks:

Script Framework Purpose Typical runtime
bench_core.rs Criterion.rs Pure-Rust microbenchmarks — measures core transforms without PyO3 overhead ~2 min
bench_pyperf.py pyperf Rigorous Python-level benchmarks with statistical analysis — translit vs competitors ~15 min
bench_quick.py stdlib timeit Quick sanity-check timing — no external dependencies beyond translit ~30 sec

Additional focused scripts:

Script Measures
bench_transliterate.py Transliteration across scripts and input sizes
bench_slugify.py Slugification with various option combinations
bench_vs_unidecode.py Head-to-head comparison against Unidecode

Quick start

# Build in release mode (critical for accurate results)
maturin develop --release

# Quick sanity check — no extra deps needed
python benchmarks/bench_quick.py

# Full rigorous suite
pip install pyperf Unidecode text-unidecode anyascii python-slugify pathvalidate
python benchmarks/bench_pyperf.py -o results.json

# View results
python -m pyperf stats results.json

Rust benchmarks (Criterion)

bench_core.rs measures the Rust implementation functions directly, bypassing PyO3. This isolates the algorithmic performance from boundary-crossing overhead.

Benchmark groups:

  • transliterate — ASCII passthrough, Latin diacritics, Cyrillic, CJK, mixed-script, language-specific (lang="ru")
  • table_lookup — per-character lookup latency for Latin extended, Cyrillic, CJK, Hangul, ASCII
  • slugify — default config, bounded with word boundary
  • fold_case — ASCII, Latin diacritics, German eszett, Greek, mixed-script
  • whitespace — messy input with control chars and zero-width, clean passthrough
  • scripts — script detection and mixed-script classification
  • grapheme — grapheme cluster length and splitting for ASCII and emoji
# Run all Criterion benchmarks
cargo bench --no-default-features

# Run a specific group
cargo bench --no-default-features -- transliterate

# Generate HTML reports (written to target/criterion/)
cargo bench --no-default-features
open target/criterion/report/index.html

Criterion automatically generates HTML reports with statistical analysis, violin plots, and regression detection. Reports are written to target/criterion/ and can be served locally.

To compare before/after an optimization, run the baseline first, make changes, then run again — Criterion automatically compares against the previous run and reports regressions.

Python benchmarks (pyperf)

bench_pyperf.py is the primary Python benchmark script. It uses pyperf for rigorous statistical methodology: separate process invocations per benchmark, automatic warmup calibration, and mean ± std dev across multiple runs.

Benchmark groups:

  • transliterate — translit vs Unidecode, text-unidecode, anyascii across Latin (short/long), Cyrillic (short/long), CJK (short/long), and mixed scripts
  • slugify — translit vs python-slugify with default, long-text, and options-heavy configurations
  • normalize — translit vs unicodedata.normalize() (CPython C extension)
  • filename — translit vs pathvalidate for simple, Unicode, and adversarial inputs
  • strip_accents — translit vs pure-Python NFD + category filter
  • fold_case — translit vs str.casefold() (CPython C builtin)
  • batchtransliterate(list) and slugify(list) vs equivalent Python loops
# Full suite (~15 min, high confidence)
python benchmarks/bench_pyperf.py -o results.json

# Quick mode (~5 min, lower confidence)
python benchmarks/bench_pyperf.py --fast -o results.json

# Compare two runs (before/after optimization)
python -m pyperf compare_to baseline.json improved.json

# Detailed stats
python -m pyperf stats results.json

Input corpora

Both Python and Rust benchmarks use consistent input corpora designed to exercise different code paths:

Input Script Length Exercises
Latin short French diacritics 42 chars Flat BMP array lookup, accent handling
Latin long French repeated ×10 ~1.7 KB Throughput at document scale
Cyrillic short Russian text 45 chars Cyrillic transliteration table
Cyrillic long Russian repeated ×10 ~1.5 KB Cyrillic throughput
CJK short Chinese addresses 12 chars Hanzi→Pinyin PHF, script-transition spacing
CJK long Chinese repeated ×10 ~0.8 KB CJK throughput with space insertion
Mixed Latin + Cyrillic + CJK + diacritics 50 chars Script detection, table switching
ASCII Pure English text 11–120 chars Fast-path bypass (no Rust entry)

Short inputs (12–52 chars) represent per-record processing in databases and APIs. Long inputs (0.8–1.7 KB) represent document and batch processing. Both are essential — short inputs expose per-call overhead, long inputs expose algorithmic throughput.

Methodology

  • pyperf: Each benchmark runs in a separate process to eliminate cross-benchmark interference. pyperf automatically calibrates loop count, warmup iterations, and run count. Results report mean ± standard deviation across process invocations (not just in-process loops), reducing the impact of GC, JIT warmup, and OS scheduling.
  • Criterion: Statistical analysis with configurable confidence intervals (default 95%). Automatic comparison against previous runs with noise threshold filtering. Violin plots show distribution shape, not just mean.
  • System tuning: Run python -m pyperf system tune before Python benchmarks for lowest-variance results. For Rust, ensure --release profile is used.
  • Build mode: Always benchmark release builds. maturin develop (without --release) produces debug builds that are 10–50× slower and not representative of production performance.

Adding new benchmarks

Python (pyperf): Add a function add_<name>_benchmarks(runner) in bench_pyperf.py following the existing pattern, then call it from main(). Use the runner.timeit() API with pre-imported globals to avoid measuring import overhead.

Rust (Criterion): Add a benchmark function bench_<name>(c: &mut Criterion) in bench_core.rs, create a benchmark group, and add it to the criterion_group! macro. Use black_box() on inputs to prevent compiler optimization from eliminating the computation.