Benchmarks¶
How to reproduce and extend the translit benchmark suite.
For published results and analysis, see Performance.
Benchmark suite overview¶
The benchmarks/ directory contains three tiers of benchmarks:
| Script | Framework | Purpose | Typical runtime |
|---|---|---|---|
bench_core.rs |
Criterion.rs | Pure-Rust microbenchmarks — measures core transforms without PyO3 overhead | ~2 min |
bench_pyperf.py |
pyperf | Rigorous Python-level benchmarks with statistical analysis — translit vs competitors | ~15 min |
bench_quick.py |
stdlib timeit |
Quick sanity-check timing — no external dependencies beyond translit | ~30 sec |
Additional focused scripts:
| Script | Measures |
|---|---|
bench_transliterate.py |
Transliteration across scripts and input sizes |
bench_slugify.py |
Slugification with various option combinations |
bench_vs_unidecode.py |
Head-to-head comparison against Unidecode |
Quick start¶
# Build in release mode (critical for accurate results)
maturin develop --release
# Quick sanity check — no extra deps needed
python benchmarks/bench_quick.py
# Full rigorous suite
pip install pyperf Unidecode text-unidecode anyascii python-slugify pathvalidate
python benchmarks/bench_pyperf.py -o results.json
# View results
python -m pyperf stats results.json
Rust benchmarks (Criterion)¶
bench_core.rs measures the Rust implementation functions directly, bypassing PyO3. This isolates the algorithmic performance from boundary-crossing overhead.
Benchmark groups:
- transliterate — ASCII passthrough, Latin diacritics, Cyrillic, CJK, mixed-script, language-specific (
lang="ru") - table_lookup — per-character lookup latency for Latin extended, Cyrillic, CJK, Hangul, ASCII
- slugify — default config, bounded with word boundary
- fold_case — ASCII, Latin diacritics, German eszett, Greek, mixed-script
- whitespace — messy input with control chars and zero-width, clean passthrough
- scripts — script detection and mixed-script classification
- grapheme — grapheme cluster length and splitting for ASCII and emoji
# Run all Criterion benchmarks
cargo bench --no-default-features
# Run a specific group
cargo bench --no-default-features -- transliterate
# Generate HTML reports (written to target/criterion/)
cargo bench --no-default-features
open target/criterion/report/index.html
Criterion automatically generates HTML reports with statistical analysis, violin plots, and regression detection. Reports are written to target/criterion/ and can be served locally.
To compare before/after an optimization, run the baseline first, make changes, then run again — Criterion automatically compares against the previous run and reports regressions.
Python benchmarks (pyperf)¶
bench_pyperf.py is the primary Python benchmark script. It uses pyperf for rigorous statistical methodology: separate process invocations per benchmark, automatic warmup calibration, and mean ± std dev across multiple runs.
Benchmark groups:
- transliterate — translit vs Unidecode, text-unidecode, anyascii across Latin (short/long), Cyrillic (short/long), CJK (short/long), and mixed scripts
- slugify — translit vs python-slugify with default, long-text, and options-heavy configurations
- normalize — translit vs
unicodedata.normalize()(CPython C extension) - filename — translit vs pathvalidate for simple, Unicode, and adversarial inputs
- strip_accents — translit vs pure-Python NFD + category filter
- fold_case — translit vs
str.casefold()(CPython C builtin) - batch —
transliterate(list)andslugify(list)vs equivalent Python loops
# Full suite (~15 min, high confidence)
python benchmarks/bench_pyperf.py -o results.json
# Quick mode (~5 min, lower confidence)
python benchmarks/bench_pyperf.py --fast -o results.json
# Compare two runs (before/after optimization)
python -m pyperf compare_to baseline.json improved.json
# Detailed stats
python -m pyperf stats results.json
Input corpora¶
Both Python and Rust benchmarks use consistent input corpora designed to exercise different code paths:
| Input | Script | Length | Exercises |
|---|---|---|---|
| Latin short | French diacritics | 42 chars | Flat BMP array lookup, accent handling |
| Latin long | French repeated ×10 | ~1.7 KB | Throughput at document scale |
| Cyrillic short | Russian text | 45 chars | Cyrillic transliteration table |
| Cyrillic long | Russian repeated ×10 | ~1.5 KB | Cyrillic throughput |
| CJK short | Chinese addresses | 12 chars | Hanzi→Pinyin PHF, script-transition spacing |
| CJK long | Chinese repeated ×10 | ~0.8 KB | CJK throughput with space insertion |
| Mixed | Latin + Cyrillic + CJK + diacritics | 50 chars | Script detection, table switching |
| ASCII | Pure English text | 11–120 chars | Fast-path bypass (no Rust entry) |
Short inputs (12–52 chars) represent per-record processing in databases and APIs. Long inputs (0.8–1.7 KB) represent document and batch processing. Both are essential — short inputs expose per-call overhead, long inputs expose algorithmic throughput.
Methodology¶
- pyperf: Each benchmark runs in a separate process to eliminate cross-benchmark interference. pyperf automatically calibrates loop count, warmup iterations, and run count. Results report mean ± standard deviation across process invocations (not just in-process loops), reducing the impact of GC, JIT warmup, and OS scheduling.
- Criterion: Statistical analysis with configurable confidence intervals (default 95%). Automatic comparison against previous runs with noise threshold filtering. Violin plots show distribution shape, not just mean.
- System tuning: Run
python -m pyperf system tunebefore Python benchmarks for lowest-variance results. For Rust, ensure--releaseprofile is used. - Build mode: Always benchmark release builds.
maturin develop(without--release) produces debug builds that are 10–50× slower and not representative of production performance.
Adding new benchmarks¶
Python (pyperf): Add a function add_<name>_benchmarks(runner) in bench_pyperf.py following the existing pattern, then call it from main(). Use the runner.timeit() API with pre-imported globals to avoid measuring import overhead.
Rust (Criterion): Add a benchmark function bench_<name>(c: &mut Criterion) in bench_core.rs, create a benchmark group, and add it to the criterion_group! macro. Use black_box() on inputs to prevent compiler optimization from eliminating the computation.