Performance Overview
Measured benchmarks comparing aerospike-py (Rust/PyO3, native async) against the official aerospike Python client (C extension wrapped with loop.run_in_executor). Reproducible from benchmark/ in this repo.
↓lower is better,↑higher is better, 🔥 ≥50% improvement. Default test setup: FastAPI + DLRM + Aerospike CE, k6 10 VUs × 60s.
How fast — cumulative effect on DLRM-serving p95
| Step | p95 | vs original |
|---|---|---|
Original (official client + Python 3.11 + gather) | 324 ms | baseline |
| + Replace with aerospike-py | 189 ms | −42% |
+ gather(N) → single batch_read(mixed keys) | 126 ms | −61% |
| + Python 3.14t free-threaded | 97 ms | −70% 🔥 (3.3× faster) |
Recommended actions
| # | Action | Effect |
|---|---|---|
| 1 | Replace official client → aerospike-py | p95 −42% (Python 3.11) |
| 2 | Move runtime to Python 3.14t free-threaded | p95 −49% more, TPS +47% (no Rust changes) |
| 3 | gather(N) → single batch_read(mixed keys) | p95 −33% (under GIL) |
| 4 | Keep AEROSPIKE_PY_INTERNAL_METRICS=1 always on | E2E overhead ≈ 0, instant per-stage attribution |
Environment summary (Python 3.11 + GIL)
The thinner the surrounding stack, the larger the gap. Tail latency advantage survives even in production-shaped workloads.
| Environment | aerospike-py vs official | Detail |
|---|---|---|
| A) Pure DB client (no HTTP/ML) | avg −80% (108→22 ms), TPS +171% (138→374) 🔥 | Benchmarks → A |
| B) uvicorn ASGI (FastAPI + DB, no ML) | mean −21% (290→228 ms), TPS +17% | Benchmarks → B |
| C) uvicorn + DLRM (real serving) | p95 −42% (324→189 ms), avg −19% | Benchmarks → C |
Read next
- Benchmarks — per-set tables, k6 raw output, server-side Prometheus metrics
- Free-Threaded Python — what changes when the GIL is removed (3.14t)
- Bottleneck Analysis — internal stage profiling, why action #3 works
To reproduce locally, see benchmark/README.md.