Performance Overview
Measured benchmarks comparing aerospike-py (Rust/PyO3, native async) against the official aerospike Python client (C extension wrapped with loop.run_in_executor).
324 ms → 97 ms (−70%) by stacking 3 actions: switch client → batch_read consolidation → Python 3.14t.
How fast — cumulative gain on p95
Original (official client, 3.11 GIL, gather) ████████████████████████ 324 ms baseline
+ Replace with aerospike-py ██████████████ 189 ms −42%
+ gather(N) → single batch_read(mixed keys) █████████ 126 ms −61%
+ Python 3.14t free-threaded ███████ 97 ms −70% 🔥
↑ 3.3× faster
Setup: FastAPI + DLRM + Aerospike CE, k6 10 VUs × 60s, 2 replicas (4 CPU / 4 GiB request).
Recommended actions (in order)
| # | Action | Effect on p95 |
|---|---|---|
| 1 | Replace official client → aerospike-py | −42% (Python 3.11) |
| 2 | gather(N) → single batch_read(mixed keys) | −33% more (under GIL) |
| 3 | Move runtime to Python 3.14t free-threaded | −49% more, TPS +47% |
| 4 | Keep AEROSPIKE_PY_INTERNAL_METRICS=1 on | E2E overhead ≈ 0 |
Pick your environment (Python 3.11 + GIL)
The thinner the surrounding stack, the larger the gap. Tail-latency advantage survives even in production-shaped workloads.
- C — uvicorn + DLRM (real serving)
- B — uvicorn ASGI only
- A — Pure DB client (no HTTP/ML)
Full HTTP → batch_read → DLRM CPU inference → response. Closest to a real recsys serving pod.
| Metric | aerospike-py | official | aerospike-py advantage |
|---|---|---|---|
| avg | 118 ms | 146 ms | −19% |
| p90 | 173 ms | 293 ms | −41% |
| p95 | 189 ms | 324 ms | −42% 🔥 |
FastAPI + uvicorn around batch_read, no model inference. The "REST API in front of a key-value lookup" shape.
| Metric | aerospike-py | official | advantage |
|---|---|---|---|
| total mean | 228 ms | 290 ms | −21% |
| TPS | 19.4 | 16.6 | +17% |
Python loop drives batch_read directly. Largest gap — surrounding stack is as thin as possible.
| Metric | aerospike-py | official | advantage |
|---|---|---|---|
| avg mean | 22 ms | 108 ms | −80% 🔥 |
| avg p99 | 121 ms | 195 ms | −38% |
| TPS | 374 | 138 | +171% 🔥 |
Read next
- Benchmarks — per-environment numbers, k6 raw output, server-side Prometheus metrics
- Free-Threaded Python — what changes when the GIL is removed (3.14t)
- Bottleneck Analysis — internal stage profiling, why action #2 works
To reproduce locally, see benchmark/README.md.