Skip to main content
Version: 0.10.8

Performance Overview

Measured benchmarks comparing aerospike-py (Rust/PyO3, native async) against the official aerospike Python client (C extension wrapped with loop.run_in_executor).

TL;DR — cumulative effect on DLRM-serving p95

324 ms → 97 ms (−70%) by stacking 3 actions: switch client → batch_read consolidation → Python 3.14t.

How fast — cumulative gain on p95

Original (official client, 3.11 GIL, gather)    ████████████████████████  324 ms   baseline
+ Replace with aerospike-py ██████████████ 189 ms −42%
+ gather(N) → single batch_read(mixed keys) █████████ 126 ms −61%
+ Python 3.14t free-threaded ███████ 97 ms −70% 🔥
↑ 3.3× faster

Setup: FastAPI + DLRM + Aerospike CE, k6 10 VUs × 60s, 2 replicas (4 CPU / 4 GiB request).

#ActionEffect on p95
1Replace official client → aerospike-py−42% (Python 3.11)
2gather(N) → single batch_read(mixed keys)−33% more (under GIL)
3Move runtime to Python 3.14t free-threaded−49% more, TPS +47%
4Keep AEROSPIKE_PY_INTERNAL_METRICS=1 onE2E overhead ≈ 0

Pick your environment (Python 3.11 + GIL)

The thinner the surrounding stack, the larger the gap. Tail-latency advantage survives even in production-shaped workloads.

Full HTTP → batch_read → DLRM CPU inference → response. Closest to a real recsys serving pod.

Metricaerospike-pyofficialaerospike-py advantage
avg118 ms146 ms−19%
p90173 ms293 ms−41%
p95189 ms324 ms−42% 🔥

→ Full numbers (Benchmarks C)

To reproduce locally, see benchmark/README.md.