Version: 0.10.6

Performance Overview

Measured benchmarks comparing aerospike-py (Rust/PyO3, native async) against the official aerospike Python client (C extension wrapped with loop.run_in_executor). Reproducible from benchmark/ in this repo.

↓ lower is better, ↑ higher is better, 🔥 ≥50% improvement. Default test setup: FastAPI + DLRM + Aerospike CE, k6 10 VUs × 60s.

How fast — cumulative effect on DLRM-serving p95

Step	p95	vs original
Original (official client + Python 3.11 + `gather`)	324 ms	baseline
+ Replace with aerospike-py	189 ms	−42%
+ `gather(N)` → single `batch_read(mixed keys)`	126 ms	−61%
+ Python 3.14t free-threaded	97 ms	−70% 🔥 (3.3× faster)

Recommended actions

#	Action	Effect
1	Replace official client → aerospike-py	p95 −42% (Python 3.11)
2	Move runtime to Python 3.14t free-threaded	p95 −49% more, TPS +47% (no Rust changes)
3	`gather(N)` → single `batch_read(mixed keys)`	p95 −33% (under GIL)
4	Keep `AEROSPIKE_PY_INTERNAL_METRICS=1` always on	E2E overhead ≈ 0, instant per-stage attribution

Environment summary (Python 3.11 + GIL)

The thinner the surrounding stack, the larger the gap. Tail latency advantage survives even in production-shaped workloads.

Environment	aerospike-py vs official	Detail
A) Pure DB client (no HTTP/ML)	avg −80% (108→22 ms), TPS +171% (138→374) 🔥	Benchmarks → A
B) uvicorn ASGI (FastAPI + DB, no ML)	mean −21% (290→228 ms), TPS +17%	Benchmarks → B
C) uvicorn + DLRM (real serving)	p95 −42% (324→189 ms), avg −19%	Benchmarks → C

How fast — cumulative effect on DLRM-serving p95​

Recommended actions​

Environment summary (Python 3.11 + GIL)​

Read next​

How fast — cumulative effect on DLRM-serving p95

Recommended actions

Environment summary (Python 3.11 + GIL)

Read next