Skip to main content
Version: main

Benchmarks

Per-environment numbers backing the Overview. All measurements use Python 3.11 with the GIL (the production default). For the free-threaded Python 3.14t comparison, see Free-Threaded Python.

Where the gap lives

Mean latency advantage of aerospike-py vs official (Python 3.11 + GIL)

A) Pure DB client ████████████████████ −80% (108 → 22 ms) 🔥
B) uvicorn ASGI █████ −21% (290 → 228 ms)
C) uvicorn + DLRM ███████████ −42% p95 (324→189 ms) 🔥

p95 advantage holds even when mean compresses (C): the GIL serializes
official-client result conversion at the tail.

Environment A — Pure DB Client

What this isolates. No FastAPI, no model inference, no HTTP loadgen. A Python loop drives batch_read directly — the surrounding stack is as thin as possible, so aerospike-py's advantage is largest here.

Setup

ItemValue
Sourcebenchmark/results/20260416_134243/report.md
Clientsofficial (sync C), official-async (sync wrapped via executor), py-async (aerospike-py)
Sets / batch sizes9 / 50, 200
Concurrency / iterations10 / 30

Aggregate result (avg over 9 sets × 2 batch sizes)

Clientavg mean (ms)avg p99 (ms)avg TPS
official107.56195.34138.2
official-async110.64211.33125.7
py-async (aerospike-py)22.45120.67373.7
py-async vs official4.8× faster latency1.6×2.7× higher TPS
py-async vs official-async4.9× faster1.8×3.0×

Per-set speedup distribution (aerospike-py vs official)

Batch 50:   mean speedup across 9 sets   4.4× ──────── 7.8×    (median ~5.8×)
Batch 200: mean speedup across 9 sets 3.1× ──────── 6.6× (median ~4.5×)

Outlier: set_8 (0% found rate) → fast not-found path, not real read latency.
set_8 is the outlier

0% found rate — the official client returns errors faster than success paths, so this row reflects "fast not-found" rather than real read latency. All other sets show 4–8× mean speedup.

official-async (the sync C client wrapped with loop.run_in_executor) is slightly slower than the bare sync client across every set — each request pays a thread-pool hop. aerospike-py's gap over official-async is therefore consistently larger than its gap over official.

Full per-set table (18 rows)
setbatchofficial mean (ms)official p99aerospike-py meanaerospike-py p99mean speedup
set_150110.23200.1419.74105.465.6×
set_1200127.06206.1930.53194.054.2×
set_250121.76210.2415.5734.687.8×
set_2200110.86194.8224.98124.864.4×
set_350108.00184.3218.53103.225.8×
set_3200128.85220.1523.35110.905.5×
set_450109.15206.1118.18116.926.0×
set_4200118.69195.7723.12109.325.1×
set_550113.21195.2725.57301.284.4×
set_5200122.26197.8926.85148.114.6×
set_650115.30210.7418.87103.986.1×
set_6200123.93261.4130.47122.364.1×
set_750115.34190.6817.8344.866.5×
set_7200126.81215.8741.17133.613.1×
set_85013.9296.2715.2097.950.9× ⚠
set_820019.95102.1216.27116.031.2× ⚠
set_950111.80217.1416.9295.596.6×
set_9200139.05210.9520.95108.816.6×

Pattern across environments

   layers added →            ratio compresses,      tail still wins
───────────── ───────────────── ───────────────
A) Pure DB (no HTTP/ML) mean 4.8× p99 1.6×
B) uvicorn ASGI mean 1.27× ≈ noise at C=5
C) uvicorn + DLRM mean 1.24× p95 −42% 🔥

As layers are added around the DB call, the ratio compresses (4.8× → 1.24× mean) but upper-percentile advantage holds — aerospike-py keeps the GIL released during I/O, while the official client serializes GIL acquisition through run_in_executor and spikes at the tail.

What's next: