버전: 0.9.1

Performance Tuning

Connection Pool

config = {
    "hosts": [("node1", 3000), ("node2", 3000)],
    "max_conns_per_node": 300,   # default: 256
    "min_conns_per_node": 10,    # pre-warm
    "idle_timeout": 55,          # below server proto-fd-idle-ms (60s)
}

Read Optimization

Select Specific Bins

# Reads ALL bins from server
record = client.get(key)

# Reads only what you need (less network I/O)
record = client.select(key, ["name", "age"])

Use Batch Reads

# N sequential round-trips
results = [client.get(k) for k in keys]

# Single round-trip
batch = client.batch_read(keys, bins=["name", "age"])

NumPy Batch Reads

For numeric workloads, skip Python dict overhead entirely:

import numpy as np

dtype = np.dtype([("score", "i8"), ("rating", "f8")])
batch = client.batch_read(keys, bins=["score", "rating"], _dtype=dtype)
# batch.batch_records is a numpy structured array

See NumPy Batch Guide.

Write Optimization

Combine Operations

# Two round-trips
client.put(key, {"counter": 1})
client.put(key, {"updated_at": now})

# Single round-trip
ops = [
    {"op": aerospike.OPERATOR_WRITE, "bin": "counter", "val": 1},
    {"op": aerospike.OPERATOR_WRITE, "bin": "updated_at", "val": now},
]
client.operate(key, ops)

TTL Strategy

client.put(key, bins, meta={"ttl": aerospike.TTL_NEVER_EXPIRE})     # never expire
client.put(key, bins, meta={"ttl": aerospike.TTL_DONT_UPDATE})      # keep existing TTL
client.put(key, bins, meta={"ttl": aerospike.TTL_NAMESPACE_DEFAULT}) # use namespace default

Concurrency & Backpressure Tuning

High-concurrency Python services (FastAPI, Gunicorn workers, Celery fan-out) can saturate the two layers underneath aerospike-py:

The internal Tokio runtime that drives the Rust async client.
The per-node connection pool to the Aerospike server.

There are two independent knobs — pick the right one for the symptom.

`AEROSPIKE_RUNTIME_WORKERS` (env var)

Controls the number of Tokio worker threads used by the embedded async runtime. Default: 2, which keeps CPU overhead low when colocated with CPU-heavy workloads (PyTorch inference, sklearn, etc.).

# Bump worker count when 10+ concurrent FastAPI requests each call
# batch_read and you observe `spawn_blocking` queue stalls.
export AEROSPIKE_RUNTIME_WORKERS=4

Workers	Use case
`2` (default)	Most applications, ML serving, single-tenant web servers
`4`	Concurrent batch_read fan-out, FastAPI with many in-flight requests
`4–8`	High-throughput pipelines, Gunicorn with `--workers >= 4` per process
`8+`	Rarely needed — profile first with `py-spy`/`tokio-console`

Symptoms that mean "increase workers":

await client.batch_read(...) p99 latency rises sharply at >10 in-flight callers, while server-side metrics show the cluster is healthy.
tokio-console (or a Tokio runtime metric) shows a queue depth that grows unboundedly during load.

The env var is read once at runtime initialization (first AsyncClient.connect()). Changing it after the runtime is up has no effect — set it before importing aerospike_py.

`max_concurrent_operations` (client config)

Caps the number of in-flight operations dispatched into the Rust client at any moment. Disabled by default (0, zero overhead). When set, excess callers queue for a slot instead of failing or exhausting the connection pool.

config = {
    # "aerospike" = service name in your Podman/compose file; use 127.0.0.1 for local dev
    "hosts": [("aerospike", 3000)],
    "max_concurrent_operations": 64,    # at most 64 in-flight ops
    "operation_queue_timeout_ms": 5000, # raise BackpressureError after 5s
}

When enabled:

Operations beyond the limit wait for a free slot.
Waiting operations resume as soon as a previous one completes.
aerospike_py.BackpressureError is raised only if operation_queue_timeout_ms expires before a slot frees up.

Choosing the value: set this close to (but no higher than) max_conns_per_node (default 256). For a 3-node cluster, 64 is a conservative starting point that prevents pool exhaustion while keeping throughput high.

Enable when: high-fanout batch reads where the spawn_blocking queue would otherwise stall, or when an upstream caller (FastAPI under load test) can issue more concurrent ops than the connection pool can serve.

Quick before/after

# Before: 100 concurrent FastAPI requests calling batch_read each
# may stall on the Tokio queue with default 2 workers and no cap.

# After (env): export AEROSPIKE_RUNTIME_WORKERS=4
# AND (programmatic):
import aerospike_py

client = aerospike_py.AsyncClient({
    # "aerospike" = service name in your Podman/compose file; use 127.0.0.1 for local dev
    "hosts": [("aerospike", 3000)],
    "max_concurrent_operations": 64,    # caps in-flight ops
    "operation_queue_timeout_ms": 5000,
})
await client.connect()

FastAPI / Gunicorn recommendations

For a FastAPI service deployed under Gunicorn with uvicorn workers (see examples/sample-fastapi/):

Setting	Recommended starting value	Notes
`AEROSPIKE_RUNTIME_WORKERS`	`4`	Set in the deployment env, not in code.
`max_concurrent_operations`	`64`	Per `AsyncClient` instance, per worker process.
`operation_queue_timeout_ms`	`5000`	Pair with FastAPI request timeout.
Gunicorn `--workers`	`2 * CPU`	Each worker has its own client + Tokio runtime.
`max_conns_per_node`	`256`	Stay well above `max_concurrent_operations`.

A single Gunicorn worker with the values above can sustain ~64 concurrent in-flight Aerospike ops without exhausting the pool. Total cluster-side load = gunicorn_workers * max_concurrent_operations; size accordingly.

Async Client

For high-concurrency workloads (web servers, fan-out reads):

import asyncio

async def main() -> None:
    client = aerospike.AsyncClient({
        "hosts": [("127.0.0.1", 3000)],
        "max_concurrent_operations": 64,  # prevent pool exhaustion
    })
    await client.connect()

    keys = [("test", "demo", f"key{i}") for i in range(1000)]
    results = await asyncio.gather(*(client.get(k) for k in keys))

    await client.close()

Expression Filters

Push filtering to the server to reduce network transfer:

from aerospike_py import exp

# Without filter: transfers ALL records, filters in Python
results = client.query("test", "demo").results()
active = [r for r in results if r.bins.get("active")]

# With filter: server returns only matching records
expr = exp.eq(exp.bool_bin("active"), exp.bool_val(True))
results = client.query("test", "demo").results(policy={"filter_expression": expr})

Timeout Guidelines

Setting	Recommendation
`socket_timeout`	1-5s. Catches hung connections.
`total_timeout`	Set based on SLA. Includes retries.
`max_retries`	2-3 for reads, 0 for writes (idempotency).

Connection Pool​

Read Optimization​

Select Specific Bins​

Use Batch Reads​

NumPy Batch Reads​

Write Optimization​

Combine Operations​

TTL Strategy​

Concurrency & Backpressure Tuning​

AEROSPIKE_RUNTIME_WORKERS (env var)​

max_concurrent_operations (client config)​

Quick before/after​

FastAPI / Gunicorn recommendations​

Async Client​

Expression Filters​

Timeout Guidelines​