버전: In Development

Internal Stage Profiling

aerospike-py exposes a second, more granular histogram alongside db_client_operation_duration_seconds:

db_client_internal_stage_seconds — a histogram that breaks a single batch_read down into its internal stages (key parsing, Tokio scheduling, I/O, to_dict / to_numpy conversion, event-loop resume, etc.).

Unlike operational metrics which are always on, internal stage profiling is off by default. When disabled, every stage timer call site skips Instant::now() entirely — the disabled path costs a single atomic load (~1ns per batch).

Quick Start

import aerospike_py

# Off by default
assert aerospike_py.is_internal_stage_metrics_enabled() is False

# Turn it on for a debug session
aerospike_py.set_internal_stage_metrics_enabled(True)
lazy_records = await client.batch_read(keys)
print(aerospike_py.get_metrics())   # db_client_internal_stage_seconds now populated

# Or scope it to a single block
with aerospike_py.internal_stage_profiling():
    lazy_records = await client.batch_read(keys)
# Flag automatically restored on exit, even if the block raised

How to Enable

Method	When to use
`AEROSPIKE_PY_INTERNAL_METRICS=1` env var	Whole-process default at startup (K8s `ConfigMap`, `docker run -e`, CI). Truthy: `1`, `true`, `True`, `TRUE`, `yes`, `on`.
`aerospike_py.set_internal_stage_metrics_enabled(True)`	Runtime toggle from application code.
`with aerospike_py.internal_stage_profiling(): ...`	Temporary — restores the previous state on exit.

The env var is read once during native module init. Runtime calls win over the env var.

`db_client_internal_stage_seconds`

A histogram tracking per-stage durations inside a single database operation.

Labels:

Label	Examples
`stage`	`key_parse`, `tokio_schedule_delay`, `limiter_wait`, `io`, `spawn_blocking_delay`, `into_pyobject`, `event_loop_resume_delay`, `to_dict`, `to_numpy`, `merge_to_dict`, `future_into_py_setup`
`db_operation_name`	`batch_read` (more ops may be instrumented later)

Buckets (sub-microsecond precision): 1μs, 5μs, 10μs, 50μs, 100μs, 500μs, 1ms, 2ms, 5ms, 10ms, 20ms, 50ms, 100ms, 500ms, 1s

Stage Reference (`batch_read`)

Stage	What it measures
`key_parse`	Python keys → Rust tuples (GIL held)
`future_into_py_setup`	`future_into_py` sync setup (GIL held)
`tokio_schedule_delay`	Gap from before `future_into_py` setup to the start of the async body on a Tokio worker. Note: includes the synchronous `future_into_py_setup` window as well as pure scheduling delay — subtracting `future_into_py_setup` approximates the isolated scheduling component (a small `stage_timer!` macro overhead also falls inside this window).
`limiter_wait`	Time waiting for the backpressure semaphore
`io`	Aerospike network round-trip
`spawn_blocking_delay`	Gap between I/O completion and `IntoPyObject::into_pyobject` starting on a spawn-blocking thread (GIL-bound)
`into_pyobject`	`Arc` wrap + `LazyBatchRecords` construction
`event_loop_resume_delay`	Gap between `into_pyobject` returning and the Python coroutine actually resuming in the event loop
`to_dict`	`batch_to_dict_py` conversion (GIL held).
`to_numpy`	`batch_to_numpy_py` structured-array fill. The hot `Value → buffer` loop runs under `Python::detach`, so the timer wraps only the GIL-held framing (allocation, dtype parse, `key_map` build) — not the GIL-released fill itself.
`merge_to_dict`	Static `LazyBatchRecords.merge_to_dict` — single-GIL merge of multiple handles.

Why This Is Opt-In

Stage profiling adds measurable latency: each batch_read pays ~10× Instant::now() plus 10 histogram observations. On a benchmark with 20 concurrent workers we measured, for db_client_operation_duration_seconds of batch_read:

	OFF	ON	Δ
avg	2.16 ms	3.22 ms	+1.06 ms
p95	4.67 ms	7.61 ms	+2.94 ms
p99	4.93 ms	9.51 ms	+4.58 ms

Leave it off in production, and flip it on only when you are actively hunting a latency regression.

Scraping and Visualization

The histogram is exposed via the same get_metrics() / start_metrics_server() surface as the operational histogram — no separate endpoint. A Grafana panel can query, for example, the p95 of to_dict:

histogram_quantile(
  0.95,
  sum by (le) (rate(
    db_client_internal_stage_seconds_bucket{stage="to_dict"}[5m]
  ))
)

A stacked timeseries of all stages gives you a quick view of where each batch_read is spending its time:

sum by (stage, db_operation_name) (
  rate(db_client_internal_stage_seconds_sum[1m])
) / sum by (stage, db_operation_name) (
  rate(db_client_internal_stage_seconds_count[1m])
)

Confirming the Toggle State

At process start, aerospike-py logs its effective state:

[aerospike-py] internal_stage_metrics_enabled=False (AEROSPIKE_PY_INTERNAL_METRICS=None)

You can also publish it as a Prometheus gauge in your own app for dashboard visibility:

from prometheus_client import Gauge
import aerospike_py

g = Gauge("aerospike_py_internal_stage_metrics_enabled",
          "1 if stage profiling is ON, 0 otherwise")
g.set(1 if aerospike_py.is_internal_stage_metrics_enabled() else 0)

Troubleshooting

db_client_internal_stage_seconds shows up in /metrics but has no samples The # HELP / # TYPE headers are always registered; samples appear only after the toggle is ON and a batch_read has actually completed.

My context-manager block raised an exception — is the flag still ON? No. internal_stage_profiling() restores the previous state via finally, identical to contextlib.contextmanager semantics.

Does this affect db_client_operation_duration_seconds? No. Operational metrics (db_client_operation_duration_seconds) are controlled by set_metrics_enabled / is_metrics_enabled and are independent of the stage toggle.

Quick Start​

How to Enable​

db_client_internal_stage_seconds​

Stage Reference (batch_read)​

Why This Is Opt-In​

Scraping and Visualization​

Confirming the Toggle State​

Troubleshooting​