Oak Chain Primary Signals
This page defines the primary operational signals for Oak Chain and how to interpret them under real load.
Why This Matters
Without shared signal definitions, teams misread queue pressure and diagnose the wrong bottleneck during incidents.
What You'll Prove
- You can interpret queue, verifier, durability, and replication signals as one pipeline.
- You can label system state consistently across dashboards and runbooks.
- You can detect settle vs constrained behavior using trend-based metrics.
Next Action
Use this page while viewing live queue/stats output, then tag the system state (IDLE, FILLING, DRAINING, or CONSTRAINED) from real data.
Why This Exists
During stress testing, raw counters can look contradictory unless they are interpreted as a pipeline:
- proposals are accepted
- proposals are verified
- proposals are finalized
- durability acknowledgements close the loop
Primary Signals gives operators a single language for diagnosing where pressure is building.
Related API contract pages:
- API Reference
GET /v1/proposals/queue/stats(primary per-node queue/finality surface)
Signal Groups
Queue
queue.pending/queue.queuePending: queued proposals waiting to be finalized.queue.backpressurePending: backlog held by backpressure controls.queue.oldestPendingAgeMs: age of oldest queued work.
Interpretation:
- High
queue.pendingwith lowbackpressurePendingusually indicates finalization throughput limits. - High
backpressurePendingmeans ingress is outpacing the send/ack loop.
Verifier
verifier.attemptCount,successCount,errorCountverifier.queueWaitAvgMs,queueWaitMaxMsverifier.avgTotalMs,lastTotalMs
Interpretation:
- If verifier wait/total are low but queue grows, verifier is not the bottleneck.
- If verifier wait grows sharply, intake is outrunning verifier scheduling.
Durability
durability.pendingAcksdurability.ackTimeouts
Interpretation:
pendingAcksis in-flight work already sent but not yet durably acknowledged.- Rising then falling under burst load is normal.
- Rising and flat means commit/ack is bottlenecked or stalled.
Replication
replication.maxLagMsreplication.maxLagNodeId
Interpretation:
- Non-zero lag with growth indicates replica catch-up pressure.
- Persistent high lag with queue growth is a cluster-side throughput concern.
Operating States
Use these state labels on dashboards and in runbooks:
IDLE: queue near zero, no backpressure debt, no pending acks.FILLING: net queue slope positive; ingress > finalize.DRAINING: net queue slope negative; finalize > ingress.CONSTRAINED: queue high and flat or rising while backpressure/acks stay elevated.SETTLED: queue drained andtotalVerifiedCount == totalFinalizedCount.
Core Derived Metrics
Recommended derived metrics (computed in dashboard/worker):
Finalize Rate (ops/s): delta oftotalFinalizedCountover windowIngest Rate (ops/s): delta of accepted proposals over windowNet Queue Slope (ops/s):ingest_rate - finalize_rateQueue Pressure: label from slope and absolute queue sizeGap Ratio (%):(totalVerifiedCount - totalFinalizedCount) / max(totalVerifiedCount,1) * 100
Notes:
Gap Ratiocan be high at startup or short windows; always pair it with slope and queue size.- Avoid single-point judgments; require at least a 2-5 minute trend.
Fast Triage Commands
# Cluster overview
curl -s http://127.0.0.1:8787/ops/v1/overview | jq '.data | {
leader, queue, durability, replication
}'
# Leader queue stats
curl -s http://127.0.0.1:8090/v1/proposals/queue/stats | jq '{
totalProposals,totalVerifiedCount,totalFinalizedCount,
verifiedCount,processedCount,batchQueueSize,
backpressurePendingCount,backpressurePendingRawCount,
persistencePendingChanges,persistenceFlushAvgMs,persistenceFlushLastMs,
verifierQueueWaitAvgMs,verifierQueueWaitMaxMs,pendingEpochStats
}'Healthy End-of-Run Criteria
For load-test settlement, require all:
batchQueueSize = 0backpressurePendingCount = 0persistencePendingChanges = 0verifiedCount = 0totalVerifiedCount == totalFinalizedCount
If all pass, the run is operationally settled even after heavy burst traffic.