Oak Chain Primary Signals

This page defines the primary operational signals for Oak Chain and how to interpret them under real load.

Why This Matters

Without shared signal definitions, teams misread queue pressure and diagnose the wrong bottleneck during incidents.

What You'll Prove

You can interpret queue, verifier, durability, and replication signals as one pipeline.
You can label system state consistently across dashboards and runbooks.
You can detect settle vs constrained behavior using trend-based metrics.

Next Action

Use this page while viewing live queue/stats output, then tag the system state (IDLE, FILLING, DRAINING, or CONSTRAINED) from real data.

Why This Exists

During stress testing, raw counters can look contradictory unless they are interpreted as a pipeline:

proposals are accepted
proposals are verified
proposals are finalized
durability acknowledgements close the loop

Primary Signals gives operators a single language for diagnosing where pressure is building.

Related API contract pages:

API Reference
GET /v1/proposals/queue/stats (primary per-node queue/finality surface)

Signal Groups

Queue

queue.pending / queue.queuePending: queued proposals waiting to be finalized.
queue.backpressurePending: backlog held by backpressure controls.
queue.oldestPendingAgeMs: age of oldest queued work.

Interpretation:

High queue.pending with low backpressurePending usually indicates finalization throughput limits.
High backpressurePending means ingress is outpacing the send/ack loop.

Verifier

verifier.attemptCount, successCount, errorCount
verifier.queueWaitAvgMs, queueWaitMaxMs
verifier.avgTotalMs, lastTotalMs

Interpretation:

If verifier wait/total are low but queue grows, verifier is not the bottleneck.
If verifier wait grows sharply, intake is outrunning verifier scheduling.

Durability

durability.pendingAcks
durability.ackTimeouts

Interpretation:

pendingAcks is in-flight work already sent but not yet durably acknowledged.
Rising then falling under burst load is normal.
Rising and flat means commit/ack is bottlenecked or stalled.

Replication

replication.maxLagMs
replication.maxLagNodeId

Interpretation:

Non-zero lag with growth indicates replica catch-up pressure.
Persistent high lag with queue growth is a cluster-side throughput concern.

Operating States

Use these state labels on dashboards and in runbooks:

IDLE: queue near zero, no backpressure debt, no pending acks.
FILLING: net queue slope positive; ingress > finalize.
DRAINING: net queue slope negative; finalize > ingress.
CONSTRAINED: queue high and flat or rising while backpressure/acks stay elevated.
SETTLED: queue drained and totalVerifiedCount == totalFinalizedCount.

Core Derived Metrics

Recommended derived metrics (computed in dashboard/worker):

Finalize Rate (ops/s): delta of totalFinalizedCount over window
Ingest Rate (ops/s): delta of accepted proposals over window
Net Queue Slope (ops/s): ingest_rate - finalize_rate
Queue Pressure: label from slope and absolute queue size
Gap Ratio (%): (totalVerifiedCount - totalFinalizedCount) / max(totalVerifiedCount,1) * 100

Notes:

Gap Ratio can be high at startup or short windows; always pair it with slope and queue size.
Avoid single-point judgments; require at least a 2-5 minute trend.

Fast Triage Commands

bash

# Cluster overview
curl -s http://127.0.0.1:8787/ops/v1/overview | jq '.data | {
  leader, queue, durability, replication
}'

# Leader queue stats
curl -s http://127.0.0.1:8090/v1/proposals/queue/stats | jq '{
  totalProposals,totalVerifiedCount,totalFinalizedCount,
  verifiedCount,processedCount,batchQueueSize,
  backpressurePendingCount,backpressurePendingRawCount,
  persistencePendingChanges,persistenceFlushAvgMs,persistenceFlushLastMs,
  verifierQueueWaitAvgMs,verifierQueueWaitMaxMs,pendingEpochStats
}'

Healthy End-of-Run Criteria

For load-test settlement, require all:

batchQueueSize = 0
backpressurePendingCount = 0
persistencePendingChanges = 0
verifiedCount = 0
totalVerifiedCount == totalFinalizedCount

If all pass, the run is operationally settled even after heavy burst traffic.

Oak Chain Primary Signals ​

Why This Matters ​

What You'll Prove ​

Next Action ​

Why This Exists ​

Signal Groups ​

Queue ​

Verifier ​

Durability ​

Replication ​

Operating States ​

Core Derived Metrics ​

Fast Triage Commands ​

Healthy End-of-Run Criteria ​

Oak Chain Primary Signals

Why This Matters

What You'll Prove

Next Action

Why This Exists

Signal Groups

Queue

Verifier

Durability

Replication

Operating States

Core Derived Metrics

Fast Triage Commands

Healthy End-of-Run Criteria