Skip to content

Segment Store Garbage Collection

Oak's segment store uses generational garbage collection to reclaim disk space. Content writes create immutable segments, but edits and deletes leave "garbage" (unreachable segments). GC compacts live data into a new generation and deletes old TAR files.

This is the "cost" of the append-only DAG architecture. Understanding it is important for operating Oak Chain at scale.

⚠️ Key Difference: Consensus-Based GC

Oak Chain is Different

In traditional Oak, GC is a local operation. In Oak Chain, GC must go through Raft consensus to maintain deterministic state across all validators. There is no "offline" mode in a distributed consensus system.

Consensus-Based GC Process

In Oak Chain, GC is proposal-based through Raft consensus:

Step 1: Epoch Trigger

GC is triggered by Ethereum epoch finalization:

  • Ethereum block finality triggers GC check
  • Only the Raft leader can propose GC
  • Leader checks if garbage threshold is exceeded

Step 2: GC Proposal

The leader creates a signed GC proposal:

  • Proposal type: COMPACT
  • Signed with leader's wallet
  • Includes epoch reference for ordering

Step 3: Raft Consensus

Proposal is replicated to all validators:

  • Broadcast via Aeron Raft
  • Requires majority acknowledgment
  • All validators receive identical proposal

Step 4: Deterministic GC Execution

Critical: All validators must apply the same GC decision:

  • Same input → Same output
  • Each node applies the agreed GC operation locally
  • Results must be identical (hash verification)

Current Implementation Note (as of February 8, 2026)

The current consensus execution path in GCProposalManager invokes FileStore.cleanup() directly.
This is deterministic and consensus-safe, but it is not the full fullGC() pipeline (estimation + compaction + cleanup).

Step 5: Consensus Commit

GC is committed to the Raft log:

  • Committed at specific Raft log index
  • Durable once majority confirms
  • All validators reclaim same space

Deep Dive: Compaction

What Gets Compacted? (Target Full GC Behavior)

  1. Journal Head → The current repository state
  2. Checkpoints → Async indexing save points (compacted first, with deduplication)
  3. Content Tree → All reachable nodes and properties

The Reachability Problem

Segment graphs are extremely dense. A single reachable record in a segment keeps the entire segment alive, along with all segments it references. This is why:

  • Typical stores have 70-90% garbage
  • But cleanup may only reclaim 50-80% of it
  • Multiple GC cycles are needed for full cleanup

Compaction Behavior

BehaviorDescriptionSpeedThoroughness
Incremental cyclesCompact current reachable graph and reclaim unreachable generationsFastProgressive
Deep reclamationMultiple cycles to converge after heavy churn/write-delete stormsSlowerHigher reclaim

In Oak Chain, compaction is consensus-driven and applied deterministically across validators. Recovery after heavy churn is achieved through repeated consensus GC cycles, not standalone local "offline GC".

Cleanup-Only vs Full GC

Current behavior:

  • Consensus flow can execute cleanup() successfully and deterministically.
  • cleanup() reclaims segments/files eligible by generation/reclaimer rules.
  • If no eligible old generations/files exist, cleanup runs but may reclaim 0 B.

Target behavior:

  • Execute full compaction + cleanup (fullGC() or compactFull() then cleanup()) in the consensus path.
  • This should improve reclaim after write/delete churn when data is still in current generations.

Generational Model

How Generations Work

  1. Generation N is the current (active) generation
  2. Generation N-1 is retained as a safety buffer
  3. Generation N-2 is deleted during cleanup

Why Retain 2 Generations?

Readers may hold references to segments in the previous generation. Keeping 2 generations ensures no reader sees a "segment not found" error during GC.

This is a trade-off:

  • More retained generations = more disk space
  • Fewer retained generations = risk of read failures

Oak fixes this at 2 generations (not configurable since Oak 1.8).


TAR File Cleanup

The Cleanup Process

  1. Scan each TAR file's segment index
  2. Check if segments are referenced by current generation
  3. Mark empty TAR files for deletion
  4. Rewrite partial TAR files (optional, to reclaim more space)
  5. Reaper thread deletes marked files

TAR File Naming

data00000a.tar  ← Generation 1
data00001a.tar  ← Generation 1
data00000b.tar  ← Generation 2
data00001b.tar  ← Generation 2

The letter suffix indicates the generation. During cleanup, entire generations of TAR files are removed.


Consensus GC & GC Debt Economics

Consensus GC (Only Mode)

In Oak Chain, there is no standalone local/offline GC mode for validators. All GC is coordinated through consensus:

  • ✅ Leader creates signed GC proposal
  • ✅ Proposal replicated via Aeron Raft
  • ✅ All validators compact deterministically
  • ✅ Same input → Same output (critical!)
  • ✅ GC committed to Raft log

Why No Offline GC?

In a distributed consensus system, all validators must have identical state. If one validator ran "offline" GC independently, its segment store would diverge from others, breaking consensus. GC must be coordinated through Raft to maintain determinism.

GC Debt Model (ADR 017)

Delete operations have economic implications:

ConceptDescription
GC DebtDelete operations incur debt (estimated cleanup cost)
Per-Wallet TrackingDebt tracked per Ethereum wallet address
Write BlockingWrites blocked if debt exceeds limit
PaymentPay ETH to ValidatorPayment contract to clear debt
IncentiveValidators incentivized to run GC (reduces storage costs)
Delete Operation → Debt Accrual → [If over limit] → Writes Blocked

                                                    Pay ETH

                                                  Writes Unblocked

This creates a sustainable economic model where:

  • Authors pay for the storage cost of their content
  • Delete operations aren't "free" (they incur GC debt)
  • Validators are compensated for storage and GC overhead

Periodic Debt Conversion Job

PeriodicGCJob is a scheduler for the GC account/debt model. It is not a GC proposal sweeper.

  • Runs once per epoch interval
  • Epoch cadence follows the active chain/finality profile.
  • Converts account pendingDebtexecutedDebt
  • Enforces writesBlocked when executed debt exceeds limit
  • Logs blocked entities and debt conversion activity

Proposal lifecycle (/v1/propose-gc, /v1/gc/vote, /v1/gc/execute) is handled separately by GCProposalManager.


Monitoring GC (API-First)

Oak Chain validators are operated through HTTP APIs and logs. Do not rely on AEM/Felix/JMX consoles.

Primary GC/Fragmentation APIs

  • GET /v1/gc/status
  • GET /v1/gc/estimate?wallet=0x...
  • GET /v1/gc/account/{wallet}
  • POST /v1/propose-gc
  • POST /v1/gc/execute
  • POST /v1/gc/trigger
  • GET /v1/fragmentation/metrics
  • GET /v1/fragmentation/metrics/{wallet}
  • GET /v1/fragmentation/top
  • GET /v1/compaction/proposals
  • GET /v1/proposals/queue/stats (to confirm finalization/drain state before and after GC)

Representative Log Messages

TarMK GC #2: started
TarMK GC #2: estimation started
TarMK GC #2: estimation completed in 961.8 μs. Estimated garbage: 45%
TarMK GC #2: compaction started, gc options=...
TarMK GC #2: running full compaction
TarMK GC #2: compaction succeeded in 6.580 min, after 2 cycles
TarMK GC #2: cleanup started
TarMK GC #2: cleanup completed in 16.23 min. Post cleanup size is 10.4 GB and space reclaimed 84.5 GB.

Operational Nuance (Observed in E2E on February 8, 2026)

  • GC proposal/voting/execution can succeed while reclaim is 0 B.
  • Example evidence:
    • TarMK GC #0: cleanup started using reclaimer (full generation older than 0.0, with 2 retained generations)
    • cleanup marking files for deletion: none
    • cleanup completed ... space reclaimed 0 B
  • This means revision cleanup ran, but no TAR files were eligible for deletion in that cycle.

Best Practices

For Oak Chain Validators

  1. Run consensus GC only (no local one-off compaction path).
  2. Confirm queue settled before GC (verifiedCount=0, batchQueueSize=0).
  3. Monitor via HTTP endpoints + logs, not JMX/Felix.
  4. Size disks for churn headroom; delete-heavy workloads reclaim over cycles, not instantly.

Tuning Parameters

ParameterDefaultDescription
gcSizeDeltaEstimation1 GBNew content threshold to trigger GC
retryCount5Max compaction retry cycles
forceTimeout60sMax time to block writes for force compact
memoryThreshold5%Min free heap to continue compaction

Common Issues

IssueCauseSolution
GC never completesHigh write loadSchedule during quiet periods
Disk keeps growingGC not runningCheck scheduler, run manually
"Segment not found"Divergent state or transport/storage issueValidate cluster consistency and segment transfer paths
Slow compactionLarge storeUse tail compaction, add RAM

The Economics of GC

In Oak Chain, GC has economic implications:

  • Validators pay for storage (disk costs)
  • Authors pay for writes (ETH per write)
  • GC is a validator cost, not author cost

This creates an incentive for validators to:

  1. Run efficient GC to minimize storage costs
  2. Price writes to cover long-term storage + GC overhead
  3. Encourage authors to avoid unnecessary content churn

Apache 2.0 Licensed