Segment Store Garbage Collection
Oak's segment store uses generational garbage collection to reclaim disk space. Content writes create immutable segments, but edits and deletes leave "garbage" (unreachable segments). GC compacts live data into a new generation and deletes old TAR files.
This is the "cost" of the append-only DAG architecture. Understanding it is important for operating Oak Chain at scale.
⚠️ Key Difference: Consensus-Based GC
Oak Chain is Different
In traditional Oak, GC is a local operation. In Oak Chain, GC must go through Raft consensus to maintain deterministic state across all validators. There is no "offline" mode in a distributed consensus system.
Consensus-Based GC Process
In Oak Chain, GC is proposal-based through Raft consensus:
Step 1: Epoch Trigger
GC is triggered by Ethereum epoch finalization:
- Ethereum block finality triggers GC check
- Only the Raft leader can propose GC
- Leader checks if garbage threshold is exceeded
Step 2: GC Proposal
The leader creates a signed GC proposal:
- Proposal type:
COMPACT - Signed with leader's wallet
- Includes epoch reference for ordering
Step 3: Raft Consensus
Proposal is replicated to all validators:
- Broadcast via Aeron Raft
- Requires majority acknowledgment
- All validators receive identical proposal
Step 4: Deterministic GC Execution
Critical: All validators must apply the same GC decision:
- Same input → Same output
- Each node applies the agreed GC operation locally
- Results must be identical (hash verification)
Current Implementation Note (as of February 8, 2026)
The current consensus execution path in GCProposalManager invokes FileStore.cleanup() directly.
This is deterministic and consensus-safe, but it is not the full fullGC() pipeline (estimation + compaction + cleanup).
Step 5: Consensus Commit
GC is committed to the Raft log:
- Committed at specific Raft log index
- Durable once majority confirms
- All validators reclaim same space
Deep Dive: Compaction
What Gets Compacted? (Target Full GC Behavior)
- Journal Head → The current repository state
- Checkpoints → Async indexing save points (compacted first, with deduplication)
- Content Tree → All reachable nodes and properties
The Reachability Problem
Segment graphs are extremely dense. A single reachable record in a segment keeps the entire segment alive, along with all segments it references. This is why:
- Typical stores have 70-90% garbage
- But cleanup may only reclaim 50-80% of it
- Multiple GC cycles are needed for full cleanup
Compaction Behavior
| Behavior | Description | Speed | Thoroughness |
|---|---|---|---|
| Incremental cycles | Compact current reachable graph and reclaim unreachable generations | Fast | Progressive |
| Deep reclamation | Multiple cycles to converge after heavy churn/write-delete storms | Slower | Higher reclaim |
In Oak Chain, compaction is consensus-driven and applied deterministically across validators. Recovery after heavy churn is achieved through repeated consensus GC cycles, not standalone local "offline GC".
Cleanup-Only vs Full GC
Current behavior:
- Consensus flow can execute
cleanup()successfully and deterministically. cleanup()reclaims segments/files eligible by generation/reclaimer rules.- If no eligible old generations/files exist, cleanup runs but may reclaim
0 B.
Target behavior:
- Execute full compaction + cleanup (
fullGC()orcompactFull()thencleanup()) in the consensus path. - This should improve reclaim after write/delete churn when data is still in current generations.
Generational Model
How Generations Work
- Generation N is the current (active) generation
- Generation N-1 is retained as a safety buffer
- Generation N-2 is deleted during cleanup
Why Retain 2 Generations?
Readers may hold references to segments in the previous generation. Keeping 2 generations ensures no reader sees a "segment not found" error during GC.
This is a trade-off:
- More retained generations = more disk space
- Fewer retained generations = risk of read failures
Oak fixes this at 2 generations (not configurable since Oak 1.8).
TAR File Cleanup
The Cleanup Process
- Scan each TAR file's segment index
- Check if segments are referenced by current generation
- Mark empty TAR files for deletion
- Rewrite partial TAR files (optional, to reclaim more space)
- Reaper thread deletes marked files
TAR File Naming
data00000a.tar ← Generation 1
data00001a.tar ← Generation 1
data00000b.tar ← Generation 2
data00001b.tar ← Generation 2The letter suffix indicates the generation. During cleanup, entire generations of TAR files are removed.
Consensus GC & GC Debt Economics
Consensus GC (Only Mode)
In Oak Chain, there is no standalone local/offline GC mode for validators. All GC is coordinated through consensus:
- ✅ Leader creates signed GC proposal
- ✅ Proposal replicated via Aeron Raft
- ✅ All validators compact deterministically
- ✅ Same input → Same output (critical!)
- ✅ GC committed to Raft log
Why No Offline GC?
In a distributed consensus system, all validators must have identical state. If one validator ran "offline" GC independently, its segment store would diverge from others, breaking consensus. GC must be coordinated through Raft to maintain determinism.
GC Debt Model (ADR 017)
Delete operations have economic implications:
| Concept | Description |
|---|---|
| GC Debt | Delete operations incur debt (estimated cleanup cost) |
| Per-Wallet Tracking | Debt tracked per Ethereum wallet address |
| Write Blocking | Writes blocked if debt exceeds limit |
| Payment | Pay ETH to ValidatorPayment contract to clear debt |
| Incentive | Validators incentivized to run GC (reduces storage costs) |
Delete Operation → Debt Accrual → [If over limit] → Writes Blocked
↓
Pay ETH
↓
Writes UnblockedThis creates a sustainable economic model where:
- Authors pay for the storage cost of their content
- Delete operations aren't "free" (they incur GC debt)
- Validators are compensated for storage and GC overhead
Periodic Debt Conversion Job
PeriodicGCJob is a scheduler for the GC account/debt model. It is not a GC proposal sweeper.
- Runs once per epoch interval
- Epoch cadence follows the active chain/finality profile.
- Converts account
pendingDebt→executedDebt - Enforces
writesBlockedwhen executed debt exceeds limit - Logs blocked entities and debt conversion activity
Proposal lifecycle (/v1/propose-gc, /v1/gc/vote, /v1/gc/execute) is handled separately by GCProposalManager.
Monitoring GC (API-First)
Oak Chain validators are operated through HTTP APIs and logs. Do not rely on AEM/Felix/JMX consoles.
Primary GC/Fragmentation APIs
GET /v1/gc/statusGET /v1/gc/estimate?wallet=0x...GET /v1/gc/account/{wallet}POST /v1/propose-gcPOST /v1/gc/executePOST /v1/gc/triggerGET /v1/fragmentation/metricsGET /v1/fragmentation/metrics/{wallet}GET /v1/fragmentation/topGET /v1/compaction/proposalsGET /v1/proposals/queue/stats(to confirm finalization/drain state before and after GC)
Representative Log Messages
TarMK GC #2: started
TarMK GC #2: estimation started
TarMK GC #2: estimation completed in 961.8 μs. Estimated garbage: 45%
TarMK GC #2: compaction started, gc options=...
TarMK GC #2: running full compaction
TarMK GC #2: compaction succeeded in 6.580 min, after 2 cycles
TarMK GC #2: cleanup started
TarMK GC #2: cleanup completed in 16.23 min. Post cleanup size is 10.4 GB and space reclaimed 84.5 GB.Operational Nuance (Observed in E2E on February 8, 2026)
- GC proposal/voting/execution can succeed while reclaim is
0 B. - Example evidence:
TarMK GC #0: cleanup started using reclaimer (full generation older than 0.0, with 2 retained generations)cleanup marking files for deletion: nonecleanup completed ... space reclaimed 0 B
- This means revision cleanup ran, but no TAR files were eligible for deletion in that cycle.
Best Practices
For Oak Chain Validators
- Run consensus GC only (no local one-off compaction path).
- Confirm queue settled before GC (
verifiedCount=0,batchQueueSize=0). - Monitor via HTTP endpoints + logs, not JMX/Felix.
- Size disks for churn headroom; delete-heavy workloads reclaim over cycles, not instantly.
Tuning Parameters
| Parameter | Default | Description |
|---|---|---|
gcSizeDeltaEstimation | 1 GB | New content threshold to trigger GC |
retryCount | 5 | Max compaction retry cycles |
forceTimeout | 60s | Max time to block writes for force compact |
memoryThreshold | 5% | Min free heap to continue compaction |
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| GC never completes | High write load | Schedule during quiet periods |
| Disk keeps growing | GC not running | Check scheduler, run manually |
| "Segment not found" | Divergent state or transport/storage issue | Validate cluster consistency and segment transfer paths |
| Slow compaction | Large store | Use tail compaction, add RAM |
The Economics of GC
In Oak Chain, GC has economic implications:
- Validators pay for storage (disk costs)
- Authors pay for writes (ETH per write)
- GC is a validator cost, not author cost
This creates an incentive for validators to:
- Run efficient GC to minimize storage costs
- Price writes to cover long-term storage + GC overhead
- Encourage authors to avoid unnecessary content churn