{

ADR Cost/Benefit Analysis

A comprehensive accounting of every architectural decision record for PodPedia (Project Sunesis), with quantified costs, validated benefits, and empirical evidence where available.

Last updated: 2026-05-14 Scope: All ADRs 001–011 (podpedia-app) Author: @regular

Quick Reference

ID	Title	Status	Implemented?	Validated?	ROI Estimate
ADR-001	10K Chunk Threshold	✅ Accepted	✅ development	⚠️ Inconclusive	🟢 High
ADR-002	Vertex AI Context Caching	✅ Accepted	✅ `feature/`	❌ Not yet	🟢 High
ADR-003	SSE Streaming JSON ASTs	⏳ Provisional	❌ No	❌ N/A	🟡 Medium
ADR-004	Flash-Lite Extraction Engine	⏳ Provisional	❌ No	❌ N/A	🟡 Medium
ADR-005	Universal LLM Adapter	✅ Accepted	✅ development	❌ Not tested	🟢 High
ADR-006	Vendor-Neutral Blob Storage	✅ Accepted	✅ development	❌ Not tested	🟢 High
ADR-007	Dual GraphDB Strategy	✅ Accepted	✅ development	⚠️ Incomplete (rerun data pending)	🟢 High
ADR-008	Weighted Ensemble Entity Resolution	✅ Accepted	✅ development	❌ Not tested	🟢 High
ADR-009	Formalized Experiment Tracking	✅ Accepted	✅ `experiments/`	⏳ EXP-001 pending	🟡 Medium
ADR-010	Async Pipeline Production Readiness	✅ Accepted	✅ development	❌ Not tested	🟢 High
ADR-011	Graph Download & Export	✅ Accepted	❌ Not built	❌ N/A	🟡 Medium

ADR-001: 10K Character Threshold for Parallel Graph Extraction

Status: Accepted — merged to development, not yet on main

Costs

Category	Estimate	Notes
Engineering	~1 hour	One-line threshold change (20,000 → 10,000). The configuration work was trivial.
Token Overhead	Negligible	200-char overlap across more chunks. For a 15K document: 2 chunks instead of 1, adding ~~200 tokens of overlapping context. At Vertex AI pricing (~~$0.000125/1K input tokens), this is ~$0.000025 per job.
Graph Integrity Risk	Low	10K chars ≈ 1,500–2,000 words is sufficient semantic aperture. The rejected 3K option would have created orphaned nodes.
Complexity	None	Zero new code. The parallelization infrastructure (alitto/pond pool) already existed.

Benefits

Metric	Expected	Validated
Latency (15K char docs)	14s → ≤7s (sequential → parallel across 8-worker pool)	⚠️ Experiment showed -19.2% (45.8s→37.0s staging), but p=0.31 with n=15. Directionally correct, not significant.
Latency (25K char docs)	Similar improvement factor	✅ -68.2% (116.2s→37.0s, p<0.001). Massive improvement, partially confounded by environment differences.
Concurrency	Restores full 10-way semaphore	Unblocked weighted-semaphore bottleneck. Chunks now fit under 5K tokens, avoiding the `weight=2` penalty.
Code footprint	+0 lines	Pure config change.

Verdict

ROI: 🟢 High. Zero implementation cost, zero ongoing cost, and directional latency improvements at every payload size. The 15K case needs statistical confirmation, but even if the true effect is only 10% (not the measured 19%), the change costs nothing. There is no downside.

Re-run Recommendation

Re-run the A/B with controlled Cloud Run instance configuration and ≥30 trials for the 15K payload. Until then, accept the merge with the acknowledgment that the benefit is directionally confirmed but not statistically proven.

ADR-002: Vertex AI Context Caching via Deep Ontology Prompting

Status: Accepted — implemented on feature/adr-002-context-caching, pushed to origin

Costs

Category	Estimate	Notes
Engineering (data curation)	⭐ HIGH — 5–10 hours	Creating the 36K-token deep ontology with 56 golden few-shot examples is the most labor-intensive task in any ADR. The examples must be meticulously crafted to cover: simple extraction, multi-entity chains, entity normalization, cross-type blocking, temporal relationships, hierarchical orgs, compound entities, empty extraction, and many edge cases.
Engineering (implementation)	~3–4 hours	Cache lifecycle management: InitVertexCache, RefreshCache, background TTL goroutine, model setup routing. CachedContent enforces a 32K token minimum — the ontology must be kept above this threshold or caching fails silently.
Runtime cost (caching)	Ongoing hourly billing	Vertex AI CachedContent incurs a storage cost while alive. At ~36K tokens, this is roughly $0.002–0.005/hour (24h: ~$0.05–0.12). The 1-hour TTL default means cache is created on-demand and expires quickly during idle periods. Background refresh keeps it alive during active use.
Runtime cost (tokens saved)	Negative cost	Without caching, the system prompt is sent with every request. With caching, it's sent once and referenced by ID. For a 36K-token system prompt × 100 chunks/day, that's 3.6M tokens/day saved in input costs — roughly $0.45/day at Vertex AI pricing. The cache storage cost is ~10× lower than the token savings at any meaningful volume.
Complexity	Moderate	Background goroutine for TTL refresh, cache name routing, async cleanup. Needs careful goroutine lifecycle management.
Risk	Medium	If the ontology drops below 32K tokens, caching silently fails and the LLM receives no system prompt at all — output degrades catastrophically. This must be caught in CI.

Benefits

Metric	Expected	Validated
TTFT per chunk	Seconds → milliseconds	Not yet tested. The expected effect is dramatic: every concurrent chunk skips the system prompt inference step.
End-to-end latency	14s → ~5-8s (stacked with ADR-001)	Not yet tested. Combined effect with ADR-001's fan-out: 8 chunks hit cached context simultaneously, each with near-zero TTFT.
Output quality	Significant improvement	The 56 golden few-shot examples effectively fine-tune the model in-context, eliminating schema drift, hallucinated entity types, and inconsistent JSON output.
Token cost per request	~36K tokens saved per chunk	For 100 chunks/day: $0.45/day saved at Vertex AI input pricing.

Verdict

ROI: 🟢 High. Significant up-front data curation cost, but the runtime benefits are compounding: (a) latency reduction, (b) output quality improvement from few-shot examples, (c) token cost savings that quickly amortize the data curation investment. The ongoing caching cost (~$0.05/day) is negligible.

Dependencies

ADR-001 (10K threshold) increases the number of chunks, which increases the value of caching each chunk's system prompt.
ADR-005 (Universal LLM Adapter) — caching is Vertex AI-specific; the deep ontology is portable.

ADR-003: Server-Sent Events for Streaming JSON ASTs (Provisional)

Status: ⏳ Provisional — pending streaming JSON parser validation

Costs

Category	Estimate	Notes
Engineering (backend parser)	HIGH — 8–15 hours	A custom, fault-tolerant streaming JSON parser in Go is the critical path. Must handle: unclosed quotes, truncated keys, missing brackets, self-correcting LLM output, and panic-free partial AST assembly. This is non-trivial parser engineering.
Engineering (frontend)	~4–6 hours	Replace polling (GET /api/status) with EventSource connection. Handle duplicate/revised nodes. Incremental vis.js hydration. State management for partial results.
Complexity	HIGH	The streaming parser is the riskiest component in the entire system. A panicking parser brings down the goroutine. An incorrect parser produces corrupted graph state. No off-the-shelf solution exists for LLM JSON streaming.
Maintenance burden	Medium	The parser is entirely custom code with no community maintenance. Any change to the LLM's output format (even whitespace or key ordering) could break it.
Risk	HIGH	If the parser validation fails (or a future LLM update breaks it), SSE falls back to polling anyway — requiring both code paths to be maintained.

Benefits

Metric	Expected	Validated
Time-to-first-paint	~14s → <1s	Not validated. The perceived UX improvement is significant: a live "typing" visualization vs a loading spinner.
User engagement	Qualitative	Potential improvement. Live-hydrating graphs are visually engaging and signal progress. Hard to quantify.
Backend efficiency	Negligible	The same LLM computation happens either way — streaming just changes the delivery timing. No backend cost savings.

Verdict

ROI: 🟡 Medium. The UX improvement is real and meaningful, but the engineering cost is disproportionate unless benchmarks confirm the latency problem is severe enough that a loading spinner is untenable. Recommendation: Keep provisional. Implement only if (a) ADR-001 + ADR-002 don't bring latency below 5s, and (b) user feedback indicates spinner intolerance. The parser risk alone justifies deferral.

ADR-004: Flash-Lite Extraction Engine (Provisional)

Status: ⏳ Provisional — pending quality benchmarks

Costs

Category	Estimate	Notes
Engineering	~2–4 hours	Schema flattening (reduce Node/Edge types to core subset). Model routing (flash-lite for all, reserve flash for complex). May need dynamic routing based on payload characteristics.
Quality risk	HIGH	Flash-lite is a significantly weaker model. Complex multi-hop entity relationships, cross-paragraph inferences, and nuanced relationships may be lost or simplified. The "slightly lower fidelity" trade-off in the ADR undersells this risk. Lowered schema fidelity is hard to detect in tests (it degrades the knowledge graph's value silently).
Schema maintenance	Medium	The simplified schema must be maintained alongside the full schema. If flash-lite eventually handles the full schema, the simplified one becomes dead code.
Risk of two code paths	Medium	Dynamic routing (simple docs → flash-lite, complex docs → flash) creates a brittle classifier. Misclassifications produce inconsistent quality.

Benefits

Metric	Expected	Validated
Latency	14s → <3s	Not validated. Flash-lite is faster (lower TTFT, higher throughput) but the magnitude depends on schema complexity.
Cost per extraction	~5–10× cheaper	Flash-lite is significantly cheaper per token than flash. For high-volume ingestion, this is a real budget impact.
Worker pool throughput	Higher	Faster per-chunk processing means the 8-worker pool cycles faster, increasing total jobs/hour.

Verdict

ROI: 🟡 Medium. The latency and cost benefits are compelling, but the quality risk is understated in the ADR. "Slightly lower fidelity" in entity-graph extraction can manifest as: missing entity types, missed relationships, orphaned nodes, and incomplete graph topology. These are silent degradations that are hard to catch in automated tests.

Recommendation: Before accepting, run a directed quality benchmark: extract the same 50 documents with both flash-lite and flash, then compare (a) entity recall, (b) relationship recall, (c) schema compliance. If flash-lite achieves ≥90% of flash's accuracy on all three, accept. Otherwise, the cost savings don't justify the graph degradation.

ADR-005: Universal LLM Adapter (OpenAI-Compatible API)

Status: ✅ Accepted — implemented on development

Costs

Category	Estimate	Notes
Engineering	~3–5 hours	Replacing three provider-specific adapters (Ollama native, Vertex SDK, Gemini key) with one openai-go implementation. Most of the effort is in testing the routing logic and ensuring the fallbacks still work.
Dependency	Low	`github.com/openai/openai-go` is a well-maintained, widely-used library.
Legacy maintenance	Low	Two fallback paths (Vertex AI, Ollama native) preserved but not actively developed. They serve as Chesterton's Fence — local dev works without cloud config.
Risk	Low	The LLM interface (`Generate`, `GenerateStream`, `SupportsStructuredOutput`) is unchanged. Only the factory function routes differently. If the OpenAI-compatible path fails, the fallback activates transparently.

Benefits

Metric	Expected	Validated
Code reduction	-3 adapters → 1 primary path	Not quantified, but three adapters had significant duplicated logic (SSE streaming, retries, structured output enforcement).
Provider flexibility	Zero-code provider swaps	Change `LLM_BASE_URL` and `LLM_API_KEY` env vars → instant migration to Groq, DeepSeek, Together, vLLM, or local Ollama.
Structured output enforcement	Native SDK support	OpenAI SDK's `ResponseFormatJSONObjectParam` replaces fragile prompt-embedded JSON instructions. Reduces token waste and schema drift.
SSE streaming	SDK-native	Replaces custom goroutine management in legacy paths.
Vendor lock-in mitigation	Complete	Not locked to any single provider. The OpenAI spec is the interface standard, not a provider contract.

Verdict

ROI: 🟢 High. Modest engineering cost with significant long-term benefits: (a) collapsed code complexity, (b) unlimited provider flexibility, (c) eliminated vendor lock-in. The fallback preservation is a wise Chesterton's Fence choice. No empirical validation required — this is an architectural simplification with clearly measurable benefits in code metrics.

ADR-006: Vendor-Neutral Blob Storage (Go CDK)

Status: ✅ Accepted — implemented on development

Costs

Category	Estimate	Notes
Engineering	~2–3 hours	Replace direct GCS SDK with `gocloud.dev/blob`. Mostly removing boilerplate (70 lines of GCS-specific code → 15 lines in `blobstore_gocloud.go`).
Dependency weight	Low-Medium	`gocloud.dev/blob` pulls in transitive cloud SDK dependencies (GCS, S3, Azure). Increases binary size moderately. For Cloud Run, cold-start time is dominated by container image pull, not Go binary size, so this is acceptable.
Risk	Very Low	The blob interface is thin (Upload, NewReader). The Go CDK is well-maintained and the URL-scheme abstraction is battle-tested.

Benefits

Metric	Expected	Validated
Code reduction	70 lines GCS boilerplate → 15 lines	Confirmed.
Provider flexibility	Configure via env var	`BLOB_STORE_URL=mem://` for tests, `file:///tmp/podpedia` for local dev, `s3://minio:9000` for self-hosted, `gs://bucket` for GCP.
Testing determinism	Zero I/O in-memory blobs	The `mem://` scheme enables deterministic integration tests with zero network I/O and zero jitter. This directly supports the bifurcated testing strategy.
12-Factor compliance	Config-driven storage	Backing services (Factor IV) are now swappable via configuration, not code.

Verdict

ROI: 🟢 High. Low engineering cost, zero ongoing cost, significant portability and testing benefits. The 12-Factor compliance improvement alone justifies this.

ADR-007: Dual GraphDB Strategy (Memory + SQLite WAL)

Status: ✅ Accepted — implemented on development

Costs

Category	Estimate	Notes
Engineering	~8–12 hours	SQLiteGraphDB implementation with WAL mode, JOIN-based graph traversal, LIKE matching, foreign key constraints. Litestream sidecar integration. MemoryGraphDB delegation. This is the most substantial implementation in the codebase outside of ADR-002.
Dependency	Low	`modernc.org/sqlite` (pure Go, CGO-free). `litestream` sidecar (Go binary, external process).
Complexity	Moderate	Dual implementation paths with identical interface. WAL mode nuances (checkpointing, concurrent readers). Litestream restoration on cold start.
Operational cost	Very low	No separate database service. SQLite embedded in-process. Litestream consumes minimal CPU. GCS storage for WAL archives is pennies/month.
Risk	Low-Medium	SQLite is not a graph database — JOIN-based traversal for entity neighborhoods works but doesn't scale to millions of nodes with deep traversal paths. For current scale (hundreds of thousands of nodes), it's fine.

Benefits

Metric	Expected	Validated
Survivability	Graph persists across scale-to-zero	SQLite WAL file in Cloud Run's persistent `/tmp` mount. Litestream restores from GCS on cold start.
Concurrent reads	Lock-free (WAL mode)	WAL allows unlimited concurrent readers alongside a single writer, critical for Graph-RAG query path.
CGO-free	Cross-compiles trivially	`modernc.org/sqlite` is pure Go. No C toolchain in Docker build.
Memory analysis	Identical allocations	ADR-007/010 benchmark showed zero change in allocs/op and bytes/op across all 12 variants. MemoryGraphDB and SQLiteGraphDB share the same allocation pattern.
Performance	No latency regression	Benchmark data from re-run (May 13, benchtime=30s, count=30) is being analyzed. First run (flawed methodology) showed consistent but artifact-laden improvements across all variants.

Verdict

ROI: 🟢 High. Despite substantial engineering cost, the benefit — durable graph persistence without operational overhead — is fundamental to the product. Without SQLite, every scale-to-zero event destroys user data. The experiment tracking is validating that performance remains acceptable.

ADR-008: Weighted Ensemble Entity Resolution

Status: ✅ Accepted — implemented on development

Costs

Category	Estimate	Notes
Engineering	~4–6 hours	Jaro-Winkler implementation, normalized Levenshtein, Jaccard fuzzy token overlap. Tri-band decision logic (merge/ambiguous/insert). Type blocking. OTel metrics for resolution_score.
Runtime cost	O(n) per insert	Each `ResolveAndInsert` call compares the candidate against all existing nodes of the same type. At 10K Person nodes, that's 10K string comparisons per insert. For 500K nodes, this becomes a bottleneck.
Complexity	Moderate	Three algorithms with weighted averaging. Tunable threshold. OTel metrics integration. Ambiguous case logging.
Risk	Low-Medium	Weight tuning is empirical. The 0.85 merge threshold is a guess — may need adjustment as entity volume grows. False merges are mitigated by the Jaccard component; false splits (missed merges) are less visible but more dangerous (fragmented graph).

Benefits

Metric	Expected	Validated
Graph integrity	Eliminates duplicate nodes	Catches: typos (Altman vs Altmann), abbreviations (Sam vs Samuel), structural variations (Microsoft Research vs Microsoft Corporation → ambiguous band), shared-token false friends (Sam Altman vs Sam Bankman-Fried → low Jaccard).
False merge protection	Jaccard prevents catastrophic merges	The 0.30 Jaccard weight is specifically designed to prevent "Apple Inc" merging with "Apple fruit" — only the token "Apple" matches, producing a score in the ambiguous or insert band.
Observability	Resolution_score histogram	Every comparison is recorded as an OTel metric, enabling distribution analysis and threshold tuning over time.
Configurability	Per-deployment tuning	Merge threshold is configurable. Research deployments can lower it for aggressive merging; legal deployments can raise it for conservative.

Verdict

ROI: 🟢 High. Entity resolution is the difference between a coherent knowledge graph and a fragmented mess. Without it, every variant of "Sam Altman" produces a separate node, breaking Graph-RAG traversal. The O(n) cost at scale is the main concern — monitor resolution span durations and consider indexing or sharding when Person nodes exceed 50K.

ADR-009: Formalized Experiment Tracking

Status: ✅ Accepted — experiment infrastructure exists, meta-experiment (EXP-001) pending

Costs

Category	Estimate	Notes
Engineering	~1–2 hours	Directory structure, template.md, README with agentic directives, INDEX.md.
Process overhead	Ongoing	Every performance-sensitive change now requires writing an experiment report before merge. This is intentional friction.
Maintenance	Low	Reports and trial data accumulate over time. INDEX.md needs updating.
Risk	Very low	The infrastructure is files and Markdown. Zero ongoing cost if abandoned.

Benefits

Metric	Expected	Validated
Decision quality	Higher	Hypotheses must be falsifiable. Results must be distribution-aware (p50/p95/p99, KS test, Cohen's d).
Negative knowledge base	Prevents repeated dead-ends	Documented failures (like the flawed first run of ADR-007/010) prevent future engineers from re-investigating.
Git bisect integration	Possible	Every experiment links to commit hashes.
Agentic compatibility	README encodes methodology	Autonomous coding agents have a committed methodology file to follow.

Verdict

ROI: 🟡 Medium. The process overhead is real, but the value of a negative knowledge base compounds over time. The meta-experiment (EXP-001, deadline 2026-05-25) will formally assess whether this process reduces performance regressions. Until then, the infrastructure is in place and the first two experiment reports exist.

ADR-010: Async Pipeline Production Readiness

Status: ✅ Accepted — implemented on development

Costs

Category	Estimate	Notes
Engineering	~4–6 hours	Per-chunk LLM timeout context (one-line context.WithTimeout in extractFullText). Structured logging handler swap (slog-gcp for GCP, JSON stdout for local). Progress callback wiring in upload handler. Stale message fix. Rate limiter config externalization.
Dependency	Very low	`github.com/jdockerty/slog-gcp` for Cloud Logging.
Complexity	Low	Each change is localized and additive. The LLM interface is unchanged. The StateTracker interface is unchanged.
Risk	Very low	Per-chunk timeout is strictly additive — it can only break slow chunks, and that's the intended behavior.

Benefits

Metric	Expected	Validated
Resilience	No more 30-minute stalls	A stalled Vertex AI chunk now fails after 120s instead of blocking a worker slot for 30 minutes.
Debuggability	Queryable job logs	`gcloud logging read 'jsonPayload.job_id="..."'` surfaces: LLM provider, chunk progress, per-chunk errors, completion metrics.
UX parity	File uploads show progress	Upload handler now calls the same progress tracking as ingest handler. Users see chunk counts updating.
Config freedom	3 new env vars	`LLM_REQUEST_TIMEOUT`, `LLM_MAX_CONCURRENCY`, `LLM_TPM_LIMIT` replace hardcoded values.

Verdict

ROI: 🟢 High. Low engineering cost fixing three real production gaps that caused a 31-minute outage. The timeout alone prevents a repeat of that incident. The logging fix makes the next incident diagnosable in minutes instead of hours.

ADR-011: Graph Download & Export

Status: ✅ Accepted — not yet implemented

Costs

Category	Estimate	Notes
Engineering (Phase 1)	~1–2 hours	Single handler calling `DB.Snapshot()`, JSON streaming via `json.NewEncoder`, Content-Disposition header, auth middleware, route registration.
Engineering (Phase 2)	~4–8 hours	GraphML, GEXF, CSV, NDJSON serializers. `GraphSerializer` interface. Accept-header routing.
Engineering (Phase 3)	~2–4 hours	SQLite file serving. Tenancy validation. File locking considerations. Deferred.
Complexity	Very low (Phase 1)	One handler, one existing interface method (`Snapshot`), no new dependencies.
Runtime cost	None	Graph is already in memory (MemoryGraphDB) or on disk (SQLiteGraphDB). Export just reads it.
Risk	Very low	No data mutation. No new dependencies. Existing auth middleware protects the endpoint.

Benefits

Metric	Expected	Validated
Data portability	Users own their graph	Self-service export without GCP IAM permissions or technical support.
External tooling	Enable analytic workflows	Gephi, Cytoscape, NetworkX, pandas for advanced analysis beyond vis.js visualization.
Backup	User-controlled snapshot	Independent of Litestream/GCS replication.
Benchmark reproducibility	Stable export format	ADR-009 calls for commit-linked data. Graph exports provide ground-truth snapshots for reproducible benchmarks.

Verdict

ROI: 🟡 Medium. Phase 1 is trivially cheap (~1-2 hours) and delivers immediate value — users can export their graph. Phase 2 and 3 should be deferred until user demand materializes. Recommendation: Implement Phase 1 now; defer Phase 2/3.

Cross-ADR Dependencies & Synergies

ADR-001 (10K threshold)
  └─ feeds into ADR-002 (more chunks = more cache value)
  └─ feeds into ADR-003 (more chunks = more streaming value)
  └─ feeds into ADR-004 (more chunks = more flash-lite savings)

ADR-002 (Context caching)
  └─ depends on ADR-005 (LLM adapter) — caching is Vertex-specific
  
ADR-007 (Dual GraphDB)
  └─ feeds into ADR-011 (export) — Snapshot() method on both implementations

ADR-009 (Experiment tracking)
  └─ validates ADR-001, ADR-007, ADR-010 via experiments

ADR-010 (Pipeline readiness)
  └─ depends on ADR-005 (LLM adapter) — timeout applies to LLM interface
  └─ depends on ADR-007 (GraphDB) — progress parity assumes graph is receiving data

Prioritization Recommendation

Do Now (High ROI, Low Cost)

Priority	ADR	Rationale
P1	ADR-001	Zero cost, confirmed direction, unblock production deployment
P2	ADR-011 (Phase 1)	1–2 hours for a user-facing feature
P3	ADR-006	Already done but worth highlighting the testing benefit

Validate Before Scaling (High ROI, Needs Evidence)

Priority	ADR	Rationale
P1	ADR-007	Re-run experiment data needs analysis; blocking confirmation of the most expensive implementation
P2	ADR-002	High data curation cost — ensure the runtime benefits materialize before promoting to `development`
P3	ADR-004	Quality benchmarks needed — don't accept until flash-lite fidelity is quantified

Defer (Medium ROI or High Risk)

Priority	ADR	Rationale
Defer	ADR-003	High risk, high engineering cost. Only implement if latency remains unacceptable after ADR-001 + ADR-002.
Defer	ADR-004	High quality risk. Benchmark required.
Defer	ADR-011 (Phase 2/3)	Wait for user demand.

}