ADR Cost/Benefit Analysis
A comprehensive accounting of every architectural decision record for PodPedia (Project Sunesis), with quantified costs, validated benefits, and empirical evidence where available.
Last updated: 2026-05-14 Scope: All ADRs 001–011 (podpedia-app) Author: @regular
Quick Reference
| ID | Title | Status | Implemented? | Validated? | ROI Estimate |
|---|---|---|---|---|---|
| ADR-001 | 10K Chunk Threshold | ✅ Accepted | ✅ development | ⚠️ Inconclusive | 🟢 High |
| ADR-002 | Vertex AI Context Caching | ✅ Accepted | ✅ feature/ |
❌ Not yet | 🟢 High |
| ADR-003 | SSE Streaming JSON ASTs | ⏳ Provisional | ❌ No | ❌ N/A | 🟡 Medium |
| ADR-004 | Flash-Lite Extraction Engine | ⏳ Provisional | ❌ No | ❌ N/A | 🟡 Medium |
| ADR-005 | Universal LLM Adapter | ✅ Accepted | ✅ development | ❌ Not tested | 🟢 High |
| ADR-006 | Vendor-Neutral Blob Storage | ✅ Accepted | ✅ development | ❌ Not tested | 🟢 High |
| ADR-007 | Dual GraphDB Strategy | ✅ Accepted | ✅ development | ⚠️ Incomplete (rerun data pending) | 🟢 High |
| ADR-008 | Weighted Ensemble Entity Resolution | ✅ Accepted | ✅ development | ❌ Not tested | 🟢 High |
| ADR-009 | Formalized Experiment Tracking | ✅ Accepted | ✅ experiments/ |
⏳ EXP-001 pending | 🟡 Medium |
| ADR-010 | Async Pipeline Production Readiness | ✅ Accepted | ✅ development | ❌ Not tested | 🟢 High |
| ADR-011 | Graph Download & Export | ✅ Accepted | ❌ Not built | ❌ N/A | 🟡 Medium |
ADR-001: 10K Character Threshold for Parallel Graph Extraction
Status: Accepted — merged to development, not yet on main
Costs
| Category | Estimate | Notes |
|---|---|---|
| Engineering | ~1 hour | One-line threshold change (20,000 → 10,000). The configuration work was trivial. |
| Token Overhead | Negligible | 200-char overlap across more chunks. For a 15K document: 2 chunks instead of 1, adding |
| Graph Integrity Risk | Low | 10K chars ≈ 1,500–2,000 words is sufficient semantic aperture. The rejected 3K option would have created orphaned nodes. |
| Complexity | None | Zero new code. The parallelization infrastructure (alitto/pond pool) already existed. |
Benefits
| Metric | Expected | Validated |
|---|---|---|
| Latency (15K char docs) | 14s → ≤7s (sequential → parallel across 8-worker pool) | ⚠️ Experiment showed -19.2% (45.8s→37.0s staging), but p=0.31 with n=15. Directionally correct, not significant. |
| Latency (25K char docs) | Similar improvement factor | ✅ -68.2% (116.2s→37.0s, p<0.001). Massive improvement, partially confounded by environment differences. |
| Concurrency | Restores full 10-way semaphore | Unblocked weighted-semaphore bottleneck. Chunks now fit under 5K tokens, avoiding the weight=2 penalty. |
| Code footprint | +0 lines | Pure config change. |
Verdict
ROI: 🟢 High. Zero implementation cost, zero ongoing cost, and directional latency improvements at every payload size. The 15K case needs statistical confirmation, but even if the true effect is only 10% (not the measured 19%), the change costs nothing. There is no downside.
Re-run Recommendation
Re-run the A/B with controlled Cloud Run instance configuration and ≥30 trials for the 15K payload. Until then, accept the merge with the acknowledgment that the benefit is directionally confirmed but not statistically proven.
ADR-002: Vertex AI Context Caching via Deep Ontology Prompting
Status: Accepted — implemented on feature/adr-002-context-caching, pushed to origin
Costs
| Category | Estimate | Notes |
|---|---|---|
| Engineering (data curation) | ⭐ HIGH — 5–10 hours | Creating the 36K-token deep ontology with 56 golden few-shot examples is the most labor-intensive task in any ADR. The examples must be meticulously crafted to cover: simple extraction, multi-entity chains, entity normalization, cross-type blocking, temporal relationships, hierarchical orgs, compound entities, empty extraction, and many edge cases. |
| Engineering (implementation) | ~3–4 hours | Cache lifecycle management: InitVertexCache, RefreshCache, background TTL goroutine, model setup routing. CachedContent enforces a 32K token minimum — the ontology must be kept above this threshold or caching fails silently. |
| Runtime cost (caching) | Ongoing hourly billing | Vertex AI CachedContent incurs a storage cost while alive. At ~36K tokens, this is roughly $0.002–0.005/hour (24h: ~$0.05–0.12). The 1-hour TTL default means cache is created on-demand and expires quickly during idle periods. Background refresh keeps it alive during active use. |
| Runtime cost (tokens saved) | Negative cost | Without caching, the system prompt is sent with every request. With caching, it's sent once and referenced by ID. For a 36K-token system prompt × 100 chunks/day, that's 3.6M tokens/day saved in input costs — roughly $0.45/day at Vertex AI pricing. The cache storage cost is ~10× lower than the token savings at any meaningful volume. |
| Complexity | Moderate | Background goroutine for TTL refresh, cache name routing, async cleanup. Needs careful goroutine lifecycle management. |
| Risk | Medium | If the ontology drops below 32K tokens, caching silently fails and the LLM receives no system prompt at all — output degrades catastrophically. This must be caught in CI. |
Benefits
| Metric | Expected | Validated |
|---|---|---|
| TTFT per chunk | Seconds → milliseconds | Not yet tested. The expected effect is dramatic: every concurrent chunk skips the system prompt inference step. |
| End-to-end latency | 14s → ~5-8s (stacked with ADR-001) | Not yet tested. Combined effect with ADR-001's fan-out: 8 chunks hit cached context simultaneously, each with near-zero TTFT. |
| Output quality | Significant improvement | The 56 golden few-shot examples effectively fine-tune the model in-context, eliminating schema drift, hallucinated entity types, and inconsistent JSON output. |
| Token cost per request | ~36K tokens saved per chunk | For 100 chunks/day: $0.45/day saved at Vertex AI input pricing. |
Verdict
ROI: 🟢 High. Significant up-front data curation cost, but the runtime benefits are compounding: (a) latency reduction, (b) output quality improvement from few-shot examples, (c) token cost savings that quickly amortize the data curation investment. The ongoing caching cost (~$0.05/day) is negligible.
Dependencies
- ADR-001 (10K threshold) increases the number of chunks, which increases the value of caching each chunk's system prompt.
- ADR-005 (Universal LLM Adapter) — caching is Vertex AI-specific; the deep ontology is portable.
ADR-003: Server-Sent Events for Streaming JSON ASTs (Provisional)
Status: ⏳ Provisional — pending streaming JSON parser validation
Costs
| Category | Estimate | Notes |
|---|---|---|
| Engineering (backend parser) | HIGH — 8–15 hours | A custom, fault-tolerant streaming JSON parser in Go is the critical path. Must handle: unclosed quotes, truncated keys, missing brackets, self-correcting LLM output, and panic-free partial AST assembly. This is non-trivial parser engineering. |
| Engineering (frontend) | ~4–6 hours | Replace polling (GET /api/status) with EventSource connection. Handle duplicate/revised nodes. Incremental vis.js hydration. State management for partial results. |
| Complexity | HIGH | The streaming parser is the riskiest component in the entire system. A panicking parser brings down the goroutine. An incorrect parser produces corrupted graph state. No off-the-shelf solution exists for LLM JSON streaming. |
| Maintenance burden | Medium | The parser is entirely custom code with no community maintenance. Any change to the LLM's output format (even whitespace or key ordering) could break it. |
| Risk | HIGH | If the parser validation fails (or a future LLM update breaks it), SSE falls back to polling anyway — requiring both code paths to be maintained. |
Benefits
| Metric | Expected | Validated |
|---|---|---|
| Time-to-first-paint | ~14s → <1s | Not validated. The perceived UX improvement is significant: a live "typing" visualization vs a loading spinner. |
| User engagement | Qualitative | Potential improvement. Live-hydrating graphs are visually engaging and signal progress. Hard to quantify. |
| Backend efficiency | Negligible | The same LLM computation happens either way — streaming just changes the delivery timing. No backend cost savings. |
Verdict
ROI: 🟡 Medium. The UX improvement is real and meaningful, but the engineering cost is disproportionate unless benchmarks confirm the latency problem is severe enough that a loading spinner is untenable. Recommendation: Keep provisional. Implement only if (a) ADR-001 + ADR-002 don't bring latency below 5s, and (b) user feedback indicates spinner intolerance. The parser risk alone justifies deferral.
ADR-004: Flash-Lite Extraction Engine (Provisional)
Status: ⏳ Provisional — pending quality benchmarks
Costs
| Category | Estimate | Notes |
|---|---|---|
| Engineering | ~2–4 hours | Schema flattening (reduce Node/Edge types to core subset). Model routing (flash-lite for all, reserve flash for complex). May need dynamic routing based on payload characteristics. |
| Quality risk | HIGH | Flash-lite is a significantly weaker model. Complex multi-hop entity relationships, cross-paragraph inferences, and nuanced relationships may be lost or simplified. The "slightly lower fidelity" trade-off in the ADR undersells this risk. Lowered schema fidelity is hard to detect in tests (it degrades the knowledge graph's value silently). |
| Schema maintenance | Medium | The simplified schema must be maintained alongside the full schema. If flash-lite eventually handles the full schema, the simplified one becomes dead code. |
| Risk of two code paths | Medium | Dynamic routing (simple docs → flash-lite, complex docs → flash) creates a brittle classifier. Misclassifications produce inconsistent quality. |
Benefits
| Metric | Expected | Validated |
|---|---|---|
| Latency | 14s → <3s | Not validated. Flash-lite is faster (lower TTFT, higher throughput) but the magnitude depends on schema complexity. |
| Cost per extraction | ~5–10× cheaper | Flash-lite is significantly cheaper per token than flash. For high-volume ingestion, this is a real budget impact. |
| Worker pool throughput | Higher | Faster per-chunk processing means the 8-worker pool cycles faster, increasing total jobs/hour. |
Verdict
ROI: 🟡 Medium. The latency and cost benefits are compelling, but the quality risk is understated in the ADR. "Slightly lower fidelity" in entity-graph extraction can manifest as: missing entity types, missed relationships, orphaned nodes, and incomplete graph topology. These are silent degradations that are hard to catch in automated tests.
Recommendation: Before accepting, run a directed quality benchmark: extract the same 50 documents with both flash-lite and flash, then compare (a) entity recall, (b) relationship recall, (c) schema compliance. If flash-lite achieves ≥90% of flash's accuracy on all three, accept. Otherwise, the cost savings don't justify the graph degradation.
ADR-005: Universal LLM Adapter (OpenAI-Compatible API)
Status: ✅ Accepted — implemented on development
Costs
| Category | Estimate | Notes |
|---|---|---|
| Engineering | ~3–5 hours | Replacing three provider-specific adapters (Ollama native, Vertex SDK, Gemini key) with one openai-go implementation. Most of the effort is in testing the routing logic and ensuring the fallbacks still work. |
| Dependency | Low | github.com/openai/openai-go is a well-maintained, widely-used library. |
| Legacy maintenance | Low | Two fallback paths (Vertex AI, Ollama native) preserved but not actively developed. They serve as Chesterton's Fence — local dev works without cloud config. |
| Risk | Low | The LLM interface (Generate, GenerateStream, SupportsStructuredOutput) is unchanged. Only the factory function routes differently. If the OpenAI-compatible path fails, the fallback activates transparently. |
Benefits
| Metric | Expected | Validated |
|---|---|---|
| Code reduction | -3 adapters → 1 primary path | Not quantified, but three adapters had significant duplicated logic (SSE streaming, retries, structured output enforcement). |
| Provider flexibility | Zero-code provider swaps | Change LLM_BASE_URL and LLM_API_KEY env vars → instant migration to Groq, DeepSeek, Together, vLLM, or local Ollama. |
| Structured output enforcement | Native SDK support | OpenAI SDK's ResponseFormatJSONObjectParam replaces fragile prompt-embedded JSON instructions. Reduces token waste and schema drift. |
| SSE streaming | SDK-native | Replaces custom goroutine management in legacy paths. |
| Vendor lock-in mitigation | Complete | Not locked to any single provider. The OpenAI spec is the interface standard, not a provider contract. |
Verdict
ROI: 🟢 High. Modest engineering cost with significant long-term benefits: (a) collapsed code complexity, (b) unlimited provider flexibility, (c) eliminated vendor lock-in. The fallback preservation is a wise Chesterton's Fence choice. No empirical validation required — this is an architectural simplification with clearly measurable benefits in code metrics.
ADR-006: Vendor-Neutral Blob Storage (Go CDK)
Status: ✅ Accepted — implemented on development
Costs
| Category | Estimate | Notes |
|---|---|---|
| Engineering | ~2–3 hours | Replace direct GCS SDK with gocloud.dev/blob. Mostly removing boilerplate (70 lines of GCS-specific code → 15 lines in blobstore_gocloud.go). |
| Dependency weight | Low-Medium | gocloud.dev/blob pulls in transitive cloud SDK dependencies (GCS, S3, Azure). Increases binary size moderately. For Cloud Run, cold-start time is dominated by container image pull, not Go binary size, so this is acceptable. |
| Risk | Very Low | The blob interface is thin (Upload, NewReader). The Go CDK is well-maintained and the URL-scheme abstraction is battle-tested. |
Benefits
| Metric | Expected | Validated |
|---|---|---|
| Code reduction | 70 lines GCS boilerplate → 15 lines | Confirmed. |
| Provider flexibility | Configure via env var | BLOB_STORE_URL=mem:// for tests, file:///tmp/podpedia for local dev, s3://minio:9000 for self-hosted, gs://bucket for GCP. |
| Testing determinism | Zero I/O in-memory blobs | The mem:// scheme enables deterministic integration tests with zero network I/O and zero jitter. This directly supports the bifurcated testing strategy. |
| 12-Factor compliance | Config-driven storage | Backing services (Factor IV) are now swappable via configuration, not code. |
Verdict
ROI: 🟢 High. Low engineering cost, zero ongoing cost, significant portability and testing benefits. The 12-Factor compliance improvement alone justifies this.
ADR-007: Dual GraphDB Strategy (Memory + SQLite WAL)
Status: ✅ Accepted — implemented on development
Costs
| Category | Estimate | Notes |
|---|---|---|
| Engineering | ~8–12 hours | SQLiteGraphDB implementation with WAL mode, JOIN-based graph traversal, LIKE matching, foreign key constraints. Litestream sidecar integration. MemoryGraphDB delegation. This is the most substantial implementation in the codebase outside of ADR-002. |
| Dependency | Low | modernc.org/sqlite (pure Go, CGO-free). litestream sidecar (Go binary, external process). |
| Complexity | Moderate | Dual implementation paths with identical interface. WAL mode nuances (checkpointing, concurrent readers). Litestream restoration on cold start. |
| Operational cost | Very low | No separate database service. SQLite embedded in-process. Litestream consumes minimal CPU. GCS storage for WAL archives is pennies/month. |
| Risk | Low-Medium | SQLite is not a graph database — JOIN-based traversal for entity neighborhoods works but doesn't scale to millions of nodes with deep traversal paths. For current scale (hundreds of thousands of nodes), it's fine. |
Benefits
| Metric | Expected | Validated |
|---|---|---|
| Survivability | Graph persists across scale-to-zero | SQLite WAL file in Cloud Run's persistent /tmp mount. Litestream restores from GCS on cold start. |
| Concurrent reads | Lock-free (WAL mode) | WAL allows unlimited concurrent readers alongside a single writer, critical for Graph-RAG query path. |
| CGO-free | Cross-compiles trivially | modernc.org/sqlite is pure Go. No C toolchain in Docker build. |
| Memory analysis | Identical allocations | ADR-007/010 benchmark showed zero change in allocs/op and bytes/op across all 12 variants. MemoryGraphDB and SQLiteGraphDB share the same allocation pattern. |
| Performance | No latency regression | Benchmark data from re-run (May 13, benchtime=30s, count=30) is being analyzed. First run (flawed methodology) showed consistent but artifact-laden improvements across all variants. |
Verdict
ROI: 🟢 High. Despite substantial engineering cost, the benefit — durable graph persistence without operational overhead — is fundamental to the product. Without SQLite, every scale-to-zero event destroys user data. The experiment tracking is validating that performance remains acceptable.
ADR-008: Weighted Ensemble Entity Resolution
Status: ✅ Accepted — implemented on development
Costs
| Category | Estimate | Notes |
|---|---|---|
| Engineering | ~4–6 hours | Jaro-Winkler implementation, normalized Levenshtein, Jaccard fuzzy token overlap. Tri-band decision logic (merge/ambiguous/insert). Type blocking. OTel metrics for resolution_score. |
| Runtime cost | O(n) per insert | Each ResolveAndInsert call compares the candidate against all existing nodes of the same type. At 10K Person nodes, that's 10K string comparisons per insert. For 500K nodes, this becomes a bottleneck. |
| Complexity | Moderate | Three algorithms with weighted averaging. Tunable threshold. OTel metrics integration. Ambiguous case logging. |
| Risk | Low-Medium | Weight tuning is empirical. The 0.85 merge threshold is a guess — may need adjustment as entity volume grows. False merges are mitigated by the Jaccard component; false splits (missed merges) are less visible but more dangerous (fragmented graph). |
Benefits
| Metric | Expected | Validated |
|---|---|---|
| Graph integrity | Eliminates duplicate nodes | Catches: typos (Altman vs Altmann), abbreviations (Sam vs Samuel), structural variations (Microsoft Research vs Microsoft Corporation → ambiguous band), shared-token false friends (Sam Altman vs Sam Bankman-Fried → low Jaccard). |
| False merge protection | Jaccard prevents catastrophic merges | The 0.30 Jaccard weight is specifically designed to prevent "Apple Inc" merging with "Apple fruit" — only the token "Apple" matches, producing a score in the ambiguous or insert band. |
| Observability | Resolution_score histogram | Every comparison is recorded as an OTel metric, enabling distribution analysis and threshold tuning over time. |
| Configurability | Per-deployment tuning | Merge threshold is configurable. Research deployments can lower it for aggressive merging; legal deployments can raise it for conservative. |
Verdict
ROI: 🟢 High. Entity resolution is the difference between a coherent knowledge graph and a fragmented mess. Without it, every variant of "Sam Altman" produces a separate node, breaking Graph-RAG traversal. The O(n) cost at scale is the main concern — monitor resolution span durations and consider indexing or sharding when Person nodes exceed 50K.
ADR-009: Formalized Experiment Tracking
Status: ✅ Accepted — experiment infrastructure exists, meta-experiment (EXP-001) pending
Costs
| Category | Estimate | Notes |
|---|---|---|
| Engineering | ~1–2 hours | Directory structure, template.md, README with agentic directives, INDEX.md. |
| Process overhead | Ongoing | Every performance-sensitive change now requires writing an experiment report before merge. This is intentional friction. |
| Maintenance | Low | Reports and trial data accumulate over time. INDEX.md needs updating. |
| Risk | Very low | The infrastructure is files and Markdown. Zero ongoing cost if abandoned. |
Benefits
| Metric | Expected | Validated |
|---|---|---|
| Decision quality | Higher | Hypotheses must be falsifiable. Results must be distribution-aware (p50/p95/p99, KS test, Cohen's d). |
| Negative knowledge base | Prevents repeated dead-ends | Documented failures (like the flawed first run of ADR-007/010) prevent future engineers from re-investigating. |
| Git bisect integration | Possible | Every experiment links to commit hashes. |
| Agentic compatibility | README encodes methodology | Autonomous coding agents have a committed methodology file to follow. |
Verdict
ROI: 🟡 Medium. The process overhead is real, but the value of a negative knowledge base compounds over time. The meta-experiment (EXP-001, deadline 2026-05-25) will formally assess whether this process reduces performance regressions. Until then, the infrastructure is in place and the first two experiment reports exist.
ADR-010: Async Pipeline Production Readiness
Status: ✅ Accepted — implemented on development
Costs
| Category | Estimate | Notes |
|---|---|---|
| Engineering | ~4–6 hours | Per-chunk LLM timeout context (one-line context.WithTimeout in extractFullText). Structured logging handler swap (slog-gcp for GCP, JSON stdout for local). Progress callback wiring in upload handler. Stale message fix. Rate limiter config externalization. |
| Dependency | Very low | github.com/jdockerty/slog-gcp for Cloud Logging. |
| Complexity | Low | Each change is localized and additive. The LLM interface is unchanged. The StateTracker interface is unchanged. |
| Risk | Very low | Per-chunk timeout is strictly additive — it can only break slow chunks, and that's the intended behavior. |
Benefits
| Metric | Expected | Validated |
|---|---|---|
| Resilience | No more 30-minute stalls | A stalled Vertex AI chunk now fails after 120s instead of blocking a worker slot for 30 minutes. |
| Debuggability | Queryable job logs | gcloud logging read 'jsonPayload.job_id="..."' surfaces: LLM provider, chunk progress, per-chunk errors, completion metrics. |
| UX parity | File uploads show progress | Upload handler now calls the same progress tracking as ingest handler. Users see chunk counts updating. |
| Config freedom | 3 new env vars | LLM_REQUEST_TIMEOUT, LLM_MAX_CONCURRENCY, LLM_TPM_LIMIT replace hardcoded values. |
Verdict
ROI: 🟢 High. Low engineering cost fixing three real production gaps that caused a 31-minute outage. The timeout alone prevents a repeat of that incident. The logging fix makes the next incident diagnosable in minutes instead of hours.
ADR-011: Graph Download & Export
Status: ✅ Accepted — not yet implemented
Costs
| Category | Estimate | Notes |
|---|---|---|
| Engineering (Phase 1) | ~1–2 hours | Single handler calling DB.Snapshot(), JSON streaming via json.NewEncoder, Content-Disposition header, auth middleware, route registration. |
| Engineering (Phase 2) | ~4–8 hours | GraphML, GEXF, CSV, NDJSON serializers. GraphSerializer interface. Accept-header routing. |
| Engineering (Phase 3) | ~2–4 hours | SQLite file serving. Tenancy validation. File locking considerations. Deferred. |
| Complexity | Very low (Phase 1) | One handler, one existing interface method (Snapshot), no new dependencies. |
| Runtime cost | None | Graph is already in memory (MemoryGraphDB) or on disk (SQLiteGraphDB). Export just reads it. |
| Risk | Very low | No data mutation. No new dependencies. Existing auth middleware protects the endpoint. |
Benefits
| Metric | Expected | Validated |
|---|---|---|
| Data portability | Users own their graph | Self-service export without GCP IAM permissions or technical support. |
| External tooling | Enable analytic workflows | Gephi, Cytoscape, NetworkX, pandas for advanced analysis beyond vis.js visualization. |
| Backup | User-controlled snapshot | Independent of Litestream/GCS replication. |
| Benchmark reproducibility | Stable export format | ADR-009 calls for commit-linked data. Graph exports provide ground-truth snapshots for reproducible benchmarks. |
Verdict
ROI: 🟡 Medium. Phase 1 is trivially cheap (~1-2 hours) and delivers immediate value — users can export their graph. Phase 2 and 3 should be deferred until user demand materializes. Recommendation: Implement Phase 1 now; defer Phase 2/3.
Cross-ADR Dependencies & Synergies
ADR-001 (10K threshold)
└─ feeds into ADR-002 (more chunks = more cache value)
└─ feeds into ADR-003 (more chunks = more streaming value)
└─ feeds into ADR-004 (more chunks = more flash-lite savings)
ADR-002 (Context caching)
└─ depends on ADR-005 (LLM adapter) — caching is Vertex-specific
ADR-007 (Dual GraphDB)
└─ feeds into ADR-011 (export) — Snapshot() method on both implementations
ADR-009 (Experiment tracking)
└─ validates ADR-001, ADR-007, ADR-010 via experiments
ADR-010 (Pipeline readiness)
└─ depends on ADR-005 (LLM adapter) — timeout applies to LLM interface
└─ depends on ADR-007 (GraphDB) — progress parity assumes graph is receiving data
Prioritization Recommendation
Do Now (High ROI, Low Cost)
| Priority | ADR | Rationale |
|---|---|---|
| P1 | ADR-001 | Zero cost, confirmed direction, unblock production deployment |
| P2 | ADR-011 (Phase 1) | 1–2 hours for a user-facing feature |
| P3 | ADR-006 | Already done but worth highlighting the testing benefit |
Validate Before Scaling (High ROI, Needs Evidence)
| Priority | ADR | Rationale |
|---|---|---|
| P1 | ADR-007 | Re-run experiment data needs analysis; blocking confirmation of the most expensive implementation |
| P2 | ADR-002 | High data curation cost — ensure the runtime benefits materialize before promoting to development |
| P3 | ADR-004 | Quality benchmarks needed — don't accept until flash-lite fidelity is quantified |
Defer (Medium ROI or High Risk)
| Priority | ADR | Rationale |
|---|---|---|
| Defer | ADR-003 | High risk, high engineering cost. Only implement if latency remains unacceptable after ADR-001 + ADR-002. |
| Defer | ADR-004 | High quality risk. Benchmark required. |
| Defer | ADR-011 (Phase 2/3) | Wait for user demand. |