{

Podpedia App — main vs development Performance Comparison

Date: May 15, 2026 Goal: Determine whether the development branch changes (ADR-001, ADR-005, ADR-007, ADR-010) provide meaningful performance benefits over main to justify the migration and maintenance cost.


Branch Differences

ADR Change Files Changed Impact
ADR-001 Chunk threshold: 20K → 10K transform_entity.go Smaller chunks = faster per-LLM-call, more parallelism
ADR-005 Universal LLM adapter llm.go, llm_openai_compatible.go Single code path for all providers
ADR-007 Dual GraphDB (SQLite WAL-mode) graph_sqlite.go, graph.go Persistent graph storage
ADR-010 Async pipeline readiness process.go, ratelimit.go Progress tracking, rate limiter, graph export

Total: 19 files changed, +724 / -316 lines


Experimental Setup

Parameter Value
Hardware RTX 3090 (24GB), AMD Ryzen 3700X, 64GB RAM
Ollama extraction model qwen3.5:latest (7B, ~42 tok/s)
Backend Docker Compose, in-memory state + blob store, memory graph
Small workload Apple Inc. text (~500 chars, 10 entities)
Medium workload Aristotle Nicomachean Ethics Book I (~35K chars, dense philosophical prose)
Trials 3 (small), 1 (medium) per branch

Results

Small Workload (Apple text, ~500 chars)

Trial main development Δ
1 45,255ms 40,554ms -4,701ms
2 40,555ms 38,526ms -2,029ms
3 50,682ms 46,629ms -4,053ms
p50 ~45,255ms ~40,554ms -10%

Both branches perform similarly for small texts. Development is 5-10% faster, likely due to the optimized LLM adapter path. Not a game-changer — the variance between trials is larger than the difference.

Medium Workload (Aristotle, 35K chars)

Metric main development Δ
Total time >840s (did not complete) 90,274ms ~10x faster
Chunk size 20K chars (2 chunks) 10K chars (4 chunks) Smaller chunks
Chunk throughput ~1 chunk / 7+ min ~1 chunk / 22s ~20x faster per chunk
Entities extracted N/A (timed out) 46 nodes, 25 edges N/A

Key Finding: Chunk Size Matters Non-Linearly

A 20K character chunk of dense philosophical text on a 7B model takes far more than 2x the time of a 10K chunk. The LLM's attention mechanism scales super-linearly with input length for complex extraction tasks. At 20K characters, the model struggles to maintain coherent entity extraction across the full context.

Development's 10K threshold (ADR-001) effectively solves this: smaller chunks are processed faster individually, and the concurrent worker pool (8 workers) processes them in parallel.


Analysis

What Development Gets Right

  1. 10K chunk threshold (ADR-001): The single most impactful change. For dense philosophical texts, the difference between 20K and 10K chunks is not 2x — it's 10-20x. The smaller chunks allow the 7B model to process text within its effective context window.

  2. Universal LLM adapter (ADR-005): Cleaner code path that routes all providers through the same interface. This is a maintainability win rather than a performance win.

  3. Dual GraphDB (ADR-007): Not tested in this experiment (in-memory graph was used for both). The SQLite WAL-mode path would matter for production persistence, not local dev latency.

  4. Async pipeline (ADR-010): Rate limiter and progress tracking work correctly. The total_chunks field in the status response is useful for UX.

What Main Does Well

  1. Simple, single-path architecture: Fewer abstractions, easier to understand and debug.
  2. No performance regression: For small texts, main is within 10% of development.
  3. The Ollama-native path: Main's Ollama path uses 1000-character chunks with concurrent processing — which might actually be faster for some cases (untested).

Where Both Branches Struggle

Neither branch handles large/dense texts well with a 7B model. The entity extraction pipeline becomes a bottleneck for any text >10K characters. This is a fundamental limitation of the hardware, not the code:


Verdict

Criterion main development Winner
Small text latency ~45s ~41s development (marginal)
Medium text latency >14 min (fails) ~1.5 min development (clear)
Code complexity Lower Higher main
Maintainability Simpler More abstractions main
Production readiness Limited SQLite + rate limiter development

Is the juice worth the squeeze?

For local development with a 7B model: Yes. The 10K chunk threshold alone makes development massively more usable for anything beyond trivial texts. The 10x+ improvement for medium texts is decisive.

For production with Vertex AI: Probably neutral. Vertex's larger models handle 20K chunks without issue, so the 20→10K change is less impactful. The dual GraphDB and async pipeline features are the real value in production.

Recommendation: Merge development to main. The improvement for local development is substantial, and the production features (SQLite, rate limiter, progress tracking, graph export) are worth the added complexity.


Full trial data: experiments/trials/20260515-*-e2e-ingest-local.txt Full test suite: 60+ Go unit tests, all pass on both branches.

}