Podpedia App — main vs development Performance Comparison
Date: May 15, 2026 Goal: Determine whether the development branch changes (ADR-001, ADR-005, ADR-007, ADR-010) provide meaningful performance benefits over main to justify the migration and maintenance cost.
Branch Differences
| ADR | Change | Files Changed | Impact |
|---|---|---|---|
| ADR-001 | Chunk threshold: 20K → 10K | transform_entity.go |
Smaller chunks = faster per-LLM-call, more parallelism |
| ADR-005 | Universal LLM adapter | llm.go, llm_openai_compatible.go |
Single code path for all providers |
| ADR-007 | Dual GraphDB (SQLite WAL-mode) | graph_sqlite.go, graph.go |
Persistent graph storage |
| ADR-010 | Async pipeline readiness | process.go, ratelimit.go |
Progress tracking, rate limiter, graph export |
Total: 19 files changed, +724 / -316 lines
Experimental Setup
| Parameter | Value |
|---|---|
| Hardware | RTX 3090 (24GB), AMD Ryzen 3700X, 64GB RAM |
| Ollama extraction model | qwen3.5:latest (7B, ~42 tok/s) |
| Backend | Docker Compose, in-memory state + blob store, memory graph |
| Small workload | Apple Inc. text (~500 chars, 10 entities) |
| Medium workload | Aristotle Nicomachean Ethics Book I (~35K chars, dense philosophical prose) |
| Trials | 3 (small), 1 (medium) per branch |
Results
Small Workload (Apple text, ~500 chars)
| Trial | main | development | Δ |
|---|---|---|---|
| 1 | 45,255ms | 40,554ms | -4,701ms |
| 2 | 40,555ms | 38,526ms | -2,029ms |
| 3 | 50,682ms | 46,629ms | -4,053ms |
| p50 | ~45,255ms | ~40,554ms | -10% |
Both branches perform similarly for small texts. Development is 5-10% faster, likely due to the optimized LLM adapter path. Not a game-changer — the variance between trials is larger than the difference.
Medium Workload (Aristotle, 35K chars)
| Metric | main | development | Δ |
|---|---|---|---|
| Total time | >840s (did not complete) | 90,274ms | ~10x faster |
| Chunk size | 20K chars (2 chunks) | 10K chars (4 chunks) | Smaller chunks |
| Chunk throughput | ~1 chunk / 7+ min | ~1 chunk / 22s | ~20x faster per chunk |
| Entities extracted | N/A (timed out) | 46 nodes, 25 edges | N/A |
Key Finding: Chunk Size Matters Non-Linearly
A 20K character chunk of dense philosophical text on a 7B model takes far more than 2x the time of a 10K chunk. The LLM's attention mechanism scales super-linearly with input length for complex extraction tasks. At 20K characters, the model struggles to maintain coherent entity extraction across the full context.
Development's 10K threshold (ADR-001) effectively solves this: smaller chunks are processed faster individually, and the concurrent worker pool (8 workers) processes them in parallel.
Analysis
What Development Gets Right
10K chunk threshold (ADR-001): The single most impactful change. For dense philosophical texts, the difference between 20K and 10K chunks is not 2x — it's 10-20x. The smaller chunks allow the 7B model to process text within its effective context window.
Universal LLM adapter (ADR-005): Cleaner code path that routes all providers through the same interface. This is a maintainability win rather than a performance win.
Dual GraphDB (ADR-007): Not tested in this experiment (in-memory graph was used for both). The SQLite WAL-mode path would matter for production persistence, not local dev latency.
Async pipeline (ADR-010): Rate limiter and progress tracking work correctly. The
total_chunksfield in the status response is useful for UX.
What Main Does Well
- Simple, single-path architecture: Fewer abstractions, easier to understand and debug.
- No performance regression: For small texts, main is within 10% of development.
- The Ollama-native path: Main's Ollama path uses 1000-character chunks with concurrent processing — which might actually be faster for some cases (untested).
Where Both Branches Struggle
Neither branch handles large/dense texts well with a 7B model. The entity extraction pipeline becomes a bottleneck for any text >10K characters. This is a fundamental limitation of the hardware, not the code:
- 7B model at ~42 tok/s needs ~60-90s per 10K chunk of dense text
- For a 100K text: ~10 chunks = ~10-15 minutes total pipeline time
- For a 250K text: ~25 chunks = ~25-40 minutes
Verdict
| Criterion | main | development | Winner |
|---|---|---|---|
| Small text latency | ~45s | ~41s | development (marginal) |
| Medium text latency | >14 min (fails) | ~1.5 min | development (clear) |
| Code complexity | Lower | Higher | main |
| Maintainability | Simpler | More abstractions | main |
| Production readiness | Limited | SQLite + rate limiter | development |
Is the juice worth the squeeze?
For local development with a 7B model: Yes. The 10K chunk threshold alone makes development massively more usable for anything beyond trivial texts. The 10x+ improvement for medium texts is decisive.
For production with Vertex AI: Probably neutral. Vertex's larger models handle 20K chunks without issue, so the 20→10K change is less impactful. The dual GraphDB and async pipeline features are the real value in production.
Recommendation: Merge development to main. The improvement for local development is substantial, and the production features (SQLite, rate limiter, progress tracking, graph export) are worth the added complexity.
Full trial data: experiments/trials/20260515-*-e2e-ingest-local.txt
Full test suite: 60+ Go unit tests, all pass on both branches.