Podpedia App — Local Docker Compose Deployment
Date: May 15, 2026
Hardware: NVIDIA RTX 3090 (24GB) + AMD Ryzen 3700X + 64GB RAM
Ollama model for entity extraction: qwen3.5:latest (7B, ~42 tok/s)
Overview
Deployed podpedia-app locally using Docker Compose. The project is a podcast knowledge graph and RAG engine with a Go backend and Vite/Typescript frontend.
Architecture
Browser → localhost:5173 (nginx)
↓ proxy /api/*
localhost:8080 (Go backend)
↓ LLM calls
host.docker.internal:11434 (Ollama)
Services
| Service | Container | Port | Role |
|---|---|---|---|
| Frontend | nginx (alpine) | :5173 | Serves built Vite app, proxies /api to backend |
| Backend | Go 1.26 (alpine) | :8080 | HTTP API, entity extraction, graph building, LLM orchestration |
Files Created
docker-compose.yml— Two-service orchestration (backend + frontend)frontend/Dockerfile— Multi-stage: pnpm build → nginx static servefrontend/nginx.conf— SPA routing +/api/proxy to backend container
Model Selection Journey
What didn't work
| Model | Issue |
|---|---|
qwen3.6:27b |
Too slow for entity extraction (60-90s per LLM call). Overkill for structured JSON outputs. |
qwen2.5:7b |
Could not produce valid JSON. Generated refusal text instead of structured output. |
qwen2.5:0.5b |
Too small — entity resolution was wrong (parsed "SpaceX" as Person, "Musk" as Org). |
What works
qwen3.5:latest (7B) handles the structured JSON format correctly. Entity extraction for a short text completes in ~30s per LLM call, with the full pipeline (extract → resolve → graph) finishing in ~54s.
Sample Pipeline Result
Input: "Elon Musk founded SpaceX in 2002. The company launched Falcon 1 in 2008."
Output:
{
"nodes": [
{"id": "Elon Musk", "group": "Person"},
{"id": "SpaceX", "group": "Organization"},
{"id": "Falcon 1", "group": "Product"}
],
"edges": [
{"from": "Elon Musk", "to": "SpaceX", "label": "FOUNDED"},
{"from": "SpaceX", "to": "Falcon 1", "label": "LAUNCHED"}
]
}
Environment Configuration
From docker-compose.yml:
backend:
environment:
- PORT=8080
- OLLAMA_URL=http://host.docker.internal:11434
- LLM_BASE_URL=http://host.docker.internal:11434/v1
- LLM_API_KEY=ollama
- LLM_MODEL=qwen3.5:latest
The backend auto-selects the LLM provider based on env vars:
LLM_BASE_URLset → OpenAI-compatible (Ollama via/v1path)GOOGLE_CLOUD_PROJECTset → Vertex AI- Neither → Ollama local fallback
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/ingest |
POST | Text → graph pipeline. Returns job ID for polling. |
/api/status/{job_id} |
GET | Poll job progress (PROCESSING / COMPLETED / FAILED) |
/api/results/{job_id} |
GET | Fetch completed graph JSON |
Ollama Models Loaded
At runtime, multiple models may be loaded on the RTX 3090:
qwen3.5:latest(9.4 GB) — active entity extraction modelqwen2.5:7b(6.8 GB) — loaded but unused after swapqwen2.5:0.5b(1.5 GB) — also unused
Total: ~17.7 GB of the 24 GB VRAM, leaving room for context.
URLs
- Frontend: http://localhost:5173
- Backend API: http://localhost:8080
See also: Qwen3.6:27b Capability Evaluation, Qwen3.6:27b Tool-Use Evaluation
}