{

Podpedia App — Local Docker Compose Deployment

Date: May 15, 2026 Hardware: NVIDIA RTX 3090 (24GB) + AMD Ryzen 3700X + 64GB RAM Ollama model for entity extraction: qwen3.5:latest (7B, ~42 tok/s)

Overview

Deployed podpedia-app locally using Docker Compose. The project is a podcast knowledge graph and RAG engine with a Go backend and Vite/Typescript frontend.

Architecture

Browser → localhost:5173 (nginx)
                ↓ proxy /api/*
         localhost:8080 (Go backend)
                ↓ LLM calls
         host.docker.internal:11434 (Ollama)

Services

Service	Container	Port	Role
Frontend	nginx (alpine)	:5173	Serves built Vite app, proxies `/api` to backend
Backend	Go 1.26 (alpine)	:8080	HTTP API, entity extraction, graph building, LLM orchestration

Files Created

docker-compose.yml — Two-service orchestration (backend + frontend)
frontend/Dockerfile — Multi-stage: pnpm build → nginx static serve
frontend/nginx.conf — SPA routing + /api/ proxy to backend container

Model Selection Journey

What didn't work

Model	Issue
`qwen3.6:27b`	Too slow for entity extraction (60-90s per LLM call). Overkill for structured JSON outputs.
`qwen2.5:7b`	Could not produce valid JSON. Generated refusal text instead of structured output.
`qwen2.5:0.5b`	Too small — entity resolution was wrong (parsed "SpaceX" as Person, "Musk" as Org).

What works

qwen3.5:latest (7B) handles the structured JSON format correctly. Entity extraction for a short text completes in ~30s per LLM call, with the full pipeline (extract → resolve → graph) finishing in ~54s.

Sample Pipeline Result

Input: "Elon Musk founded SpaceX in 2002. The company launched Falcon 1 in 2008."

Output:

{
  "nodes": [
    {"id": "Elon Musk", "group": "Person"},
    {"id": "SpaceX", "group": "Organization"},
    {"id": "Falcon 1", "group": "Product"}
  ],
  "edges": [
    {"from": "Elon Musk", "to": "SpaceX", "label": "FOUNDED"},
    {"from": "SpaceX", "to": "Falcon 1", "label": "LAUNCHED"}
  ]
}

Environment Configuration

From docker-compose.yml:

backend:
  environment:
    - PORT=8080
    - OLLAMA_URL=http://host.docker.internal:11434
    - LLM_BASE_URL=http://host.docker.internal:11434/v1
    - LLM_API_KEY=ollama
    - LLM_MODEL=qwen3.5:latest

The backend auto-selects the LLM provider based on env vars:

LLM_BASE_URL set → OpenAI-compatible (Ollama via /v1 path)
GOOGLE_CLOUD_PROJECT set → Vertex AI
Neither → Ollama local fallback

API Endpoints

Endpoint	Method	Description
`/api/ingest`	POST	Text → graph pipeline. Returns job ID for polling.
`/api/status/{job_id}`	GET	Poll job progress (PROCESSING / COMPLETED / FAILED)
`/api/results/{job_id}`	GET	Fetch completed graph JSON

Ollama Models Loaded

At runtime, multiple models may be loaded on the RTX 3090:

qwen3.5:latest (9.4 GB) — active entity extraction model
qwen2.5:7b (6.8 GB) — loaded but unused after swap
qwen2.5:0.5b (1.5 GB) — also unused

Total: ~17.7 GB of the 24 GB VRAM, leaving room for context.

URLs

Frontend: http://localhost:5173
Backend API: http://localhost:8080

}