{

Podpedia App — Local Docker Compose Deployment

Date: May 15, 2026 Hardware: NVIDIA RTX 3090 (24GB) + AMD Ryzen 3700X + 64GB RAM Ollama model for entity extraction: qwen3.5:latest (7B, ~42 tok/s)


Overview

Deployed podpedia-app locally using Docker Compose. The project is a podcast knowledge graph and RAG engine with a Go backend and Vite/Typescript frontend.

Architecture

Browser → localhost:5173 (nginx)
                ↓ proxy /api/*
         localhost:8080 (Go backend)
                ↓ LLM calls
         host.docker.internal:11434 (Ollama)

Services

Service Container Port Role
Frontend nginx (alpine) :5173 Serves built Vite app, proxies /api to backend
Backend Go 1.26 (alpine) :8080 HTTP API, entity extraction, graph building, LLM orchestration

Files Created

Model Selection Journey

What didn't work

Model Issue
qwen3.6:27b Too slow for entity extraction (60-90s per LLM call). Overkill for structured JSON outputs.
qwen2.5:7b Could not produce valid JSON. Generated refusal text instead of structured output.
qwen2.5:0.5b Too small — entity resolution was wrong (parsed "SpaceX" as Person, "Musk" as Org).

What works

qwen3.5:latest (7B) handles the structured JSON format correctly. Entity extraction for a short text completes in ~30s per LLM call, with the full pipeline (extract → resolve → graph) finishing in ~54s.

Sample Pipeline Result

Input: "Elon Musk founded SpaceX in 2002. The company launched Falcon 1 in 2008."

Output:

{
  "nodes": [
    {"id": "Elon Musk", "group": "Person"},
    {"id": "SpaceX", "group": "Organization"},
    {"id": "Falcon 1", "group": "Product"}
  ],
  "edges": [
    {"from": "Elon Musk", "to": "SpaceX", "label": "FOUNDED"},
    {"from": "SpaceX", "to": "Falcon 1", "label": "LAUNCHED"}
  ]
}

Environment Configuration

From docker-compose.yml:

backend:
  environment:
    - PORT=8080
    - OLLAMA_URL=http://host.docker.internal:11434
    - LLM_BASE_URL=http://host.docker.internal:11434/v1
    - LLM_API_KEY=ollama
    - LLM_MODEL=qwen3.5:latest

The backend auto-selects the LLM provider based on env vars:

API Endpoints

Endpoint Method Description
/api/ingest POST Text → graph pipeline. Returns job ID for polling.
/api/status/{job_id} GET Poll job progress (PROCESSING / COMPLETED / FAILED)
/api/results/{job_id} GET Fetch completed graph JSON

Ollama Models Loaded

At runtime, multiple models may be loaded on the RTX 3090:

Total: ~17.7 GB of the 24 GB VRAM, leaving room for context.

URLs


See also: Qwen3.6:27b Capability Evaluation, Qwen3.6:27b Tool-Use Evaluation

}