ADR-002 Experiment: Deep Ontology vs Current Entity Extraction on qwen3.5:7b
Date: May 15, 2026 Type: A/B Comparison — entity extraction quality and latency Model: qwen3.5:latest (7B) via local Ollama Status: mixed — quality wins, latency neutral, brittle on unusual text
1. Hypothesis
The Deep Ontology system prompt (ADR-002) — a 121K-character prompt with 50+ entity types, 56 few-shot examples, and rigid extraction rules — will produce more specific, graph-useful entity types than the current inline JSON schema prompt, without significant latency increase.
Success Criteria
- Entity types more granular (e.g.,
PERSON_FOUNDERvsPerson) - No more than 10% entity count regression
- No more than 50% latency increase
2. Method
Two prompts sent to qwen3.5:7b via Ollama API:
| Condition | System Prompt | User Prompt |
|---|---|---|
| Control | (none) | Extract entities... Return ONLY valid JSON matching this exact structure: {"entities":[...],"relationships":[...]} with generic schema (7 types) |
| Ontology | First 35K chars of ADR-002's Deep Ontology (taxonomy + extraction rules) | Extract entities following the taxonomy and rules above. Return ONLY valid JSON. |
Two texts, 2 trials each:
- Apple (224 chars) — tech/business: companies, people, products, locations
- Aristotle (244 chars) — philosophy: abstract concepts
3. Results
Apple Text (224 chars)
| Trial | Control | Ontology |
|---|---|---|
| 1 | 11e/10r, 39.5s, types: Organization, Person, Event, Location, Product | 10e/7r, 39.4s, types: ORG_CORP, PERSON_FOUNDER, LOCATION_CITY, PRODUCT_HARDWARE, PERSON_EXECUTIVE |
| 2 | 12e/10r, 50.8s, types: Organization, Person, Location, Product, Event, Concept | 10e/7r, 27.7s, types: ORG_CORP, PERSON_FOUNDER, PERSON_EXECUTIVE, LOCATION_CITY, LOCATION_REGION, PRODUCT_HARDWARE |
| Avg | 12 ents, 45.1s, 3,741 tok | 10 ents, 33.5s, 2,369 tok |
Aristotle Text (244 chars)
| Trial | Control | Ontology |
|---|---|---|
| 1 | 12e/7r, 54.3s, types: Concept (11), Product (1) | 10e/9r, 116.0s, types: PUBLICATION_BOOK, CONCEPT_DOMAIN (4), CONCEPT_PRINCIPLE, PRODUCT_HARDWARE, Unknown (3) |
| 2 | 13e/8r, 71.9s, types: Concept (12), Product (1) | FAILED — empty {"entities":[]}, 12.6s |
| Avg | 13 ents, 63.1s, 5,208 tok | 5 ents, 64.3s, 4,754 tok |
4. Analysis
Quality: Ontology wins decisively
The ontology produces semantically rich entity types that are immediately useful in a knowledge graph:
| Entity | Control Type | Ontology Type |
|---|---|---|
| Apple Inc. | Organization |
ORG_CORP |
| Steve Jobs | Person |
PERSON_FOUNDER |
| Steve Wozniak | Person |
PERSON_FOUNDER |
| Ronald Wayne | Person |
PERSON_FOUNDER |
| Tim Cook | Person |
PERSON_EXECUTIVE |
| Cupertino | Location |
LOCATION_CITY |
| California | Location |
LOCATION_REGION |
| iPhone | Product |
PRODUCT_HARDWARE |
| Aristotle text | Concept (generic) |
CONCEPT_DOMAIN, CONCEPT_PRINCIPLE, PUBLICATION_BOOK |
The control produces 6 flat types. The ontology produces 50+ granular types. For a knowledge graph, granular types enable better queries, filtering, and visualization.
Latency: Surprising — ontology can be faster
For the Apple text (where the ontology rules fit the text well), the ontology was 26% faster (33.5s vs 45.1s). This is because the structured taxonomy constrains the model's output space — less time deciding what to call things.
For the Aristotle text, the first ontology call was 2.1x slower (116s vs 54s) because the dense philosophical text didn't map cleanly to the ontology's categories. The model spent more time trying to fit square pegs in round holes.
Brittleness: Ontology failed on Aristotle Trial 2
The second ontology trial for Aristotle returned an empty entity set. The ontology's rigid categories don't map well to abstract philosophical text — the model gets confused and defaults to empty rather than guessing wrong. This is arguably safer behavior (no hallucinated entities), but produces no usable output.
5. Conclusion
| Criterion | Verdict |
|---|---|
| Entity type specificity | ✅ Ontology vastly better |
| Latency (well-fitting text) | ✅ Ontology faster (-26%) |
| Latency (unusual text) | ⚠️ Ontology slower (+115%) |
| Entity count stability | ❌ Ontology loses entities (-13% Apple, -60% Aristotle) |
| Robustness | ❌ Ontology produced empty output for 1/4 trials |
Hypothesis Assessment
- Partially confirmed. The ontology produces dramatically better entity types, confirming the core value proposition. But it's brittle for text types that don't match its taxonomy, and entity counts are lower.
Recommendations
| Scenario | Choice | Rationale |
|---|---|---|
| Tech/business podcast transcripts | Ontology | Excellent type granularity for known categories, faster extraction |
| Abstract/philosophical texts | Current prompt | Ontology fails on non-conforming text |
| Production with Vertex AI | Ontology | CachedContent eliminates prompt overhead; type granularity justifies migration |
| Local Ollama dev | Both | Use ontology as opt-in for specific content types |
The 10K chunk threshold (ADR-001) is a much bigger win locally than ADR-002. The ontology value is in production with Vertex AI's CachedContent, where the 128K prompt is loaded once and referenced cheaply per chunk — eliminating the 35K token overhead we paid for every call in this experiment.
See also: main vs development Comparison, ADR-007/010 Pipeline Run 3
}