{

ADR-002 Experiment: Deep Ontology vs Current Entity Extraction on qwen3.5:7b

Date: May 15, 2026 Type: A/B Comparison — entity extraction quality and latency Model: qwen3.5:latest (7B) via local Ollama Status: mixed — quality wins, latency neutral, brittle on unusual text


1. Hypothesis

The Deep Ontology system prompt (ADR-002) — a 121K-character prompt with 50+ entity types, 56 few-shot examples, and rigid extraction rules — will produce more specific, graph-useful entity types than the current inline JSON schema prompt, without significant latency increase.

Success Criteria


2. Method

Two prompts sent to qwen3.5:7b via Ollama API:

Condition System Prompt User Prompt
Control (none) Extract entities... Return ONLY valid JSON matching this exact structure: {"entities":[...],"relationships":[...]} with generic schema (7 types)
Ontology First 35K chars of ADR-002's Deep Ontology (taxonomy + extraction rules) Extract entities following the taxonomy and rules above. Return ONLY valid JSON.

Two texts, 2 trials each:


3. Results

Apple Text (224 chars)

Trial Control Ontology
1 11e/10r, 39.5s, types: Organization, Person, Event, Location, Product 10e/7r, 39.4s, types: ORG_CORP, PERSON_FOUNDER, LOCATION_CITY, PRODUCT_HARDWARE, PERSON_EXECUTIVE
2 12e/10r, 50.8s, types: Organization, Person, Location, Product, Event, Concept 10e/7r, 27.7s, types: ORG_CORP, PERSON_FOUNDER, PERSON_EXECUTIVE, LOCATION_CITY, LOCATION_REGION, PRODUCT_HARDWARE
Avg 12 ents, 45.1s, 3,741 tok 10 ents, 33.5s, 2,369 tok

Aristotle Text (244 chars)

Trial Control Ontology
1 12e/7r, 54.3s, types: Concept (11), Product (1) 10e/9r, 116.0s, types: PUBLICATION_BOOK, CONCEPT_DOMAIN (4), CONCEPT_PRINCIPLE, PRODUCT_HARDWARE, Unknown (3)
2 13e/8r, 71.9s, types: Concept (12), Product (1) FAILED — empty {"entities":[]}, 12.6s
Avg 13 ents, 63.1s, 5,208 tok 5 ents, 64.3s, 4,754 tok

4. Analysis

Quality: Ontology wins decisively

The ontology produces semantically rich entity types that are immediately useful in a knowledge graph:

Entity Control Type Ontology Type
Apple Inc. Organization ORG_CORP
Steve Jobs Person PERSON_FOUNDER
Steve Wozniak Person PERSON_FOUNDER
Ronald Wayne Person PERSON_FOUNDER
Tim Cook Person PERSON_EXECUTIVE
Cupertino Location LOCATION_CITY
California Location LOCATION_REGION
iPhone Product PRODUCT_HARDWARE
Aristotle text Concept (generic) CONCEPT_DOMAIN, CONCEPT_PRINCIPLE, PUBLICATION_BOOK

The control produces 6 flat types. The ontology produces 50+ granular types. For a knowledge graph, granular types enable better queries, filtering, and visualization.

Latency: Surprising — ontology can be faster

For the Apple text (where the ontology rules fit the text well), the ontology was 26% faster (33.5s vs 45.1s). This is because the structured taxonomy constrains the model's output space — less time deciding what to call things.

For the Aristotle text, the first ontology call was 2.1x slower (116s vs 54s) because the dense philosophical text didn't map cleanly to the ontology's categories. The model spent more time trying to fit square pegs in round holes.

Brittleness: Ontology failed on Aristotle Trial 2

The second ontology trial for Aristotle returned an empty entity set. The ontology's rigid categories don't map well to abstract philosophical text — the model gets confused and defaults to empty rather than guessing wrong. This is arguably safer behavior (no hallucinated entities), but produces no usable output.


5. Conclusion

Criterion Verdict
Entity type specificity ✅ Ontology vastly better
Latency (well-fitting text) ✅ Ontology faster (-26%)
Latency (unusual text) ⚠️ Ontology slower (+115%)
Entity count stability ❌ Ontology loses entities (-13% Apple, -60% Aristotle)
Robustness ❌ Ontology produced empty output for 1/4 trials

Hypothesis Assessment

Recommendations

Scenario Choice Rationale
Tech/business podcast transcripts Ontology Excellent type granularity for known categories, faster extraction
Abstract/philosophical texts Current prompt Ontology fails on non-conforming text
Production with Vertex AI Ontology CachedContent eliminates prompt overhead; type granularity justifies migration
Local Ollama dev Both Use ontology as opt-in for specific content types

The 10K chunk threshold (ADR-001) is a much bigger win locally than ADR-002. The ontology value is in production with Vertex AI's CachedContent, where the 128K prompt is loaded once and referenced cheaply per chunk — eliminating the 35K token overhead we paid for every call in this experiment.


See also: main vs development Comparison, ADR-007/010 Pipeline Run 3

}