{

ADR-002 Experiment: Deep Ontology vs Current Entity Extraction on qwen3.5:7b

Date: May 15, 2026 Type: A/B Comparison — entity extraction quality and latency Model: qwen3.5:latest (7B) via local Ollama Status: mixed — quality wins, latency neutral, brittle on unusual text

1. Hypothesis

The Deep Ontology system prompt (ADR-002) — a 121K-character prompt with 50+ entity types, 56 few-shot examples, and rigid extraction rules — will produce more specific, graph-useful entity types than the current inline JSON schema prompt, without significant latency increase.

Success Criteria

Entity types more granular (e.g., PERSON_FOUNDER vs Person)
No more than 10% entity count regression
No more than 50% latency increase

2. Method

Two prompts sent to qwen3.5:7b via Ollama API:

Condition	System Prompt	User Prompt
Control	(none)	`Extract entities... Return ONLY valid JSON matching this exact structure: {"entities":[...],"relationships":[...]}` with generic schema (7 types)
Ontology	First 35K chars of ADR-002's Deep Ontology (taxonomy + extraction rules)	`Extract entities following the taxonomy and rules above. Return ONLY valid JSON.`

Two texts, 2 trials each:

Apple (224 chars) — tech/business: companies, people, products, locations
Aristotle (244 chars) — philosophy: abstract concepts

3. Results

Apple Text (224 chars)

Trial	Control	Ontology
1	11e/10r, 39.5s, types: Organization, Person, Event, Location, Product	10e/7r, 39.4s, types: ORG_CORP, PERSON_FOUNDER, LOCATION_CITY, PRODUCT_HARDWARE, PERSON_EXECUTIVE
2	12e/10r, 50.8s, types: Organization, Person, Location, Product, Event, Concept	10e/7r, 27.7s, types: ORG_CORP, PERSON_FOUNDER, PERSON_EXECUTIVE, LOCATION_CITY, LOCATION_REGION, PRODUCT_HARDWARE
Avg	12 ents, 45.1s, 3,741 tok	10 ents, 33.5s, 2,369 tok

Aristotle Text (244 chars)

Trial	Control	Ontology
1	12e/7r, 54.3s, types: Concept (11), Product (1)	10e/9r, 116.0s, types: PUBLICATION_BOOK, CONCEPT_DOMAIN (4), CONCEPT_PRINCIPLE, PRODUCT_HARDWARE, Unknown (3)
2	13e/8r, 71.9s, types: Concept (12), Product (1)	FAILED — empty `{"entities":[]}`, 12.6s
Avg	13 ents, 63.1s, 5,208 tok	5 ents, 64.3s, 4,754 tok

4. Analysis

Quality: Ontology wins decisively

The ontology produces semantically rich entity types that are immediately useful in a knowledge graph:

Entity	Control Type	Ontology Type
Apple Inc.	`Organization`	`ORG_CORP`
Steve Jobs	`Person`	`PERSON_FOUNDER`
Steve Wozniak	`Person`	`PERSON_FOUNDER`
Ronald Wayne	`Person`	`PERSON_FOUNDER`
Tim Cook	`Person`	`PERSON_EXECUTIVE`
Cupertino	`Location`	`LOCATION_CITY`
California	`Location`	`LOCATION_REGION`
iPhone	`Product`	`PRODUCT_HARDWARE`
Aristotle text	`Concept` (generic)	`CONCEPT_DOMAIN`, `CONCEPT_PRINCIPLE`, `PUBLICATION_BOOK`

The control produces 6 flat types. The ontology produces 50+ granular types. For a knowledge graph, granular types enable better queries, filtering, and visualization.

Latency: Surprising — ontology can be faster

For the Apple text (where the ontology rules fit the text well), the ontology was 26% faster (33.5s vs 45.1s). This is because the structured taxonomy constrains the model's output space — less time deciding what to call things.

For the Aristotle text, the first ontology call was 2.1x slower (116s vs 54s) because the dense philosophical text didn't map cleanly to the ontology's categories. The model spent more time trying to fit square pegs in round holes.

Brittleness: Ontology failed on Aristotle Trial 2

The second ontology trial for Aristotle returned an empty entity set. The ontology's rigid categories don't map well to abstract philosophical text — the model gets confused and defaults to empty rather than guessing wrong. This is arguably safer behavior (no hallucinated entities), but produces no usable output.

5. Conclusion

Criterion	Verdict
Entity type specificity	✅ Ontology vastly better
Latency (well-fitting text)	✅ Ontology faster (-26%)
Latency (unusual text)	⚠️ Ontology slower (+115%)
Entity count stability	❌ Ontology loses entities (-13% Apple, -60% Aristotle)
Robustness	❌ Ontology produced empty output for 1/4 trials

Hypothesis Assessment

Partially confirmed. The ontology produces dramatically better entity types, confirming the core value proposition. But it's brittle for text types that don't match its taxonomy, and entity counts are lower.

Recommendations

Scenario	Choice	Rationale
Tech/business podcast transcripts	Ontology	Excellent type granularity for known categories, faster extraction
Abstract/philosophical texts	Current prompt	Ontology fails on non-conforming text
Production with Vertex AI	Ontology	CachedContent eliminates prompt overhead; type granularity justifies migration
Local Ollama dev	Both	Use ontology as opt-in for specific content types

The 10K chunk threshold (ADR-001) is a much bigger win locally than ADR-002. The ontology value is in production with Vertex AI's CachedContent, where the 128K prompt is loaded once and referenced cheaply per chunk — eliminating the 35K token overhead we paid for every call in this experiment.

}