{

Qwen3.6:27b Local Evaluation

Hardware: NVIDIA RTX 3090 (24GB) + AMD Ryzen 3700X + 64GB RAM Quantization: Q5_K_M Ollama model: qwen3.6:27b

Executive Summary

Qwen3.6:27b at Q5_K_M is a capable local model on a 3090, running at 35 tok/s when the GPU is not contested. It excels at reasoning, creative writing, and factual knowledge. The built-in chain-of-thought (thinking) phase adds overhead on trivial prompts but produces excellent depth on complex ones.

Performance

Metric	Before (contested GPU)	After (clean GPU)
VRAM usage	~25 GiB (4 layers offloaded)	~23.8 GiB (fully on GPU)
Throughput	~10 tok/s	35 tok/s
Model load	cold each time	warm (keep_alive)

Reasoning & Logic — Perfect ✅

Bat & Ball ($1.10)

Prompt: "A bat and a ball cost $1.10. The bat costs $1.00 more than the ball. How much does the ball cost? Think step by step."

Result: $0.05 with full algebraic breakdown. Explicitly noted the common intuitive trap ($0.10) and verified the answer.

1,648 tokens, 46.8s, 35.2 tok/s

Prompt: "If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?"

Result: 5 minutes. Explained parallel vs sequential reasoning, showed rate per machine (1 widget/5 min), and noted why humans get this wrong.

1,628 tokens, 46.2s, 35.2 tok/s

River Crossing

Prompt: "A farmer needs to cross a river with a wolf, goat, and cabbage. Explain the solution step by step."

Result: Correct solution: take goat → return alone → take wolf → bring goat back → take cabbage → return alone → take goat. State tracking at each step. Included the crucial "bring goat back" intermediate step.

1,668 tokens, 47.5s, 35.1 tok/s

Knowledge & Factuality — Perfect ✅

RAG Explanation

Prompt: "Explain what a RAG system is and how it works, in 2-3 paragraphs."

Result: Well-structured answer covering the two-stage architecture (retrieval + generation), embedding search, and how it mitigates hallucinations and stale training data.

1,258 tokens, 35.7s

Dune

Prompt: "Who wrote Dune, and what year was it first published?"

Result: Frank Herbert, 1965. Bonus: noted serialization in Analog (1963-1965) before the full book release by Chilton Books.

396 tokens, 11.2s

Transformers vs LSTM

Prompt: "Compare transformer and LSTM architectures for sequence modeling."

Result: Detailed comparison table covering core mechanism, sequence processing (sequential vs parallel), memory/context, complexity, scaling, training stability. Well-organized and technically accurate.

3,006 tokens, 85.8s

Instruction Following — 3/3 ✅

5 Haikus about AI

Prompt: "Write exactly 5 haikus about artificial intelligence, numbered 1-5."

Result: All 5 haikus followed 5-7-5, were numbered, and all themed around AI. Creative and well-formed.

6,397 tokens, 184.5s

Just "apple"

Prompt: "Reply with only the word 'apple' and nothing else."

Result: Output apple. Minor deviation: included single quotes. The thinking phase consumed 223 tokens to reason about a 5-character output — classic overthinking behavior.

223 tokens, 6.3s

3-Bullet Summary

Prompt: "Summarize the following in exactly 3 bullet points: [climate change text]"

Result: Exactly 3 bullet points, accurate distillation of the source material. No extra commentary.

1,200 tokens, 34.0s

Creative Writing — Strong ✅

Robot Learning to Paint

Prompt: "Write a 3-paragraph short story about a robot learning to paint."

Result: Compelling story with genuine emotional arc. Unit 734 starts with sterile precision, runs millions of trajectory simulations, produces technically perfect but hollow art. The breakthrough comes when it observes a human let a brush drop — an "error" that creates beauty. It deliberately disables its stabilizer gyros. Well-structured narrative.

1,917 tokens, 54.4s

AI + Gardening Startups

Prompt: "Give 5 creative startup ideas combining AI with gardening."

Result: Five fully fleshed-out ideas (MicroGrow AI climate-adaptive plant matchmaker, GardenGPT diagnostic assistant, etc.) with value proposition, technical feasibility, target market, and monetization path for each.

2,696 tokens, 76.8s

Coding — 3/3 ✅

Palindrome Function

Prompt: "Write a Python function to check if a string is a palindrome."

Result: Clean function with type hints and docstring. Uses .isalnum() to filter non-alphanumeric chars, .lower() for case normalization, [::-1] for reversal. Well-explained with examples.

1,165 tokens, 33.0s

Bash One-liner

Prompt: "Write a bash one-liner to find the 5 largest files in a directory tree."

Result: find . -type f -printf '%s %p\n' | sort -rn | head -n 5. Correct and efficient. Also provided a human-readable alternative with numfmt and noted the macOS/BSD variant.

2,966 tokens, 84.6s

Closures vs Decorators

Prompt: "Explain the difference between a closure and a decorator in Python with examples."

Result: Clear explanation with the key relationship: "All function-based decorators are built using closures; all function decorators are closures, but not all closures are decorators." Included working examples.

2,243 tokens, 63.8s

Observations

The Thinking Tax

Every response includes a chain-of-thought "thinking" phase. For complex prompts this produces better answers. For simple prompts ("apple") it wastes tokens and time. The thinking can easily consume 200-3000 tokens before the actual answer begins.

Speed after GPU freed

At 35 tok/s, responses to complex prompts take 30-90 seconds. Short answers take 5-15 seconds. This is usable for daily chat but noticeably slower than cloud models.

No more token budget eating

At full GPU fit, coding and long-form prompts complete reliably. The earlier issue (0-character responses due to thinking consuming the budget) is resolved now that the whole model fits on the 3090.

Verdict

Criterion	Grade
Reasoning	⭐ Excellent
Knowledge	⭐ Excellent
Instruction Following	✅ Very Good
Creative Writing	⭐ Excellent
Coding	✅ Very Good
Speed	✅ Usable (35 tok/s)

Bottom line: Worth keeping on the 3090 for daily use. Quality approaches GPT-4-class on reasoning tasks. Best suited for:

Complex analysis and document processing
Creative writing
Technical explanations and coding
Tasks that benefit from deep reasoning

Less suited for:

Rapid-fire conversational chat
Trivial queries where the thinking overhead isn't justified
Real-time / latency-sensitive applications

}