{

Qwen3.6:27b Local Evaluation

Hardware: NVIDIA RTX 3090 (24GB) + AMD Ryzen 3700X + 64GB RAM Quantization: Q5_K_M Ollama model: qwen3.6:27b


Executive Summary

Qwen3.6:27b at Q5_K_M is a capable local model on a 3090, running at 35 tok/s when the GPU is not contested. It excels at reasoning, creative writing, and factual knowledge. The built-in chain-of-thought (thinking) phase adds overhead on trivial prompts but produces excellent depth on complex ones.


Performance

Metric Before (contested GPU) After (clean GPU)
VRAM usage ~25 GiB (4 layers offloaded) ~23.8 GiB (fully on GPU)
Throughput ~10 tok/s 35 tok/s
Model load cold each time warm (keep_alive)

Reasoning & Logic — Perfect ✅

Bat & Ball ($1.10)

Prompt: "A bat and a ball cost $1.10. The bat costs $1.00 more than the ball. How much does the ball cost? Think step by step."

Result: $0.05 with full algebraic breakdown. Explicitly noted the common intuitive trap ($0.10) and verified the answer.

Widget Factory

Prompt: "If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?"

Result: 5 minutes. Explained parallel vs sequential reasoning, showed rate per machine (1 widget/5 min), and noted why humans get this wrong.

River Crossing

Prompt: "A farmer needs to cross a river with a wolf, goat, and cabbage. Explain the solution step by step."

Result: Correct solution: take goat → return alone → take wolf → bring goat back → take cabbage → return alone → take goat. State tracking at each step. Included the crucial "bring goat back" intermediate step.


Knowledge & Factuality — Perfect ✅

RAG Explanation

Prompt: "Explain what a RAG system is and how it works, in 2-3 paragraphs."

Result: Well-structured answer covering the two-stage architecture (retrieval + generation), embedding search, and how it mitigates hallucinations and stale training data.

Dune

Prompt: "Who wrote Dune, and what year was it first published?"

Result: Frank Herbert, 1965. Bonus: noted serialization in Analog (1963-1965) before the full book release by Chilton Books.

Transformers vs LSTM

Prompt: "Compare transformer and LSTM architectures for sequence modeling."

Result: Detailed comparison table covering core mechanism, sequence processing (sequential vs parallel), memory/context, complexity, scaling, training stability. Well-organized and technically accurate.


Instruction Following — 3/3 ✅

5 Haikus about AI

Prompt: "Write exactly 5 haikus about artificial intelligence, numbered 1-5."

Result: All 5 haikus followed 5-7-5, were numbered, and all themed around AI. Creative and well-formed.

Just "apple"

Prompt: "Reply with only the word 'apple' and nothing else."

Result: Output apple. Minor deviation: included single quotes. The thinking phase consumed 223 tokens to reason about a 5-character output — classic overthinking behavior.

3-Bullet Summary

Prompt: "Summarize the following in exactly 3 bullet points: [climate change text]"

Result: Exactly 3 bullet points, accurate distillation of the source material. No extra commentary.


Creative Writing — Strong ✅

Robot Learning to Paint

Prompt: "Write a 3-paragraph short story about a robot learning to paint."

Result: Compelling story with genuine emotional arc. Unit 734 starts with sterile precision, runs millions of trajectory simulations, produces technically perfect but hollow art. The breakthrough comes when it observes a human let a brush drop — an "error" that creates beauty. It deliberately disables its stabilizer gyros. Well-structured narrative.

AI + Gardening Startups

Prompt: "Give 5 creative startup ideas combining AI with gardening."

Result: Five fully fleshed-out ideas (MicroGrow AI climate-adaptive plant matchmaker, GardenGPT diagnostic assistant, etc.) with value proposition, technical feasibility, target market, and monetization path for each.


Coding — 3/3 ✅

Palindrome Function

Prompt: "Write a Python function to check if a string is a palindrome."

Result: Clean function with type hints and docstring. Uses .isalnum() to filter non-alphanumeric chars, .lower() for case normalization, [::-1] for reversal. Well-explained with examples.

Bash One-liner

Prompt: "Write a bash one-liner to find the 5 largest files in a directory tree."

Result: find . -type f -printf '%s %p\n' | sort -rn | head -n 5. Correct and efficient. Also provided a human-readable alternative with numfmt and noted the macOS/BSD variant.

Closures vs Decorators

Prompt: "Explain the difference between a closure and a decorator in Python with examples."

Result: Clear explanation with the key relationship: "All function-based decorators are built using closures; all function decorators are closures, but not all closures are decorators." Included working examples.


Observations

The Thinking Tax

Every response includes a chain-of-thought "thinking" phase. For complex prompts this produces better answers. For simple prompts ("apple") it wastes tokens and time. The thinking can easily consume 200-3000 tokens before the actual answer begins.

Speed after GPU freed

At 35 tok/s, responses to complex prompts take 30-90 seconds. Short answers take 5-15 seconds. This is usable for daily chat but noticeably slower than cloud models.

No more token budget eating

At full GPU fit, coding and long-form prompts complete reliably. The earlier issue (0-character responses due to thinking consuming the budget) is resolved now that the whole model fits on the 3090.


Verdict

Criterion Grade
Reasoning ⭐ Excellent
Knowledge ⭐ Excellent
Instruction Following ✅ Very Good
Creative Writing ⭐ Excellent
Coding ✅ Very Good
Speed ✅ Usable (35 tok/s)

Bottom line: Worth keeping on the 3090 for daily use. Quality approaches GPT-4-class on reasoning tasks. Best suited for:

Less suited for:

}