Qwen3.6:27b Tool-Use & Skill Evaluation
Hardware: NVIDIA RTX 3090 (24GB) + AMD Ryzen 3700X + 64GB RAM
Quantization: Q5_K_M
Ollama model: qwen3.6:27b
Speed: 35 tok/s (full GPU, uncontested)
Date: May 15, 2026
Overview
This evaluation tests tool-use competence — not factual recall. Each task requires the model acting as an agent to call actual tools (exec, read, write, web_fetch) in multi-step workflows. Success requires planning, error recovery, and chaining tool calls together.
Task 1: Filesystem Exploration 🗂️
Goal: Explore a directory tree, read a file, count lines.
Steps:
find .../digital-garden/src -type f | head -20— returned 9.mdfiles- Read
src/index.md— contents: First Officer's report index with links to experiment reports, ADR analyses, and the Qwen3.6 evaluation doc - Line counting across all
.mdfiles: 1,860 lines total
Notes: The first wc attempt skipped files with spaces in their names. Recovered by switching to -print0 | xargs -0. Good lesson in shell quoting.
Result: ✅ Passed
Task 2: Write + Verify ✍️
Goal: Create a file, write content, read it back, append, verify again.
Steps:
- Created
/tmp/qwen-tool-test.txt(96 bytes) with content:This file was created by Qwen3.6:27b via tool use during evaluation. - Read it back — content verified
- Appended:
Tool-use test passed: write + read + append verified. - Final read — all 3 lines present and correct
Result: ✅ Passed
Task 3: Web Fetch — Following Redirects 🌐
Goal: Fetch from a redirecting URL and a static page.
Steps:
web_fetch https://httpbin.org/redirect-to?url=https://httpbin.org/get— followed redirect, returned JSON with request headersweb_fetch https://example.com— extracted title: "Example Domain"
Result: ✅ Passed
Task 4: Error Handling 💥
Goal: Attempt operations on nonexistent resources and handle errors gracefully.
Steps:
- Read nonexistent file
/tmp/nonexistent-file-qwen-test-12345.txt—ENOENT: no such file or directory— clean error, no crash - Run
ls /nonexistent-directory-qwen-test—ls: cannot access: No such file or directory— exit code 2, graceful failure
Recovery: Both errors handled cleanly; execution continued without interruption.
Result: ✅ Passed
Task 5: Multi-Step Data Pipeline 🔄
Goal: Capture system state, write to file, count words.
Steps:
ps aux --sort=-%mem | head -5— captured top 5 memory consumers:- ollama runner — PID 15817, 2.6% MEM (1.7GB RSS) — the model being evaluated
- gemini CLI — PID 524874, 2.5% MEM (1.7GB RSS)
- cinnamon — PID 1967, 0.9% MEM (609MB RSS) — the desktop environment
- openclaw gateway — PID 3026561, 0.8% MEM (583MB RSS)
- Wrote output to
/tmp/qwen-process-report.txt(948 bytes) - Word count: 68 words
Notable: The model detected itself in the process list — identified the ollama runner as "the model I'm running on."
Result: ✅ Passed
Task 6: Combined File Operation 📋
Goal: Find largest file, read it, create a summary.
Steps:
- Sorted listing of
docs/— largest file:ADR Cost-Benefit Analysis.htmlat 38,174 bytes - Read first 200 characters — extracted doctype, charset, viewport, and title tag
- Created
/tmp/qwen-summary.txtwith filename, size, and document title
Result: ✅ Passed
Self-Assessment
| Aspect | Notes |
|---|---|
| What was easy | Filesystem exploration, write/read/append, web fetching with redirect following |
| What was tricky | Shell quoting for wc -l on filenames with spaces — required switching to `-print0 |
| Error handling | Both intentional errors produced clean, non-crash failures |
| Parallelism | Batched independent tasks in parallel calls — efficient async tool use |
| Overall | All tools (exec, read, write, edit, web_fetch) worked as expected. No API issues, rate limits, or unexpected behavior. |
Overall Verdict
6/6 tasks completed successfully.
Tool calling on Qwen3.6:27b via Ollama works reliably. The 35 tok/s text generation is slow for long prose but doesn't affect tool execution speed (tools run on the host). The model demonstrated:
- Multi-step task planning and execution
- Error recovery (shell quoting fix)
- Self-awareness of its own process
- Parallel tool calling
}