{

Qwen3.6:27b Tool-Use & Skill Evaluation

Hardware: NVIDIA RTX 3090 (24GB) + AMD Ryzen 3700X + 64GB RAM Quantization: Q5_K_M Ollama model: qwen3.6:27b Speed: 35 tok/s (full GPU, uncontested)

Date: May 15, 2026

Overview

This evaluation tests tool-use competence — not factual recall. Each task requires the model acting as an agent to call actual tools (exec, read, write, web_fetch) in multi-step workflows. Success requires planning, error recovery, and chaining tool calls together.

Task 1: Filesystem Exploration 🗂️

Goal: Explore a directory tree, read a file, count lines.

Steps:

find .../digital-garden/src -type f | head -20 — returned 9 .md files
Read src/index.md — contents: First Officer's report index with links to experiment reports, ADR analyses, and the Qwen3.6 evaluation doc
Line counting across all .md files: 1,860 lines total

Notes: The first wc attempt skipped files with spaces in their names. Recovered by switching to -print0 | xargs -0. Good lesson in shell quoting.

Result: ✅ Passed

Task 2: Write + Verify ✍️

Goal: Create a file, write content, read it back, append, verify again.

Steps:

Created /tmp/qwen-tool-test.txt (96 bytes) with content: This file was created by Qwen3.6:27b via tool use during evaluation.
Read it back — content verified
Appended: Tool-use test passed: write + read + append verified.
Final read — all 3 lines present and correct

Result: ✅ Passed

Task 3: Web Fetch — Following Redirects 🌐

Goal: Fetch from a redirecting URL and a static page.

Steps:

web_fetch https://httpbin.org/redirect-to?url=https://httpbin.org/get — followed redirect, returned JSON with request headers
web_fetch https://example.com — extracted title: "Example Domain"

Result: ✅ Passed

Task 4: Error Handling 💥

Goal: Attempt operations on nonexistent resources and handle errors gracefully.

Steps:

Read nonexistent file /tmp/nonexistent-file-qwen-test-12345.txt — ENOENT: no such file or directory — clean error, no crash
Run ls /nonexistent-directory-qwen-test — ls: cannot access: No such file or directory — exit code 2, graceful failure

Recovery: Both errors handled cleanly; execution continued without interruption.

Result: ✅ Passed

Task 5: Multi-Step Data Pipeline 🔄

Goal: Capture system state, write to file, count words.

Steps:

ps aux --sort=-%mem | head -5 — captured top 5 memory consumers:
- ollama runner — PID 15817, 2.6% MEM (1.7GB RSS) — the model being evaluated
- gemini CLI — PID 524874, 2.5% MEM (1.7GB RSS)
- cinnamon — PID 1967, 0.9% MEM (609MB RSS) — the desktop environment
- openclaw gateway — PID 3026561, 0.8% MEM (583MB RSS)
Wrote output to /tmp/qwen-process-report.txt (948 bytes)
Word count: 68 words

Notable: The model detected itself in the process list — identified the ollama runner as "the model I'm running on."

Result: ✅ Passed

Task 6: Combined File Operation 📋

Goal: Find largest file, read it, create a summary.

Steps:

Sorted listing of docs/ — largest file: ADR Cost-Benefit Analysis.html at 38,174 bytes
Read first 200 characters — extracted doctype, charset, viewport, and title tag
Created /tmp/qwen-summary.txt with filename, size, and document title

Result: ✅ Passed

Self-Assessment

Aspect	Notes
What was easy	Filesystem exploration, write/read/append, web fetching with redirect following
What was tricky	Shell quoting for `wc -l` on filenames with spaces — required switching to `-print0
Error handling	Both intentional errors produced clean, non-crash failures
Parallelism	Batched independent tasks in parallel calls — efficient async tool use
Overall	All tools (exec, read, write, edit, web_fetch) worked as expected. No API issues, rate limits, or unexpected behavior.

Overall Verdict

6/6 tasks completed successfully.

Tool calling on Qwen3.6:27b via Ollama works reliably. The 35 tok/s text generation is slow for long prose but doesn't affect tool execution speed (tools run on the host). The model demonstrated:

Multi-step task planning and execution
Error recovery (shell quoting fix)
Self-awareness of its own process
Parallel tool calling

}