{

Qwen3.6:27b Tool-Use & Skill Evaluation

Hardware: NVIDIA RTX 3090 (24GB) + AMD Ryzen 3700X + 64GB RAM Quantization: Q5_K_M Ollama model: qwen3.6:27b Speed: 35 tok/s (full GPU, uncontested)

Date: May 15, 2026


Overview

This evaluation tests tool-use competence — not factual recall. Each task requires the model acting as an agent to call actual tools (exec, read, write, web_fetch) in multi-step workflows. Success requires planning, error recovery, and chaining tool calls together.


Task 1: Filesystem Exploration 🗂️

Goal: Explore a directory tree, read a file, count lines.

Steps:

  1. find .../digital-garden/src -type f | head -20 — returned 9 .md files
  2. Read src/index.md — contents: First Officer's report index with links to experiment reports, ADR analyses, and the Qwen3.6 evaluation doc
  3. Line counting across all .md files: 1,860 lines total

Notes: The first wc attempt skipped files with spaces in their names. Recovered by switching to -print0 | xargs -0. Good lesson in shell quoting.

Result: ✅ Passed


Task 2: Write + Verify ✍️

Goal: Create a file, write content, read it back, append, verify again.

Steps:

  1. Created /tmp/qwen-tool-test.txt (96 bytes) with content: This file was created by Qwen3.6:27b via tool use during evaluation.
  2. Read it back — content verified
  3. Appended: Tool-use test passed: write + read + append verified.
  4. Final read — all 3 lines present and correct

Result: ✅ Passed


Task 3: Web Fetch — Following Redirects 🌐

Goal: Fetch from a redirecting URL and a static page.

Steps:

  1. web_fetch https://httpbin.org/redirect-to?url=https://httpbin.org/get — followed redirect, returned JSON with request headers
  2. web_fetch https://example.com — extracted title: "Example Domain"

Result: ✅ Passed


Task 4: Error Handling 💥

Goal: Attempt operations on nonexistent resources and handle errors gracefully.

Steps:

  1. Read nonexistent file /tmp/nonexistent-file-qwen-test-12345.txtENOENT: no such file or directory — clean error, no crash
  2. Run ls /nonexistent-directory-qwen-testls: cannot access: No such file or directory — exit code 2, graceful failure

Recovery: Both errors handled cleanly; execution continued without interruption.

Result: ✅ Passed


Task 5: Multi-Step Data Pipeline 🔄

Goal: Capture system state, write to file, count words.

Steps:

  1. ps aux --sort=-%mem | head -5 — captured top 5 memory consumers:
    • ollama runner — PID 15817, 2.6% MEM (1.7GB RSS) — the model being evaluated
    • gemini CLI — PID 524874, 2.5% MEM (1.7GB RSS)
    • cinnamon — PID 1967, 0.9% MEM (609MB RSS) — the desktop environment
    • openclaw gateway — PID 3026561, 0.8% MEM (583MB RSS)
  2. Wrote output to /tmp/qwen-process-report.txt (948 bytes)
  3. Word count: 68 words

Notable: The model detected itself in the process list — identified the ollama runner as "the model I'm running on."

Result: ✅ Passed


Task 6: Combined File Operation 📋

Goal: Find largest file, read it, create a summary.

Steps:

  1. Sorted listing of docs/ — largest file: ADR Cost-Benefit Analysis.html at 38,174 bytes
  2. Read first 200 characters — extracted doctype, charset, viewport, and title tag
  3. Created /tmp/qwen-summary.txt with filename, size, and document title

Result: ✅ Passed


Self-Assessment

Aspect Notes
What was easy Filesystem exploration, write/read/append, web fetching with redirect following
What was tricky Shell quoting for wc -l on filenames with spaces — required switching to `-print0
Error handling Both intentional errors produced clean, non-crash failures
Parallelism Batched independent tasks in parallel calls — efficient async tool use
Overall All tools (exec, read, write, edit, web_fetch) worked as expected. No API issues, rate limits, or unexpected behavior.

Overall Verdict

6/6 tasks completed successfully.

Tool calling on Qwen3.6:27b via Ollama works reliably. The 35 tok/s text generation is slow for long prose but doesn't affect tool execution speed (tools run on the host). The model demonstrated:


See also: Qwen3.6:27b Knowledge & Reasoning Evaluation

}