Tool Use

Tool use is documented as a core capability, not an add-on

The original Qwen README gives tool use, ReAct prompting, system prompts, and code interpreter their own public-facing sections and benchmark tables.

System promptReAct promptingCode interpreter

System prompt note

The upstream README says Qwen-1.8B-Chat and Qwen-72B-Chat were trained on more diverse system prompts and multi-round interactions.

That claim is tied to customization, role playing, style transfer, task setting, and behavior setting in context.

Chinese tool-use benchmark

ModelTool selectionTool inputFalse positive error
GPT-498.0%0.95323.9%
GPT-3.574.5%0.80780.6%
Qwen-1.8B-Chat85.0%0.83927.6%
Qwen-7B-Chat95.5%0.90011.6%
Qwen-14B-Chat96.9%0.9175.6%
Qwen-72B-Chat98.2%0.9271.1%

Chinese tool-use benchmark version 20231206, as reported in the upstream README.

Code interpreter benchmark

ModelMathVisualization hardVisualization easyExecutable rate
GPT-482.866.760.882.8
GPT-3.547.333.355.774.1
Qwen-1.8B-Chat25.621.422.865.5
Qwen-7B-Chat41.923.838.067.2
Qwen-14B-Chat58.431.045.665.5
Qwen-72B-Chat72.741.743.082.8

Code Interpreter benchmark version 20231206, mirrored from the upstream README.

Implementation framing

The upstream docs point readers to a ReAct prompting example for implementing tool calls and to `openai_api.py` for function calling support.

For deeper agent workflows and the benchmark assets behind code-interpreter evaluation, the README points to Qwen-Agent.

Source anchors

Tool Use and System Prompt | Qwen Code