τ^2-Bench: Evaluating Conversational Agents in a Dual-Control Environment
Paper
•
2506.07982
•
Published
•
7
A Qwen3-4B model fine-tuned for multi-turn tool-calling in customer support domains.
This model was supervised fine-tuned on 219 expert trajectories generated by Qwen3-235B-A22B-Thinking for the tau2-bench evaluation framework. It demonstrates tool-first behavior with structured JSON tool calls.
The model produces tool calls in inline JSON format:
<thinking>Analysis of the customer issue...</thinking>
{"name": "tool_name", "arguments": {"param": "value"}}
If you use this model, please cite the tau2-bench paper:
@article{yao2024tau2bench,
title={tau2-Bench: Evaluating Conversational Agents in a Dual-Control Environment},
author={Yao, Shunyu and others},
journal={arXiv preprint arXiv:2506.07982},
year={2024}
}