Testing over the last days
Thank you for the Xiaomi team for releasing this model.
I tested it excessive over the last days and overall it gather context very breadth and depth.
The tool calling is mostly robust, but sometimes making up tool calls not in the tool registry:
<function=editor>
Maybe this model is RL-post trained on such a tool call. Is this somewhere documented?
Also the chain-of-tool calling is great and follows instruction very well.
Overall, congrats for that great model.
Additionally, the model has difficulties with string replacements for edit file tool calls. It gets very confused with whitespace characters and can't reliably match the old string to replace with the new string.
The question is, what matching method was trained. The old_str -> new-str or with the unified diff method?
Thank you
Since today, the LLM isn't able to call the bash tool input arguments correctly. Days before it worked as expected. (OpenRouter). Are you replacing the models frequently?
Additionally, function calling leaks into thinking tokens.