Testing over the last days

#23

by PriNova - opened 4 days ago

4 days ago

Thank you for the Xiaomi team for releasing this model.

I tested it excessive over the last days and overall it gather context very breadth and depth.
The tool calling is mostly robust, but sometimes making up tool calls not in the tool registry:

<function=editor>

Maybe this model is RL-post trained on such a tool call. Is this somewhere documented?

Also the chain-of-tool calling is great and follows instruction very well.

Overall, congrats for that great model.

PriNova

3 days ago

•

edited 1 day ago

Additionally, the model has difficulties with string replacements for edit file tool calls. It gets very confused with whitespace characters and can't reliably match the old string to replace with the new string.
The question is, what matching method was trained. The old_str -> new-str or with the unified diff method?

Thank you

PriNova

about 21 hours ago

•

edited about 21 hours ago

Since today, the LLM isn't able to call the bash tool input arguments correctly. Days before it worked as expected. (OpenRouter). Are you replacing the models frequently?
Additionally, function calling leaks into thinking tokens.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment