nvidia
/

Llama-3_1-Nemotron-Ultra-253B-v1

@@ -107,6 +107,7 @@ Llama-3.1-Nemotron-Ultra-253B-v1 is a general purpose reasoning and chat model i
 3. We recommend using greedy decoding (temperature 0\) for Reasoning OFF mode
 4. We do not recommend to add additional system prompts besides the control prompt, all instructions should be put into user query
 5. We have provided a list of prompts to use for evaluation for each benchmark where a specific template is required
 You can try this model out through the preview API, using this link: [Llama-3\_1-Nemotron-Ultra-253B-v1](https://build.nvidia.com/nvidia/llama-3\_1-nemotron-ultra-253b-v1).

 3. We recommend using greedy decoding (temperature 0\) for Reasoning OFF mode
 4. We do not recommend to add additional system prompts besides the control prompt, all instructions should be put into user query
 5. We have provided a list of prompts to use for evaluation for each benchmark where a specific template is required
+6. The model will include `<think></think>` if no reasoning was necessary in Reasoning ON model, this is expected behaviour
 You can try this model out through the preview API, using this link: [Llama-3\_1-Nemotron-Ultra-253B-v1](https://build.nvidia.com/nvidia/llama-3\_1-nemotron-ultra-253b-v1).