Instructions to use tngtech/DeepSeek-TNG-R1T2-Chimera with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tngtech/DeepSeek-TNG-R1T2-Chimera with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tngtech/DeepSeek-TNG-R1T2-Chimera", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tngtech/DeepSeek-TNG-R1T2-Chimera", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("tngtech/DeepSeek-TNG-R1T2-Chimera", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tngtech/DeepSeek-TNG-R1T2-Chimera with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tngtech/DeepSeek-TNG-R1T2-Chimera" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tngtech/DeepSeek-TNG-R1T2-Chimera", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/tngtech/DeepSeek-TNG-R1T2-Chimera
- SGLang
How to use tngtech/DeepSeek-TNG-R1T2-Chimera with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tngtech/DeepSeek-TNG-R1T2-Chimera" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tngtech/DeepSeek-TNG-R1T2-Chimera", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tngtech/DeepSeek-TNG-R1T2-Chimera" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tngtech/DeepSeek-TNG-R1T2-Chimera", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use tngtech/DeepSeek-TNG-R1T2-Chimera with Docker Model Runner:
docker model run hf.co/tngtech/DeepSeek-TNG-R1T2-Chimera
Openrouter Reasoning? (+ Questions about prompting)
Hi hi. I've been trying out the new model through Openrouter. I assume they still disable thinking by default, but I was wondering if there's a prompt to enable it? I use JanitorAI for reference.
I was also wondering if the custom prompt I use works well for R1T2, or if I should look for another, https://rentry.co/molekprompt#೯-𖥻-moleks-base-prompt-version-07-ᰋ
Currently, I still struggle with feelings that the LLM just isn't... reading my prompts? It has been calling my pink-haired persona's hair silver. Just wondering if there was a general ''fix'' for any of this.
Greetings,
thanks for your questions.
A) On OpenRouter, reasoning is enabled for R1T2, as you can see by looking at the graph at:
https://openrouter.ai/tngtech/deepseek-r1t2-chimera:free/activity
For example, it is now about 6 hours after the model became live on OR, and it has 144M input tokens, 7.31M reasoning tokens and 5.48M completion tokens.
B) Regarding custom RP-prompts: We have no experience in that area. If the original R1T Chimera was working for you in that respect, maybe it is worth sticking with R1T? Or try some slight prompt variations?
C) In case you are using the OpenRouter chat, it has a generic bug when used with reasoning models such as R1T2, R1-0528, Microsoft R1 or Qwen3 235B A22B: If you run a long reasoning query and stop/interrupt it while reasoning, and then ask a next question, the previous question will be restarted, not the next question answered. That can create the true impression of the reasoning LLM not reading the last prompt. But that is not the LLM's fault. Also, this should not appear when using a different chat client, of course.
D) We did design / optimize R1T2 to be good in topics like mathematics and coding, big thanks to the DeepSeek parent models. But we also tried to create R1T2 to have a creative, very funny personality. At least from a nerd's perspective, its programming and mathematical jokes can be hilarious. This natural overflowing creativity of the model may interfere with RP behaviour, but at this moment I would not know how to quantify this.
I hope this helps.
Thank you for the response! <3
I know this is a bit far-reached, but will TNG ever make a model specifically for roleplays?
Hello,
I guess that is unlikely at the moment. Almost all of us are software developers, which makes coder models and general purpose, business capable models most interesting for us.
Cheers!