Instructions to use tngtech/DeepSeek-TNG-R1T2-Chimera with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tngtech/DeepSeek-TNG-R1T2-Chimera with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tngtech/DeepSeek-TNG-R1T2-Chimera", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tngtech/DeepSeek-TNG-R1T2-Chimera", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("tngtech/DeepSeek-TNG-R1T2-Chimera", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use tngtech/DeepSeek-TNG-R1T2-Chimera with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tngtech/DeepSeek-TNG-R1T2-Chimera"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tngtech/DeepSeek-TNG-R1T2-Chimera",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tngtech/DeepSeek-TNG-R1T2-Chimera

SGLang

How to use tngtech/DeepSeek-TNG-R1T2-Chimera with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tngtech/DeepSeek-TNG-R1T2-Chimera" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tngtech/DeepSeek-TNG-R1T2-Chimera",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tngtech/DeepSeek-TNG-R1T2-Chimera" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tngtech/DeepSeek-TNG-R1T2-Chimera",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use tngtech/DeepSeek-TNG-R1T2-Chimera with Docker Model Runner:
```
docker model run hf.co/tngtech/DeepSeek-TNG-R1T2-Chimera
```

Openrouter Reasoning? (+ Questions about prompting)

#12

by cinnamoo0 - opened Jul 8, 2025

Discussion

cinnamoo0

Jul 8, 2025

Hi hi. I've been trying out the new model through Openrouter. I assume they still disable thinking by default, but I was wondering if there's a prompt to enable it? I use JanitorAI for reference.

I was also wondering if the custom prompt I use works well for R1T2, or if I should look for another, https://rentry.co/molekprompt#೯-𖥻-moleks-base-prompt-version-07-ᰋ

Currently, I still struggle with feelings that the LLM just isn't... reading my prompts? It has been calling my pink-haired persona's hair silver. Just wondering if there was a general ''fix'' for any of this.

TNGHK

TNG Technology Consulting GmbH org Jul 8, 2025

•

edited Jul 8, 2025

Greetings,
thanks for your questions.

A) On OpenRouter, reasoning is enabled for R1T2, as you can see by looking at the graph at:
https://openrouter.ai/tngtech/deepseek-r1t2-chimera:free/activity
For example, it is now about 6 hours after the model became live on OR, and it has 144M input tokens, 7.31M reasoning tokens and 5.48M completion tokens.

B) Regarding custom RP-prompts: We have no experience in that area. If the original R1T Chimera was working for you in that respect, maybe it is worth sticking with R1T? Or try some slight prompt variations?

C) In case you are using the OpenRouter chat, it has a generic bug when used with reasoning models such as R1T2, R1-0528, Microsoft R1 or Qwen3 235B A22B: If you run a long reasoning query and stop/interrupt it while reasoning, and then ask a next question, the previous question will be restarted, not the next question answered. That can create the true impression of the reasoning LLM not reading the last prompt. But that is not the LLM's fault. Also, this should not appear when using a different chat client, of course.

D) We did design / optimize R1T2 to be good in topics like mathematics and coding, big thanks to the DeepSeek parent models. But we also tried to create R1T2 to have a creative, very funny personality. At least from a nerd's perspective, its programming and mathematical jokes can be hilarious. This natural overflowing creativity of the model may interfere with RP behaviour, but at this moment I would not know how to quantify this.

I hope this helps.

cinnamoo0

Jul 8, 2025

Thank you for the response! <3
I know this is a bit far-reached, but will TNG ever make a model specifically for roleplays?

TNGHK

TNG Technology Consulting GmbH org Jul 9, 2025

Hello,
I guess that is unlikely at the moment. Almost all of us are software developers, which makes coder models and general purpose, business capable models most interesting for us.
Cheers!

TNGHK changed discussion status to closed Jul 9, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment