Instructions to use KORMo-Team/KORMo-10B-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use KORMo-Team/KORMo-10B-sft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="KORMo-Team/KORMo-10B-sft", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("KORMo-Team/KORMo-10B-sft", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use KORMo-Team/KORMo-10B-sft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "KORMo-Team/KORMo-10B-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KORMo-Team/KORMo-10B-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/KORMo-Team/KORMo-10B-sft

SGLang

How to use KORMo-Team/KORMo-10B-sft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "KORMo-Team/KORMo-10B-sft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KORMo-Team/KORMo-10B-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "KORMo-Team/KORMo-10B-sft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KORMo-Team/KORMo-10B-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use KORMo-Team/KORMo-10B-sft with Docker Model Runner:
```
docker model run hf.co/KORMo-Team/KORMo-10B-sft
```

Resolving inference compatibility issues in the Kormo model’s Transformer 5.2

by jungsin3 - opened Mar 18

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+30

-7

jungsin3

Mar 18

In the case of RotaryEmbedding, the inv_freq value is calculated in the init and reused.
In Transformers 5.2, the model is loaded using the meta device, so this calculation does not take place. Consequently, in 5.2, logic was added to the _init_weights function to restore inv_freq via an else statement. In the case of KORMo, as it uses a custom _init_weights function, this logic was not applied, resulting in the issue where the RoPE value was not used during inference.
The following changes have been made to the code:

Added logic to restore inv_freq in _init_weights to KORMoPreTrainedModel.
Added the copy_ function used in _init_weights to the top of the file.
We resolved an issue where the original_inv_freq key value was not registered in _buffer by cloning the self.inv_freq value, which previously returned None because it was not calculated. (RotaryEmbedding)
We added the compute_default_rope_parameters function, which was missing in version 5.2. (RotaryEmbedding)

Compatible with both version 4.57.1 and version 5.2.
Thank you.

Resolving inference compatibility issues in the Kormo model’s Transformer 5.281dfd6b7

mjkmain

KORMo org Mar 18

Great work! Thank you for contributing to our KORMo repository.

mjkmain

KORMo org Mar 18

LGTM

mjkmain changed pull request status to merged Mar 18

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment