Instructions to use EQUES/MedLLama3-JP-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EQUES/MedLLama3-JP-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EQUES/MedLLama3-JP-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EQUES/MedLLama3-JP-v2")
model = AutoModelForCausalLM.from_pretrained("EQUES/MedLLama3-JP-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use EQUES/MedLLama3-JP-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EQUES/MedLLama3-JP-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EQUES/MedLLama3-JP-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EQUES/MedLLama3-JP-v2

SGLang

How to use EQUES/MedLLama3-JP-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EQUES/MedLLama3-JP-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EQUES/MedLLama3-JP-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EQUES/MedLLama3-JP-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EQUES/MedLLama3-JP-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use EQUES/MedLLama3-JP-v2 with Docker Model Runner:
```
docker model run hf.co/EQUES/MedLLama3-JP-v2
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Llama3ベースの日本語医療LLM MedLlama3-JP

このモデルはLlama3の継続学習により作成された４種類のLLMから成るマージモデルです。日本語LLMをベースとし、英語の医療LLMをマージすることで日本語での医療知識や医療Q&Aへの回答力を獲得することを目的としています。

医療目的には利用しないでください。
本モデルの出力に関してその正確性等を保証しません。

評価

IgakuQA（日本医師国家試験データセット）の正答率を用いて評価しました。
評価実験の設定はIgakuQAに従いました。
GPTモデルのパフォーマンスはKasai et al., 2023の結果に基づき算出しました。

モデル	正答率（全2485問）
EQUES/MedLLama3-JP-v2	46.6%
tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1	42.2%
elyza/Llama-3-ELYZA-JP-8B	43.9%
----	----
GPT-4	78.2%
ChatGPT	54.9%
GPT-3	42.1%

また、セクションごとの正答率をプロットした図が以下です。マージモデルがマージ元のモデルの良い所取りをしている傾向が窺えます。

Usage

pip install transformers vllm

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "EQUES/MedLLama3-JP-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(
    model=model_name,
    tensor_parallel_size=1,
)

sampling_params = SamplingParams(
    temperature=0.6, top_p=0.9, max_tokens=512, stop="<|eot_id|>"
)

message = [
    {"role": "system", "content": <question>},
    {"role": "user","content": <answer>},
    {"role": "system", "content": <question>},
]

prompt = tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)
output = llm.generate(prompt, sampling_params)
print(output[0].outputs[0].text)

Bias, Risks, and Limitations

The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.

Acknowledgement

We acknowledge the developers of each base model.

How to Cite

@misc{MedLLama3-JP-v2,
      title={EQUES/MedLLama3-JP-v2},
      url={https://huggingface.co/EQUES/MedLLama3-JP-v2},
      author={Issey Sukeda},
      year={2024},
}

Downloads last month: 10

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for EQUES/MedLLama3-JP-v2

Quantizations

1 model

Paper for EQUES/MedLLama3-JP-v2

Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations

Paper • 2303.18027 • Published Mar 31, 2023