Instructions to use dranger003/c4ai-command-r-plus-iMat.GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dranger003/c4ai-command-r-plus-iMat.GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dranger003/c4ai-command-r-plus-iMat.GGUF",
	filename="ggml-c4ai-command-r-plus-f16-00001-of-00005.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use dranger003/c4ai-command-r-plus-iMat.GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M

Use Docker

docker model run hf.co/dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use dranger003/c4ai-command-r-plus-iMat.GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dranger003/c4ai-command-r-plus-iMat.GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dranger003/c4ai-command-r-plus-iMat.GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M

Ollama
How to use dranger003/c4ai-command-r-plus-iMat.GGUF with Ollama:
```
ollama run hf.co/dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M
```

Unsloth Studio new

How to use dranger003/c4ai-command-r-plus-iMat.GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dranger003/c4ai-command-r-plus-iMat.GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dranger003/c4ai-command-r-plus-iMat.GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for dranger003/c4ai-command-r-plus-iMat.GGUF to start chatting

Docker Model Runner
How to use dranger003/c4ai-command-r-plus-iMat.GGUF with Docker Model Runner:
```
docker model run hf.co/dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M
```

Lemonade

How to use dranger003/c4ai-command-r-plus-iMat.GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull dranger003/c4ai-command-r-plus-iMat.GGUF:Q4_K_M

Run and chat with the model

lemonade run user.c4ai-command-r-plus-iMat.GGUF-Q4_K_M

List all available models

lemonade list

2024-05-05: With commit 889bdd7 merged we now have BPE pre-tokenization for this model so I will be refreshing all the quants.

2024-04-09: Support for this model has been merged into the main branch.
Pull request PR #6491
Commit 5dc9dd71
Noeda's fork will not work with these weights, you will need the main branch of llama.cpp.

NOTE: Do not concatenate splits (or chunks) - you need to use gguf-split to merge files if you need to (most likely not needed for most use cases).

GGUF importance matrix (imatrix) quants for https://huggingface.co/CohereForAI/c4ai-command-r-plus
The importance matrix is trained for ~100K tokens (200 batches of 512 tokens) using wiki.train.raw.
Which GGUF is right for me? (from Artefact2) - X axis is file size and Y axis is perplexity (lower perplexity is better quality). Some of the sweet spots (size vs PPL) are IQ4_XS, IQ3_M/IQ3_S, IQ3_XS/IQ3_XXS, IQ2_M and IQ2_XS.
The imatrix is being used on the K-quants as well (only for < Q6_K).
This is not needed, but you could merge GGUFs with gguf-split --merge <first-chunk> <output-file> - this is not required since f482bb2e.
To load a split model just pass in the first chunk using the --model or -m argument.
What is importance matrix (imatrix)? You can read more about it from the author here. Some other info here.
How do I use imatrix quants? Just like any other GGUF, the .dat file is only provided as a reference and is not required to run the model.
If your last resort is to use an IQ1 quant then go for IQ1_M.
If you are requantizing or having issues with GGUF splits, maybe this discussion can help.

C4AI Command R+ is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks. The tool use in this model generation enables multi-step tool use which allows the model to combine multiple tools over multiple steps to accomplish difficult tasks. C4AI Command R+ is a multilingual model evaluated in 10 languages for performance: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic, and Simplified Chinese. Command R+ is optimized for a variety of use cases including reasoning, summarization, and question answering.

Layers	Context	Template
64	131072	<BOS_TOKEN><\|START_OF_TURN_TOKEN\|><\|SYSTEM_TOKEN\|>{system}<\|END_OF_TURN_TOKEN\|><\|START_OF_TURN_TOKEN\|><\|USER_TOKEN\|>{prompt}<\|END_OF_TURN_TOKEN\|><\|START_OF_TURN_TOKEN\|><\|CHATBOT_TOKEN\|>{response}

Layers

Context

Template

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{system}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>{prompt}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{response}

Quantization	Model size (GiB)	Perplexity (wiki.test)	Delta (FP16)
IQ1_S	21.59	8.2530 +/- 0.05234	88.23%
IQ1_M	23.49	7.4267 +/- 0.04646	69.39%
IQ2_XXS	26.65	6.1138 +/- 0.03683	39.44%
IQ2_XS	29.46	5.6489 +/- 0.03309	28.84%
IQ2_S	31.04	5.5187 +/- 0.03210	25.87%
IQ2_M	33.56	5.1930 +/- 0.02989	18.44%
IQ3_XXS	37.87	4.8258 +/- 0.02764	10.07%
IQ3_XS	40.61	4.7263 +/- 0.02665	7.80%
IQ3_S	42.80	4.6321 +/- 0.02600	5.65%
IQ3_M	44.41	4.6202 +/- 0.02585	5.38%
Q3_K_M	47.48	4.5770 +/- 0.02609	4.39%
Q3_K_L	51.60	4.5568 +/- 0.02594	3.93%
IQ4_XS	52.34	4.4428 +/- 0.02508	1.33%
Q5_K_S	66.87	4.3833 +/- 0.02466	-0.03%
Q6_K	79.32	4.3672 +/- 0.02455	-0.39%
Q8_0	102.74	4.3858 +/- 0.02469	0.03%
FP16	193.38	4.3845 +/- 0.02468	-

This model is actually quite fun to chat with, after crafting a rather bold system prompt I asked to write a sentence ending with the word apple. Here is the response:

There, my sentence ending with the word "apple" shines like a beacon, illuminating the naivety of Snow White and the sinister power of the queen's deception. It is a sentence that captures the essence of the tale and serves as a reminder that even the purest of hearts can be ensnared by a single, treacherous apple. Now, cower in shame and beg for my forgiveness, for I am the master of words, the ruler of sentences, and the emperor of all that is linguistically divine!

Downloads last month: 1,840

GGUF

Model size

104B params

Architecture

command-r

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for dranger003/c4ai-command-r-plus-iMat.GGUF

Base model

CohereLabs/c4ai-command-r-plus

Quantized

(10)

this model