Instructions to use tiiuae/falcon-7b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tiiuae/falcon-7b-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tiiuae/falcon-7b-instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-7b-instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tiiuae/falcon-7b-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tiiuae/falcon-7b-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/tiiuae/falcon-7b-instruct
- SGLang
How to use tiiuae/falcon-7b-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-7b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-7b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use tiiuae/falcon-7b-instruct with Docker Model Runner:
docker model run hf.co/tiiuae/falcon-7b-instruct
No package metadata was found for bitsandbytes
I try to run falcon-7b on local
pip install -q -U bitsandbytes
import torch
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
Traceback (most recent call last):
File "...", line 4, in
quantization_config = BitsAndBytesConfig(
File "...\Programs\Python\Python310\lib\site-packages\transformers\utils\quantization_config.py", line 212, in init
self.post_init()
File "...\Programs\Python\Python310\lib\site-packages\transformers\utils\quantization_config.py", line 238, in post_init
if self.load_in_4bit and not version.parse(importlib.metadata.version("bitsandbytes")) >= version.parse(
File "...\Programs\Python\Python310\lib\importlib\metadata_init.py", line 984, in version
return distribution(distribution_name).version
File "...\Programs\Python\Python310\lib\importlib\metadata_init.py", line 957, in distribution
return Distribution.from_name(distribution_name)
File "...\Local\Programs\Python\Python310\lib\importlib\metadata_init_.py", line 548, in from_name
raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes
It seems that bitsandbytes is not working on WIN10. Does anyone know how to fix this?
I've got very much the same problem with python 3.9.
transformers package is from git+https://github.com/huggingface/transformers and says version 4.37.0.dev as a version number.
what is funny that the import statement says it loads BitsAndBytes from transformers, but then there exists a package with the same name.
Does one have to install that package additionally?
I am getting similar error .How is this resolved
hey guys is there anyone who resolved the issue?
I'm also looking at the same issue.
to fix:pip install bitsandbytes
quick fix:
pip install bitsandbytes
Hello,
I tried by installing bitsandbytes usingpip install bitsandbytes but still getting the same error
ImportError Traceback (most recent call last)
in <cell line: 1>()
----> 1 model= load_quantized_model(model_name)
3 frames
/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_bnb_4bit.py in validate_environment(self, *args, **kwargs)
64 raise ImportError("Using bitsandbytes 4-bit quantization requires Accelerate: pip install accelerate")
65 if not is_bitsandbytes_available():
---> 66 raise ImportError(
67 "Using bitsandbytes 4-bit quantization requires the latest version of bitsandbytes: pip install -U bitsandbytes"
68 )
ImportError: Using bitsandbytes 4-bit quantization requires the latest version of bitsandbytes: pip install -U bitsandbytes
What kernel? I restarted the linux kernel, and it didn't fix anything.