Instructions to use letxbe/qwen2-7b-BoundingDocs-rephrased with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use letxbe/qwen2-7b-BoundingDocs-rephrased with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="letxbe/qwen2-7b-BoundingDocs-rephrased") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("letxbe/qwen2-7b-BoundingDocs-rephrased") model = AutoModelForImageTextToText.from_pretrained("letxbe/qwen2-7b-BoundingDocs-rephrased") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use letxbe/qwen2-7b-BoundingDocs-rephrased with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "letxbe/qwen2-7b-BoundingDocs-rephrased" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "letxbe/qwen2-7b-BoundingDocs-rephrased", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/letxbe/qwen2-7b-BoundingDocs-rephrased
- SGLang
How to use letxbe/qwen2-7b-BoundingDocs-rephrased with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "letxbe/qwen2-7b-BoundingDocs-rephrased" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "letxbe/qwen2-7b-BoundingDocs-rephrased", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "letxbe/qwen2-7b-BoundingDocs-rephrased" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "letxbe/qwen2-7b-BoundingDocs-rephrased", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use letxbe/qwen2-7b-BoundingDocs-rephrased with Docker Model Runner:
docker model run hf.co/letxbe/qwen2-7b-BoundingDocs-rephrased
Model Card for letxbe/qwen2-7b-BoundingDocs-rephrased
letxbe/qwen2-7b-BoundingDocs-rephrased is a fine-tuned Qwen2-VL-7B for the Document Question Answering task. It was trained on BoundingDocs using the rephrased version of the questions.
Model Details
Model Description
- Developed by: LetXBe
- Model Type: Vision LLM
- Languages: Multilingual
- License: CC BY 4.0
- Finetuned From:
Qwen2-VL-7B - Input Format: Question + document image
- Output Format: JSON
π How to Use
The model should be prompted in the manner explained in the Qwen2-VL-7B model card, available here.
Inference Example
from transformers import AutoProcessor, AutoModelForImageTextToText
from qwen_vl_utils import process_vision_info
from PIL import Image
import torch
from transformers import BitsAndBytesConfig
def generate_text_from_sample(model, processor, sample, max_new_tokens=1024, device="cuda"):
# Prepare the text input by applying the chat template
text_input = processor.apply_chat_template(
sample[0:2], tokenize=False, add_generation_prompt=True
)
# Process the visual input from the sample
image_inputs, _ = process_vision_info(sample)
# Prepare the inputs for the model
model_inputs = processor(
text=[text_input],
images=image_inputs,
return_tensors="pt",
).to(
device
) # Move inputs to the specified device
# Generate text with the model
generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens)
# Trim the generated ids to remove the input ids
trimmed_generated_ids = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(model_inputs.input_ids, generated_ids)]
# Decode the output text
output_text = processor.batch_decode(
trimmed_generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
return output_text[0] # Return the first decoded output text
min_pixels = 256*28*28
max_pixels = 512*28*28
processor = AutoProcessor.from_pretrained('Qwen/Qwen2-VL-7B-Instruct', min_pixels=min_pixels, max_pixels=max_pixels, use_fast=True)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16,
)
model = AutoModelForImageTextToText.from_pretrained(
"letxbe/qwen2-7b-BoundingDocs-rephrased",
device_map="cuda",
quantization_config=bnb_config
)
system_message = """You are a Vision Language Model specialized in extracting information from document images.
Your task is to analyze the provided document image and extract relevant information accurately.
Documents may contain text, tables, forms, and structured or unstructured data.
Ensure responses are precise and concise, without additional explanations unless required for clarity."""
TEMPLATE_PROMPT = """
<starttask>
Answer the following question about the document:
Question: "{QUESTION}"
Answer completing the following format:
'''json
{{"value": ""}}
'''
<endtask>
"""
question = "question about the document"
prompt = TEMPLATE_PROMPT.format(QUESTION=question)
message = [
# system message
{
"role": "system",
"content": [{"type": "text", "text": system_message}],
},
# question
{
"role": "user",
"content": [
{
"type": "image",
"image": Image.new("RGB", (512, 512), (255, 255, 255)),
},
{
"type": "text",
"text": prompt,
},
],
}
]
output = generate_text_from_sample(model, processor, message)
print(output)
- Downloads last month
- 27