Instructions to use letxbe/qwen2-7b-BoundingDocs-rephrased with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use letxbe/qwen2-7b-BoundingDocs-rephrased with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="letxbe/qwen2-7b-BoundingDocs-rephrased")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("letxbe/qwen2-7b-BoundingDocs-rephrased")
model = AutoModelForImageTextToText.from_pretrained("letxbe/qwen2-7b-BoundingDocs-rephrased")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use letxbe/qwen2-7b-BoundingDocs-rephrased with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "letxbe/qwen2-7b-BoundingDocs-rephrased"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "letxbe/qwen2-7b-BoundingDocs-rephrased",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/letxbe/qwen2-7b-BoundingDocs-rephrased

SGLang

How to use letxbe/qwen2-7b-BoundingDocs-rephrased with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "letxbe/qwen2-7b-BoundingDocs-rephrased" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "letxbe/qwen2-7b-BoundingDocs-rephrased",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "letxbe/qwen2-7b-BoundingDocs-rephrased" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "letxbe/qwen2-7b-BoundingDocs-rephrased",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use letxbe/qwen2-7b-BoundingDocs-rephrased with Docker Model Runner:
```
docker model run hf.co/letxbe/qwen2-7b-BoundingDocs-rephrased
```

Model Card for letxbe/qwen2-7b-BoundingDocs-rephrased

letxbe/qwen2-7b-BoundingDocs-rephrased is a fine-tuned Qwen2-VL-7B for the Document Question Answering task. It was trained on BoundingDocs using the rephrased version of the questions.

Model Details

Model Description

Developed by: LetXBe
Model Type: Vision LLM
Languages: Multilingual
License: CC BY 4.0
Finetuned From: Qwen2-VL-7B
Input Format: Question + document image
Output Format: JSON

🚀 How to Use

The model should be prompted in the manner explained in the Qwen2-VL-7B model card, available here.

Inference Example

from transformers import AutoProcessor, AutoModelForImageTextToText
from qwen_vl_utils import process_vision_info
from PIL import Image
import torch
from transformers import BitsAndBytesConfig


def generate_text_from_sample(model, processor, sample, max_new_tokens=1024, device="cuda"):
    # Prepare the text input by applying the chat template
    text_input = processor.apply_chat_template(
        sample[0:2], tokenize=False, add_generation_prompt=True
    )

    # Process the visual input from the sample
    image_inputs, _ = process_vision_info(sample)

    # Prepare the inputs for the model
    model_inputs = processor(
        text=[text_input],
        images=image_inputs,
        return_tensors="pt",
    ).to(
        device
    )  # Move inputs to the specified device

    # Generate text with the model
    generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens)

    # Trim the generated ids to remove the input ids
    trimmed_generated_ids = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(model_inputs.input_ids, generated_ids)]

    # Decode the output text
    output_text = processor.batch_decode(
        trimmed_generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )

    return output_text[0]  # Return the first decoded output text


min_pixels = 256*28*28
max_pixels = 512*28*28
processor = AutoProcessor.from_pretrained('Qwen/Qwen2-VL-7B-Instruct', min_pixels=min_pixels, max_pixels=max_pixels, use_fast=True)


bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForImageTextToText.from_pretrained(
    "letxbe/qwen2-7b-BoundingDocs-rephrased",
    device_map="cuda",
    quantization_config=bnb_config
)

system_message = """You are a Vision Language Model specialized in extracting information from document images.  
Your task is to analyze the provided document image and extract relevant information accurately.  
Documents may contain text, tables, forms, and structured or unstructured data.  
Ensure responses are precise and concise, without additional explanations unless required for clarity."""  

TEMPLATE_PROMPT = """
<starttask>
Answer the following question about the document:
Question: "{QUESTION}"
Answer completing the following format:
'''json
{{"value": ""}}
'''
<endtask>
"""

question = "question about the document"

prompt = TEMPLATE_PROMPT.format(QUESTION=question)
  
message = [
            # system message
            {
                "role": "system",
                "content": [{"type": "text", "text": system_message}],
            },
            # question
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "image": Image.new("RGB", (512, 512), (255, 255, 255)),
                    },
                    {
                        "type": "text",
                        "text": prompt,
                    },
                ],
            }
]

output = generate_text_from_sample(model, processor, message)

print(output)

Downloads last month: 27

Safetensors

Model size

8B params

Tensor type

BF16