pierreguillou/DocLayNet-small
Viewer • Updated • 804 • 499 • 13
How to use Mit1208/Florence-2-DocLayNet with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="Mit1208/Florence-2-DocLayNet", trust_remote_code=True) # Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
processor = AutoProcessor.from_pretrained("Mit1208/Florence-2-DocLayNet", trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained("Mit1208/Florence-2-DocLayNet", trust_remote_code=True)How to use Mit1208/Florence-2-DocLayNet with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Mit1208/Florence-2-DocLayNet"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Mit1208/Florence-2-DocLayNet",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/Mit1208/Florence-2-DocLayNet
How to use Mit1208/Florence-2-DocLayNet with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "Mit1208/Florence-2-DocLayNet" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Mit1208/Florence-2-DocLayNet",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Mit1208/Florence-2-DocLayNet" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Mit1208/Florence-2-DocLayNet",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use Mit1208/Florence-2-DocLayNet with Docker Model Runner:
docker model run hf.co/Mit1208/Florence-2-DocLayNet
# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
processor = AutoProcessor.from_pretrained("Mit1208/Florence-2-DocLayNet", trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained("Mit1208/Florence-2-DocLayNet", trust_remote_code=True)This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
Use the code below to get started with the model.
[More Information Needed]
[More Information Needed]
!pip install -qU transformers
!pip install -qU accelerate bitsandbytes einops flash_attn timm
!pip install -q datasets
from PIL import Image
import requests
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq, BitsAndBytesConfig, TrainingArguments, AutoModelForCausalLM
import requests
import re
from transformers import AutoConfig, AutoProcessor, AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-base-ft", trust_remote_code=True,)
processor = AutoProcessor.from_pretrained("microsoft/Florence-2-base-ft", trust_remote_code=True,)
model = AutoModelForCausalLM.from_pretrained("Mit1208/Florence-2-DocLayNet", trust_remote_code=True, config = base_model.config)
def run_example(task_prompt, image, text_input=None):
if text_input is None:
prompt = task_prompt
else:
prompt = task_prompt + text_input
print(prompt)
inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
early_stopping=False,
do_sample=False,
num_beams=3,
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
print(generated_text)
parsed_answer = processor.post_process_generation(
generated_text,
task=task_prompt,
image_size=(image.width, image.height)
)
return parsed_answer
from PIL import Image
import requests
image = Image.open('form-1.png').convert('RGB')
task_prompt = '<OD>'
results = run_example(task_prompt, example['image'].resize(size=(1000, 1000)))
print(results)
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Mit1208/Florence-2-DocLayNet", trust_remote_code=True)