tomasonjo/text2cypher-gpt4o-clean
Viewer • Updated • 7.74k • 104 • 20
How to use tomasonjo/text2cypher-demo-16bit with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="tomasonjo/text2cypher-demo-16bit")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tomasonjo/text2cypher-demo-16bit")
model = AutoModelForCausalLM.from_pretrained("tomasonjo/text2cypher-demo-16bit")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use tomasonjo/text2cypher-demo-16bit with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tomasonjo/text2cypher-demo-16bit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "tomasonjo/text2cypher-demo-16bit",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/tomasonjo/text2cypher-demo-16bit
How to use tomasonjo/text2cypher-demo-16bit with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "tomasonjo/text2cypher-demo-16bit" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "tomasonjo/text2cypher-demo-16bit",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "tomasonjo/text2cypher-demo-16bit" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "tomasonjo/text2cypher-demo-16bit",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use tomasonjo/text2cypher-demo-16bit with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tomasonjo/text2cypher-demo-16bit to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tomasonjo/text2cypher-demo-16bit to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for tomasonjo/text2cypher-demo-16bit to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="tomasonjo/text2cypher-demo-16bit",
max_seq_length=2048,
)How to use tomasonjo/text2cypher-demo-16bit with Docker Model Runner:
docker model run hf.co/tomasonjo/text2cypher-demo-16bit
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
For more information visit this link
Install dependencies. Check Unsloth documentation for specific installation for other environments.
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes
Then you can load the model and use it as inference
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
tokenizer,
chat_template = "llama-3",
map_eos_token = True,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
schema = """Node properties: - **Question** - `favorites`: INTEGER Example: "0" - `answered`: BOOLEAN - `text`: STRING Example: "### This is: Bug ### Specifications OS: Win10" - `link`: STRING Example: "https://stackoverflow.com/questions/62224586/playg" - `createdAt`: DATE_TIME Min: 2020-06-05T16:57:19Z, Max: 2020-06-05T21:49:16Z - `title`: STRING Example: "Playground is not loading with apollo-server-lambd" - `id`: INTEGER Min: 62220505, Max: 62224586 - `upVotes`: INTEGER Example: "0" - `score`: INTEGER Example: "-1" - `downVotes`: INTEGER Example: "1" - **Tag** - `name`: STRING Example: "aws-lambda" - **User** - `image`: STRING Example: "https://lh3.googleusercontent.com/-NcFYSuXU0nk/AAA" - `link`: STRING Example: "https://stackoverflow.com/users/10251021/alexandre" - `id`: INTEGER Min: 751, Max: 13681006 - `reputation`: INTEGER Min: 1, Max: 420137 - `display_name`: STRING Example: "Alexandre Le" Relationship properties: The relationships: (:Question)-[:TAGGED]->(:Tag) (:User)-[:ASKED]->(:Question)"""
question = "Identify the top 5 questions with the most downVotes."
messages = [
{"role": "system", "content": "Given an input question, convert it to a Cypher query. No pre-amble."},
{"role": "user", "content": f"""Based on the Neo4j graph schema below, write a Cypher query that would answer the user's question:
{schema}
Question: {question}
Cypher query:"""}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
outputs = model.generate(input_ids = inputs, max_new_tokens = 128, use_cache = True)
tokenizer.batch_decode(outputs)