Instructions to use eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct") model = PeftModel.from_pretrained(base_model, "eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora") - llama-cpp-python
How to use eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora", filename="qwen-edgeai-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M # Run inference directly in the terminal: llama-cli -hf eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M # Run inference directly in the terminal: llama-cli -hf eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
Use Docker
docker model run hf.co/eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
- Ollama
How to use eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora with Ollama:
ollama run hf.co/eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
- Unsloth Studio new
How to use eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora to start chatting
- Pi new
How to use eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora with Docker Model Runner:
docker model run hf.co/eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
- Lemonade
How to use eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora:Q4_K_M
Run and chat with the model
lemonade run user.edgeai-docs-qwen2.5-coder-0.5b-lora-Q4_K_M
List all available models
lemonade list
edgeai-docs-embedding-qwen1.5-0.5b-instruct
A lightweight LoRA adapter fine-tuned on 1,794 Edge Impulse / Edge AI MDX documentation files from the Edge Impulse documentation, built on top of Qwen/Qwen1.5-0.5B.
Optimized for:
- answering developer questions about Edge Impulse Studio, SDKs, APIs, and tooling
- summarizing technical documentation and tutorials
- generating code snippets for edge ML workflows
- lightweight local/edge deployment with PEFT adapters
Larger variants in training: 1.5B ยท 7B (Qwen2.5-Coder base)
Model Summary
edgeai-docs-embedding-qwen1.5-0.5b-instruct is a PEFT LoRA adapter trained for documentation-focused text generation and conversational support over Edge Impulse / Edge AI knowledge.
Use cases
- Documentation Q&A for Edge Impulse developers
- Technical explanation of Studio workflows, SDK usage, and hardware deployment
- Generating sample code for API, CLI, and Python SDK integrations
- Retrieval-augmented generation (RAG) over Edge AI docs
Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen1.5-0.5B |
| Adapter type | LoRA (PEFT) |
LoRA rank (r) |
8 |
| LoRA alpha | 32 |
| Target modules | q_proj, v_proj |
| Task type | CAUSAL_LM |
| Trainable parameters | ~786K (0.17% of base) |
| Training epochs | 3 |
| Batch size | 4 (ร grad accum 2 = effective 8) |
| Learning rate | 3e-4 |
| Max sequence length | 512 tokens |
| Training hardware | Apple M1 Pro (MPS, fp16) |
| Precision | float16 |
Training Data
| Stat | Value |
|---|---|
| Source | Edge Impulse documentation |
| File format | MDX (Markdown + JSX components) |
| Total files | 1,794 .mdx files |
| Preprocessing | Stripped frontmatter, imports, JSX tags; unwrapped code fences; flattened links |
| Chunk size | 512 tokens |
Topics covered: Studio projects, datasets, data ingestion, DSP and transformation blocks, learning and processing blocks, model deployment, Python SDK, REST API, CLI tools, and edge inference.
Evaluation
QA evaluation
- Dataset: 5 fixed developer-style prompts
- Base avg keyword count: 8.2
- Adapter avg keyword count: 6.8
- Code snippet presence: 5/5 for both base and adapter
Perplexity on Edge AI samples
- Test corpus: 30 sample Edge AI documentation files
- Base mean perplexity: 11.53
- Adapter mean perplexity: 12.02
- Adapter wins: 4 / 30 documents
These metrics are from small validation samples and should be interpreted as a lightweight benchmark rather than a full production evaluation.
Tutorials
- Offline SLMs for Edge AI Development โ Part 1: Qwen LoRA Adapter Fine-Tuned on Edge Impulse Docs
- Offline SLMs for Edge AI Development โ Part 2: RAG as an Enhancement for Fine-Tuned Models with FAISS
- Offline SLMs for Edge AI Development โ Part 3: Agentic Coding with an Arduino Fine-Tuned Adapter via llama.cpp and OpenCode
Usage
Load with PEFT
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
BASE_MODEL = "Qwen/Qwen1.5-0.5B"
ADAPTER = "eoinedge/edgeai-docs-embedding-qwen1.5-0.5b-instruct"
device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL,
torch_dtype=torch.float16 if device != "cpu" else torch.float32,
device_map=device,
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()
Text generation pipeline
from transformers import pipeline
pipe = pipeline("text-generation", model="eoinedge/edgeai-docs-embedding-qwen1.5-0.5b-instruct")
print(pipe([{"role": "user", "content": "How do I use the Edge Impulse Python SDK to upload data?"}]))
Example prompts
| Task | Prompt |
|---|---|
| Concept explanation | What is a DSP block in Edge Impulse? |
| API usage | How do I use the Edge Impulse Python SDK to upload data? |
| Deployment | How do I deploy a model to an Arduino Nano 33 BLE Sense? |
| Code generation | Write Python code to collect IMU data and upload it to Edge Impulse. |
| Troubleshooting | Why is my Edge Impulse model showing high latency? |
Limitations
- Based on a 0.5B base model โ may struggle with long multi-step reasoning
- Training data covers Edge Impulse docs as of mid-2026; newer features may be missing
- May hallucinate or fabricate undocumented APIs or block behavior
- Not validated for safety-critical or production use
- Validate generated code before deploying on hardware
Related models
| Model | Base | Status |
|---|---|---|
| This model | Qwen/Qwen1.5-0.5B | โ Available |
| eoinedge/edgeai-qwen2.5coder-1.5b-lora | Qwen2.5-Coder-1.5B-Instruct | ๐ Training |
| eoinedge/edgeai-qwen2.5coder-7b-lora | Qwen2.5-Coder-7B-Instruct | ๐ Training |
| eoinedge/arduino-qwen0.5-lora | Qwen/Qwen1.5-0.5B | โ Available |
Citation
@misc{edgeai-docs-embedding-qwen1.5-0.5b-instruct,
author = {Jordan, Eoin},
title = {edgeai-docs-embedding-qwen1.5-0.5b-instruct},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/eoinedge/edgeai-docs-embedding-qwen1.5-0.5b-instruct}}
}
- Downloads last month
- 174
4-bit
Model tree for eoinedge/edgeai-docs-qwen2.5-coder-0.5b-lora
Base model
Qwen/Qwen1.5-0.5B