Instructions to use distil-labs/Distil-Localdoc-Qwen3-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use distil-labs/Distil-Localdoc-Qwen3-0.6B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="distil-labs/Distil-Localdoc-Qwen3-0.6B")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("distil-labs/Distil-Localdoc-Qwen3-0.6B", dtype="auto") - llama-cpp-python
How to use distil-labs/Distil-Localdoc-Qwen3-0.6B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="distil-labs/Distil-Localdoc-Qwen3-0.6B", filename="model.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use distil-labs/Distil-Localdoc-Qwen3-0.6B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf distil-labs/Distil-Localdoc-Qwen3-0.6B # Run inference directly in the terminal: llama-cli -hf distil-labs/Distil-Localdoc-Qwen3-0.6B
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf distil-labs/Distil-Localdoc-Qwen3-0.6B # Run inference directly in the terminal: llama-cli -hf distil-labs/Distil-Localdoc-Qwen3-0.6B
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf distil-labs/Distil-Localdoc-Qwen3-0.6B # Run inference directly in the terminal: ./llama-cli -hf distil-labs/Distil-Localdoc-Qwen3-0.6B
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf distil-labs/Distil-Localdoc-Qwen3-0.6B # Run inference directly in the terminal: ./build/bin/llama-cli -hf distil-labs/Distil-Localdoc-Qwen3-0.6B
Use Docker
docker model run hf.co/distil-labs/Distil-Localdoc-Qwen3-0.6B
- LM Studio
- Jan
- vLLM
How to use distil-labs/Distil-Localdoc-Qwen3-0.6B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "distil-labs/Distil-Localdoc-Qwen3-0.6B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "distil-labs/Distil-Localdoc-Qwen3-0.6B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/distil-labs/Distil-Localdoc-Qwen3-0.6B
- SGLang
How to use distil-labs/Distil-Localdoc-Qwen3-0.6B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "distil-labs/Distil-Localdoc-Qwen3-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "distil-labs/Distil-Localdoc-Qwen3-0.6B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "distil-labs/Distil-Localdoc-Qwen3-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "distil-labs/Distil-Localdoc-Qwen3-0.6B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Ollama
How to use distil-labs/Distil-Localdoc-Qwen3-0.6B with Ollama:
ollama run hf.co/distil-labs/Distil-Localdoc-Qwen3-0.6B
- Unsloth Studio new
How to use distil-labs/Distil-Localdoc-Qwen3-0.6B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for distil-labs/Distil-Localdoc-Qwen3-0.6B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for distil-labs/Distil-Localdoc-Qwen3-0.6B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for distil-labs/Distil-Localdoc-Qwen3-0.6B to start chatting
- Docker Model Runner
How to use distil-labs/Distil-Localdoc-Qwen3-0.6B with Docker Model Runner:
docker model run hf.co/distil-labs/Distil-Localdoc-Qwen3-0.6B
- Lemonade
How to use distil-labs/Distil-Localdoc-Qwen3-0.6B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull distil-labs/Distil-Localdoc-Qwen3-0.6B
Run and chat with the model
lemonade run user.Distil-Localdoc-Qwen3-0.6B-{{QUANT_TAG}}List all available models
lemonade list
Distil-Localdoc-Qwen3-0.6B
A small language model (SLM) fine-tuned by Distil Labs for generating high-quality Python docstrings in Google style. Optimized to run locally via Ollama, ensuring your proprietary code never leaves your infrastructure.
*********** GITHUB DEMO AND CODE ***********
Model Details
- Developed by: Distil Labs GmbH
- License: Apache 2.0
- Finetuned from: Qwen/Qwen3-0.6B
- Model Size: 0.6B parameters
- Deployment: Local inference via Ollama
Use-case
Given Python functions or methods without docstrings, the model generates complete, properly formatted documentation following Google style guide.
Before:
def calculate_total(items, tax_rate=0.08, discount=None):
subtotal = sum(item['price'] * item['quantity'] for item in items)
if discount:
subtotal *= (1 - discount)
return subtotal * (1 + tax_rate)
After:
def calculate_total(items, tax_rate=0.08, discount=None):
"""
Calculate the total cost of items, applying a tax rate and optionally a discount.
Args:
items: List of item objects with price and quantity
tax_rate: Tax rate expressed as a decimal (default 0.08)
discount: Discount rate expressed as a decimal; if provided, the subtotal is multiplied by (1 - discount)
Returns:
Total amount after applying the tax
Example:
>>> items = [{'price': 10, 'quantity': 2}, {'price': 5, 'quantity': 1}]
>>> calculate_total(items, tax_rate=0.1, discount=0.05)
22.5
"""
subtotal = sum(item['price'] * item['quantity'] for item in items)
if discount:
subtotal *= (1 - discount)
return subtotal * (1 + tax_rate)
The model handles:
- Functions: Parameter descriptions, return values, exceptions, and usage examples
- Methods: Instance and class method documentation with proper formatting
- Note: The tool skips double underscore (dunder: xxx) methods
Why Local?
Privacy & Security: Proprietary codebases contain intellectual property and trade secrets. Cloud APIs create:
- IP exposure risks
- Compliance violations (GDPR, SOC 2, HIPAA)
- Security audit failures
- Dependency on external services
Speed & Cost: Document entire codebases in minutes without API rate limits or per-token charges.
Training
The tuned model was trained using knowledge distillation, leveraging the teacher model GPT-OSS-120B. We used 28 diverse Python functions and classes as seed data and supplemented them with 10,000 synthetic examples covering various domains:
- Data science and machine learning
- Web development (Flask, FastAPI, Django)
- DevOps and system utilities
- Algorithm implementations
- API clients and wrappers
Training data includes examples with:
- Various function complexities (simple to async patterns)
- Error handling patterns
- Async/await patterns
- Different parameter types and return values
Evaluation
We evaluated the model on 250 held-out test examples using LLM-as-a-judge methodology to assess the overall quality of generated docstrings.
| Model | Size | Accuracy |
|---|---|---|
| GPT-OSS (thinking) | 120B | 0.81 ± 0.02 |
| Qwen3 0.6B (tuned) | 0.6B | 0.76 ± 0.01 |
| Qwen3 0.6B (base) | 0.6B | 0.55 ± 0.04 |
The fine-tuned model achieves 94% of the teacher model's performance while running entirely on local hardware with zero API costs and complete privacy.
How to Use
Installation
Follow the instructions in the Github repository
Quick start:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Download and build the model
pip install huggingface_hub
hf download distil-labs/Distil-Localdoc-Qwen3-0.6B --local-dir distil-model
cd distil-model
ollama create localdoc_qwen3 -f Modelfile
# Run on your code
python localdoc_cli.py --file your_script.py
CLI Usage
# Basic usage (generates Google-style docstrings)
python localdoc_cli.py --file my_module.py
# Use specific model
python localdoc_cli.py --file my_module.py --model localdoc_qwen3
The tool will:
- Parse your Python file using AST
- Identify all functions and methods without docstrings (skips dunder methods)
- Generate appropriate docstrings based on code structure
- Preserve all original code and existing docstrings
- Output a new file with
_documentedsuffix
Model Sources
- Homepage: https://distillabs.ai
- Repository: https://github.com/distil-labs/Distil-localdoc
- Contact: contact@distillabs.ai
Citation
@software{distil_localdoc_2024,
title = {Distil-Localdoc: Local Python Documentation Generation with SLMs},
author = {Distil Labs},
year = {2024},
url = {https://huggingface.co/distil-labs/Distil-Localdoc-Qwen3-0.6B}
}
Community
- Follow us on LinkedIn
- Join our Slack community
- Star us on GitHub
- Downloads last month
- 6
We're not able to determine the quantization variants.
Model tree for distil-labs/Distil-Localdoc-Qwen3-0.6B
Collection including distil-labs/Distil-Localdoc-Qwen3-0.6B
Evaluation results
- LLM-as-Judge Accuracyself-reported0.760