Instructions to use DJLougen/Harmonic-27B-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use DJLougen/Harmonic-27B-MLX-8bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("DJLougen/Harmonic-27B-MLX-8bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Unsloth Studio new
How to use DJLougen/Harmonic-27B-MLX-8bit with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DJLougen/Harmonic-27B-MLX-8bit to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DJLougen/Harmonic-27B-MLX-8bit to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for DJLougen/Harmonic-27B-MLX-8bit to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="DJLougen/Harmonic-27B-MLX-8bit", max_seq_length=2048, ) - Pi new
How to use DJLougen/Harmonic-27B-MLX-8bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "DJLougen/Harmonic-27B-MLX-8bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "DJLougen/Harmonic-27B-MLX-8bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use DJLougen/Harmonic-27B-MLX-8bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "DJLougen/Harmonic-27B-MLX-8bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default DJLougen/Harmonic-27B-MLX-8bit
Run Hermes
hermes
- MLX LM
How to use DJLougen/Harmonic-27B-MLX-8bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "DJLougen/Harmonic-27B-MLX-8bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "DJLougen/Harmonic-27B-MLX-8bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DJLougen/Harmonic-27B-MLX-8bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
Harmonic-27B-MLX-8bit
MLX 8-bit quantized conversion of DJLougen/Harmonic-27B — the flagship of the Harmonic series. A reasoning-focused fine-tune of Qwen 3.5 27B trained on structurally validated data. Every row passes automated quality gates. No junk, no filler, no shallow traces.
The name comes from harmonic analysis of reasoning patterns — the structural signal that separates genuine thinking from surface-level chain-of-thought.
Support This Work
I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.
Quantization Details
- Format: MLX SafeTensors
- Bits per weight: 8.50
- Note: 8-bit quantization — near-lossless quality. Requires ~27GB.
Training Approach
Same pipeline as Harmonic-9B. 799 curated rows — a small, precisely curated dataset instead of tens of thousands of unfiltered examples. The base model already has the knowledge from pretraining — the fine-tune teaches it a reasoning behavior pattern.
Every training row contains explicit self-correction ("wait, that's not right"), verification ("let me check by plugging back in"), and multi-path exploration ("alternatively, I could try..."). The data was generated from multiple frontier models and filtered through a custom structural quality pipeline that enforces reasoning depth, coherence, and flow patterns. 100% of rows pass all quality gates simultaneously.
Training Data Quality
The same reasoning data as Harmonic-9B and Harmonic-2B, curated using a custom structural process supervision pipeline:
| Metric | Value |
|---|---|
| Signal quality score | 78.7 mean (61.5 min, 90.0 max) |
| Thinking trace depth | 1,667 words average |
| Self-correction | 100% of rows (17.2 per row avg) |
| Verification | 100% of rows (10.3 per row avg) |
| Exploration | 100% of rows (6.3 per row avg) |
| Quality gate pass rate | 100% |
How It Compares
We ran our structural quality analysis against every major public reasoning dataset used for Opus/Qwen distillation. The results:
| Dataset | Rows | Think Words | Self-Correction | Verification | Exploration | Signal Score | Gate Pass |
|---|---|---|---|---|---|---|---|
| Harmonic (ours) | 799 | 1,667 | 100% | 100% | 100% | 78.7 | 100% |
| Crownelius/Opus-3300x | 2,160 | 188 | 5.9% | 22.6% | 5.2% | 28.0 | 0.1% |
| nohurry/Opus-Filtered | 2,326 | 191 | 6.7% | 24.1% | 5.3% | 28.5 | 0.1% |
| TeichAI/Opus-250x | 250 | 323 | 17.2% | 26.8% | 6.8% | 24.6 | 0.4% |
| Jackrong/Qwen-700x | 633 | 6,653 | 97.5% | 97.6% | 69.8% | 75.6 | 22.7% |
| Bespoke-Stratos-17k | 16,710 | 1,322 | 88.2% | 72.7% | 59.7% | 71.7 | 49.0% |
| glaiveai/reasoning-20m | 22M+ | 799 | 64.1% | 41.4% | 37.3% | 46.2 | 12.8% |
| KingNish/reasoning-20k | 19,944 | 132 | 0.7% | 4.2% | 4.3% | 27.4 | 0.0% |
Speculative Decoding
Harmonic-27B pairs with Harmonic-2B for speculative decoding. Both models share the same training data, reasoning format, and architecture family (Qwen 3.5), which keeps draft token acceptance rates high.
from transformers import AutoModelForCausalLM
target = AutoModelForCausalLM.from_pretrained("DJLougen/Harmonic-27B")
draft = AutoModelForCausalLM.from_pretrained("DJLougen/Harmonic-2B")
outputs = target.generate(
**inputs,
assistant_model=draft,
max_new_tokens=512,
)
Usage
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("DJLougen/Harmonic-27B-MLX-8bit")
response = generate(model, tokenizer, prompt="What is the sum of the first 100 prime numbers?", max_tokens=2048)
print(response)
Reasoning format
The model uses think blocks for reasoning:
<|thinking|>
The user is asking about X. Let me consider two approaches...
Approach 1: ...
Approach 2: ...
I will go with Approach 1 because...
Wait, I need to be careful here - this assumes Y, which may not hold.
Let me verify by checking a special case...
Yes, that confirms the result.
<|/thinking|>
[Final answer here]
Training Configuration
base_model: unsloth/Qwen3.5-27B
dataset: 799 curated reasoning rows
epochs: 1
learning_rate: 1e-4
lr_scheduler: cosine
warmup_ratio: 0.1
max_seq_length: 8192
lora_rank: 32
lora_alpha: 32
dropout: 0.05
micro_batch_size: 1
gradient_accumulation_steps: 4
weight_decay: 0.01
Intended Use
- Reasoning tasks requiring genuine multi-step thinking
- Mathematical problem-solving with self-correction
- Code analysis and generation with structured verification
- General conversation (conversational ability preserved through training design)
- Target model for speculative decoding with Harmonic-2B
- Base model for Stage 2 agentic fine-tuning
Limitations
- Reasoning traces can be verbose for simple questions
- Not optimized for tool calling — see Harmonic-Hermes-9B for agentic use
- Benchmark evaluation is ongoing
Architecture
- Base: Qwen 3.5 27B (27.36B parameters)
- Training: LoRA fine-tuning, merged into base weights
- Original Precision: BF16
- Context: 8192 tokens
License
Apache 2.0 — same as the base model. All training data is from Apache 2.0 or MIT licensed sources. Fully commercial use permitted.
Links
- Original model: DJLougen/Harmonic-27B
- GGUF quants: DJLougen/Harmonic-27B-GGUF
- MLX 16-bit: DJLougen/Harmonic-27B-MLX-16bit
- MLX 8-bit: DJLougen/Harmonic-27B-MLX-8bit
- MLX 4-bit: DJLougen/Harmonic-27B-MLX-4bit
- 9B variant: DJLougen/Harmonic-9B
- 9B GGUF: DJLougen/Harmonic-9B-GGUF
- 2B draft model: DJLougen/Harmonic-2B
- Agentic variant: DJLougen/Harmonic-Hermes-9B
- Downloads last month
- 1
8-bit
