Instructions to use srallabandi0225/inframind-0.5b-grpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use srallabandi0225/inframind-0.5b-grpo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="srallabandi0225/inframind-0.5b-grpo") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("srallabandi0225/inframind-0.5b-grpo") model = AutoModelForCausalLM.from_pretrained("srallabandi0225/inframind-0.5b-grpo") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use srallabandi0225/inframind-0.5b-grpo with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "srallabandi0225/inframind-0.5b-grpo" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srallabandi0225/inframind-0.5b-grpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/srallabandi0225/inframind-0.5b-grpo
- SGLang
How to use srallabandi0225/inframind-0.5b-grpo with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "srallabandi0225/inframind-0.5b-grpo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srallabandi0225/inframind-0.5b-grpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "srallabandi0225/inframind-0.5b-grpo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srallabandi0225/inframind-0.5b-grpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use srallabandi0225/inframind-0.5b-grpo with Docker Model Runner:
docker model run hf.co/srallabandi0225/inframind-0.5b-grpo
- InfraMind: Infrastructure-as-Code Small Language Model
InfraMind: Infrastructure-as-Code Small Language Model
InfraMind is a 0.5B parameter language model fine-tuned for Infrastructure-as-Code (IaC) generation using reinforcement learning (GRPO/DAPO).
Model Description
| Attribute | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-0.5B-Instruct |
| Parameters | 500M |
| Training Method | GRPO + DAPO (Reinforcement Learning) |
| Domain | Infrastructure-as-Code |
| License | MIT |
Why InfraMind?
Unlike traditional fine-tuning (SFT/LoRA) that memorizes patterns, InfraMind uses reinforcement learning with domain-specific rewards to teach the model to reason about infrastructure.
| Approach | Method | Result |
|---|---|---|
| SFT/LoRA | "Memorize this Terraform example" | Copies patterns, fails on novel tasks |
| InfraMind | "Generate Terraform, I'll score if it's valid" | Learns reasoning, handles new tasks |
Evaluation Results
| Model | Training Method | Accuracy | Pass Threshold |
|---|---|---|---|
| inframind-grpo | GRPO | 97.3% | 0.6 |
| inframind-dapo | DAPO | 96.4% | 0.6 |
| Base (Qwen2.5-0.5B) | None | ~30% | 0.6 |
Evaluated on InfraMind-Bench (110 held-out test samples) across:
- Terraform (AWS, GCP, Azure)
- Kubernetes (Deployments, Services, Ingress)
- Docker (Dockerfile, docker-compose)
- CI/CD (GitHub Actions, GitLab CI)
Comparison with Other Models
| Model | Params | Training | Benchmarks | Edge Deploy |
|---|---|---|---|---|
| qwen3-devops | 1.7B | SFT | None | No |
| devops-slm-v1 | 7B | LoRA | None | No |
| InfraMind | 0.5B | GRPO/DAPO | 97.3% | Yes |
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model
model = AutoModelForCausalLM.from_pretrained("srallabandi0225/inframind-0.5b-grpo")
tokenizer = AutoTokenizer.from_pretrained("srallabandi0225/inframind-0.5b-grpo")
# Generate Terraform
prompt = """### Instruction:
Create Terraform for AWS EC2 instance
### Input:
t3.micro instance type
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.pad_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Example Output
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "web-server"
}
}
Supported IaC Categories
| Category | Examples | Coverage |
|---|---|---|
| Terraform | EC2, S3, VPC, RDS, EKS, Lambda, IAM | AWS, GCP, Azure |
| Kubernetes | Deployment, Service, Ingress, ConfigMap, RBAC | All K8s resources |
| Docker | Dockerfile, docker-compose | Multi-stage builds |
| CI/CD | GitHub Actions, GitLab CI, Jenkins | Workflows, pipelines |
| Ansible | Playbooks, roles | Server configuration |
| Helm | Charts, values.yaml | K8s package management |
Training Details
GRPO (Group Relative Policy Optimization)
First stage training using GRPO:
Training:
epochs: 3
batch_size: 16 (effective)
learning_rate: 5e-6
beta (KL): 0.04
generations_per_prompt: 4
LoRA:
r: 16
alpha: 32
target_modules: [q_proj, k_proj, v_proj, o_proj]
DAPO (Direct Advantage Policy Optimization)
Second stage training with DAPO innovations:
Training:
epochs: 2
batch_size: 16 (effective)
learning_rate: 5e-6
beta (KL): 0.0 # Pure DAPO
generations_per_prompt: 8
DAPO Innovations:
1. Clip-Higher: Asymmetric clipping (ε_low=0.2, ε_high=0.28)
2. Dynamic Sampling: Skip uniform reward batches
3. Token-Level Loss: Per-token policy gradient
4. Overlong Punishment: Soft length penalty
Reward Function
Domain-specific reward for IaC quality:
Reward = α × Syntax + β × Correctness + γ × Format
Where:
- Syntax (α=0.4): Valid resource declarations
- Correctness (β=0.3): Correct resource types
- Format (γ=0.3): Proper structure
Hardware Requirements
| Deployment | Memory | GPU |
|---|---|---|
| Training | 16GB+ | A100/A10G |
| Inference | 2GB | Optional |
| Edge (Raspberry Pi 5) | 4GB | None |
The 0.5B model is small enough to run on edge devices, making it suitable for:
- Air-gapped environments
- Local development
- CI/CD pipelines
- IoT/Edge infrastructure
Limitations
- IaC-specific: Optimized for infrastructure tasks, not general conversation
- English only: Training data is in English
- No execution: Generates code, does not execute or validate against real infrastructure
- Version-sensitive: Generated code may use older API versions
- Security: Always review generated code for security best practices
Out-of-Scope Uses
- Legal or medical advice
- General-purpose chatbot
- Executing infrastructure changes without human review
- Production deployment without validation
Intended Use
Primary Use Cases
- Generating Terraform configurations
- Creating Kubernetes manifests
- Writing Dockerfiles and docker-compose
- Building CI/CD pipelines
- Infrastructure automation scripting
Users
- DevOps engineers
- Platform engineers
- SREs
- Cloud architects
- Infrastructure developers
Training Data
InfraMind-Bench: 2000+ IaC tasks in Alpaca format
| Category | Tasks |
|---|---|
| Terraform | 500+ |
| Kubernetes | 400+ |
| Docker | 300+ |
| CI/CD | 300+ |
| Ansible | 200+ |
| Helm | 150+ |
| Monitoring | 150+ |
Data format:
{
"instruction": "Create Terraform for AWS EC2 instance",
"input": "t3.micro instance type",
"output": ""
}
Ethical Considerations
- Model may generate insecure configurations if not prompted for security
- Generated infrastructure code should always be reviewed before deployment
- Model does not have access to real infrastructure or credentials
- Users are responsible for validating generated code against their security policies
Citation
@misc{rallabandi2024inframind,
title={InfraMind: Fine-tuning Small Language Models for Infrastructure-as-Code Generation with Reinforcement Learning},
author={Rallabandi, Sai Kiran},
year={2024},
publisher={HuggingFace},
url={https://huggingface.co/srallabandi0225/inframind-0.5b-grpo}
}
Links
- GitHub: github.com/saikiranrallabandi/inframind
- GRPO Model: srallabandi0225/inframind-0.5b-grpo
- DAPO Model: srallabandi0225/inframind-0.5b-dapo
Acknowledgments
- Qwen Team for the base model
- DeepSeek for GRPO
- NVIDIA NeMo for DAPO reference
- TRL for training infrastructure
Model Card Contact
Author: Sai Kiran Rallabandi GitHub: @saikiranrallabandi
- Downloads last month
- 12
Model tree for srallabandi0225/inframind-0.5b-grpo
Base model
Qwen/Qwen2.5-0.5BEvaluation results
- GRPO Accuracy on InfraMind-Benchself-reported97.300
- DAPO Accuracy on InfraMind-Benchself-reported96.400