amandaa
/

AutoL2S-Plus-7b

Model card Files Files and versions

AutoL2S-Plus-7b / README.md

Feng Luo

update readme

6daa7d4 3 months ago

|

history blame contribute delete

3.1 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen2.5-7B-Instruct
	---

	# AutoL2S-Plus-7B

	This is the official model repository for AutoL2S-Plus-7B, a model fine-tuned for efficient reasoning based on [amandaa/AutoL2S-7b](https://huggingface.co/amandaa/AutoL2S-7b/tree/main).

	## 💡 Overview

	AutoL2S is a two-stage framework designed to improve reasoning efficiency. It follows a two-phase training pipeline — Supervised Fine-Tuning (SFT) followed by off-policy Reinforcement Learning (RL).

	- Stage 1: Long–Short Concatenated Distillation
	In this stage, long and short chains of thought (CoT) are paired and trained jointly, using a special `<EASY>` token to enable automatic switching between CoT modes. The resulting sft model is released as [amandaa/AutoL2S-7b](https://huggingface.co/amandaa/AutoL2S-7b/tree/main).

	- Stage 2: Off-Policy RL with Length-Aware Objective
	In the second stage, we further refine reasoning efficiency through an RL objective that balances accuracy and length and get AutoL2S-Plus. This model is rewarded for generating shorter reasoning paths while maintaining correctness. Because the length objective is non-differentiable, we apply a PPO-style clipped loss and compute per-sample advantages by leveraging long- and short-form outputs from the SFT-based AutoL2S model, which serves as the reference policy.

	This repository contains:

	- Model weights
	- Configuration files
	- necessary scripts in the `examples/` directory

	---
	## 🧩 Dependencies
	We recommend using the model with [vLLM](https://github.com/vllm-project/vllm).
	The code has been tested with:

	```
	vLLM == 0.6.2
	```

	---
	## 🚀 How to Use

	Run the inference example:

	```bash
	cd examples
	python inference.py
	```

	Alternatively, please download examples/prefixLLM.py from this repository and put them in your working dir.

	```python
	from vllm import SamplingParams
	from prefixLLM import PrefixLLM

	SYSTEM_PROMPT = "You are a helpful and harmless assistant.You should think step-by-step and put your final answer within \\boxed{{}}."

	llm = PrefixLLM(model="amandaa/AutoL2S-Plus-7b")
	max_tokens, temp = 32768, 0.7
	sampling_params= SamplingParams(max_tokens=max_tokens, temperature=temp)

	question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"
	messages = [
	{"role": "system", "content": SYSTEM_PROMPT},
	{"role": "user", "content": question}
	]
	responses = llm.chat(messages=messages, sampling_params=sampling_params, use_tqdm=True)

	print(responses[0].outputs[0].text)
	```

	---


	## 🔍 Citation

	If you use this model in your work, please consider citing:

	```bibtex
	@article{luo2025autol2s,
	title={AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models},
	author={Luo, Feng and Chuang, Yu-Neng and Wang, Guanchu and Le, Hoang Anh Duy and Zhong, Shaochen and Liu, Hongyi and Yuan, Jiayi and Sui, Yang and Braverman, Vladimir and Chaudhary, Vipin and others},
	journal={arXiv preprint arXiv:2505.22662},
	year={2025}
	}
	```