| | --- |
| | license: apache-2.0 |
| | base_model: |
| | - Qwen/Qwen2.5-7B-Instruct |
| | --- |
| | |
| | # AutoL2S-Plus-7B |
| |
|
| | This is the official model repository for AutoL2S-Plus-7B, a model fine-tuned for efficient reasoning based on [amandaa/AutoL2S-7b](https://huggingface.co/amandaa/AutoL2S-7b/tree/main). |
| |
|
| | ## 💡 Overview |
| |
|
| | **AutoL2S** is a two-stage framework designed to improve reasoning efficiency. It follows a two-phase training pipeline — Supervised Fine-Tuning (SFT) followed by off-policy Reinforcement Learning (RL). |
| |
|
| | - **Stage 1: Long–Short Concatenated Distillation** |
| | In this stage, long and short chains of thought (CoT) are paired and trained jointly, using a special `<EASY>` token to enable automatic switching between CoT modes. The resulting sft model is released as [amandaa/AutoL2S-7b](https://huggingface.co/amandaa/AutoL2S-7b/tree/main). |
| |
|
| | - **Stage 2: Off-Policy RL with Length-Aware Objective** |
| | In the second stage, we further refine reasoning efficiency through an RL objective that balances accuracy and length and get AutoL2S-Plus. This model is rewarded for generating shorter reasoning paths while maintaining correctness. Because the length objective is non-differentiable, we apply a PPO-style clipped loss and compute per-sample advantages by leveraging long- and short-form outputs from the SFT-based AutoL2S model, which serves as the reference policy. |
| |
|
| | This repository contains: |
| |
|
| | - Model weights |
| | - Configuration files |
| | - necessary scripts in the `examples/` directory |
| |
|
| | --- |
| | ## 🧩 Dependencies |
| | We recommend using the model with [vLLM](https://github.com/vllm-project/vllm). |
| | The code has been tested with: |
| |
|
| | ``` |
| | vLLM == 0.6.2 |
| | ``` |
| |
|
| | --- |
| | ## 🚀 How to Use |
| |
|
| | Run the inference example: |
| |
|
| | ```bash |
| | cd examples |
| | python inference.py |
| | ``` |
| |
|
| | Alternatively, please download examples/prefixLLM.py from this repository and put them in your working dir. |
| |
|
| | ```python |
| | from vllm import SamplingParams |
| | from prefixLLM import PrefixLLM |
| | |
| | SYSTEM_PROMPT = "You are a helpful and harmless assistant.You should think step-by-step and put your final answer within \\boxed{{}}." |
| | |
| | llm = PrefixLLM(model="amandaa/AutoL2S-Plus-7b") |
| | max_tokens, temp = 32768, 0.7 |
| | sampling_params= SamplingParams(max_tokens=max_tokens, temperature=temp) |
| | |
| | question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$" |
| | messages = [ |
| | {"role": "system", "content": SYSTEM_PROMPT}, |
| | {"role": "user", "content": question} |
| | ] |
| | responses = llm.chat(messages=messages, sampling_params=sampling_params, use_tqdm=True) |
| | |
| | print(responses[0].outputs[0].text) |
| | ``` |
| |
|
| | --- |
| |
|
| |
|
| | ## 🔍 Citation |
| |
|
| | If you use this model in your work, please consider citing: |
| |
|
| | ```bibtex |
| | @article{luo2025autol2s, |
| | title={AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models}, |
| | author={Luo, Feng and Chuang, Yu-Neng and Wang, Guanchu and Le, Hoang Anh Duy and Zhong, Shaochen and Liu, Hongyi and Yuan, Jiayi and Sui, Yang and Braverman, Vladimir and Chaudhary, Vipin and others}, |
| | journal={arXiv preprint arXiv:2505.22662}, |
| | year={2025} |
| | } |
| | ``` |