大模型idea - a anbinx Collection

anbinx 's Collections

大模型idea

updated Sep 15, 2025

Instruction Following without Instruction Tuning

Paper • 2409.14254 • Published Sep 21, 2024 • 29
Baichuan Alignment Technical Report

Paper • 2410.14940 • Published Oct 19, 2024 • 51
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 60
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

Paper • 2410.18558 • Published Oct 24, 2024 • 18
Self-Consistency Preference Optimization

Paper • 2411.04109 • Published Nov 6, 2024 • 19
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 433
Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published Feb 5, 2025 • 58
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19, 2025 • 211
Chain of Draft: Thinking Faster by Writing Less

Paper • 2502.18600 • Published Feb 25, 2025 • 50
URECA: Unique Region Caption Anything

Paper • 2504.05305 • Published Apr 7, 2025 • 35
An Empirical Study of Qwen3 Quantization

Paper • 2505.02214 • Published May 4, 2025 • 25
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14, 2025 • 98
WorldPM: Scaling Human Preference Modeling

Paper • 2505.10527 • Published May 15, 2025 • 34
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1, 2025 • 79
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful

Paper • 2507.07101 • Published Jul 9, 2025 • 4
Scaling Laws for Optimal Data Mixtures

Paper • 2507.09404 • Published Jul 12, 2025 • 36
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21, 2025 • 90
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published Aug 28, 2025 • 110