大模型idea
updated
Instruction Following without Instruction Tuning
Paper
•
2409.14254
•
Published
•
29
Baichuan Alignment Technical Report
Paper
•
2410.14940
•
Published
•
51
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and
Evolution
Paper
•
2410.16256
•
Published
•
60
Infinity-MM: Scaling Multimodal Performance with Large-Scale and
High-Quality Instruction Data
Paper
•
2410.18558
•
Published
•
18
Self-Consistency Preference Optimization
Paper
•
2411.04109
•
Published
•
19
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
433
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
•
2502.03373
•
Published
•
58
Qwen2.5-VL Technical Report
Paper
•
2502.13923
•
Published
•
211
Chain of Draft: Thinking Faster by Writing Less
Paper
•
2502.18600
•
Published
•
50
URECA: Unique Region Caption Anything
Paper
•
2504.05305
•
Published
•
35
An Empirical Study of Qwen3 Quantization
Paper
•
2505.02214
•
Published
•
25
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture,
Training and Dataset
Paper
•
2505.09568
•
Published
•
98
WorldPM: Scaling Human Preference Modeling
Paper
•
2505.10527
•
Published
•
34
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
•
2507.00432
•
Published
•
79
Small Batch Size Training for Language Models: When Vanilla SGD Works,
and Why Gradient Accumulation Is Wasteful
Paper
•
2507.07101
•
Published
•
4
Scaling Laws for Optimal Data Mixtures
Paper
•
2507.09404
•
Published
•
36
Deep Think with Confidence
Paper
•
2508.15260
•
Published
•
90
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs
via Bi-Mode Annealing and Reinforce Learning
Paper
•
2508.21113
•
Published
•
110