Self-Improving Pretraining: using post-trained models to pretrain better models Paper • 2601.21343 • Published 3 days ago • 10
CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval Paper • 2601.15849 • Published 10 days ago • 14
AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking Paper • 2601.17645 • Published 7 days ago • 22
Linear representations in language models can change dramatically over a conversation Paper • 2601.20834 • Published 4 days ago • 20
view article Article Introducing Waypoint-1: Real-time interactive video diffusion from Overworld +3 13 days ago • 33
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models Paper • 2601.10387 • Published 17 days ago • 11
LucaOne Collection Generalized biological foundation model with unified nucleic acid and protein language(Nature Machine Intelligence),https://github.com/LucaOne/LucaOne • 6 items • Updated Dec 31, 2025 • 2
view article Article M2.1: Multilingual and Multi-Task Coding with Strong Generalization 27 days ago • 37
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior Paper • 2512.20757 • Published Dec 23, 2025 • 18
Are We on the Right Way to Assessing LLM-as-a-Judge? Paper • 2512.16041 • Published Dec 17, 2025 • 34
Hierarchical Dataset Selection for High-Quality Data Sharing Paper • 2512.10952 • Published Dec 11, 2025 • 2
Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems Paper • 2512.11150 • Published Dec 11, 2025 • 6
Skywork-Reward-V2 Collection Scaling preference data curation to the extreme • 9 items • Updated Jul 4, 2025 • 26
Reward Models 10-2025 Collection A collection of great reward models for research and production • 7 items • Updated 3 days ago • 12
Olmo 3 Pre-training Collection All artifacts related to Olmo 3 pre-training • 10 items • Updated Dec 23, 2025 • 33
Mitigating Label Length Bias in Large Language Models Paper • 2511.14385 • Published Nov 18, 2025 • 8
view article Article ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases Nov 5, 2025 • 59