Running Featured 1.25k FineWeb: decanting the web for the finest text data at scale 🍷 1.25k Generate high-quality text data for LLMs using FineWeb
Running on CPU Upgrade Featured 2.8k The Smol Training Playbook 📚 2.8k The secrets to building world-class LLMs
Running 3.62k The Ultra-Scale Playbook 🌌 3.62k The ultimate guide to training LLM on large GPU Clusters
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published Feb 7, 2025 • 65
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 123
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21, 2025 • 84
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 434
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published Jan 14, 2025 • 20
Do generative video models learn physical principles from watching videos? Paper • 2501.09038 • Published Jan 14, 2025 • 34