view article Article KV Cache from scratch in nanoVLM +3 ariG23498, kashif, lusxvr, andito, pcuenq • Jun 4, 2025 • 119
view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate +2 mirinflim, aldopareja, muellerzr, stas • Jun 13, 2024 • 62
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Paper • 2402.12030 • Published Feb 19, 2024 • 3
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization Paper • 2508.07629 • Published Aug 11, 2025 • 43
ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge Paper • 2507.21990 • Published Jul 29, 2025 • 27