Collections
Discover the best community collections!
Collections including paper arxiv:2402.03300
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 29 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper • 2511.22570 • Published • 93 -
DeepSeek-OCR: Contexts Optical Compression
Paper • 2510.18234 • Published • 93 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 76
-
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 33 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 55 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 61 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 72
-
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 60 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Paper • 2305.10601 • Published • 15
-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Paper • 2510.00553 • Published • 9 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 144
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 144 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447
-
Chat with DeepSeek Coder 33B
🐬300Generate code and answers with a conversational AI
-
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 144 -
Mujoco
🦾Duplicate this space to train your own robots using Mujoco!
-
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument
Paper • 2502.08939 • Published • 2
-
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 60 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Paper • 2305.10601 • Published • 15
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 29 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Paper • 2510.00553 • Published • 9 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 144
-
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper • 2511.22570 • Published • 93 -
DeepSeek-OCR: Contexts Optical Compression
Paper • 2510.18234 • Published • 93 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 76
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 144 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447
-
Chat with DeepSeek Coder 33B
🐬300Generate code and answers with a conversational AI
-
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 144 -
Mujoco
🦾Duplicate this space to train your own robots using Mujoco!
-
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument
Paper • 2502.08939 • Published • 2
-
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 33 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 55 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 61 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 72