Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.03300

DeepSeek Math series

deepseek-ai/DeepSeek-Math-V2

Text Generation • Updated Nov 27, 2025 • 2.75k • 687
deepseek-ai/deepseek-math-7b-instruct

Text Generation • Updated Feb 6, 2024 • 9.38k • 150
deepseek-ai/deepseek-math-7b-rl

Text Generation • 7B • Updated Mar 19, 2024 • 2.91k • 92
deepseek-ai/deepseek-math-7b-base

Text Generation • Updated Feb 6, 2024 • 7.47k • 87

Papers reimplemented

List of research papers, architectures, and techniques reimplemented in LLM-quest or Hugging Face's TRL. Missing: Qwen3.5, Qwen3-Next, GPT-2

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220
Reinforced Attention Learning

Paper • 2602.04884 • Published Feb 4 • 29
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published Feb 4 • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3

DeepSeek Research Papers

This Collection contains all DeepSeek Research Papers

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Paper • 2511.22570 • Published Nov 27, 2025 • 93
DeepSeek-OCR: Contexts Optical Compression

Paper • 2510.18234 • Published Oct 21, 2025 • 93
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14, 2025 • 76

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 144

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Paper • 2310.16818 • Published Oct 25, 2023 • 33
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 55
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 61
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 72

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 125
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 60
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Paper • 2305.10601 • Published May 17, 2023 • 15

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 230
On Predictability of Reinforcement Learning Dynamics for Large Language Models

Paper • 2510.00553 • Published Oct 1, 2025 • 9
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 144

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 144
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447

Running

Featured

300

Chat with DeepSeek Coder 33B

🐬

300

Generate code and answers with a conversational AI
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 144
Sleeping

Mujoco

🦾

Duplicate this space to train your own robots using Mujoco!
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument

Paper • 2502.08939 • Published Feb 13, 2025 • 2

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 144

DeepSeek Math series

deepseek-ai/DeepSeek-Math-V2

Text Generation • Updated Nov 27, 2025 • 2.75k • 687
deepseek-ai/deepseek-math-7b-instruct

Text Generation • Updated Feb 6, 2024 • 9.38k • 150
deepseek-ai/deepseek-math-7b-rl

Text Generation • 7B • Updated Mar 19, 2024 • 2.91k • 92
deepseek-ai/deepseek-math-7b-base

Text Generation • Updated Feb 6, 2024 • 7.47k • 87

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 125
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 60
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Paper • 2305.10601 • Published May 17, 2023 • 15

Papers reimplemented

List of research papers, architectures, and techniques reimplemented in LLM-quest or Hugging Face's TRL. Missing: Qwen3.5, Qwen3-Next, GPT-2

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220
Reinforced Attention Learning

Paper • 2602.04884 • Published Feb 4 • 29
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published Feb 4 • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 230
On Predictability of Reinforcement Learning Dynamics for Large Language Models

Paper • 2510.00553 • Published Oct 1, 2025 • 9
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 144

DeepSeek Research Papers

This Collection contains all DeepSeek Research Papers

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Paper • 2511.22570 • Published Nov 27, 2025 • 93
DeepSeek-OCR: Contexts Optical Compression

Paper • 2510.18234 • Published Oct 21, 2025 • 93
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14, 2025 • 76

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 144
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 144

Running

Featured

300

Chat with DeepSeek Coder 33B

🐬

300

Generate code and answers with a conversational AI
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 144
Sleeping

Mujoco

🦾

Duplicate this space to train your own robots using Mujoco!
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument

Paper • 2502.08939 • Published Feb 13, 2025 • 2

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Paper • 2310.16818 • Published Oct 25, 2023 • 33
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 55
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 61
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 72

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 144

Previous
1
2
3
...
8
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs