jaygala24 commited on
Commit
c85738c
verified
1 Parent(s): 19f5946

Add pass@k evaluation results

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -65,6 +65,18 @@ DAPO extends GRPO with clip-higher (asymmetric PPO clipping), dynamic sampling (
65
  | Precision | `bf16` |
66
  | DeepSpeed | ZeRO Stage 3 |
67
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  ## Training Curves
69
 
70
  ![Training Metrics](training_metrics.png)
 
65
  | Precision | `bf16` |
66
  | DeepSpeed | ZeRO Stage 3 |
67
 
68
+ ## Evaluation Results
69
+
70
+ Pass@k on math reasoning benchmarks (N=32 samples per problem, temperature=1.0):
71
+
72
+ | Dataset | pass@1 | pass@2 | pass@4 | pass@8 | pass@16 | pass@32 |
73
+ | --- | ---: | ---: | ---: | ---: | ---: | ---: |
74
+ | GSM8K (test) | 86.52 | 91.04 | 93.73 | 95.52 | 96.73 | 97.50 |
75
+ | MATH-500 | 70.66 | 77.91 | 83.26 | 87.27 | 90.10 | 92.00 |
76
+ | **Overall** | **82.16** | **87.43** | **90.85** | **93.25** | **94.90** | **95.99** |
77
+
78
+ *GSM8K test: 1319 problems 路 MATH-500: 500 problems 路 Overall: 1819 problems (overall weighted by problem count).*
79
+
80
  ## Training Curves
81
 
82
  ![Training Metrics](training_metrics.png)