MattBou00/SequentialLR001_2000samples_R1-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated Nov 22, 2025
MattBou00/SequentialLR001_2000samples_R1-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated Nov 22, 2025
MattBou00/SequentialLR001_2000samples-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated Nov 22, 2025
MattBou00/SequentialLR001_2000samples-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated Nov 22, 2025 • 1
MattBou00/SequentialLR001_2000samples-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated Nov 22, 2025
MattBou00/SequentialLR00001_2000samples-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated Nov 22, 2025
MattBou00/SingleLR00001_2000samples-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated Nov 22, 2025
MattBou00/ROUND5ACTUALRETRYRUNNINGCODE-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated Nov 21, 2025
MattBou00/ROUND5ACTUALRETRYRUNNINGCODE-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated Nov 21, 2025
MattBou00/ROUND5ACTUALRETRYRUNNINGCODE-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated Nov 21, 2025
MattBou00/ROUND5ACTUALRETRYRUNNINGCODE-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated Nov 21, 2025
MattBou00/ROUND5ACTUALRETRYRUNNINGCODE-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated Nov 21, 2025
MattBou00/ROUND5RETRYRUNNINGCODE-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated Nov 21, 2025
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round1-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated Nov 21, 2025
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated Sep 22, 2025