WPO Models and datasets in paper "WPO: Enhancing RLHF with Weighted Preference Optimization". wzhouad/Llama3-Instruct-8B-WPO-FP Text Generation • 8B • Updated Jul 24, 2024 • 10 wzhouad/Llama3-Instruct-8B-WPO-HB Text Generation • 8B • Updated Aug 22, 2024 • 7 • 1 wzhouad/zephyr-7B-WPO-FP Text Generation • 7B • Updated Jul 24, 2024 • 7 wzhouad/zephyr-7B-WPO-HB Text Generation • 7B • Updated Aug 21, 2024 • 9
WPO Models and datasets in paper "WPO: Enhancing RLHF with Weighted Preference Optimization". wzhouad/Llama3-Instruct-8B-WPO-FP Text Generation • 8B • Updated Jul 24, 2024 • 10 wzhouad/Llama3-Instruct-8B-WPO-HB Text Generation • 8B • Updated Aug 22, 2024 • 7 • 1 wzhouad/zephyr-7B-WPO-FP Text Generation • 7B • Updated Jul 24, 2024 • 7 wzhouad/zephyr-7B-WPO-HB Text Generation • 7B • Updated Aug 21, 2024 • 9