Ron Arel's picture

Ron Arel

RonusMTG

·

AI & ML interests

None yet

Organizations

upvoted a collection over 1 year ago

Tamper-Resistant Safeguards for Open-Weight LLMs

Models & datasets from the paper "Tamper-Resistant Safeguards for Open-Weight LLMs" (https://arxiv.org/pdf/2408.00761) • 9 items • Updated Feb 15, 2025 • 5

upvoted a collection almost 2 years ago

WMDP Benchmark

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning • 9 items • Updated 19 days ago • 10