Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AkimfromParisย 
posted an update 4 days ago
Post
2437
๐ŸŒธ ๐™Š๐™ฅ๐™š๐™ฃ ๐™…๐™–๐™ฅ๐™–๐™ฃ๐™š๐™จ๐™š ๐™‡๐™‡๐™ˆ ๐™‡๐™š๐™–๐™™๐™š๐™ง๐™—๐™ค๐™–๐™ง๐™™ ๐™‘2 ๐™ค๐™ฃ ๐™ƒ๐™ช๐™œ๐™œ๐™ž๐™ฃ๐™œ ๐™๐™–๐™˜๐™š ๐Ÿ‡ฏ๐Ÿ‡ต // ๐ŸŒธ ใƒใ‚ฎใƒณใ‚ฐใƒ•ใ‚งใ‚คใ‚น็‰ˆใ€Œ ๐—ข๐—ฝ๐—ฒ๐—ป ๐—๐—ฎ๐—ฝ๐—ฎ๐—ป๐—ฒ๐˜€๐—ฒ ๐—Ÿ๐—Ÿ๐—  ๐—Ÿ๐—ฒ๐—ฎ๐—ฑ๐—ฒ๐—ฟ๐—ฏ๐—ผ๐—ฎ๐—ฟ๐—ฑ ๐—ฉ๐Ÿฎ ใ€ๅ…ฌ้–‹ ๐Ÿ‡ฏ๐Ÿ‡ต

I am thrilled to announce the launch of version 2 of the ๐™Š๐™ฅ๐™š๐™ฃ ๐™…๐™–๐™ฅ๐™–๐™ฃ๐™š๐™จ๐™š ๐™‡๐™‡๐™ˆ ๐™‡๐™š๐™–๐™™๐™š๐™ง๐™—๐™ค๐™–๐™ง๐™™. This initiative is driven by the "Fine-tuning and Evaluation" team, led by Professor Miyao at the The University of Tokyo, under the Research and Development Center for Large Language Models (LLMC) at Japanโ€™s National Institute of Informatics (NII).

๐™Ž๐™ฉ๐™ง๐™–๐™ฉ๐™š๐™œ๐™ž๐™˜ ๐™–๐™ฃ๐™™ ๐™ฉ๐™š๐™˜๐™๐™ฃ๐™ž๐™˜๐™–๐™ก ๐™ช๐™ฅ๐™œ๐™ง๐™–๐™™๐™š๐™จ:
- Our new backend features eight A100 GPUs, enabling the evaluation of open-source models of more than 100B parameters.
- Submissions now require a Hugging Face Hub login to ensure accountability.
- We have added metrics for evaluation time, COโ‚‚ emissions (thx to Code Carbon ๐ŸŒฑ ), alongside reasoning capabilities.

๐˜ฟ๐™–๐™ฉ๐™–๐™จ๐™š๐™ฉ๐™จ ๐™–๐™ฃ๐™™ ๐™š๐™ซ๐™–๐™ก๐™ช๐™–๐™ฉ๐™ž๐™ค๐™ฃ ๐™จ๐™ฉ๐™–๐™ฃ๐™™๐™–๐™ง๐™™๐™จ:
- New datasets cover reasoning, mathematics, exams, and instruction following.
- Math evaluations now span from grade-school levels to expert-tier challenges (GSM8K, PolyMath, AIME).
- While integrating English-heavy and multilingual benchmarks (including Humanityโ€™s Last Exam, GPQA, and BBH in both English and Japanese), we continue to prioritize unique Japanese cultural datasets.

llm-jp/open-japanese-llm-leaderboard-v2

ใฉใ†ใžใŠ้ก˜ใ„่‡ดใ—ใพใ™๏ผ๐Ÿ˜Š
In this post