| | --- |
| | title: Arabic Function Calling Leaderboard |
| | emoji: 🏆 |
| | colorFrom: green |
| | colorTo: blue |
| | sdk: gradio |
| | sdk_version: 4.44.0 |
| | app_file: app.py |
| | pinned: true |
| | license: apache-2.0 |
| | tags: |
| | - arabic |
| | - function-calling |
| | - leaderboard |
| | - llm-evaluation |
| | --- |
| | |
| | # 🏆 Arabic Function Calling Leaderboard |
| |
|
| | لوحة تقييم استدعاء الدوال بالعربية |
| |
|
| | ## Overview |
| |
|
| | The **Arabic Function Calling Leaderboard (AFCL)** evaluates Large Language Models on their ability to: |
| |
|
| | 1. Understand Arabic queries (MSA + Dialects) |
| | 2. Select appropriate functions from available options |
| | 3. Extract correct arguments from Arabic text |
| | 4. Handle parallel and complex function calls |
| | 5. Detect when no function should be called |
| |
|
| | ## Models Evaluated |
| |
|
| | - **Arabic-Native**: Jais, ALLaM, SILMA, AceGPT |
| | - **Multilingual**: Qwen, Llama, Gemma, Mistral, Phi, BLOOMZ, Aya |
| |
|
| | ## Dataset |
| |
|
| | 📊 **Dataset**: [HeshamHaroon/Arabic_Function_Calling](https://huggingface.co/datasets/HeshamHaroon/Arabic_Function_Calling) |
| |
|
| | - **1,470 total samples** across 10 categories |
| | - Simple, Multiple, Parallel, Parallel Multiple |
| | - Irrelevance Detection |
| | - Dialect Handling (Egyptian, Gulf, Levantine) |
| |
|
| | ## Evaluation |
| |
|
| | The leaderboard automatically evaluates models using the HuggingFace Inference API when the Space starts. |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{afcl2024, |
| | title={Arabic Function Calling Leaderboard}, |
| | author={Hesham Haroon}, |
| | year={2024}, |
| | url={https://huggingface.co/spaces/HeshamHaroon/Arabic-Function-Calling-Leaderboard} |
| | } |
| | ``` |
| |
|