Spaces:

HeshamHaroon
/

Arabic-Function-Calling-Leaderboard

Sleeping

Update: Auto-evaluation on Space startup

de63c9e verified 4 months ago

1.51 kB

	---
	title: Arabic Function Calling Leaderboard
	emoji: 🏆
	colorFrom: green
	colorTo: blue
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: true
	license: apache-2.0
	tags:
	- arabic
	- function-calling
	- leaderboard
	- llm-evaluation
	---

	# 🏆 Arabic Function Calling Leaderboard

	لوحة تقييم استدعاء الدوال بالعربية

	## Overview

	The Arabic Function Calling Leaderboard (AFCL) evaluates Large Language Models on their ability to:

	1. Understand Arabic queries (MSA + Dialects)
	2. Select appropriate functions from available options
	3. Extract correct arguments from Arabic text
	4. Handle parallel and complex function calls
	5. Detect when no function should be called

	## Models Evaluated

	- Arabic-Native: Jais, ALLaM, SILMA, AceGPT
	- Multilingual: Qwen, Llama, Gemma, Mistral, Phi, BLOOMZ, Aya

	## Dataset

	📊 Dataset: [HeshamHaroon/Arabic_Function_Calling](https://huggingface.co/datasets/HeshamHaroon/Arabic_Function_Calling)

	- 1,470 total samples across 10 categories
	- Simple, Multiple, Parallel, Parallel Multiple
	- Irrelevance Detection
	- Dialect Handling (Egyptian, Gulf, Levantine)

	## Evaluation

	The leaderboard automatically evaluates models using the HuggingFace Inference API when the Space starts.

	## Citation

	```bibtex
	@misc{afcl2024,
	title={Arabic Function Calling Leaderboard},
	author={Hesham Haroon},
	year={2024},
	url={https://huggingface.co/spaces/HeshamHaroon/Arabic-Function-Calling-Leaderboard}
	}
	```