google/fleurs
Viewer • Updated • 768k • 58.2k • 405
How to use junnei/Phi-4-multimodal-instruct-ko-asr with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="junnei/Phi-4-multimodal-instruct-ko-asr", trust_remote_code=True) # Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("junnei/Phi-4-multimodal-instruct-ko-asr", trust_remote_code=True, dtype="auto")This model is fine-tuned from microsoft/Phi-4-multimodal-instruct on Bingsu/zeroth-korean, google/flerus in 5 epochs.
This model is trained 960 steps on datasets for Korean Audio Speech Recognition on H100.
After that, we continue training with CoVoST2 Dataset / CoVoST2-Ko for AST.
AST Finetuned model is Here : Phi-4-multimodal-instruct-ko-speech
Evaluation was done on the following datasets:
Script is retrieved from here.
Compared to Phi-4-mm-inst-zeroth-kor and Phi-4-multimodal-finetune-ko-speech, ASR is significantly improved.
| Model | zeroth-CER | zeroth-WER | fleurs-ko_en-BLEU | fleurs-ko_en-cot-BLEU | fleurs-en_ko-BLEU | fleurs-en_ko-cot-BLEU |
|---|---|---|---|---|---|---|
| original | 198.32 | - | 5.63 | 2.42 | 6.86 | 4.17 |
| daekeun-ml/Phi-4-multimodal-finetune-ko-speech | 1.61 | 3.54 | 7.67 | 8.38 | 12.31 | 9.69 |
| seastar105/Phi-4-mm-inst-zeroth-kor | 7.02 | - | 7.07 | 9.19 | 13.08 | 9.35 |
| ASR finetune(this model) | 1.31 | 2.95 | 7.46 | 6.24 | 12.15 | 8.91 |
| + 1 epoch finetune with Covost-Ko | 3.88 | - | 8.07 | 10.09 | 18.82 | 15.41 |
| AST finetuned model | 1.77 | 2.99 | 8.01 | 9.09 | 17.09 | 11.82 |
Base model
microsoft/Phi-4-multimodal-instruct