| --- |
| language: mt |
| tags: |
| - audio |
| - automatic-speech-recognition |
| - voxpopuli-v2 |
| datasets: |
| - voxpopuli |
| license: cc-by-nc-4.0 |
| inference: false |
| --- |
| |
| # Wav2Vec2-large-VoxPopuli-V2 |
|
|
| [Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) large model pretrained only in **mt** on **9.1** unlabeled datat of the [VoxPopuli corpus](https://arxiv.org/abs/2101.00390). |
|
|
| The model is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. |
|
|
| **Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data in **mt**. Check out [this blog](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) for a more in-detail explanation of how to fine-tune the model. |
|
|
| **Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation |
| Learning, Semi-Supervised Learning and Interpretation](https://arxiv.org/abs/2101.00390)* |
|
|
| **Authors**: *Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux* from *Facebook AI*. |
|
|
| See the official website for more information, [here](https://github.com/facebookresearch/voxpopuli/). |
|
|