facebook
/

wav2vec2-large-mt-voxpopuli-v2

Automatic Speech Recognition

Model card Files Files and versions

wav2vec2-large-mt-voxpopuli-v2 / README.md

patrickvonplaten's picture

patrickvonplaten

add model

51ecaa5 about 4 years ago

|

history blame contribute delete

1.37 kB

	---
	language: mt
	tags:
	- audio
	- automatic-speech-recognition
	- voxpopuli-v2
	datasets:
	- voxpopuli
	license: cc-by-nc-4.0
	inference: false
	---

	# Wav2Vec2-large-VoxPopuli-V2

	[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) large model pretrained only in mt on 9.1 unlabeled datat of the [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).

	The model is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

	Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data in mt. Check out [this blog](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) for a more in-detail explanation of how to fine-tune the model.

	Paper: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
	Learning, Semi-Supervised Learning and Interpretation](https://arxiv.org/abs/2101.00390)*

	Authors: Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux from Facebook AI.

	See the official website for more information, [here](https://github.com/facebookresearch/voxpopuli/).