fsicoli/common_voice_15_0
Updated • 94.9k • 6
Fine-tuned openai/whisper-small on German ASR
using Common Voice 15 + FLEURS + MLS German (~1.9k hours of clean audiobook speech).
| Setting | Value |
|---|---|
| Base model | openai/whisper-small |
| Language | German (de) |
| Datasets | Common Voice 15 + FLEURS + MLS |
| Trainable params | Encoder (minus conv) + Decoder (~99%) |
| Scheduler | Cosine decay |
| Augmentation | Speed perturbation (±10%, 30% prob) |
| Eval decoding | Beam search (5) |
| Best WER | 0.1033 (step 14000) |
| Baseline WER | 0.1371 (pre-training) |
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
processor = WhisperProcessor.from_pretrained("vilassn/whisper-small-german-new")
model = WhisperForConditionalGeneration.from_pretrained("vilassn/whisper-small-german-new")
model.config.use_cache = True
# audio_array: np.ndarray at 16 kHz
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
ids = model.generate(**inputs, language="de", task="transcribe", num_beams=5)
print(processor.batch_decode(ids, skip_special_tokens=True)[0])
fsicoli/common_voice_15_0) is the best openly available option.BasicTextNormalizer for fair comparison.checkpoint-N/ folders in this repo.Base model
openai/whisper-small