Whisper Small — German Fine-tune (opus7)

Fine-tuned openai/whisper-small on German ASR using Common Voice 15 + FLEURS + MLS German (~1.9k hours of clean audiobook speech).

Training details

Setting Value
Base model openai/whisper-small
Language German (de)
Datasets Common Voice 15 + FLEURS + MLS
Trainable params Encoder (minus conv) + Decoder (~99%)
Scheduler Cosine decay
Augmentation Speed perturbation (±10%, 30% prob)
Eval decoding Beam search (5)
Best WER 0.1033 (step 14000)
Baseline WER 0.1371 (pre-training)

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

processor = WhisperProcessor.from_pretrained("vilassn/whisper-small-german-new")
model     = WhisperForConditionalGeneration.from_pretrained("vilassn/whisper-small-german-new")
model.config.use_cache = True

# audio_array: np.ndarray at 16 kHz
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    ids = model.generate(**inputs, language="de", task="transcribe", num_beams=5)
print(processor.batch_decode(ids, skip_special_tokens=True)[0])

Notes

  • Mozilla moved Common Voice 17+ to the Mozilla Data Collective in October 2025, so CV15 community mirror (fsicoli/common_voice_15_0) is the best openly available option.
  • Labels keep raw German punctuation and capitalisation during training; WER is computed with Whisper's BasicTextNormalizer for fair comparison.
  • Intermediate checkpoints are available under checkpoint-N/ folders in this repo.
Downloads last month
163
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vilassn/whisper-small-german

Finetuned
(3548)
this model

Datasets used to train vilassn/whisper-small-german

Evaluation results