Whisper Small — German Fine-tune (opus7)

Fine-tuned openai/whisper-small on German ASR using Common Voice 15 + FLEURS + MLS German (~1.9k hours of clean audiobook speech).

Training details

Setting	Value
Base model	`openai/whisper-small`
Language	German (`de`)
Datasets	Common Voice 15 + FLEURS + MLS
Trainable params	Encoder (minus conv) + Decoder (~99%)
Scheduler	Cosine decay
Augmentation	Speed perturbation (±10%, 30% prob)
Eval decoding	Beam search (5)
Best WER	0.1033 (step 14000)
Baseline WER	0.1371 (pre-training)

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

processor = WhisperProcessor.from_pretrained("vilassn/whisper-small-german-new")
model     = WhisperForConditionalGeneration.from_pretrained("vilassn/whisper-small-german-new")
model.config.use_cache = True

# audio_array: np.ndarray at 16 kHz
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    ids = model.generate(**inputs, language="de", task="transcribe", num_beams=5)
print(processor.batch_decode(ids, skip_special_tokens=True)[0])

Notes

Mozilla moved Common Voice 17+ to the Mozilla Data Collective in October 2025, so CV15 community mirror (fsicoli/common_voice_15_0) is the best openly available option.
Labels keep raw German punctuation and capitalisation during training; WER is computed with Whisper's BasicTextNormalizer for fair comparison.
Intermediate checkpoints are available under checkpoint-N/ folders in this repo.

Downloads last month: 163

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vilassn/whisper-small-german

Base model

openai/whisper-small

Finetuned

(3548)

this model

Datasets used to train vilassn/whisper-small-german

Evaluation results

Word Error Rate on Common Voice 15 (de)
test set self-reported

0.103