Instructions to use majentik/gemma-4-e4b-mlx-elderwise-MERaLiON with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use majentik/gemma-4-e4b-mlx-elderwise-MERaLiON with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir gemma-4-e4b-mlx-elderwise-MERaLiON majentik/gemma-4-e4b-mlx-elderwise-MERaLiON
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Gemma-4-E4B-BF16 + MERaLiON Speech LoRA for Singapore English (MLX)
A composed Singapore-English ASR model that connects the MERaLiON-3 speech encoder to a BF16 Gemma-4-E4B decoder through a trained projector and rank-16 speech LoRA.
This BF16 release is the recommended quality-first edition: it keeps the decoder in native bfloat16, avoids quantization artifacts, and improves the standalone MERaLiON-3 baseline by 9.69 WER points on the MNSC ASR Part 2 test set.
Important: this is a composed MLX bundle, not a vanilla
transformers.pipelinecheckpoint. Use theelderwiseruntime (or equivalent wiring) to connectspeech_encoder/,projector/,decoder/, andlora/.
Result summary
Evaluated on MERaLiON Multitask National Speech Corpus v1 — ASR Part 2 Test (3000 utterance-level clips).
| System | WER ↓ | Notes |
|---|---|---|
| MERaLiON-3 baseline | 25.78% | stock MERaLiON-3 encoder + native decoder |
| 8-bit Gemma-4 + MERaLiON speech LoRA | 18.86% | smaller sibling release |
| This BF16 release | 16.09% | best-quality bundle |
- Absolute improvement vs. MERaLiON-3 baseline: −9.69pp
- Absolute improvement vs. 8-bit sibling: −2.77pp
- Normalization: lowercase, ASCII punctuation stripped, whitespace collapsed, speaker-prefix tags removed from reference and hypothesis.
Example outputs
These are actual model outputs from artifacts/run_3000_bf16_r16mlp/eval_predictions.jsonl, selected from the held-out MNSC ASR Part 2 test set. Each row scores 0% WER under the release normalizer (lowercase, punctuation removed, whitespace collapsed).
| # | Reference | Model output | WER |
|---|---|---|---|
| 1 | There IS A Food Court Selling Chicken Pasta behind Delmas' House | There is a food court selling Chicken Pasta behind Delma's house. | 0% |
| 2 | what is the distance to The Seletar Mall | What is the distance to The Seletar Mall? | 0% |
| 3 | Number sequence IS S seven six nine Zero four one three A and Date of birth IS thirteen September nineteen seventy seven | Number sequence is S. seven, six, nine, zero, four, one, three, A, and date of birth is thirteen, September, nineteen seventy seven. | 0% |
| 4 | six nine eight four four six eight three five three | Six, nine, eight, four, four, six, eight, three, five, three. | 0% |
| 5 | eight five six four one seven four five | Eight, five, six, four, one, seven, four, five. | 0% |
| 6 | Pita is a Traditional Local Cuisine | Pita is a traditional local cuisine. | 0% |
| 7 | it is faster to take the bus to Jalan Asas | It is faster to take the bus to Jalan Asas. | 0% |
| 8 | a new television show documented the lives of various people including Syed Sheikh Syed Ahmad Al Hadi and Lucien Wang | A new television show documented the lives of various people, including Syed Sheikh Syed Ahmad Al Hadi and Lucien Wang. | 0% |
| 9 | Hiyashi Chuka Takikomi Gohan and Fugu | Hiyashi Chuka Takikomi Gohan and Fugu. | 0% |
| 10 | where can I get cheap food in Kathmandu | Where can I get Cheap Food in Kathmandu? | 0% |
Across the full saved evaluation file, 1071 / 3000 utterances scored 0% WER, and another 857 scored ≤20% WER under the same normalizer.
What is inside
| Path | Contents | Precision |
|---|---|---|
decoder/ |
Gemma-4-E4B instruction decoder, MLX format | bfloat16 |
speech_encoder/ |
MERaLiON-3 acoustic encoder + frame adaptor | fp16 |
projector/ |
LayerNorm -> Linear(3584,3072) -> SiLU -> Linear(3072,2560) -> RMSNorm |
fp32 |
lora/ |
rank-16 speech-alignment LoRA adapters + lora_config.json |
fp32 |
config.json |
composition manifest | JSON |
PROVENANCE.md |
chain of custody, evaluation, license notes | Markdown |
The speech path is:
audio -> Whisper-style log-mel -> MERaLiON-3 encoder/adaptor -> 3584-d speech embeddings
-> projector -> 2560-d Gemma embedding space -> Gemma-4-E4B BF16 + speech LoRA -> text
Quickstart
Install or clone the elderwise runtime that wires the components together:
pip install git+https://github.com/ajentik/elderwise-mlx.git
# or: git clone https://github.com/ajentik/elderwise-mlx && pip install -e elderwise-mlx
Then load the composed bundle:
from pathlib import Path
from elderwise.inference import load_pipeline, transcribe_with_pipeline
from huggingface_hub import snapshot_download
bundle = Path(snapshot_download("majentik/gemma-4-e4b-mlx-elderwise-MERaLiON"))
pipeline = load_pipeline(
meralion_dir=str(bundle / "speech_encoder"),
gemma_id=str(bundle / "decoder"),
projector_path=str(bundle / "projector"),
lora_path=str(bundle / "lora"),
lora_rank=16,
lora_target_names=(
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
),
)
text = transcribe_with_pipeline(pipeline, "your_audio.wav", max_tokens=128)
print(text)
Runtime notes:
lora_pathshould point to the directory containingadapters.safetensors(lora/), not to the file itself.- The target module list must match the adapter:
q/k/v/o/gate/up/downacross all 42 decoder layers. - Use the prompt
Transcribe the following audio:unless you intentionally fine-tune/evaluate a different prompt contract. - The speech LoRA is switchable in the runtime: enable speech mode for ASR, disable/scale to
0.0for plain text generation.
Intended use
Good fits:
- Singapore English / Singlish automatic speech recognition
- utterance-level voice notes, routing, search, and agent input
- MLX-native speech-language research with a shared text decoder
Not intended for:
- safety-critical or legal/medical transcription
- diarization, timestamps, speaker identification, or streaming ASR
- Mandarin-only ASR; a separate switchable Mandarin LoRA is planned
Limitations
- The LoRA is specialized for Singapore English. Other accents and languages may degrade.
- Residual errors mostly cluster around rare or ambiguous proper nouns, especially code-switched names and places.
- Long-form audio was not the optimization target; split long recordings into utterance-sized chunks.
- This repo is a composed bundle. Generic hub inference widgets will not know how to run it without the
elderwiseruntime.
Architecture details
- Speech encoder output dimension: 3584
- Projector hidden dimension: 3072
- Decoder embedding dimension: 2560
- Decoder depth: 42 layers
- LoRA rank: 16
- LoRA targets:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Speech-mode LoRA scale used by the release runtime: 20.0
Gemma-4's per-layer embedding side channel is handled in the runtime by supplying explicit per-layer inputs for speech positions instead of forcing speech embeddings through token nearest-neighbor recovery.
Provenance and licenses
See PROVENANCE.md for the full chain of custody. Summary:
- Decoder:
google/gemma-4-E4B-it, converted to MLX bfloat16; Gemma Terms of Use apply. - Speech tower:
MERaLiON/MERaLiON-3-10B; MERaLiON release terms apply. - Training data source:
MERaLiON/Multitask-National-Speech-Corpus-v1; MNSC terms apply. - Projector + LoRA: trained alignment components for this composition; distributed with the same upstream obligations.
Internal optimization recipe and hardware details are intentionally omitted from the public package.
Citation
@misc{gemma4_meralion_bf16_speech_lora_mlx_2026,
title = {Gemma-4-E4B-BF16 + MERaLiON Speech LoRA for Singapore English (MLX)},
author = {majentik},
year = {2026},
url = {https://huggingface.co/majentik/gemma-4-e4b-mlx-elderwise-MERaLiON}
}
Related releases
- 8-bit sibling:
majentik/Gemma-4-E4B-MERaLiON-Speech-LoRA-MNSC-MLX— smaller, 18.86% WER. - This BF16 edition is the recommended release for best transcription quality.
- Downloads last month
- 76
Quantized
Model tree for majentik/gemma-4-e4b-mlx-elderwise-MERaLiON
Dataset used to train majentik/gemma-4-e4b-mlx-elderwise-MERaLiON
Evaluation results
- WER on MNSC ASR Part 2 Testtest set self-reported16.090