Spaces:
Runtime error
Runtime error
NeMo Forced Aligner (NFA)
Try it out: HuggingFace Space π€ | Tutorial: "How to use NFA?" π | Blog post: "How does forced alignment work?" π
NFA is a tool for generating token-, word- and segment-level timestamps of speech in audio using NeMo's CTC-based Automatic Speech Recognition models. You can provide your own reference text, or use ASR-generated transcription. You can use NeMo's ASR Model checkpoints out of the box in 14+ languages, or train your own model. NFA can be used on long audio files of 1+ hours duration (subject to your hardware and the ASR model used).
Quickstart
- Install NeMo.
- Prepare a NeMo-style manifest containing the paths of audio files you would like to process, and (optionally) their text.
- Run NFA's
align.pyscript with the desired config, e.g.:python <path_to_NeMo>/tools/nemo_forced_aligner/align.py \ pretrained_name="stt_en_fastconformer_hybrid_large_pc" \ manifest_filepath=<path to manifest of utterances you want to align> \ output_dir=<path to where your output files will be saved>
Documentation
More documentation is available here.