Spaces:

subhankarg
/

MagpieTTS_Internal_Demo

Runtime error

App Files Files Community

MagpieTTS_Internal_Demo / tools /nemo_forced_aligner /README.md

subhankarg

Upload folder using huggingface_hub

0558aa4 verified 7 days ago

preview code

raw

history blame contribute delete

1.97 kB

NeMo Forced Aligner (NFA)

Try it out: HuggingFace Space 🎤 | Tutorial: "How to use NFA?" 🚀 | Blog post: "How does forced alignment work?" 📚

NFA is a tool for generating token-, word- and segment-level timestamps of speech in audio using NeMo's CTC-based Automatic Speech Recognition models. You can provide your own reference text, or use ASR-generated transcription. You can use NeMo's ASR Model checkpoints out of the box in 14+ languages, or train your own model. NFA can be used on long audio files of 1+ hours duration (subject to your hardware and the ASR model used).

Quickstart

Install NeMo.
Prepare a NeMo-style manifest containing the paths of audio files you would like to process, and (optionally) their text.

Run NFA's align.py script with the desired config, e.g.:

python <path_to_NeMo>/tools/nemo_forced_aligner/align.py \
    pretrained_name="stt_en_fastconformer_hybrid_large_pc" \
    manifest_filepath=<path to manifest of utterances you want to align> \
    output_dir=<path to where your output files will be saved>

NeMo Forced Aligner (NFA)

Quickstart

Documentation