subhankarg's picture
Upload folder using huggingface_hub
0558aa4 verified

NeMo Forced Aligner (NFA)

Try it out: HuggingFace Space 🎀 | Tutorial: "How to use NFA?" πŸš€ | Blog post: "How does forced alignment work?" πŸ“š

NFA is a tool for generating token-, word- and segment-level timestamps of speech in audio using NeMo's CTC-based Automatic Speech Recognition models. You can provide your own reference text, or use ASR-generated transcription. You can use NeMo's ASR Model checkpoints out of the box in 14+ languages, or train your own model. NFA can be used on long audio files of 1+ hours duration (subject to your hardware and the ASR model used).

Quickstart

  1. Install NeMo.
  2. Prepare a NeMo-style manifest containing the paths of audio files you would like to process, and (optionally) their text.
  3. Run NFA's align.py script with the desired config, e.g.:
    python <path_to_NeMo>/tools/nemo_forced_aligner/align.py \
        pretrained_name="stt_en_fastconformer_hybrid_large_pc" \
        manifest_filepath=<path to manifest of utterances you want to align> \
        output_dir=<path to where your output files will be saved>
    

Documentation

More documentation is available here.