Automatic Speech Recognition
Transformers
PyTorch
TensorFlow
JAX
TensorBoard
ONNX
Safetensors
whisper
audio
asr
hf-asr-leaderboard
Instructions to use NbAiLabBeta/nb-whisper-base-semantic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NbAiLabBeta/nb-whisper-base-semantic with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="NbAiLabBeta/nb-whisper-base-semantic")# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("NbAiLabBeta/nb-whisper-base-semantic") model = AutoModelForMultimodalLM.from_pretrained("NbAiLabBeta/nb-whisper-base-semantic") - Notebooks
- Google Colab
- Kaggle
updated template
Browse files
README.md
CHANGED
|
@@ -114,23 +114,26 @@ asr("king.mp3", generate_kwargs={'task': 'transcribe', 'language': 'no'})
|
|
| 114 |
</details>
|
| 115 |
|
| 116 |
#### Extended HuggingFace
|
| 117 |
-
Examining the output above, we see that there are multiple repetitions at the end. This is because the
|
| 118 |
|
| 119 |
```python
|
| 120 |
# Long Transcripts
|
| 121 |
-
asr("king.mp3", chunk_length_s=
|
|
|
|
|
|
|
|
|
|
| 122 |
|
| 123 |
# Return Timestamps
|
| 124 |
-
asr("king.mp3", chunk_length_s=
|
| 125 |
|
| 126 |
# Return Word Level Timestamps
|
| 127 |
-
asr("king.mp3", chunk_length_s=
|
| 128 |
|
| 129 |
# Transcribe to Nynorsk
|
| 130 |
-
asr("king.mp3", chunk_length_s=
|
| 131 |
|
| 132 |
# Transcribe to English
|
| 133 |
-
asr("king.mp3", chunk_length_s=
|
| 134 |
|
| 135 |
```
|
| 136 |
<details>
|
|
|
|
| 114 |
</details>
|
| 115 |
|
| 116 |
#### Extended HuggingFace
|
| 117 |
+
Examining the output above, we see that there are multiple repetitions at the end. This is because the video is longer than 30 seconds. By passing the ```chunk_lengt_s``` argument, we can transcribe longer file. Our experience is that we get slightly better result by setting that to 28 seconds instead of the default 30 seconds. We also recommend setting the beam size to 5 if possible. This greatly increases the accuracy but takes a bit longer and requires slightly more memory. The examples below also illustrates how to transcribe to English or Nynorsk, and how to get timestamps for sentences and words.
|
| 118 |
|
| 119 |
```python
|
| 120 |
# Long Transcripts
|
| 121 |
+
asr("king.mp3", chunk_length_s=28, generate_kwargs={'task': 'transcribe', 'language': 'no'})
|
| 122 |
+
|
| 123 |
+
# Increase accuracy by setting beam size to 5
|
| 124 |
+
asr("king.mp3", chunk_length_s=28, return_timestamps=True, generate_kwargs={'num_beams': 5, 'task': 'transcribe', 'language': 'no'})
|
| 125 |
|
| 126 |
# Return Timestamps
|
| 127 |
+
asr("king.mp3", chunk_length_s=28, return_timestamps=True, generate_kwargs={'task': 'transcribe', 'language': 'no'})
|
| 128 |
|
| 129 |
# Return Word Level Timestamps
|
| 130 |
+
asr("king.mp3", chunk_length_s=28, return_timestamps="word", generate_kwargs={'task': 'transcribe', 'language': 'no'})
|
| 131 |
|
| 132 |
# Transcribe to Nynorsk
|
| 133 |
+
asr("king.mp3", chunk_length_s=28, generate_kwargs={'task': 'transcribe', 'language': 'nn'})
|
| 134 |
|
| 135 |
# Transcribe to English
|
| 136 |
+
asr("king.mp3", chunk_length_s=28, generate_kwargs={'task': 'transcribe', 'language': 'en'})
|
| 137 |
|
| 138 |
```
|
| 139 |
<details>
|