Update README.md
Browse files
README.md
CHANGED
|
@@ -112,8 +112,8 @@ wav = model.generate(
|
|
| 112 |
prompt_text=None, # optional: reference text
|
| 113 |
cfg_value=2.0, # LM guidance on LocDiT, higher for better adherence to the prompt, but maybe worse
|
| 114 |
inference_timesteps=10, # LocDiT inference timesteps, higher for better result, lower for fast speed
|
| 115 |
-
normalize=
|
| 116 |
-
denoise=
|
| 117 |
retry_badcase=True, # enable retrying mode for some bad cases (unstoppable)
|
| 118 |
retry_badcase_max_times=3, # maximum retrying times
|
| 119 |
retry_badcase_ratio_threshold=6.0, # maximum length restriction for bad case detection (simple but effective), it could be adjusted for slow pace speech
|
|
@@ -148,14 +148,14 @@ voxcpm --text "VoxCPM is an innovative end-to-end TTS model from ModelBest, desi
|
|
| 148 |
--prompt-audio path/to/voice.wav \
|
| 149 |
--prompt-text "reference transcript" \
|
| 150 |
--output out.wav \
|
| 151 |
-
--denoise
|
| 152 |
|
| 153 |
# (Optinal) Voice cloning (reference audio + transcript file)
|
| 154 |
voxcpm --text "VoxCPM is an innovative end-to-end TTS model from ModelBest, designed to generate highly expressive speech." \
|
| 155 |
--prompt-audio path/to/voice.wav \
|
| 156 |
--prompt-file "/path/to/text-file" \
|
| 157 |
--output out.wav \
|
| 158 |
-
--denoise
|
| 159 |
|
| 160 |
# 3) Batch processing (one text per line)
|
| 161 |
voxcpm --input examples/input.txt --output-dir outs
|
|
@@ -163,7 +163,7 @@ voxcpm --input examples/input.txt --output-dir outs
|
|
| 163 |
voxcpm --input examples/input.txt --output-dir outs \
|
| 164 |
--prompt-audio path/to/voice.wav \
|
| 165 |
--prompt-text "reference transcript" \
|
| 166 |
-
--denoise
|
| 167 |
|
| 168 |
# 4) Inference parameters (quality/speed)
|
| 169 |
voxcpm --text "..." --output out.wav \
|
|
@@ -216,29 +216,38 @@ First, choose how you’d like to input your text:.
|
|
| 216 |
- ✅ Keep "Text Normalization" ON. Type naturally (e.g., "Hello, world! 123"). The system will automatically process numbers, abbreviations, and punctuation using WeTextProcessing library.
|
| 217 |
2. Phoneme Input (Native Mode)
|
| 218 |
- ❌ Turn "Text Normalization" OFF. Enter phoneme text like {HH AH0 L OW1} (EN) or {ni3}{hao3} (ZH) for precise pronunciation control. In this mode, VoxCPM also supports native understanding of other complex non-normalized text—try it out!
|
|
|
|
|
|
|
| 219 |
|
| 220 |
---
|
| 221 |
### 🍳 Step 2: Choose Your Flavor Profile (Voice Style)
|
| 222 |
|
| 223 |
This is the secret sauce that gives your audio its unique sound.
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
|
|
|
|
|
|
| 231 |
|
| 232 |
---
|
| 233 |
### 🧂 Step 3: The Final Seasoning (Fine-Tuning Your Results)
|
|
|
|
| 234 |
You're ready to serve! But for master chefs who want to tweak the flavor, here are two key spices.
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
-
|
| 240 |
-
|
| 241 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 242 |
|
| 243 |
---
|
| 244 |
Happy creating! 🎉 Start with the default settings and tweak from there to suit your project. The kitchen is yours!
|
|
@@ -259,4 +268,3 @@ Happy creating! 🎉 Start with the default settings and tweak from there to sui
|
|
| 259 |
## 📄 License
|
| 260 |
The VoxCPM model weights and code are open-sourced under the Apache-2.0 license.
|
| 261 |
|
| 262 |
-
|
|
|
|
| 112 |
prompt_text=None, # optional: reference text
|
| 113 |
cfg_value=2.0, # LM guidance on LocDiT, higher for better adherence to the prompt, but maybe worse
|
| 114 |
inference_timesteps=10, # LocDiT inference timesteps, higher for better result, lower for fast speed
|
| 115 |
+
normalize=False, # enable external TN tool, but will disable native raw text support
|
| 116 |
+
denoise=False, # enable external Denoise tool, but it may cause some distortion and restrict the sampling rate to 16kHz
|
| 117 |
retry_badcase=True, # enable retrying mode for some bad cases (unstoppable)
|
| 118 |
retry_badcase_max_times=3, # maximum retrying times
|
| 119 |
retry_badcase_ratio_threshold=6.0, # maximum length restriction for bad case detection (simple but effective), it could be adjusted for slow pace speech
|
|
|
|
| 148 |
--prompt-audio path/to/voice.wav \
|
| 149 |
--prompt-text "reference transcript" \
|
| 150 |
--output out.wav \
|
| 151 |
+
# --denoise
|
| 152 |
|
| 153 |
# (Optinal) Voice cloning (reference audio + transcript file)
|
| 154 |
voxcpm --text "VoxCPM is an innovative end-to-end TTS model from ModelBest, designed to generate highly expressive speech." \
|
| 155 |
--prompt-audio path/to/voice.wav \
|
| 156 |
--prompt-file "/path/to/text-file" \
|
| 157 |
--output out.wav \
|
| 158 |
+
# --denoise
|
| 159 |
|
| 160 |
# 3) Batch processing (one text per line)
|
| 161 |
voxcpm --input examples/input.txt --output-dir outs
|
|
|
|
| 163 |
voxcpm --input examples/input.txt --output-dir outs \
|
| 164 |
--prompt-audio path/to/voice.wav \
|
| 165 |
--prompt-text "reference transcript" \
|
| 166 |
+
# --denoise
|
| 167 |
|
| 168 |
# 4) Inference parameters (quality/speed)
|
| 169 |
voxcpm --text "..." --output out.wav \
|
|
|
|
| 216 |
- ✅ Keep "Text Normalization" ON. Type naturally (e.g., "Hello, world! 123"). The system will automatically process numbers, abbreviations, and punctuation using WeTextProcessing library.
|
| 217 |
2. Phoneme Input (Native Mode)
|
| 218 |
- ❌ Turn "Text Normalization" OFF. Enter phoneme text like {HH AH0 L OW1} (EN) or {ni3}{hao3} (ZH) for precise pronunciation control. In this mode, VoxCPM also supports native understanding of other complex non-normalized text—try it out!
|
| 219 |
+
- **Phoneme Conversion**: For Chinese, phonemes are converted using pinyin. For English, phonemes are converted using CMUDict. Please refer to the relevant documentation for more details.
|
| 220 |
+
|
| 221 |
|
| 222 |
---
|
| 223 |
### 🍳 Step 2: Choose Your Flavor Profile (Voice Style)
|
| 224 |
|
| 225 |
This is the secret sauce that gives your audio its unique sound.
|
| 226 |
+
|
| 227 |
+
#### 1. Cooking with a Prompt Speech (Following a Famous Recipe)
|
| 228 |
+
- A prompt speech provides the desired acoustic characteristics for VoxCPM. The speaker's timbre, speaking style, and even the background sounds and ambiance will be replicated.
|
| 229 |
+
- **For a Clean, Studio-Quality Voice:**
|
| 230 |
+
- ✅ Enable "Prompt Speech Enhancement". This acts like a noise filter, removing background hiss and rumble to give you a pure, clean voice clone.
|
| 231 |
+
|
| 232 |
+
#### 2. Cooking au Naturel (Letting the Model Improvise)
|
| 233 |
+
- If no reference is provided, VoxCPM becomes a creative chef! It will infer a fitting speaking style based on the text itself, thanks to the text-smartness of its foundation model, MiniCPM-4.
|
| 234 |
+
- **Pro Tip**: Challenge VoxCPM with any text—poetry, song lyrics, dramatic monologues—it may deliver some interesting results!
|
| 235 |
|
| 236 |
---
|
| 237 |
### 🧂 Step 3: The Final Seasoning (Fine-Tuning Your Results)
|
| 238 |
+
|
| 239 |
You're ready to serve! But for master chefs who want to tweak the flavor, here are two key spices.
|
| 240 |
+
|
| 241 |
+
#### CFG Value (How Closely to Follow the Recipe)
|
| 242 |
+
- **Default**: A great starting point.
|
| 243 |
+
- **Voice sounds strained or weird?** Lower this value. It tells the model to be more relaxed and improvisational, great for expressive prompts.
|
| 244 |
+
- **Need maximum clarity and adherence to the text?** Raise it slightly to keep the model on a tighter leash.
|
| 245 |
+
- **Short sentences?** Consider increasing the CFG value for better clarity and adherence.
|
| 246 |
+
- **Long texts?** Consider lowering the CFG value to improve stability and naturalness over extended passages.
|
| 247 |
+
|
| 248 |
+
#### Inference Timesteps (Simmering Time: Quality vs. Speed)
|
| 249 |
+
- **Need a quick snack?** Use a lower number. Perfect for fast drafts and experiments.
|
| 250 |
+
- **Cooking a gourmet meal?** Use a higher number. This lets the model "simmer" longer, refining the audio for superior detail and naturalness.
|
| 251 |
|
| 252 |
---
|
| 253 |
Happy creating! 🎉 Start with the default settings and tweak from there to suit your project. The kitchen is yours!
|
|
|
|
| 268 |
## 📄 License
|
| 269 |
The VoxCPM model weights and code are open-sourced under the Apache-2.0 license.
|
| 270 |
|
|
|