Check out our demo video in Chinese: Bilibili Demo
https://github.com/RVC-Boss/GPT-SoVITS/assets/129054828/05bee1fa-bdd8-4d85-9350-80c060ab47fb
Zero-shot TTS: Input a 5-second vocal sample and experience instant text-to-speech conversion.
Few-shot TTS: Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.
Cross-lingual Support: Inference in languages different from the training dataset, currently supporting English, Japanese, and Chinese.
WebUI Tools: Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.
High Priority:
Features:
Tested with Python 3.9, PyTorch 2.0.1, and CUDA 11.
conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
bash install.sh
pip install torch numpy scipy tensorboard librosa==0.9.2 numba==0.56.4 pytorch-lightning gradio==3.14.0 ffmpeg-python onnxruntime tqdm cn2an pypinyin pyopenjtalk g2p_en chardet
If you need Chinese ASR (supported by FunASR), install:
pip install modelscope torchaudio sentencepiece funasr
conda install ffmpeg
sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'
brew install ffmpeg
Download and place ffmpeg.exe and ffprobe.exe in the GPT-SoVITS root.
Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS\pretrained_models
.
For Chinese ASR, download models from Damo ASR Models and place them in tools/damo_asr/models
.
For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal), download models from UVR5 Weights and place them in tools/uvr5/uvr5_weights
.
The TTS annotation .list file format:
vocal_path|speaker_name|language|text
Example:
D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
Language dictionary:
Special thanks to the following projects and contributors:
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。