GPT-SoVITS - Voice Conversion and Text-to-Speech WebUI

Demo Video and Features

Check out our demo video in Chinese: Bilibili Demo

https://github.com/RVC-Boss/GPT-SoVITS/assets/129054828/05bee1fa-bdd8-4d85-9350-80c060ab47fb

Features:

Zero-shot TTS: Input a 5-second vocal sample and experience instant text-to-speech conversion.
Few-shot TTS: Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.
Cross-lingual Support: Inference in languages different from the training dataset, currently supporting English, Japanese, and Chinese.
WebUI Tools: Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.

Todo List

High Priority:
- Localization in Japanese and English.
- User guide.
Features:
- Zero-shot voice conversion (5s) / few-shot voice conversion (1min).
- TTS speaking speed control.
- Enhanced TTS emotion control.
- Experiment with changing SoVITS token inputs to probability distribution of vocabs.
- Improve English and Japanese text frontend.
- Develop tiny and larger-sized TTS models.
- Colab scripts.
- Expand training dataset (2k -> 10k).

Requirements (How to Install)

Python and PyTorch Version

Tested with Python 3.9, PyTorch 2.0.1, and CUDA 11.

Quick Install with Conda

conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
bash install.sh

Install Manually

Pip Packages

pip install torch numpy scipy tensorboard librosa==0.9.2 numba==0.56.4 pytorch-lightning gradio==3.14.0 ffmpeg-python onnxruntime tqdm cn2an pypinyin pyopenjtalk g2p_en chardet

Additional Requirements

If you need Chinese ASR (supported by FunASR), install:

pip install modelscope torchaudio sentencepiece funasr

FFmpeg

Conda Users

conda install ffmpeg

Ubuntu/Debian Users

sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'

MacOS Users

brew install ffmpeg

Windows Users

Download and place ffmpeg.exe and ffprobe.exe in the GPT-SoVITS root.

Pretrained Models

Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS\pretrained_models.

For Chinese ASR, download models from Damo ASR Models and place them in tools/damo_asr/models.

For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal), download models from UVR5 Weights and place them in tools/uvr5/uvr5_weights.

Dataset Format

The TTS annotation .list file format:

vocal_path|speaker_name|language|text

Example:

D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.

Language dictionary:

'zh': Chinese
'ja': Japanese
'en': English

Credits

Special thanks to the following projects and contributors:

houzi / GPT-SoVITS

GPT-SoVITS - Voice Conversion and Text-to-Speech WebUI

Demo Video and Features

Features:

Todo List

Requirements (How to Install)

Python and PyTorch Version

Quick Install with Conda

Install Manually

Pip Packages

Additional Requirements

FFmpeg

Conda Users

Ubuntu/Debian Users

MacOS Users

Windows Users

Pretrained Models

Dataset Format

Credits

简介

发行版

贡献者

近期动态

houzi / GPT-SoVITS .gitee-modal { width: 500px !important; }

GPT-SoVITS - Voice Conversion and Text-to-Speech WebUI

Demo Video and Features

Features:

Todo List

Requirements (How to Install)

Python and PyTorch Version

Quick Install with Conda

Install Manually

Pip Packages

Additional Requirements

FFmpeg

Conda Users

Ubuntu/Debian Users

MacOS Users

Windows Users

Pretrained Models

Dataset Format

Credits

简介

发行版

贡献者

近期动态

搜索帮助

houzi / GPT-SoVITS