Fix errors in mkdocs.yml and ipynb (#988)

This commit is contained in:
PoTaTo 2025-06-03 13:02:38 +08:00 committed by GitHub
parent b8369249e3
commit 3d31a80ad1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
18 changed files with 748 additions and 748 deletions

View File

@ -1,108 +1,50 @@
# Inference
# Introduction
As the vocoder model has been changed, you need more VRAM than before, 12GB is recommended for fluently inference.
<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
</a>
</div>
We support command line, HTTP API and WebUI for inference, you can choose any method you like.
!!! warning
We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area. <br/>
This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.
## Download Weights
## Requirements
First you need to download the model weights:
- GPU Memory: 12GB (Inference)
- System: Linux, Windows
## Setup
First, we need to create a conda environment to install the packages.
```bash
huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
conda create -n fish-speech python=3.12
conda activate fish-speech
pip install sudo apt-get install portaudio19-dev # For pyaudio
pip install -e . # This will download all rest packages.
apt install libsox-dev ffmpeg # If needed.
```
## Command Line Inference
!!! warning
The `compile` option is not supported on windows and macOS, if you want to run with compile, you need to install trition by yourself.
!!! note
If you plan to let the model randomly choose a voice timbre, you can skip this step.
## Acknowledgements
### 1. Get VQ tokens from reference audio
```bash
python fish_speech/models/dac/inference.py \
-i "ref_audio_name.wav" \
--checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
```
You should get a `fake.npy` and a `fake.wav`.
### 2. Generate semantic tokens from text:
```bash
python fish_speech/models/text2semantic/inference.py \
--text "The text you want to convert" \
--prompt-text "Your reference text" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/openaudio-s1-mini" \
--num-samples 2 \
--compile # if you want a faster speed
```
This command will create a `codes_N` file in the working directory, where N is an integer starting from 0.
!!! note
You may want to use `--compile` to fuse CUDA kernels for faster inference (~30 tokens/second -> ~500 tokens/second).
Correspondingly, if you do not plan to use acceleration, you can comment out the `--compile` parameter.
!!! info
For GPUs that do not support bf16, you may need to use the `--half` parameter.
### 3. Generate vocals from semantic tokens:
#### VQGAN Decoder
!!! warning "Future Warning"
We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## HTTP API Inference
We provide a HTTP API for inference. You can use the following command to start the server:
```bash
python -m tools.api_server \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
> If you want to speed up inference, you can add the `--compile` parameter.
After that, you can view and test the API at http://127.0.0.1:8080/.
## GUI Inference
[Download client](https://github.com/AnyaCoder/fish-speech-gui/releases)
## WebUI Inference
You can start the WebUI using the following command:
```bash
python -m tools.run_webui \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
Or simply
```bash
python -m tools.run_webui
```
> If you want to speed up inference, you can add the `--compile` parameter.
!!! note
You can save the label file and reference audio file in advance to the `references` folder in the main directory (which you need to create yourself), so that you can directly call them in the WebUI.
!!! note
You can use Gradio environment variables, such as `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` to configure WebUI.
Enjoy!
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

108
docs/en/inference.md Normal file
View File

@ -0,0 +1,108 @@
# Inference
As the vocoder model has been changed, you need more VRAM than before, 12GB is recommended for fluently inference.
We support command line, HTTP API and WebUI for inference, you can choose any method you like.
## Download Weights
First you need to download the model weights:
```bash
huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
```
## Command Line Inference
!!! note
If you plan to let the model randomly choose a voice timbre, you can skip this step.
### 1. Get VQ tokens from reference audio
```bash
python fish_speech/models/dac/inference.py \
-i "ref_audio_name.wav" \
--checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
```
You should get a `fake.npy` and a `fake.wav`.
### 2. Generate semantic tokens from text:
```bash
python fish_speech/models/text2semantic/inference.py \
--text "The text you want to convert" \
--prompt-text "Your reference text" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/openaudio-s1-mini" \
--num-samples 2 \
--compile # if you want a faster speed
```
This command will create a `codes_N` file in the working directory, where N is an integer starting from 0.
!!! note
You may want to use `--compile` to fuse CUDA kernels for faster inference (~30 tokens/second -> ~500 tokens/second).
Correspondingly, if you do not plan to use acceleration, you can comment out the `--compile` parameter.
!!! info
For GPUs that do not support bf16, you may need to use the `--half` parameter.
### 3. Generate vocals from semantic tokens:
#### VQGAN Decoder
!!! warning "Future Warning"
We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## HTTP API Inference
We provide a HTTP API for inference. You can use the following command to start the server:
```bash
python -m tools.api_server \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
> If you want to speed up inference, you can add the `--compile` parameter.
After that, you can view and test the API at http://127.0.0.1:8080/.
## GUI Inference
[Download client](https://github.com/AnyaCoder/fish-speech-gui/releases)
## WebUI Inference
You can start the WebUI using the following command:
```bash
python -m tools.run_webui \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
Or simply
```bash
python -m tools.run_webui
```
> If you want to speed up inference, you can add the `--compile` parameter.
!!! note
You can save the label file and reference audio file in advance to the `references` folder in the main directory (which you need to create yourself), so that you can directly call them in the WebUI.
!!! note
You can use Gradio environment variables, such as `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` to configure WebUI.
Enjoy!

View File

@ -1,50 +0,0 @@
# Introduction
<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
</a>
</div>
!!! warning
We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area. <br/>
This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.
## Requirements
- GPU Memory: 12GB (Inference)
- System: Linux, Windows
## Setup
First, we need to create a conda environment to install the packages.
```bash
conda create -n fish-speech python=3.12
conda activate fish-speech
pip install sudo apt-get install portaudio19-dev # For pyaudio
pip install -e . # This will download all rest packages.
apt install libsox-dev ffmpeg # If needed.
```
!!! warning
The `compile` option is not supported on windows and macOS, if you want to run with compile, you need to install trition by yourself.
## Acknowledgements
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

View File

@ -1,107 +1,50 @@
# 推論
# 紹介
ボコーダーモデルが変更されたため、以前よりも多くのVRAMが必要です。スムーズな推論には12GBを推奨します。
<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
</a>
</div>
推論には、コマンドライン、HTTP API、WebUIをサポートしており、お好きな方法を選択できます。
!!! warning
このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCAデジタルミレニアム著作権法およびその他の関連法規をご参照ください。<br/>
このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。
## 重みのダウンロード
## システム要件
まず、モデルの重みをダウンロードする必要があります:
- GPU メモリ12GB推論
- システムLinux、Windows
## セットアップ
まず、パッケージをインストールするためのconda環境を作成する必要があります。
```bash
huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
conda create -n fish-speech python=3.12
conda activate fish-speech
pip install sudo apt-get install portaudio19-dev # pyaudio用
pip install -e . # これにより残りのパッケージがすべてダウンロードされます。
apt install libsox-dev ffmpeg # 必要に応じて。
```
## コマンドライン推論
!!! warning
`compile`オプションはWindowsとmacOSでサポートされていません。compileで実行したい場合は、tritionを自分でインストールする必要があります。
!!! note
モデルにランダムに音色を選択させる場合は、この手順をスキップできます。
## 謝辞
### 1. 参照音声からVQトークンを取得
```bash
python fish_speech/models/dac/inference.py \
-i "ref_audio_name.wav" \
--checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
```
`fake.npy``fake.wav` が得られるはずです。
### 2. テキストからセマンティックトークンを生成:
```bash
python fish_speech/models/text2semantic/inference.py \
--text "変換したいテキスト" \
--prompt-text "参照テキスト" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/openaudio-s1-mini" \
--num-samples 2 \
--compile # より高速化を求める場合
```
このコマンドは、作業ディレクトリに `codes_N` ファイルを作成しますNは0から始まる整数
!!! note
より高速な推論のために `--compile` を使用してCUDAカーネルを融合することができます約30トークン/秒 -> 約500トークン/秒)。
対応して、加速を使用しない場合は、`--compile` パラメータをコメントアウトできます。
!!! info
bf16をサポートしないGPUの場合、`--half` パラメータの使用が必要かもしれません。
### 3. セマンティックトークンから音声を生成:
#### VQGANデコーダー
!!! warning "将来の警告"
元のパスtools/vqgan/inference.pyからアクセス可能なインターフェースを維持していますが、このインターフェースは後続のリリースで削除される可能性があるため、できるだけ早くコードを変更してください。
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## HTTP API推論
推論用のHTTP APIを提供しています。以下のコマンドでサーバーを開始できます
```bash
python -m tools.api_server \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
> 推論を高速化したい場合は、`--compile` パラメータを追加できます。
その後、http://127.0.0.1:8080/ でAPIを表示・テストできます。
## GUI推論
[クライアントをダウンロード](https://github.com/AnyaCoder/fish-speech-gui/releases)
## WebUI推論
以下のコマンドでWebUIを開始できます
```bash
python -m tools.run_webui \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
または単純に
```bash
python -m tools.run_webui
```
> 推論を高速化したい場合は、`--compile` パラメータを追加できます。
!!! note
ラベルファイルと参照音声ファイルをメインディレクトリの `references` フォルダに事前に保存することができます自分で作成する必要があります。これにより、WebUIで直接呼び出すことができます。
!!! note
`GRADIO_SHARE``GRADIO_SERVER_PORT``GRADIO_SERVER_NAME` などのGradio環境変数を使用してWebUIを設定できます。
お楽しみください!
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

107
docs/ja/inference.md Normal file
View File

@ -0,0 +1,107 @@
# 推論
ボコーダーモデルが変更されたため、以前よりも多くのVRAMが必要です。スムーズな推論には12GBを推奨します。
推論には、コマンドライン、HTTP API、WebUIをサポートしており、お好きな方法を選択できます。
## 重みのダウンロード
まず、モデルの重みをダウンロードする必要があります:
```bash
huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
```
## コマンドライン推論
!!! note
モデルにランダムに音色を選択させる場合は、この手順をスキップできます。
### 1. 参照音声からVQトークンを取得
```bash
python fish_speech/models/dac/inference.py \
-i "ref_audio_name.wav" \
--checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
```
`fake.npy``fake.wav` が得られるはずです。
### 2. テキストからセマンティックトークンを生成:
```bash
python fish_speech/models/text2semantic/inference.py \
--text "変換したいテキスト" \
--prompt-text "参照テキスト" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/openaudio-s1-mini" \
--num-samples 2 \
--compile # より高速化を求める場合
```
このコマンドは、作業ディレクトリに `codes_N` ファイルを作成しますNは0から始まる整数
!!! note
より高速な推論のために `--compile` を使用してCUDAカーネルを融合することができます約30トークン/秒 -> 約500トークン/秒)。
対応して、加速を使用しない場合は、`--compile` パラメータをコメントアウトできます。
!!! info
bf16をサポートしないGPUの場合、`--half` パラメータの使用が必要かもしれません。
### 3. セマンティックトークンから音声を生成:
#### VQGANデコーダー
!!! warning "将来の警告"
元のパスtools/vqgan/inference.pyからアクセス可能なインターフェースを維持していますが、このインターフェースは後続のリリースで削除される可能性があるため、できるだけ早くコードを変更してください。
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## HTTP API推論
推論用のHTTP APIを提供しています。以下のコマンドでサーバーを開始できます
```bash
python -m tools.api_server \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
> 推論を高速化したい場合は、`--compile` パラメータを追加できます。
その後、http://127.0.0.1:8080/ でAPIを表示・テストできます。
## GUI推論
[クライアントをダウンロード](https://github.com/AnyaCoder/fish-speech-gui/releases)
## WebUI推論
以下のコマンドでWebUIを開始できます
```bash
python -m tools.run_webui \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
または単純に
```bash
python -m tools.run_webui
```
> 推論を高速化したい場合は、`--compile` パラメータを追加できます。
!!! note
ラベルファイルと参照音声ファイルをメインディレクトリの `references` フォルダに事前に保存することができます自分で作成する必要があります。これにより、WebUIで直接呼び出すことができます。
!!! note
`GRADIO_SHARE``GRADIO_SERVER_PORT``GRADIO_SERVER_NAME` などのGradio環境変数を使用してWebUIを設定できます。
お楽しみください!

View File

@ -1,50 +0,0 @@
# 紹介
<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
</a>
</div>
!!! warning
このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCAデジタルミレニアム著作権法およびその他の関連法規をご参照ください。<br/>
このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。
## システム要件
- GPU メモリ12GB推論
- システムLinux、Windows
## セットアップ
まず、パッケージをインストールするためのconda環境を作成する必要があります。
```bash
conda create -n fish-speech python=3.12
conda activate fish-speech
pip install sudo apt-get install portaudio19-dev # pyaudio用
pip install -e . # これにより残りのパッケージがすべてダウンロードされます。
apt install libsox-dev ffmpeg # 必要に応じて。
```
!!! warning
`compile`オプションはWindowsとmacOSでサポートされていません。compileで実行したい場合は、tritionを自分でインストールする必要があります。
## 謝辞
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

View File

@ -1,107 +1,50 @@
# 추론
# 소개
보코더 모델이 변경되어 이전보다 더 많은 VRAM이 필요하며, 원활한 추론을 위해 12GB를 권장합니다.
<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
</a>
</div>
추론을 위해 명령줄, HTTP API, WebUI를 지원하며, 원하는 방법을 선택할 수 있습니다.
!!! warning
코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다. <br/>
이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.
## 가중치 다운로드
## 시스템 요구사항
먼저 모델 가중치를 다운로드해야 합니다:
- GPU 메모리: 12GB (추론)
- 시스템: Linux, Windows
## 설치
먼저 패키지를 설치하기 위한 conda 환경을 만들어야 합니다.
```bash
huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
conda create -n fish-speech python=3.12
conda activate fish-speech
pip install sudo apt-get install portaudio19-dev # pyaudio용
pip install -e . # 나머지 모든 패키지를 다운로드합니다.
apt install libsox-dev ffmpeg # 필요한 경우.
```
## 명령줄 추론
!!! warning
`compile` 옵션은 Windows와 macOS에서 지원되지 않습니다. compile로 실행하려면 trition을 직접 설치해야 합니다.
!!! note
모델이 임의로 음색을 선택하도록 하려면 이 단계를 건너뛸 수 있습니다.
## 감사의 말
### 1. 참조 오디오에서 VQ 토큰 얻기
```bash
python fish_speech/models/dac/inference.py \
-i "ref_audio_name.wav" \
--checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
```
`fake.npy``fake.wav`를 얻을 수 있습니다.
### 2. 텍스트에서 의미 토큰 생성:
```bash
python fish_speech/models/text2semantic/inference.py \
--text "변환하고 싶은 텍스트" \
--prompt-text "참조 텍스트" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/openaudio-s1-mini" \
--num-samples 2 \
--compile # 더 빠른 속도를 원한다면
```
이 명령은 작업 디렉토리에 `codes_N` 파일을 생성합니다. 여기서 N은 0부터 시작하는 정수입니다.
!!! note
더 빠른 추론을 위해 `--compile`을 사용하여 CUDA 커널을 융합할 수 있습니다(약 30 토큰/초 -> 약 500 토큰/초).
이에 따라 가속을 사용하지 않으려면 `--compile` 매개변수를 주석 처리할 수 있습니다.
!!! info
bf16을 지원하지 않는 GPU의 경우 `--half` 매개변수를 사용해야 할 수 있습니다.
### 3. 의미 토큰에서 음성 생성:
#### VQGAN 디코더
!!! warning "향후 경고"
원래 경로(tools/vqgan/inference.py)에서 액세스 가능한 인터페이스를 유지하고 있지만, 이 인터페이스는 향후 릴리스에서 제거될 수 있으므로 가능한 한 빨리 코드를 변경해 주세요.
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## HTTP API 추론
추론을 위한 HTTP API를 제공합니다. 다음 명령으로 서버를 시작할 수 있습니다:
```bash
python -m tools.api_server \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
> 추론을 가속화하려면 `--compile` 매개변수를 추가할 수 있습니다.
그 후 http://127.0.0.1:8080/ 에서 API를 보고 테스트할 수 있습니다.
## GUI 추론
[클라이언트 다운로드](https://github.com/AnyaCoder/fish-speech-gui/releases)
## WebUI 추론
다음 명령으로 WebUI를 시작할 수 있습니다:
```bash
python -m tools.run_webui \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
또는 간단히
```bash
python -m tools.run_webui
```
> 추론을 가속화하려면 `--compile` 매개변수를 추가할 수 있습니다.
!!! note
라벨 파일과 참조 오디오 파일을 메인 디렉토리의 `references` 폴더에 미리 저장할 수 있습니다(직접 생성해야 함). 이렇게 하면 WebUI에서 직접 호출할 수 있습니다.
!!! note
`GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME`과 같은 Gradio 환경 변수를 사용하여 WebUI를 구성할 수 있습니다.
즐기세요!
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

107
docs/ko/inference.md Normal file
View File

@ -0,0 +1,107 @@
# 추론
보코더 모델이 변경되어 이전보다 더 많은 VRAM이 필요하며, 원활한 추론을 위해 12GB를 권장합니다.
추론을 위해 명령줄, HTTP API, WebUI를 지원하며, 원하는 방법을 선택할 수 있습니다.
## 가중치 다운로드
먼저 모델 가중치를 다운로드해야 합니다:
```bash
huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
```
## 명령줄 추론
!!! note
모델이 임의로 음색을 선택하도록 하려면 이 단계를 건너뛸 수 있습니다.
### 1. 참조 오디오에서 VQ 토큰 얻기
```bash
python fish_speech/models/dac/inference.py \
-i "ref_audio_name.wav" \
--checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
```
`fake.npy``fake.wav`를 얻을 수 있습니다.
### 2. 텍스트에서 의미 토큰 생성:
```bash
python fish_speech/models/text2semantic/inference.py \
--text "변환하고 싶은 텍스트" \
--prompt-text "참조 텍스트" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/openaudio-s1-mini" \
--num-samples 2 \
--compile # 더 빠른 속도를 원한다면
```
이 명령은 작업 디렉토리에 `codes_N` 파일을 생성합니다. 여기서 N은 0부터 시작하는 정수입니다.
!!! note
더 빠른 추론을 위해 `--compile`을 사용하여 CUDA 커널을 융합할 수 있습니다(약 30 토큰/초 -> 약 500 토큰/초).
이에 따라 가속을 사용하지 않으려면 `--compile` 매개변수를 주석 처리할 수 있습니다.
!!! info
bf16을 지원하지 않는 GPU의 경우 `--half` 매개변수를 사용해야 할 수 있습니다.
### 3. 의미 토큰에서 음성 생성:
#### VQGAN 디코더
!!! warning "향후 경고"
원래 경로(tools/vqgan/inference.py)에서 액세스 가능한 인터페이스를 유지하고 있지만, 이 인터페이스는 향후 릴리스에서 제거될 수 있으므로 가능한 한 빨리 코드를 변경해 주세요.
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## HTTP API 추론
추론을 위한 HTTP API를 제공합니다. 다음 명령으로 서버를 시작할 수 있습니다:
```bash
python -m tools.api_server \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
> 추론을 가속화하려면 `--compile` 매개변수를 추가할 수 있습니다.
그 후 http://127.0.0.1:8080/ 에서 API를 보고 테스트할 수 있습니다.
## GUI 추론
[클라이언트 다운로드](https://github.com/AnyaCoder/fish-speech-gui/releases)
## WebUI 추론
다음 명령으로 WebUI를 시작할 수 있습니다:
```bash
python -m tools.run_webui \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
또는 간단히
```bash
python -m tools.run_webui
```
> 추론을 가속화하려면 `--compile` 매개변수를 추가할 수 있습니다.
!!! note
라벨 파일과 참조 오디오 파일을 메인 디렉토리의 `references` 폴더에 미리 저장할 수 있습니다(직접 생성해야 함). 이렇게 하면 WebUI에서 직접 호출할 수 있습니다.
!!! note
`GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME`과 같은 Gradio 환경 변수를 사용하여 WebUI를 구성할 수 있습니다.
즐기세요!

View File

@ -1,50 +0,0 @@
# 소개
<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
</a>
</div>
!!! warning
코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다. <br/>
이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.
## 시스템 요구사항
- GPU 메모리: 12GB (추론)
- 시스템: Linux, Windows
## 설치
먼저 패키지를 설치하기 위한 conda 환경을 만들어야 합니다.
```bash
conda create -n fish-speech python=3.12
conda activate fish-speech
pip install sudo apt-get install portaudio19-dev # pyaudio용
pip install -e . # 나머지 모든 패키지를 다운로드합니다.
apt install libsox-dev ffmpeg # 필요한 경우.
```
!!! warning
`compile` 옵션은 Windows와 macOS에서 지원되지 않습니다. compile로 실행하려면 trition을 직접 설치해야 합니다.
## 감사의 말
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

View File

@ -1,107 +1,50 @@
# Inferência
# Introdução
Como o modelo vocoder foi alterado, você precisa de mais VRAM do que antes, sendo recomendado 12GB para inferência fluente.
<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
</a>
</div>
Suportamos linha de comando, API HTTP e WebUI para inferência, você pode escolher qualquer método que preferir.
!!! warning
Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área. <br/>
Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.
## Baixar Pesos
## Requisitos
Primeiro você precisa baixar os pesos do modelo:
- Memória GPU: 12GB (Inferência)
- Sistema: Linux, Windows
## Configuração
Primeiro, precisamos criar um ambiente conda para instalar os pacotes.
```bash
huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
conda create -n fish-speech python=3.12
conda activate fish-speech
pip install sudo apt-get install portaudio19-dev # Para pyaudio
pip install -e . # Isso baixará todos os pacotes restantes.
apt install libsox-dev ffmpeg # Se necessário.
```
## Inferência por Linha de Comando
!!! warning
A opção `compile` não é suportada no Windows e macOS, se você quiser executar com compile, precisa instalar o trition por conta própria.
!!! note
Se você planeja deixar o modelo escolher aleatoriamente um timbre de voz, pode pular esta etapa.
## Agradecimentos
### 1. Obter tokens VQ do áudio de referência
```bash
python fish_speech/models/dac/inference.py \
-i "ref_audio_name.wav" \
--checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
```
Você deve obter um `fake.npy` e um `fake.wav`.
### 2. Gerar tokens semânticos do texto:
```bash
python fish_speech/models/text2semantic/inference.py \
--text "O texto que você quer converter" \
--prompt-text "Seu texto de referência" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/openaudio-s1-mini" \
--num-samples 2 \
--compile # se você quiser uma velocidade mais rápida
```
Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é um inteiro começando de 0.
!!! note
Você pode querer usar `--compile` para fundir kernels CUDA para inferência mais rápida (~30 tokens/segundo -> ~500 tokens/segundo).
Correspondentemente, se você não planeja usar aceleração, pode comentar o parâmetro `--compile`.
!!! info
Para GPUs que não suportam bf16, você pode precisar usar o parâmetro `--half`.
### 3. Gerar vocais a partir de tokens semânticos:
#### Decodificador VQGAN
!!! warning "Aviso Futuro"
Mantivemos a interface acessível do caminho original (tools/vqgan/inference.py), mas esta interface pode ser removida em versões subsequentes, então por favor altere seu código o mais breve possível.
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## Inferência com API HTTP
Fornecemos uma API HTTP para inferência. Você pode usar o seguinte comando para iniciar o servidor:
```bash
python -m tools.api_server \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
> Se você quiser acelerar a inferência, pode adicionar o parâmetro `--compile`.
Depois disso, você pode visualizar e testar a API em http://127.0.0.1:8080/.
## Inferência GUI
[Baixar cliente](https://github.com/AnyaCoder/fish-speech-gui/releases)
## Inferência WebUI
Você pode iniciar o WebUI usando o seguinte comando:
```bash
python -m tools.run_webui \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
Ou simplesmente
```bash
python -m tools.run_webui
```
> Se você quiser acelerar a inferência, pode adicionar o parâmetro `--compile`.
!!! note
Você pode salvar o arquivo de rótulo e o arquivo de áudio de referência antecipadamente na pasta `references` no diretório principal (que você precisa criar), para que possa chamá-los diretamente no WebUI.
!!! note
Você pode usar variáveis de ambiente do Gradio, como `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` para configurar o WebUI.
Divirta-se!
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

107
docs/pt/inference.md Normal file
View File

@ -0,0 +1,107 @@
# Inferência
Como o modelo vocoder foi alterado, você precisa de mais VRAM do que antes, sendo recomendado 12GB para inferência fluente.
Suportamos linha de comando, API HTTP e WebUI para inferência, você pode escolher qualquer método que preferir.
## Baixar Pesos
Primeiro você precisa baixar os pesos do modelo:
```bash
huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
```
## Inferência por Linha de Comando
!!! note
Se você planeja deixar o modelo escolher aleatoriamente um timbre de voz, pode pular esta etapa.
### 1. Obter tokens VQ do áudio de referência
```bash
python fish_speech/models/dac/inference.py \
-i "ref_audio_name.wav" \
--checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
```
Você deve obter um `fake.npy` e um `fake.wav`.
### 2. Gerar tokens semânticos do texto:
```bash
python fish_speech/models/text2semantic/inference.py \
--text "O texto que você quer converter" \
--prompt-text "Seu texto de referência" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/openaudio-s1-mini" \
--num-samples 2 \
--compile # se você quiser uma velocidade mais rápida
```
Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é um inteiro começando de 0.
!!! note
Você pode querer usar `--compile` para fundir kernels CUDA para inferência mais rápida (~30 tokens/segundo -> ~500 tokens/segundo).
Correspondentemente, se você não planeja usar aceleração, pode comentar o parâmetro `--compile`.
!!! info
Para GPUs que não suportam bf16, você pode precisar usar o parâmetro `--half`.
### 3. Gerar vocais a partir de tokens semânticos:
#### Decodificador VQGAN
!!! warning "Aviso Futuro"
Mantivemos a interface acessível do caminho original (tools/vqgan/inference.py), mas esta interface pode ser removida em versões subsequentes, então por favor altere seu código o mais breve possível.
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## Inferência com API HTTP
Fornecemos uma API HTTP para inferência. Você pode usar o seguinte comando para iniciar o servidor:
```bash
python -m tools.api_server \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
> Se você quiser acelerar a inferência, pode adicionar o parâmetro `--compile`.
Depois disso, você pode visualizar e testar a API em http://127.0.0.1:8080/.
## Inferência GUI
[Baixar cliente](https://github.com/AnyaCoder/fish-speech-gui/releases)
## Inferência WebUI
Você pode iniciar o WebUI usando o seguinte comando:
```bash
python -m tools.run_webui \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
Ou simplesmente
```bash
python -m tools.run_webui
```
> Se você quiser acelerar a inferência, pode adicionar o parâmetro `--compile`.
!!! note
Você pode salvar o arquivo de rótulo e o arquivo de áudio de referência antecipadamente na pasta `references` no diretório principal (que você precisa criar), para que possa chamá-los diretamente no WebUI.
!!! note
Você pode usar variáveis de ambiente do Gradio, como `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` para configurar o WebUI.
Divirta-se!

View File

@ -1,50 +0,0 @@
# Introdução
<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
</a>
</div>
!!! warning
Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área. <br/>
Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.
## Requisitos
- Memória GPU: 12GB (Inferência)
- Sistema: Linux, Windows
## Configuração
Primeiro, precisamos criar um ambiente conda para instalar os pacotes.
```bash
conda create -n fish-speech python=3.12
conda activate fish-speech
pip install sudo apt-get install portaudio19-dev # Para pyaudio
pip install -e . # Isso baixará todos os pacotes restantes.
apt install libsox-dev ffmpeg # Se necessário.
```
!!! warning
A opção `compile` não é suportada no Windows e macOS, se você quiser executar com compile, precisa instalar o trition por conta própria.
## Agradecimentos
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

View File

@ -1,107 +1,50 @@
# 推理
# 简介
由于声码器模型已更改您需要比以前更多的显存建议使用12GB显存以便流畅推理。
<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
</a>
</div>
我们支持命令行、HTTP API 和 WebUI 进行推理,您可以选择任何您喜欢的方法。
!!! warning
我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA数字千年版权法和其他相关法律的规定。<br/>
此代码库在 Apache 2.0 许可证下发布,所有模型在 CC-BY-NC-SA-4.0 许可证下发布。
## 下载权重
## 系统要求
首先您需要下载模型权重:
- GPU 内存12GB推理
- 系统Linux、Windows
## 安装
首先,我们需要创建一个 conda 环境来安装包。
```bash
huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
conda create -n fish-speech python=3.12
conda activate fish-speech
pip install sudo apt-get install portaudio19-dev # 用于 pyaudio
pip install -e . # 这将下载所有其余的包。
apt install libsox-dev ffmpeg # 如果需要的话。
```
## 命令行推理
!!! warning
`compile` 选项在 Windows 和 macOS 上不受支持,如果您想使用 compile 运行,需要自己安装 trition。
!!! note
如果您计划让模型随机选择音色,可以跳过此步骤。
## 致谢
### 1. 从参考音频获取VQ tokens
```bash
python fish_speech/models/dac/inference.py \
-i "ref_audio_name.wav" \
--checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
```
您应该会得到一个 `fake.npy` 和一个 `fake.wav`
### 2. 从文本生成语义tokens
```bash
python fish_speech/models/text2semantic/inference.py \
--text "您想要转换的文本" \
--prompt-text "您的参考文本" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/openaudio-s1-mini" \
--num-samples 2 \
--compile # 如果您想要更快的速度
```
此命令将在工作目录中创建一个 `codes_N` 文件其中N是从0开始的整数。
!!! note
您可能想要使用 `--compile` 来融合CUDA内核以获得更快的推理速度约30 tokens/秒 -> 约500 tokens/秒)。
相应地,如果您不打算使用加速,可以删除 `--compile` 参数的注释。
!!! info
对于不支持bf16的GPU您可能需要使用 `--half` 参数。
### 3. 从语义tokens生成人声
#### VQGAN 解码器
!!! warning "未来警告"
我们保留了从原始路径tools/vqgan/inference.py访问的接口但此接口可能在后续版本中被移除请尽快更改您的代码。
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## HTTP API 推理
我们提供HTTP API进行推理。您可以使用以下命令启动服务器
```bash
python -m tools.api_server \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
> 如果您想要加速推理,可以添加 `--compile` 参数。
之后,您可以在 http://127.0.0.1:8080/ 查看和测试API。
## GUI 推理
[下载客户端](https://github.com/AnyaCoder/fish-speech-gui/releases)
## WebUI 推理
您可以使用以下命令启动WebUI
```bash
python -m tools.run_webui \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
或者简单地
```bash
python -m tools.run_webui
```
> 如果您想要加速推理,可以添加 `--compile` 参数。
!!! note
您可以提前将标签文件和参考音频文件保存到主目录的 `references` 文件夹中需要自己创建这样就可以在WebUI中直接调用它们。
!!! note
您可以使用Gradio环境变量`GRADIO_SHARE``GRADIO_SERVER_PORT``GRADIO_SERVER_NAME` 来配置WebUI。
尽情享受吧!
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

107
docs/zh/inference.md Normal file
View File

@ -0,0 +1,107 @@
# 推理
由于声码器模型已更改您需要比以前更多的显存建议使用12GB显存以便流畅推理。
我们支持命令行、HTTP API 和 WebUI 进行推理,您可以选择任何您喜欢的方法。
## 下载权重
首先您需要下载模型权重:
```bash
huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
```
## 命令行推理
!!! note
如果您计划让模型随机选择音色,可以跳过此步骤。
### 1. 从参考音频获取VQ tokens
```bash
python fish_speech/models/dac/inference.py \
-i "ref_audio_name.wav" \
--checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
```
您应该会得到一个 `fake.npy` 和一个 `fake.wav`
### 2. 从文本生成语义tokens
```bash
python fish_speech/models/text2semantic/inference.py \
--text "您想要转换的文本" \
--prompt-text "您的参考文本" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/openaudio-s1-mini" \
--num-samples 2 \
--compile # 如果您想要更快的速度
```
此命令将在工作目录中创建一个 `codes_N` 文件其中N是从0开始的整数。
!!! note
您可能想要使用 `--compile` 来融合CUDA内核以获得更快的推理速度约30 tokens/秒 -> 约500 tokens/秒)。
相应地,如果您不打算使用加速,可以删除 `--compile` 参数的注释。
!!! info
对于不支持bf16的GPU您可能需要使用 `--half` 参数。
### 3. 从语义tokens生成人声
#### VQGAN 解码器
!!! warning "未来警告"
我们保留了从原始路径tools/vqgan/inference.py访问的接口但此接口可能在后续版本中被移除请尽快更改您的代码。
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## HTTP API 推理
我们提供HTTP API进行推理。您可以使用以下命令启动服务器
```bash
python -m tools.api_server \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
> 如果您想要加速推理,可以添加 `--compile` 参数。
之后,您可以在 http://127.0.0.1:8080/ 查看和测试API。
## GUI 推理
[下载客户端](https://github.com/AnyaCoder/fish-speech-gui/releases)
## WebUI 推理
您可以使用以下命令启动WebUI
```bash
python -m tools.run_webui \
--llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
--decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
--decoder-config-name modded_dac_vq
```
或者简单地
```bash
python -m tools.run_webui
```
> 如果您想要加速推理,可以添加 `--compile` 参数。
!!! note
您可以提前将标签文件和参考音频文件保存到主目录的 `references` 文件夹中需要自己创建这样就可以在WebUI中直接调用它们。
!!! note
您可以使用Gradio环境变量`GRADIO_SHARE``GRADIO_SERVER_PORT``GRADIO_SERVER_NAME` 来配置WebUI。
尽情享受吧!

View File

@ -1,50 +0,0 @@
# 简介
<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
</a>
</div>
!!! warning
我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA数字千年版权法和其他相关法律的规定。<br/>
此代码库在 Apache 2.0 许可证下发布,所有模型在 CC-BY-NC-SA-4.0 许可证下发布。
## 系统要求
- GPU 内存12GB推理
- 系统Linux、Windows
## 安装
首先,我们需要创建一个 conda 环境来安装包。
```bash
conda create -n fish-speech python=3.12
conda activate fish-speech
pip install sudo apt-get install portaudio19-dev # 用于 pyaudio
pip install -e . # 这将下载所有其余的包。
apt install libsox-dev ffmpeg # 如果需要的话。
```
!!! warning
`compile` 选项在 Windows 和 macOS 上不受支持,如果您想使用 compile 运行,需要自己安装 trition。
## 致谢
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

View File

@ -61,7 +61,7 @@
"# !set HF_ENDPOINT=https://hf-mirror.com\n",
"# !export HF_ENDPOINT=https://hf-mirror.com \n",
"\n",
"!huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/openaudio-s1-mini/"
"!huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini/"
]
},
{
@ -85,7 +85,7 @@
"source": [
"!python tools/run_webui.py \\\n",
" --llama-checkpoint-path checkpoints/openaudio-s1-mini \\\n",
" --decoder-checkpoint-path checkpoints/openaudio-s1-mini/firefly-gan-vq-fsq-8x1024-21hz-generator.pth \\\n",
" --decoder-checkpoint-path checkpoints/openaudio-s1-mini/codec.pth \\\n",
" # --compile"
]
},
@ -120,9 +120,9 @@
"## Enter the path to the audio file here\n",
"src_audio = r\"D:\\PythonProject\\vo_hutao_draw_appear.wav\"\n",
"\n",
"!python fish_speech/models/vqgan/inference.py \\\n",
"!python fish_speech/models/dac/inference.py \\\n",
" -i {src_audio} \\\n",
" --checkpoint-path \"checkpoints/openaudio-s1-mini/firefly-gan-vq-fsq-8x1024-21hz-generator.pth\"\n",
" --checkpoint-path \"checkpoints/openaudio-s1-mini/codec.pth\"\n",
"\n",
"from IPython.display import Audio, display\n",
"audio = Audio(filename=\"fake.wav\")\n",
@ -180,9 +180,9 @@
},
"outputs": [],
"source": [
"!python fish_speech/models/vqgan/inference.py \\\n",
"!python fish_speech/models/dac/inference.py \\\n",
" -i \"codes_0.npy\" \\\n",
" --checkpoint-path \"checkpoints/openaudio-s1-mini/firefly-gan-vq-fsq-8x1024-21hz-generator.pth\"\n",
" --checkpoint-path \"checkpoints/openaudio-s1-mini/codec.pth\"\n",
"\n",
"from IPython.display import Audio, display\n",
"audio = Audio(filename=\"fake.wav\")\n",

View File

@ -56,7 +56,7 @@ theme:
code: Roboto Mono
nav:
- Installation: en/install.md
- Installation: en/index.md
- Inference: en/inference.md
# Plugins
@ -80,25 +80,25 @@ plugins:
name: 简体中文
build: true
nav:
- 安装: zh/install.md
- 安装: zh/index.md
- 推理: zh/inference.md
- locale: ja
name: 日本語
build: true
nav:
- インストール: ja/install.md
- インストール: ja/index.md
- 推論: ja/inference.md
- locale: pt
name: Português (Brasil)
build: true
nav:
- Instalação: pt/install.md
- Instalação: pt/index.md
- Inferência: pt/inference.md
- locale: ko
name: 한국어
build: true
nav:
- 설치: ko/install.md
- 설치: ko/index.md
- 추론: ko/inference.md
markdown_extensions:

View File

@ -50,7 +50,7 @@ dependencies = [
[project.optional-dependencies]
stable = [
"torch<=2.4.1",
"torch>=2.5.1",
"torchaudio",
]