Optimize documents (#994)

* [feature]add dataset classs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [dev]combine agent and tts infer * [feature]:update inference * [feature]:update uv.lock * [Merge]:merge upstream/main * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [fix]:remove unused files * [fix]:remove unused files * [fix]:remove unused files * [fix]:fix infer bugs * [docs]:update introduction and optinize front appearence * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-06-03 21:05:14 +08:00 · 2025-06-03 21:05:14 +08:00 · 75d7ecb5b5
commit 75d7ecb5b5
parent 4bf24d8c33
18 changed files with 727 additions and 182 deletions
--- a/docs/assets/openaudio.jpg
+++ b/docs/assets/openaudio.jpg
--- a/docs/assets/openaudio.png
+++ b/docs/assets/openaudio.png
--- a/docs/en/index.md
+++ b/docs/en/index.md
@ -1,4 +1,14 @@
-# Introduction
+# OpenAudio (formerly Fish-Speech)
+
+<div align="center">
+
+<div align="center">
+
+<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
+
+</div>
+
+<strong>Advanced Text-to-Speech Model Series</strong>

 <div>
 <a target="_blank" href="https://discord.gg/Es5qTB9BcN">
@ -12,39 +22,114 @@
 </a>
 </div>

-!!! warning
-    We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area. <br/>
-    This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.
+<strong>Try it now:</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>Learn more:</strong> <a href="https://openaudio.com">OpenAudio Website</a>

-## Requirements
+</div>

- GPU Memory: 12GB (Inference)
- System: Linux, Windows
+---

-## Setup
+!!! warning "Legal Notice"
+    We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area.
+    
+    **License:** This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.

-First, we need to create a conda environment to install the packages.
+## **Introduction**

-```bash
+We are excited to announce that we have rebranded to **OpenAudio** - introducing a brand new series of advanced Text-to-Speech models that builds upon the foundation of Fish-Speech with significant improvements and new capabilities.

-conda create -n fish-speech python=3.12
-conda activate fish-speech
+**Openaudio-S1-mini**: [Video](To Be Uploaded); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);

-pip install sudo apt-get install portaudio19-dev # For pyaudio
-pip install -e . # This will download all rest packages.
+**Fish-Speech v1.5**: [Video](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);

-apt install libsox-dev ffmpeg # If needed.
+## **Highlights** ✨
+
+### **Emotion Control**
+OpenAudio S1 **supports a variety of emotional, tone, and special markers** to enhance speech synthesis:
+
+- **Basic emotions**:
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted) 
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
 ```

-!!! warning
-    The `compile` option is not supported on windows and macOS, if you want to run with compile, you need to install trition by yourself.
+- **Advanced emotions**:
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```

-## Acknowledgements
+- **Tone markers**:
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```

- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- **Special audio effects**:
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+You can also use Ha,ha,ha to control, there's many other cases waiting to be explored by yourself.
+
+### **Excellent TTS quality**
+
+We use Seed TTS Eval Metrics to evaluate the model performance, and the results show that OpenAudio S1 achieves **0.008 WER** and **0.004 CER** on English text, which is significantly better than previous models. (English, auto eval, based on OpenAI gpt-4o-transcribe, speaker distance using Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Model | Word Error Rate (WER) | Character Error Rate (CER) | Speaker Distance |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **Two Type of Models**
+
+| Model | Size | Availability | Features |
+|-------|------|--------------|----------|
+| **S1** | 4B parameters | Avaliable on [fish.audio](fish.audio) | Full-featured flagship model |
+| **S1-mini** | 0.5B parameters | Avaliable on huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Distilled version with core capabilities |
+
+Both S1 and S1-mini incorporate online Reinforcement Learning from Human Feedback (RLHF).
+
+## **Features**
+
+1. **Zero-shot & Few-shot TTS:** Input a 10 to 30-second vocal sample to generate high-quality TTS output. **For detailed guidelines, see [Voice Cloning Best Practices](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
+
+2. **Multilingual & Cross-lingual Support:** Simply copy and paste multilingual text into the input box—no need to worry about the language. Currently supports English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.
+
+3. **No Phoneme Dependency:** The model has strong generalization capabilities and does not rely on phonemes for TTS. It can handle text in any language script.
+
+4. **Highly Accurate:** Achieves a low CER (Character Error Rate) of around 0.4% and WER (Word Error Rate) of around 0.8% for Seed-TTS Eval.
+
+5. **Fast:** With fish-tech acceleration, the real-time factor is approximately 1:5 on an Nvidia RTX 4060 laptop and 1:15 on an Nvidia RTX 4090.
+
+6. **WebUI Inference:** Features an easy-to-use, Gradio-based web UI compatible with Chrome, Firefox, Edge, and other browsers.
+
+7. **GUI Inference:** Offers a PyQt6 graphical interface that works seamlessly with the API server. Supports Linux, Windows, and macOS. [See GUI](https://github.com/AnyaCoder/fish-speech-gui).
+
+8. **Deploy-Friendly:** Easily set up an inference server with native support for Linux, Windows (MacOS comming soon), minimizing speed loss.
+
+## **Disclaimer**
+
+We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
+
+## **Media & Demos**
+
+#### 🚧 Coming Soon
+Video demonstrations and tutorials are currently in development.
+
+## **Documentation**
+
+### Quick Start
+- [Build Environment](en/install.md) - Set up your development environment
+- [Inference Guide](en/inference.md) - Run the model and generate speech
+
+
+## **Community & Support**
+
+- **Discord:** Join our [Discord community](https://discord.gg/Es5qTB9BcN)
+- **Website:** Visit [OpenAudio.com](https://openaudio.com) for latest updates
+- **Try Online:** [Fish Audio Playground](https://fish.audio)
--- a/docs/en/inference.md
+++ b/docs/en/inference.md
@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
    --text "The text you want to convert" \
    --prompt-text "Your reference text" \
    --prompt-tokens "fake.npy" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --num-samples 2 \
-    --compile # if you want a faster speed
+    --compile
 ```

 This command will create a `codes_N` file in the working directory, where N is an integer starting from 0.
@ -50,15 +48,12 @@ This command will create a `codes_N` file in the working directory, where N is a

 ### 3. Generate vocals from semantic tokens:

-#### VQGAN Decoder
-
 !!! warning "Future Warning"
    We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.

 ```bash
 python fish_speech/models/dac/inference.py \
    -i "codes_0.npy" \
-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
 ```

 ## HTTP API Inference
--- a/docs/en/install.md
+++ b/docs/en/install.md
@ -0,0 +1,31 @@
+## Requirements
+
+- GPU Memory: 12GB (Inference)
+- System: Linux, WSL
+
+## Setup
+
+First you need install pyaudio and sox, which is used for audio processing.
+
+``` bash
+apt install portaudio19-dev libsox-dev ffmpeg
+```
+
+### Conda
+
+```bash
+conda create -n fish-speech python=3.12
+conda activate fish-speech
+
+pip install -e .
+```
+
+### UV
+
+```bash
+
+uv sync --python 3.12
+```
+
+!!! warning
+    The `compile` option is not supported on windows and macOS, if you want to run with compile, you need to install trition by yourself.
--- a/docs/ja/index.md
+++ b/docs/ja/index.md
@ -1,4 +1,14 @@
-# 紹介
+# OpenAudio (旧 Fish-Speech)
+
+<div align="center">
+
+<div align="center">
+
+<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
+
+</div>
+
+<strong>先進的なText-to-Speechモデルシリーズ</strong>

 <div>
 <a target="_blank" href="https://discord.gg/Es5qTB9BcN">
@ -12,39 +22,113 @@
 </a>
 </div>

-!!! warning
-    このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCA（デジタルミレニアム著作権法）およびその他の関連法規をご参照ください。<br/>
-    このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。
+<strong>今すぐ試す：</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>詳細情報：</strong> <a href="https://openaudio.com">OpenAudio ウェブサイト</a>

-## システム要件
+</div>

- GPU メモリ：12GB（推論）
- システム：Linux、Windows
+---

-## セットアップ
+!!! warning "法的通知"
+    このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCA（デジタルミレニアム著作権法）およびその他の関連法規をご参照ください。
+    
+    **ライセンス：** このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。

-まず、パッケージをインストールするためのconda環境を作成する必要があります。
+## **紹介**

-```bash
+私たちは **OpenAudio** への改名を発表できることを嬉しく思います。Fish-Speechを基盤とし、大幅な改善と新機能を加えた、新しい先進的なText-to-Speechモデルシリーズを紹介します。

-conda create -n fish-speech python=3.12
-conda activate fish-speech
+**Openaudio-S1-mini**: [動画](アップロード予定); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);

-pip install sudo apt-get install portaudio19-dev # pyaudio用
-pip install -e . # これにより残りのパッケージがすべてダウンロードされます。
+**Fish-Speech v1.5**: [動画](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);

-apt install libsox-dev ffmpeg # 必要に応じて。
+## **ハイライト** ✨
+
+### **感情制御**
+OpenAudio S1は**多様な感情、トーン、特殊マーカーをサポート**して音声合成を強化します：
+
+- **基本感情**：
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
 ```

-!!! warning
-    `compile`オプションはWindowsとmacOSでサポートされていません。compileで実行したい場合は、tritionを自分でインストールする必要があります。
+- **高度な感情**：
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```

-## 謝辞
+- **トーンマーカー**：
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```

- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- **特殊音響効果**：
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+Ha,ha,haを使用してコントロールすることもでき、他にも多くの使用法があなた自身の探索を待っています。
+
+### **優秀なTTS品質**
+
+Seed TTS評価指標を使用してモデルのパフォーマンスを評価した結果、OpenAudio S1は英語テキストで**0.008 WER**と**0.004 CER**を達成し、以前のモデルより大幅に改善されました。（英語、自動評価、OpenAI gpt-4o-転写に基づく、話者距離はRevai/pyannote-wespeaker-voxceleb-resnet34-LM使用）
+
+| モデル | 単語誤り率 (WER) | 文字誤り率 (CER) | 話者距離 |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **2つのモデルタイプ**
+
+| モデル | サイズ | 利用可能性 | 特徴 |
+|-------|------|--------------|----------|
+| **S1** | 40億パラメータ | [fish.audio](fish.audio) で利用可能 | 全機能搭載のフラッグシップモデル |
+| **S1-mini** | 5億パラメータ | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) で利用可能 | コア機能を備えた蒸留版 |
+
+S1とS1-miniの両方にオンライン人間フィードバック強化学習（RLHF）が組み込まれています。
+
+## **機能**
+
+1. **ゼロショット・フューショットTTS：** 10〜30秒の音声サンプルを入力するだけで高品質なTTS出力を生成します。**詳細なガイドラインについては、[音声クローニングのベストプラクティス](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)をご覧ください。**
+
+2. **多言語・言語横断サポート：** 多言語テキストを入力ボックスにコピー＆ペーストするだけで、言語を気にする必要はありません。現在、英語、日本語、韓国語、中国語、フランス語、ドイツ語、アラビア語、スペイン語をサポートしています。
+
+3. **音素依存なし：** このモデルは強力な汎化能力を持ち、TTSに音素に依存しません。あらゆる言語スクリプトのテキストを処理できます。
+
+4. **高精度：** Seed-TTS Evalで低い文字誤り率（CER）約0.4%と単語誤り率（WER）約0.8%を達成します。
+
+5. **高速：** fish-tech加速により、Nvidia RTX 4060ラップトップでリアルタイム係数約1:5、Nvidia RTX 4090で約1:15を実現します。
+
+6. **WebUI推論：** Chrome、Firefox、Edge、その他のブラウザと互換性のあるGradioベースの使いやすいWebUIを備えています。
+
+7. **GUI推論：** APIサーバーとシームレスに連携するPyQt6グラフィカルインターフェースを提供します。Linux、Windows、macOSをサポートします。[GUIを見る](https://github.com/AnyaCoder/fish-speech-gui)。
+
+8. **デプロイフレンドリー：** Linux、Windows、MacOSの native サポートで推論サーバーを簡単にセットアップし、速度低下を最小化します。
+
+## **免責事項**
+
+コードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCAやその他の関連法律をご参照ください。
+
+## **メディア・デモ**
+
+#### 🚧 近日公開
+動画デモとチュートリアルは現在開発中です。
+
+## **ドキュメント**
+
+### クイックスタート
+- [環境構築](install.md) - 開発環境をセットアップ
+- [推論ガイド](inference.md) - モデルを実行して音声を生成
+
+## **コミュニティ・サポート**
+
+- **Discord：** [Discordコミュニティ](https://discord.gg/Es5qTB9BcN)に参加
+- **ウェブサイト：** 最新アップデートは[OpenAudio.com](https://openaudio.com)をご覧ください
+- **オンライン試用：** [Fish Audio Playground](https://fish.audio)
--- a/docs/ja/inference.md
+++ b/docs/ja/inference.md
@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
    --text "変換したいテキスト" \
    --prompt-text "参照テキスト" \
    --prompt-tokens "fake.npy" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --num-samples 2 \
-    --compile # より高速化を求める場合
+    --compile
 ```

 このコマンドは、作業ディレクトリに `codes_N` ファイルを作成します（Nは0から始まる整数）。
@ -50,15 +48,12 @@ python fish_speech/models/text2semantic/inference.py \

 ### 3. セマンティックトークンから音声を生成：

-#### VQGANデコーダー
-
 !!! warning "将来の警告"
    元のパス（tools/vqgan/inference.py）からアクセス可能なインターフェースを維持していますが、このインターフェースは後続のリリースで削除される可能性があるため、できるだけ早くコードを変更してください。

 ```bash
 python fish_speech/models/dac/inference.py \
-    -i "codes_0.npy" \
-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
+    -i "codes_0.npy"
 ```

 ## HTTP API推論
@ -103,5 +98,3 @@ python -m tools.run_webui

 !!! note
    `GRADIO_SHARE`、`GRADIO_SERVER_PORT`、`GRADIO_SERVER_NAME` などのGradio環境変数を使用してWebUIを設定できます。
-
-お楽しみください！
--- a/docs/ja/install.md
+++ b/docs/ja/install.md
@ -0,0 +1,30 @@
+## システム要件
+
+- GPU メモリ：12GB（推論）
+- システム：Linux、WSL
+
+## セットアップ
+
+まず、音声処理に使用される pyaudio と sox をインストールする必要があります。
+
+``` bash
+apt install portaudio19-dev libsox-dev ffmpeg
+```
+
+### Conda
+
+```bash
+conda create -n fish-speech python=3.12
+conda activate fish-speech
+
+pip install -e .
+```
+
+### UV
+
+```bash
+uv sync --python 3.12
+```
+
+!!! warning
+    `compile` オプションは Windows と macOS でサポートされていません。compile で実行したい場合は、triton を自分でインストールする必要があります。
--- a/docs/ko/index.md
+++ b/docs/ko/index.md
@ -1,4 +1,14 @@
-# 소개
+# OpenAudio (구 Fish-Speech)
+
+<div align="center">
+
+<div align="center">
+
+<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
+
+</div>
+
+<strong>고급 텍스트-음성 변환 모델 시리즈</strong>

 <div>
 <a target="_blank" href="https://discord.gg/Es5qTB9BcN">
@ -12,39 +22,113 @@
 </a>
 </div>

-!!! warning
-    코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다. <br/>
-    이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.
+<strong>지금 체험:</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>자세히 알아보기:</strong> <a href="https://openaudio.com">OpenAudio 웹사이트</a>

-## 시스템 요구사항
+</div>

- GPU 메모리: 12GB (추론)
- 시스템: Linux, Windows
+---

-## 설치
+!!! warning "법적 고지"
+    코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다.
+    
+    **라이선스:** 이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.

-먼저 패키지를 설치하기 위한 conda 환경을 만들어야 합니다.
+## **소개**

-```bash
+저희는 **OpenAudio**로의 브랜드 변경을 발표하게 되어 기쁩니다. Fish-Speech를 기반으로 하여 상당한 개선과 새로운 기능을 추가한 새로운 고급 텍스트-음성 변환 모델 시리즈를 소개합니다.

-conda create -n fish-speech python=3.12
-conda activate fish-speech
+**Openaudio-S1-mini**: [동영상](업로드 예정); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);

-pip install sudo apt-get install portaudio19-dev # pyaudio용
-pip install -e . # 나머지 모든 패키지를 다운로드합니다.
+**Fish-Speech v1.5**: [동영상](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);

-apt install libsox-dev ffmpeg # 필요한 경우.
+## **주요 특징** ✨
+
+### **감정 제어**
+OpenAudio S1은 **다양한 감정, 톤, 특수 마커를 지원**하여 음성 합성을 향상시킵니다:
+
+- **기본 감정**:
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
 ```

-!!! warning
-    `compile` 옵션은 Windows와 macOS에서 지원되지 않습니다. compile로 실행하려면 trition을 직접 설치해야 합니다.
+- **고급 감정**:
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```

-## 감사의 말
+- **톤 마커**:
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```

- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- **특수 음향 효과**:
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+Ha,ha,ha를 사용하여 제어할 수도 있으며, 여러분 스스로 탐구할 수 있는 다른 많은 사용법이 있습니다.
+
+### **뛰어난 TTS 품질**
+
+Seed TTS 평가 지표를 사용하여 모델 성능을 평가한 결과, OpenAudio S1은 영어 텍스트에서 **0.008 WER**과 **0.004 CER**을 달성하여 이전 모델보다 현저히 향상되었습니다. (영어, 자동 평가, OpenAI gpt-4o-전사 기반, 화자 거리는 Revai/pyannote-wespeaker-voxceleb-resnet34-LM 사용)
+
+| 모델 | 단어 오류율 (WER) | 문자 오류율 (CER) | 화자 거리 |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **두 가지 모델 유형**
+
+| 모델 | 크기 | 가용성 | 특징 |
+|-------|------|--------------|----------|
+| **S1** | 40억 매개변수 | [fish.audio](fish.audio)에서 이용 가능 | 모든 기능을 갖춘 플래그십 모델 |
+| **S1-mini** | 5억 매개변수 | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini)에서 이용 가능 | 핵심 기능을 갖춘 경량화 버전 |
+
+S1과 S1-mini 모두 온라인 인간 피드백 강화 학습(RLHF)이 통합되어 있습니다.
+
+## **기능**
+
+1. **제로샷 및 퓨샷 TTS:** 10~30초의 음성 샘플을 입력하여 고품질 TTS 출력을 생성합니다. **자세한 가이드라인은 [음성 복제 모범 사례](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)를 참조하세요.**
+
+2. **다국어 및 교차 언어 지원:** 다국어 텍스트를 입력 상자에 복사하여 붙여넣기만 하면 됩니다. 언어에 대해 걱정할 필요가 없습니다. 현재 영어, 일본어, 한국어, 중국어, 프랑스어, 독일어, 아랍어, 스페인어를 지원합니다.
+
+3. **음소 의존성 없음:** 이 모델은 강력한 일반화 능력을 가지고 있으며 TTS에 음소에 의존하지 않습니다. 어떤 언어 스크립트의 텍스트도 처리할 수 있습니다.
+
+4. **높은 정확도:** Seed-TTS Eval에서 약 0.4%의 낮은 문자 오류율(CER)과 약 0.8%의 단어 오류율(WER)을 달성합니다.
+
+5. **빠른 속도:** fish-tech 가속을 통해 Nvidia RTX 4060 노트북에서 실시간 계수 약 1:5, Nvidia RTX 4090에서 약 1:15를 달성합니다.
+
+6. **WebUI 추론:** Chrome, Firefox, Edge 및 기타 브라우저와 호환되는 사용하기 쉬운 Gradio 기반 웹 UI를 제공합니다.
+
+7. **GUI 추론:** API 서버와 원활하게 작동하는 PyQt6 그래픽 인터페이스를 제공합니다. Linux, Windows, macOS를 지원합니다. [GUI 보기](https://github.com/AnyaCoder/fish-speech-gui).
+
+8. **배포 친화적:** Linux, Windows, MacOS의 네이티브 지원으로 추론 서버를 쉽게 설정하여 속도 손실을 최소화합니다.
+
+## **면책 조항**
+
+코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하 지역의 DMCA 및 기타 관련 법률을 참고하시기 바랍니다.
+
+## **미디어 및 데모**
+
+#### 🚧 곧 출시 예정
+동영상 데모와 튜토리얼이 현재 개발 중입니다.
+
+## **문서**
+
+### 빠른 시작
+- [환경 구축](install.md) - 개발 환경 설정
+- [추론 가이드](inference.md) - 모델 실행 및 음성 생성
+
+## **커뮤니티 및 지원**
+
+- **Discord:** [Discord 커뮤니티](https://discord.gg/Es5qTB9BcN)에 참여하세요
+- **웹사이트:** 최신 업데이트는 [OpenAudio.com](https://openaudio.com)을 방문하세요
+- **온라인 체험:** [Fish Audio Playground](https://fish.audio)
--- a/docs/ko/inference.md
+++ b/docs/ko/inference.md
@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
    --text "변환하고 싶은 텍스트" \
    --prompt-text "참조 텍스트" \
    --prompt-tokens "fake.npy" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --num-samples 2 \
-    --compile # 더 빠른 속도를 원한다면
+    --compile
 ```

 이 명령은 작업 디렉토리에 `codes_N` 파일을 생성합니다. 여기서 N은 0부터 시작하는 정수입니다.
@ -50,15 +48,12 @@ python fish_speech/models/text2semantic/inference.py \

 ### 3. 의미 토큰에서 음성 생성:

-#### VQGAN 디코더
-
 !!! warning "향후 경고"
    원래 경로(tools/vqgan/inference.py)에서 액세스 가능한 인터페이스를 유지하고 있지만, 이 인터페이스는 향후 릴리스에서 제거될 수 있으므로 가능한 한 빨리 코드를 변경해 주세요.

 ```bash
 python fish_speech/models/dac/inference.py \
-    -i "codes_0.npy" \
-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
+    -i "codes_0.npy"
 ```

 ## HTTP API 추론
@ -103,5 +98,3 @@ python -m tools.run_webui

 !!! note
    `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME`과 같은 Gradio 환경 변수를 사용하여 WebUI를 구성할 수 있습니다.
-
-즐기세요!
--- a/docs/ko/install.md
+++ b/docs/ko/install.md
@ -0,0 +1,30 @@
+## 시스템 요구사항
+
+- GPU 메모리: 12GB (추론)
+- 시스템: Linux, WSL
+
+## 설정
+
+먼저 오디오 처리에 사용되는 pyaudio와 sox를 설치해야 합니다.
+
+``` bash
+apt install portaudio19-dev libsox-dev ffmpeg
+```
+
+### Conda
+
+```bash
+conda create -n fish-speech python=3.12
+conda activate fish-speech
+
+pip install -e .
+```
+
+### UV
+
+```bash
+uv sync --python 3.12
+```
+
+!!! warning
+    `compile` 옵션은 Windows와 macOS에서 지원되지 않습니다. compile로 실행하려면 triton을 직접 설치해야 합니다.
--- a/docs/pt/index.md
+++ b/docs/pt/index.md
@ -1,4 +1,14 @@
-# Introdução
+# OpenAudio (anteriormente Fish-Speech)
+
+<div align="center">
+
+<div align="center">
+
+<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
+
+</div>
+
+<strong>Série Avançada de Modelos Text-to-Speech</strong>

 <div>
 <a target="_blank" href="https://discord.gg/Es5qTB9BcN">
@ -12,39 +22,113 @@
 </a>
 </div>

-!!! warning
-    Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área. <br/>
-    Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.
+<strong>Experimente agora:</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>Saiba mais:</strong> <a href="https://openaudio.com">Site OpenAudio</a>

-## Requisitos
+</div>

- Memória GPU: 12GB (Inferência)
- Sistema: Linux, Windows
+---

-## Configuração
+!!! warning "Aviso Legal"
+    Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área.
+    
+    **Licença:** Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.

-Primeiro, precisamos criar um ambiente conda para instalar os pacotes.
+## **Introdução**

-```bash
+Estamos empolgados em anunciar que mudamos nossa marca para **OpenAudio** - introduzindo uma nova série de modelos avançados de Text-to-Speech que se baseia na fundação do Fish-Speech com melhorias significativas e novas capacidades.

-conda create -n fish-speech python=3.12
-conda activate fish-speech
+**Openaudio-S1-mini**: [Vídeo](A ser carregado); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);

-pip install sudo apt-get install portaudio19-dev # Para pyaudio
-pip install -e . # Isso baixará todos os pacotes restantes.
+**Fish-Speech v1.5**: [Vídeo](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);

-apt install libsox-dev ffmpeg # Se necessário.
+## **Destaques** ✨
+
+### **Controle Emocional**
+O OpenAudio S1 **suporta uma variedade de marcadores emocionais, de tom e especiais** para aprimorar a síntese de fala:
+
+- **Emoções básicas**:
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
 ```

-!!! warning
-    A opção `compile` não é suportada no Windows e macOS, se você quiser executar com compile, precisa instalar o trition por conta própria.
+- **Emoções avançadas**:
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```

-## Agradecimentos
+- **Marcadores de tom**:
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```

- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- **Efeitos sonoros especiais**:
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+Você também pode usar Ha,ha,ha para controlar, há muitos outros casos esperando para serem explorados por você mesmo.
+
+### **Qualidade TTS Excelente**
+
+Utilizamos as métricas Seed TTS Eval para avaliar o desempenho do modelo, e os resultados mostram que o OpenAudio S1 alcança **0.008 WER** e **0.004 CER** em texto inglês, que é significativamente melhor que modelos anteriores. (Inglês, avaliação automática, baseada na transcrição OpenAI gpt-4o, distância do falante usando Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Modelo | Taxa de Erro de Palavras (WER) | Taxa de Erro de Caracteres (CER) | Distância do Falante |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **Dois Tipos de Modelos**
+
+| Modelo | Tamanho | Disponibilidade | Características |
+|-------|------|--------------|----------|
+| **S1** | 4B parâmetros | Disponível em [fish.audio](fish.audio) | Modelo principal com todas as funcionalidades |
+| **S1-mini** | 0.5B parâmetros | Disponível no huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Versão destilada com capacidades principais |
+
+Tanto o S1 quanto o S1-mini incorporam Aprendizado por Reforço Online com Feedback Humano (RLHF).
+
+## **Características**
+
+1. **TTS Zero-shot e Few-shot:** Insira uma amostra vocal de 10 a 30 segundos para gerar saída TTS de alta qualidade. **Para diretrizes detalhadas, veja [Melhores Práticas de Clonagem de Voz](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
+
+2. **Suporte Multilíngue e Cross-lingual:** Simplesmente copie e cole texto multilíngue na caixa de entrada—não precisa se preocupar com o idioma. Atualmente suporta inglês, japonês, coreano, chinês, francês, alemão, árabe e espanhol.
+
+3. **Sem Dependência de Fonemas:** O modelo tem fortes capacidades de generalização e não depende de fonemas para TTS. Pode lidar com texto em qualquer script de idioma.
+
+4. **Altamente Preciso:** Alcança uma baixa Taxa de Erro de Caracteres (CER) de cerca de 0,4% e Taxa de Erro de Palavras (WER) de cerca de 0,8% para Seed-TTS Eval.
+
+5. **Rápido:** Com aceleração fish-tech, o fator de tempo real é aproximadamente 1:5 em um laptop Nvidia RTX 4060 e 1:15 em um Nvidia RTX 4090.
+
+6. **Inferência WebUI:** Apresenta uma interface web fácil de usar baseada em Gradio, compatível com Chrome, Firefox, Edge e outros navegadores.
+
+7. **Inferência GUI:** Oferece uma interface gráfica PyQt6 que funciona perfeitamente com o servidor API. Suporta Linux, Windows e macOS. [Ver GUI](https://github.com/AnyaCoder/fish-speech-gui).
+
+8. **Amigável para Deploy:** Configure facilmente um servidor de inferência com suporte nativo para Linux, Windows e MacOS, minimizando a perda de velocidade.
+
+## **Isenção de Responsabilidade**
+
+Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte suas leis locais sobre DMCA e outras leis relacionadas.
+
+## **Mídia e Demos**
+
+#### 🚧 Em Breve
+Demonstrações em vídeo e tutoriais estão atualmente em desenvolvimento.
+
+## **Documentação**
+
+### Início Rápido
+- [Configurar Ambiente](install.md) - Configure seu ambiente de desenvolvimento
+- [Guia de Inferência](inference.md) - Execute o modelo e gere fala
+
+## **Comunidade e Suporte**
+
+- **Discord:** Junte-se à nossa [comunidade Discord](https://discord.gg/Es5qTB9BcN)
+- **Site:** Visite [OpenAudio.com](https://openaudio.com) para as últimas atualizações
+- **Experimente Online:** [Fish Audio Playground](https://fish.audio)
--- a/docs/pt/inference.md
+++ b/docs/pt/inference.md
@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
    --text "O texto que você quer converter" \
    --prompt-text "Seu texto de referência" \
    --prompt-tokens "fake.npy" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --num-samples 2 \
-    --compile # se você quiser uma velocidade mais rápida
+    --compile
 ```

 Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é um inteiro começando de 0.
@ -50,15 +48,12 @@ Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é u

 ### 3. Gerar vocais a partir de tokens semânticos:

-#### Decodificador VQGAN
-
 !!! warning "Aviso Futuro"
    Mantivemos a interface acessível do caminho original (tools/vqgan/inference.py), mas esta interface pode ser removida em versões subsequentes, então por favor altere seu código o mais breve possível.

 ```bash
 python fish_speech/models/dac/inference.py \
-    -i "codes_0.npy" \
-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
+    -i "codes_0.npy"
 ```

 ## Inferência com API HTTP
@ -103,5 +98,3 @@ python -m tools.run_webui

 !!! note
    Você pode usar variáveis de ambiente do Gradio, como `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` para configurar o WebUI.
-
-Divirta-se!
--- a/docs/pt/install.md
+++ b/docs/pt/install.md
@ -0,0 +1,30 @@
+## Requisitos
+
+- Memória GPU: 12GB (Inferência)
+- Sistema: Linux, WSL
+
+## Configuração
+
+Primeiro você precisa instalar pyaudio e sox, que são usados para processamento de áudio.
+
+``` bash
+apt install portaudio19-dev libsox-dev ffmpeg
+```
+
+### Conda
+
+```bash
+conda create -n fish-speech python=3.12
+conda activate fish-speech
+
+pip install -e .
+```
+
+### UV
+
+```bash
+uv sync --python 3.12
+```
+
+!!! warning
+    A opção `compile` não é suportada no Windows e macOS, se você quiser executar com compile, precisa instalar o triton por conta própria.
--- a/docs/zh/index.md
+++ b/docs/zh/index.md
@ -1,4 +1,14 @@
-# 简介
+# OpenAudio (原 Fish-Speech)
+
+<div align="center">
+
+<div align="center">
+
+<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
+
+</div>
+
+<strong>先进的文字转语音模型系列</strong>

 <div>
 <a target="_blank" href="https://discord.gg/Es5qTB9BcN">
@ -12,39 +22,113 @@
 </a>
 </div>

-!!! warning
-    我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA（数字千年版权法）和其他相关法律的规定。<br/>
-    此代码库在 Apache 2.0 许可证下发布，所有模型在 CC-BY-NC-SA-4.0 许可证下发布。
+<strong>立即试用：</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>了解更多：</strong> <a href="https://openaudio.com">OpenAudio 网站</a>

-## 系统要求
+</div>

- GPU 内存：12GB（推理）
- 系统：Linux、Windows
+---

-## 安装
+!!! warning "法律声明"
+    我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA（数字千年版权法）和其他相关法律的规定。
+    
+    **许可证：** 此代码库在 Apache 2.0 许可证下发布，所有模型在 CC-BY-NC-SA-4.0 许可证下发布。

-首先，我们需要创建一个 conda 环境来安装包。
+## **介绍**

-```bash
+我们很高兴地宣布，我们已经更名为 **OpenAudio** - 推出全新的先进文字转语音模型系列，在 Fish-Speech 的基础上进行了重大改进并增加了新功能。

-conda create -n fish-speech python=3.12
-conda activate fish-speech
+**Openaudio-S1-mini**: [视频](即将上传); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);

-pip install sudo apt-get install portaudio19-dev # 用于 pyaudio
-pip install -e . # 这将下载所有其余的包。
+**Fish-Speech v1.5**: [视频](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);

-apt install libsox-dev ffmpeg # 如果需要的话。
+## **亮点** ✨
+
+### **情感控制**
+OpenAudio S1 **支持多种情感、语调和特殊标记**来增强语音合成效果：
+
+- **基础情感**：
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
 ```

-!!! warning
-    `compile` 选项在 Windows 和 macOS 上不受支持，如果您想使用 compile 运行，需要自己安装 trition。
+- **高级情感**：
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```

-## 致谢
+- **语调标记**：
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```

- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- **特殊音效**：
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+您还可以使用 Ha,ha,ha 来控制，还有许多其他用法等待您自己探索。
+
+### **卓越的 TTS 质量**
+
+我们使用 Seed TTS 评估指标来评估模型性能，结果显示 OpenAudio S1 在英文文本上达到了 **0.008 WER** 和 **0.004 CER**，明显优于以前的模型。（英语，自动评估，基于 OpenAI gpt-4o-转录，说话人距离使用 Revai/pyannote-wespeaker-voxceleb-resnet34-LM）
+
+| 模型 | 词错误率 (WER) | 字符错误率 (CER) | 说话人距离 |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **两种模型类型**
+
+| 模型 | 规模 | 可用性 | 特性 |
+|-------|------|--------------|----------|
+| **S1** | 40亿参数 | 在 [fish.audio](fish.audio) 上可用 | 功能齐全的旗舰模型 |
+| **S1-mini** | 5亿参数 | 在 huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) 上可用 | 具有核心功能的蒸馏版本 |
+
+S1 和 S1-mini 都集成了在线人类反馈强化学习 (RLHF)。
+
+## **功能特性**
+
+1. **零样本和少样本 TTS：** 输入 10 到 30 秒的语音样本即可生成高质量的 TTS 输出。**详细指南请参见 [语音克隆最佳实践](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)。**
+
+2. **多语言和跨语言支持：** 只需复制粘贴多语言文本到输入框即可——无需担心语言问题。目前支持英语、日语、韩语、中文、法语、德语、阿拉伯语和西班牙语。
+
+3. **无音素依赖：** 该模型具有强大的泛化能力，不依赖音素进行 TTS。它可以处理任何语言文字的文本。
+
+4. **高度准确：** 在 Seed-TTS Eval 中实现低字符错误率 (CER) 约 0.4% 和词错误率 (WER) 约 0.8%。
+
+5. **快速：** 通过 fish-tech 加速，在 Nvidia RTX 4060 笔记本电脑上实时因子约为 1:5，在 Nvidia RTX 4090 上约为 1:15。
+
+6. **WebUI 推理：** 具有易于使用的基于 Gradio 的网络界面，兼容 Chrome、Firefox、Edge 和其他浏览器。
+
+7. **GUI 推理：** 提供与 API 服务器无缝配合的 PyQt6 图形界面。支持 Linux、Windows 和 macOS。[查看 GUI](https://github.com/AnyaCoder/fish-speech-gui)。
+
+8. **部署友好：** 轻松设置推理服务器，原生支持 Linux、Windows 和 MacOS，最小化速度损失。
+
+## **免责声明**
+
+我们不对代码库的任何非法使用承担责任。请参考您当地关于 DMCA 和其他相关法律的规定。
+
+## **媒体和演示**
+
+#### 🚧 即将推出
+视频演示和教程正在开发中。
+
+## **文档**
+
+### 快速开始
+- [构建环境](install.md) - 设置您的开发环境
+- [推理指南](inference.md) - 运行模型并生成语音
+
+## **社区和支持**
+
+- **Discord：** 加入我们的 [Discord 社区](https://discord.gg/Es5qTB9BcN)
+- **网站：** 访问 [OpenAudio.com](https://openaudio.com) 获取最新更新
+- **在线试用：** [Fish Audio Playground](https://fish.audio)
--- a/docs/zh/inference.md
+++ b/docs/zh/inference.md
@ -1,6 +1,6 @@
 # 推理

-由于声码器模型已更改，您需要比以前更多的显存，建议使用12GB显存以便流畅推理。
+由于声码器模型已更改，您需要比以前更多的 VRAM，建议使用 12GB 进行流畅推理。

 我们支持命令行、HTTP API 和 WebUI 进行推理，您可以选择任何您喜欢的方法。

@ -17,7 +17,7 @@ huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/ope
 !!! note
    如果您计划让模型随机选择音色，可以跳过此步骤。

-### 1. 从参考音频获取VQ tokens
+### 1. 从参考音频获取 VQ 令牌

 ```bash
 python fish_speech/models/dac/inference.py \
@ -27,38 +27,33 @@ python fish_speech/models/dac/inference.py \

 您应该会得到一个 `fake.npy` 和一个 `fake.wav`。

-### 2. 从文本生成语义tokens：
+### 2. 从文本生成语义令牌：

 ```bash
 python fish_speech/models/text2semantic/inference.py \
    --text "您想要转换的文本" \
    --prompt-text "您的参考文本" \
    --prompt-tokens "fake.npy" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --num-samples 2 \
-    --compile # 如果您想要更快的速度
+    --compile
 ```

-此命令将在工作目录中创建一个 `codes_N` 文件，其中N是从0开始的整数。
+此命令将在工作目录中创建一个 `codes_N` 文件，其中 N 是从 0 开始的整数。

 !!! note
-    您可能想要使用 `--compile` 来融合CUDA内核以获得更快的推理速度（约30 tokens/秒 -> 约500 tokens/秒）。
-    相应地，如果您不打算使用加速，可以删除 `--compile` 参数的注释。
+    您可能希望使用 `--compile` 来融合 CUDA 内核以实现更快的推理（~30 令牌/秒 -> ~500 令牌/秒）。
+    相应地，如果您不计划使用加速，可以注释掉 `--compile` 参数。

 !!! info
-    对于不支持bf16的GPU，您可能需要使用 `--half` 参数。
+    对于不支持 bf16 的 GPU，您可能需要使用 `--half` 参数。

-### 3. 从语义tokens生成人声：
-
-#### VQGAN 解码器
+### 3. 从语义令牌生成声音：

 !!! warning "未来警告"
-    我们保留了从原始路径（tools/vqgan/inference.py）访问的接口，但此接口可能在后续版本中被移除，请尽快更改您的代码。
+    我们保留了从原始路径（tools/vqgan/inference.py）访问接口的能力，但此接口可能在后续版本中被删除，因此请尽快更改您的代码。

 ```bash
 python fish_speech/models/dac/inference.py \
    -i "codes_0.npy" \
-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
 ```

 ## HTTP API 推理
--- a/docs/zh/install.md
+++ b/docs/zh/install.md
@ -0,0 +1,30 @@
+## 系统要求
+
+- GPU 内存：12GB（推理）
+- 系统：Linux、WSL
+
+## 安装
+
+首先需要安装 pyaudio 和 sox，用于音频处理。
+
+``` bash
+apt install portaudio19-dev libsox-dev ffmpeg
+```
+
+### Conda
+
+```bash
+conda create -n fish-speech python=3.12
+conda activate fish-speech
+
+pip install -e .
+```
+
+### UV
+
+```bash
+uv sync --python 3.12
+```
+
+!!! warning
+    `compile` 选项在 Windows 和 macOS 上不受支持，如果您想使用 compile 运行，需要自己安装 triton。
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -1,4 +1,4 @@
-site_name: Fish Speech
+site_name: OpenAudio
 site_description: Targeting SOTA TTS solutions.
 site_url: https://speech.fish.audio

@ -12,7 +12,7 @@ copyright: Copyright &copy; 2023-2025 by Fish Audio

 theme:
  name: material
-  favicon: assets/figs/logo-circle.png
+  favicon: assets/openaudio.png
  language: en
  features:
    - content.action.edit
@ -25,8 +25,7 @@ theme:
    - search.highlight
    - search.share
    - content.code.copy
-  icon:
-    logo: fontawesome/solid/fish
+  logo: assets/openaudio.png

  palette:
    # Palette toggle for automatic mode
@ -56,7 +55,8 @@ theme:
        code: Roboto Mono

 nav:
-  - Installation: en/index.md
+  - Introduction: en/index.md
+  - Installation: en/install.md
  - Inference: en/inference.md

 # Plugins
@ -80,25 +80,29 @@ plugins:
          name: 简体中文
          build: true
          nav:
-            - 安装: zh/index.md
+            - 介绍: zh/index.md
+            - 安装: zh/install.md
            - 推理: zh/inference.md
        - locale: ja
          name: 日本語
          build: true
          nav:
-            - インストール: ja/index.md
+            - はじめに: ja/index.md
+            - インストール: ja/install.md
            - 推論: ja/inference.md
        - locale: pt
          name: Português (Brasil)
          build: true
          nav:
-            - Instalação: pt/index.md
+            - Introdução: pt/index.md
+            - Instalação: pt/install.md
            - Inferência: pt/inference.md
        - locale: ko
          name: 한국어
          build: true
          nav:
-            - 설치: ko/index.md
+            - 소개: ko/index.md
+            - 설치: ko/install.md
            - 추론: ko/inference.md

 markdown_extensions: