2024-07-09 08:28:29 +08:00
|
|
|
{
|
|
|
|
"cells": [
|
|
|
|
{
|
|
|
|
"cell_type": "markdown",
|
|
|
|
"metadata": {},
|
|
|
|
"source": [
|
2024-07-18 08:15:59 -04:00
|
|
|
"# Fish Speech"
|
2024-07-09 08:28:29 +08:00
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "markdown",
|
|
|
|
"metadata": {},
|
|
|
|
"source": [
|
2024-07-18 20:19:53 +08:00
|
|
|
"### For Windows User / win用户"
|
2024-07-09 08:28:29 +08:00
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "code",
|
|
|
|
"execution_count": null,
|
|
|
|
"metadata": {
|
|
|
|
"vscode": {
|
|
|
|
"languageId": "bat"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"outputs": [],
|
|
|
|
"source": [
|
|
|
|
"!chcp 65001"
|
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "markdown",
|
|
|
|
"metadata": {},
|
|
|
|
"source": [
|
2024-07-18 20:19:53 +08:00
|
|
|
"### For Linux User / Linux 用户"
|
2024-07-09 08:28:29 +08:00
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "code",
|
|
|
|
"execution_count": null,
|
|
|
|
"metadata": {},
|
|
|
|
"outputs": [],
|
|
|
|
"source": [
|
|
|
|
"import locale\n",
|
|
|
|
"locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')"
|
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "markdown",
|
|
|
|
"metadata": {},
|
|
|
|
"source": [
|
2024-07-18 08:15:59 -04:00
|
|
|
"### Prepare Model"
|
2024-07-09 08:28:29 +08:00
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "code",
|
|
|
|
"execution_count": null,
|
2024-07-18 08:15:59 -04:00
|
|
|
"metadata": {},
|
2024-07-09 08:28:29 +08:00
|
|
|
"outputs": [],
|
|
|
|
"source": [
|
2024-07-18 08:15:59 -04:00
|
|
|
"# For Chinese users, you probably want to use mirror to accelerate downloading\n",
|
|
|
|
"# !set HF_ENDPOINT=https://hf-mirror.com\n",
|
|
|
|
"# !export HF_ENDPOINT=https://hf-mirror.com \n",
|
|
|
|
"\n",
|
2025-06-03 13:02:38 +08:00
|
|
|
"!huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini/"
|
2024-07-09 08:28:29 +08:00
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "markdown",
|
|
|
|
"metadata": {},
|
|
|
|
"source": [
|
2024-07-18 08:15:59 -04:00
|
|
|
"## WebUI Inference\n",
|
|
|
|
"\n",
|
|
|
|
"> You can use --compile to fuse CUDA kernels for faster inference (10x)."
|
2024-07-09 08:28:29 +08:00
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
2024-07-18 08:15:59 -04:00
|
|
|
"cell_type": "code",
|
|
|
|
"execution_count": null,
|
2024-07-20 10:00:52 +08:00
|
|
|
"metadata": {
|
|
|
|
"vscode": {
|
|
|
|
"languageId": "shellscript"
|
|
|
|
}
|
|
|
|
},
|
2024-07-18 08:15:59 -04:00
|
|
|
"outputs": [],
|
2024-07-09 08:28:29 +08:00
|
|
|
"source": [
|
Make WebUI and API code cleaner (+ 1.5 fixes) (#703)
* rename webui.py to run_webui.py
* remove unused imports
* remove unsued code
* move inference code and fix all warnings
* move web app code
* make code easier to read
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove unused function
* remove msgpack_api.py
* rename API files
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* finish updating the doc with the new file names
* finish updating the doc with the new file names
* fix CPU use in the API
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* refactor WebUIinference in a class with submodules
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* re-enable streaming in webui inference code
* generalize inference code in webui
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* minor fix
* make a unique inference engine class
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* minor fix
* cleaning code
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* implement new structure of the API (not working)
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* refactor API
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* minor fixes
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* reimplement chat endpoint
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-07 07:13:19 +01:00
|
|
|
"!python tools/run_webui.py \\\n",
|
2025-06-02 20:04:23 -04:00
|
|
|
" --llama-checkpoint-path checkpoints/openaudio-s1-mini \\\n",
|
2025-06-03 13:02:38 +08:00
|
|
|
" --decoder-checkpoint-path checkpoints/openaudio-s1-mini/codec.pth \\\n",
|
2024-07-18 08:15:59 -04:00
|
|
|
" # --compile"
|
2024-07-09 08:28:29 +08:00
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
2024-07-18 08:15:59 -04:00
|
|
|
"cell_type": "markdown",
|
2024-07-09 08:28:29 +08:00
|
|
|
"metadata": {},
|
|
|
|
"source": [
|
2024-07-18 08:15:59 -04:00
|
|
|
"## Break-down CLI Inference"
|
2024-07-09 08:28:29 +08:00
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "markdown",
|
|
|
|
"metadata": {},
|
|
|
|
"source": [
|
2024-07-18 20:19:53 +08:00
|
|
|
"### 1. Encode reference audio: / 从语音生成 prompt: \n",
|
2024-07-09 08:28:29 +08:00
|
|
|
"\n",
|
2024-07-18 20:19:53 +08:00
|
|
|
"You should get a `fake.npy` file.\n",
|
|
|
|
"\n",
|
|
|
|
"你应该能得到一个 `fake.npy` 文件."
|
2024-07-09 08:28:29 +08:00
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "code",
|
|
|
|
"execution_count": null,
|
|
|
|
"metadata": {
|
|
|
|
"vscode": {
|
|
|
|
"languageId": "shellscript"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"outputs": [],
|
|
|
|
"source": [
|
2024-07-18 08:15:59 -04:00
|
|
|
"## Enter the path to the audio file here\n",
|
2024-07-20 10:00:52 +08:00
|
|
|
"src_audio = r\"D:\\PythonProject\\vo_hutao_draw_appear.wav\"\n",
|
2024-07-09 08:28:29 +08:00
|
|
|
"\n",
|
2025-06-03 13:02:38 +08:00
|
|
|
"!python fish_speech/models/dac/inference.py \\\n",
|
2024-07-09 08:28:29 +08:00
|
|
|
" -i {src_audio} \\\n",
|
2025-06-03 13:02:38 +08:00
|
|
|
" --checkpoint-path \"checkpoints/openaudio-s1-mini/codec.pth\"\n",
|
2024-07-09 08:28:29 +08:00
|
|
|
"\n",
|
|
|
|
"from IPython.display import Audio, display\n",
|
|
|
|
"audio = Audio(filename=\"fake.wav\")\n",
|
|
|
|
"display(audio)"
|
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "markdown",
|
|
|
|
"metadata": {},
|
|
|
|
"source": [
|
2024-07-18 20:19:53 +08:00
|
|
|
"### 2. Generate semantic tokens from text: / 从文本生成语义 token:\n",
|
|
|
|
"\n",
|
|
|
|
"> This command will create a codes_N file in the working directory, where N is an integer starting from 0.\n",
|
|
|
|
"\n",
|
|
|
|
"> You may want to use `--compile` to fuse CUDA kernels for faster inference (~30 tokens/second -> ~300 tokens/second).\n",
|
|
|
|
"\n",
|
|
|
|
"> 该命令会在工作目录下创建 codes_N 文件, 其中 N 是从 0 开始的整数.\n",
|
2024-07-09 08:28:29 +08:00
|
|
|
"\n",
|
2024-07-18 20:19:53 +08:00
|
|
|
"> 您可以使用 `--compile` 来融合 cuda 内核以实现更快的推理 (~30 tokens/秒 -> ~300 tokens/秒)"
|
2024-07-09 08:28:29 +08:00
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "code",
|
|
|
|
"execution_count": null,
|
|
|
|
"metadata": {
|
|
|
|
"vscode": {
|
|
|
|
"languageId": "shellscript"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"outputs": [],
|
|
|
|
"source": [
|
2025-01-10 06:34:32 +01:00
|
|
|
"!python fish_speech/models/text2semantic/inference.py \\\n",
|
2024-07-18 08:15:59 -04:00
|
|
|
" --text \"hello world\" \\\n",
|
|
|
|
" --prompt-text \"The text corresponding to reference audio\" \\\n",
|
2024-07-09 08:28:29 +08:00
|
|
|
" --prompt-tokens \"fake.npy\" \\\n",
|
2025-06-02 20:04:23 -04:00
|
|
|
" --checkpoint-path \"checkpoints/openaudio-s1-mini\" \\\n",
|
2024-07-09 08:28:29 +08:00
|
|
|
" --num-samples 2\n",
|
|
|
|
" # --compile"
|
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "markdown",
|
|
|
|
"metadata": {},
|
|
|
|
"source": [
|
2024-07-20 10:00:52 +08:00
|
|
|
"### 3. Generate speech from semantic tokens: / 从语义 token 生成人声:"
|
2024-07-09 08:28:29 +08:00
|
|
|
]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"cell_type": "code",
|
|
|
|
"execution_count": null,
|
|
|
|
"metadata": {
|
|
|
|
"vscode": {
|
|
|
|
"languageId": "shellscript"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"outputs": [],
|
|
|
|
"source": [
|
2025-06-03 13:02:38 +08:00
|
|
|
"!python fish_speech/models/dac/inference.py \\\n",
|
2024-07-09 08:28:29 +08:00
|
|
|
" -i \"codes_0.npy\" \\\n",
|
2025-06-03 13:02:38 +08:00
|
|
|
" --checkpoint-path \"checkpoints/openaudio-s1-mini/codec.pth\"\n",
|
2024-07-09 08:28:29 +08:00
|
|
|
"\n",
|
|
|
|
"from IPython.display import Audio, display\n",
|
|
|
|
"audio = Audio(filename=\"fake.wav\")\n",
|
|
|
|
"display(audio)"
|
|
|
|
]
|
|
|
|
}
|
|
|
|
],
|
|
|
|
"metadata": {
|
|
|
|
"kernelspec": {
|
|
|
|
"display_name": "Python 3",
|
|
|
|
"language": "python",
|
|
|
|
"name": "python3"
|
|
|
|
},
|
|
|
|
"language_info": {
|
|
|
|
"codemirror_mode": {
|
|
|
|
"name": "ipython",
|
|
|
|
"version": 3
|
|
|
|
},
|
|
|
|
"file_extension": ".py",
|
|
|
|
"mimetype": "text/x-python",
|
|
|
|
"name": "python",
|
|
|
|
"nbconvert_exporter": "python",
|
|
|
|
"pygments_lexer": "ipython3",
|
|
|
|
"version": "3.10.14"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"nbformat": 4,
|
|
|
|
"nbformat_minor": 2
|
|
|
|
}
|