Merge branch 'w-okada:master' into master

This commit is contained in:
Eidenz 2023-08-09 11:41:11 +02:00 committed by GitHub
commit 11219c11f6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
67 changed files with 6554 additions and 2612 deletions

View File

@ -117,6 +117,16 @@ body:
id: issue id: issue
attributes: attributes:
label: Situation label: Situation
description: Developers spend a lot of time developing new features and resolving issues. If you really want to get it solved, please provide as much reproducible information and logs as possible. Provide logs on the terminal and capture the window. description: Developers spend a lot of time developing new features and resolving issues. If you really want to get it solved, please provide as much reproducible information and logs as possible. Provide logs on the terminal and capture the appkication window.
- type: textarea
id: capture
attributes:
label: application window capture
description: the appkication window.
- type: textarea
id: logs-on-terminal
attributes:
label: logs on terminal
description: logs on terminal.
validations: validations:
required: true required: true

36
.github/workflows/cla.yml vendored Normal file
View File

@ -0,0 +1,36 @@
name: "CLA Assistant"
on:
issue_comment:
types: [created]
pull_request_target:
types: [opened, closed, synchronize]
jobs:
CLAssistant:
runs-on: ubuntu-latest
steps:
- name: "CLA Assistant"
if: (github.event.comment.body == 'recheck' || github.event.comment.body == 'I have read the CLA Document and I hereby sign the CLA') || github.event_name == 'pull_request_target'
# Beta Release
uses: cla-assistant/github-action@v2.1.3-beta
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# the below token should have repo scope and must be manually added by you in the repository's secret
PERSONAL_ACCESS_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
with:
path-to-signatures: "signatures/version1/cla.json"
path-to-document: "https://raw.githubusercontent.com/w-okada/voice-changer/master/LICENSE-CLA" # e.g. a CLA or a DCO document
# branch should not be protected
branch: "master"
#allowlist: user1,bot*
#below are the optional inputs - If the optional inputs are not given, then default values will be taken
#remote-organization-name: enter the remote organization name where the signatures should be stored (Default is storing the signatures in the same repository)
#remote-repository-name: enter the remote repository name where the signatures should be stored (Default is storing the signatures in the same repository)
#create-file-commit-message: 'For example: Creating file for storing CLA Signatures'
#signed-commit-message: 'For example: $contributorName has signed the CLA in #$pullRequestNo'
#custom-notsigned-prcomment: 'pull request comment with Introductory message to ask new contributors to sign'
#custom-pr-sign-comment: 'The signature to be committed in order to sign the CLA'
#custom-allsigned-prcomment: 'pull request comment when all contributors has signed, defaults to **CLA Assistant Lite bot** All Contributors have signed the CLA.'
#lock-pullrequest-aftermerge: false - if you don't want this bot to automatically lock the pull request after merging (default - true)
#use-dco-flag: true - If you are using DCO instead of CLA

70
LICENSE
View File

@ -20,7 +20,6 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE. SOFTWARE.
MIT License MIT License
Copyright (c) 2022 Isle Tennos Copyright (c) 2022 Isle Tennos
@ -63,4 +62,71 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE. SOFTWARE.
MIT License
Copyright (c) 2023 liujing04
Copyright (c) 2023 源文雨
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
MIT License
Copyright (c) 2023 yxlllc
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
MIT License
Copyright (c) 2023 yxlllc
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

27
LICENSE-CLA Normal file
View File

@ -0,0 +1,27 @@
Contributor License Agreement
Copyright (c) 2022 Wataru Okada
本契約は、当社とあなた(以下、"貢献者"とします)の間で締結され、貢献者が当社に対してソフトウェアプロジェクト(以下、"プロジェクト"とします)に対する貢献(以下、"貢献"とします)を提供する際の条件を定めます。
1. 貢献者は、提供する貢献が、貢献者自身のオリジナルな作品であり、商標、著作権、特許、または他の知的財産権を侵害していないことを保証します。
2. 貢献者は、貢献を当社に対して無償で提供し、当社はそれを無制限に使用、複製、修正、公開、配布、サブライセンスを付与し、またその販売する権利を得ることに同意します。
3. 本契約が終了した場合でも、第 2 項で述べた権利は当社に留保されます。
4. 当社は貢献者の貢献を受け入れる義務を負わず、また貢献者に一切の補償をする義務を負わないことに貢献者は同意します。
5. 本契約は当社と貢献者双方の書面による合意により修正されることがあります。
"This Agreement is made between our Company and you (hereinafter referred to as "Contributor") and outlines the terms under which you provide your Contributions (hereinafter referred to as "Contributions") to our software project (hereinafter referred to as "Project").
1. You warrant that the Contributions you are providing are your original work and do not infringe any trademark, copyright, patent, or other intellectual property rights.
2. You agree to provide your Contributions to the Company for free, and the Company has the unlimited right to use, copy, modify, publish, distribute, and sublicense, and also sell the Contributions.
3. Even after the termination of this Agreement, the rights mentioned in the above clause will be retained by the Company.
4. The Company is under no obligation to accept your Contributions or to compensate you in any way for them, and you agree to this.
5. This Agreement may be modified by written agreement between the Company and the Contributor."

115
README.md
View File

@ -4,74 +4,19 @@
## What's New! ## What's New!
- v.1.5.3.10b - v.1.5.3.12
- improve:
- logger
- bugfix:
- RMVPE:different device bug (not finding root caused yet)
- RVC: when loading sample model, useIndex issue
- v.1.5.3.10a
- Improvement:
- launch sequence
- onnx export process
- error handling in client
- bugfix:
- RMVPE for mac
- v.1.5.3.10
- New Feature
- Support Diffusion SVC(only combo model)
- System audio capture(only for win)
- Support RMVPE
- improvement
- directml: set device id
- some bugfixes:
- noise suppression2
- etc.
- v.1.5.3.9a
- some improvements:
- keep f0 detector setting
- MMVC: max chunksize for onnx
- etc
- some bugfixs:
- RVC: crepe fail to estimate f0
- RVC: fallback from half-precision when half-precision failed.
- etc
- v.1.5.3.9
- New feature:
- Add Crepe Full/Tiny (onnx)
- some improvements:
- server info includes python version
- contentvec onnx support
- etc
- some bugfixs:
- server device mode chuttering
- new model add sample rate
- etc
- v.1.5.3.8a
- Bugfix(test): force client device samplerate
- Bugfix: server device filter
- v.1.5.3.8
- RVC: performance improvement ([PR](https://github.com/w-okada/voice-changer/pull/371) from [nadare881](https://github.com/nadare881))
- v.1.5.3.7
- Feature: - Feature:
- server device monitor - Pass through mode
- Bugfix: - bugfix:
- device output recorder button is showed in server device mode. - Adapted the GUI to the number of slots.
- v.1.5.3.11
- improve:
- increase slot size
- bugfix:
- m1 mac: eliminate torchaudio
# VC Client とは # VC Client とは
@ -125,32 +70,14 @@
- ダウンロードはこちらから。 - ダウンロードはこちらから。
| Version | OS | フレームワーク | link | サポート VC | サイズ | | Version | OS | フレームワーク | link | サポート VC | サイズ |
| ----------- | --- | ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- | ------ | | ---------- | --- | ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- | ------ |
| v.1.5.3.10b | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1akrb9RicU1-cldisToBedaM08y8pFQae&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, Diffusion-SVC | 795MB | | v.1.5.3.12 | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1rC7IVpzfG68Ps6tBmdFIjSXvTNaUKBf6&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 797MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1eZB0u2u0tEB1tR9mp06YiKx96x2oxgrN&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3237MB | | | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1OqxS_jve4qvj71DdSGOrhI8DGaEVRzgs&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3241MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1gzWuEN7oY_WdBwOEwtDdNaK2rT0nHvfN&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3122MB | | | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1HhfmMovujzbOmvCi7WPuqQAuuo7jaM1o&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3126MB |
| v.1.5.3.10a | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1_fLdFVswhOGwjRiQj4YWE-YZTO_GsnrA&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, Diffusion-SVC | 795MB | | v.1.5.3.11 | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1cutPICJa-PI_ww0E3ae9FCuSjY_5PnWE&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, | 795MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1imaTBgWBb9ICkNy9pN6NxBISI6SzhEfL&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3237MB | | | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1aOkc-QhtAj11gI8i335mHhNMUSESeJ5J&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3237MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1GoijW29pjscdvxMhi8xgvPSvcHCGYwXO&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3122MB | | | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=16g33cZ925HNty_0Hly7Aw_nXlQlgqxDC&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3122MB |
| v.1.5.3.10 | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1useZ4gcI0la5OhPuvt2j94CbAhWikpV4&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, Diffusion-SVC | 795MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=13abR2xs4KmNIg9b5RJXFez9g6zwZqMj4&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3237MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1ZxPp-HF7vSEJ8m00WnQaGbo4bTN4LqYD&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3122MB |
| v.1.5.3.9a | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1GsPTUTUbMvwNwAA8SGvSplwsf-yui0iw&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 794MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1eKZCozh37QDfAr33ZG7lGFUOQv1tOooR&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3237MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1sxUNBPkeSPPNOE1ZknVF-0kx2jHP3kN6&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3122MB |
| v.1.5.3.9 | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1pTTcTseSdIfCyNUjB-K1mYPg9YocSYz6&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 795MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1KWg-QoF6XmLbkUav-fmxc7bdAcD3844V&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3238MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1_TXUkDcofYz9mJd2L1ajAoyIBCQF29WL&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3123MB |
| v.1.5.3.8a | mac | ONNX(cpu), PyTorch(cpu,mps) | [normal](https://drive.google.com/uc?id=1hg6lynE3wWJTNTParTa2qB2L06OL9KJ9&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 794MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1C9PCu8pdafO6jJ2yCaB7x54Ls7LcM0Xc&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3122MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1bzrGhHPc9GdaRAMxkksTGtbuRLEeBx9i&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3237MB |
| v.1.5.3.8 | mac | ONNX(cpu), PyTorch(cpu,mps) | [normal](https://drive.google.com/uc?id=1ptmjFCRDW7M0l80072JVRII5tJpF13__&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 794MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=19DfeACmpnzqCVH5bIoFunS2pGPABRuso&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3122MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1AYP_hMdoeacX0KiF31Vd3oEjxwdreSbM&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3237MB |
| v.1.5.3.7 | mac | ONNX(cpu), PyTorch(cpu,mps) | [normal](https://drive.google.com/uc?id=1HdJwgo0__vR6pAkOkekejUZJ0lu2NfDs&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 794MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1JIF4PvKg-8HNUv_fMaXSM3AeYa-F_c4z&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3237MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1cJzRHmD3vk6av0Dvwj3v9Ef5KUsQYhKv&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3122MB |
(\*1) Google Drive からダウンロードできない方は[hugging_face](https://huggingface.co/wok000/vcclient000/tree/main)からダウンロードしてみてください (\*1) Google Drive からダウンロードできない方は[hugging_face](https://huggingface.co/wok000/vcclient000/tree/main)からダウンロードしてみてください
(\*2) 開発者が AMD のグラフィックボードを持っていないので動作確認していません。onnxruntime-directml を同梱しただけのものです。 (\*2) 開発者が AMD のグラフィックボードを持っていないので動作確認していません。onnxruntime-directml を同梱しただけのものです。
@ -255,3 +182,7 @@ Github Pages 上で実行できるため、ブラウザのみあれば様々な
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1tmTMJRRggS2Sb4goU-eHlRvUBR88RZDl&export=download) \*1 | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2872MB | | | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1tmTMJRRggS2Sb4goU-eHlRvUBR88RZDl&export=download) \*1 | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2872MB |
| v.1.5.3.1 | mac | ONNX(cpu), PyTorch(cpu,mps) | [normal](https://drive.google.com/uc?id=1oswF72q_cQQeXhIn6W275qLnoBAmcrR_&export=download) \*1 | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 796MB | | v.1.5.3.1 | mac | ONNX(cpu), PyTorch(cpu,mps) | [normal](https://drive.google.com/uc?id=1oswF72q_cQQeXhIn6W275qLnoBAmcrR_&export=download) \*1 | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 796MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1AWjDhW4w2Uljp1-9P8YUJBZsIlnhkJX2&export=download) \*1 | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2872MB | | | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1AWjDhW4w2Uljp1-9P8YUJBZsIlnhkJX2&export=download) \*1 | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2872MB |
# For Contributor
このリポジトリは[CLA](https://raw.githubusercontent.com/w-okada/voice-changer/master/LICENSE-CLA)を設定しています。

View File

@ -52,6 +52,8 @@ $ python3 MMVCServerSIO.py -p 18888 --https true \
``` ```
Access with Browser (currently only chrome is supported), then you can see gui.
2-1. Trouble shoot 2-1. Trouble shoot
(1) OSError: PortAudio library not found (1) OSError: PortAudio library not found

View File

@ -51,6 +51,8 @@ $ python3 MMVCServerSIO.py -p 18888 --https true \
--samples samples.json --samples samples.json
``` ```
ブラウザ(Chrome のみサポート)でアクセスすると画面が表示されます。
2-1. トラブルシュート 2-1. トラブルシュート
(1) OSError: PortAudio library not found (1) OSError: PortAudio library not found

View File

@ -4,74 +4,19 @@
## What's New! ## What's New!
- v.1.5.3.10b - v.1.5.3.12
- improve:
- logger
- bugfix:
- RMVPE:different device bug (not finding root caused yet)
- RVC: when loading sample model, useIndex issue
- v.1.5.3.10a
- Improvement:
- launch sequence
- onnx export process
- error handling in client
- bugfix:
- RMVPE for mac
- v.1.5.3.10
- New Feature
- Support Diffusion SVC(only combo model)
- System audio capture(only for win)
- Support RMVPE
- improvement
- directml: set device id
- some bugfixes:
- noise suppression2
- etc.
- v.1.5.3.9a
- some improvements:
- keep f0 detector setting
- MMVC: max chunksize for onnx
- etc
- some bugfixs:
- RVC: crepe fail to estimate f0
- RVC: fallback from half-precision when half-precision failed.
- etc
- v.1.5.3.9
- New feature:
- Add Crepe Full/Tiny (onnx)
- some improvements:
- server info includes python version
- contentvec onnx support
- etc
- some bugfixs:
- server device mode chuttering
- new model add sample rate
- etc
- v.1.5.3.8a
- Bugfix(test): force client device samplerate
- Bugfix: server device filter
- v.1.5.3.8
- RVC: performance improvement ([PR](https://github.com/w-okada/voice-changer/pull/371) from [nadare881](https://github.com/nadare881))
- v.1.5.3.7
- Feature: - Feature:
- server device monitor - Pass through mode
- Bugfix: - bugfix:
- device output recorder button is showed in server device mode. - Adapted the GUI to the number of slots.
- v.1.5.3.11
- improve:
- increase slot size
- bugfix:
- m1 mac: eliminate torchaudio
# What is VC Client # What is VC Client
@ -122,32 +67,14 @@ It can be used in two main ways, in order of difficulty:
- Download (When you cannot download from google drive, try [hugging_face](https://huggingface.co/wok000/vcclient000/tree/main)) - Download (When you cannot download from google drive, try [hugging_face](https://huggingface.co/wok000/vcclient000/tree/main))
| Version | OS | Framework | link | support VC | size | | Version | OS | Framework | link | support VC | size |
| ----------- | --- | ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- | ------ | | ---------- | --- | ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- | ------ |
| v.1.5.3.10b | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1akrb9RicU1-cldisToBedaM08y8pFQae&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, Diffusion-SVC | 795MB | | v.1.5.3.12 | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1rC7IVpzfG68Ps6tBmdFIjSXvTNaUKBf6&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 797MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1eZB0u2u0tEB1tR9mp06YiKx96x2oxgrN&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3237MB | | | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1OqxS_jve4qvj71DdSGOrhI8DGaEVRzgs&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3241MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1gzWuEN7oY_WdBwOEwtDdNaK2rT0nHvfN&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3122MB | | | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1HhfmMovujzbOmvCi7WPuqQAuuo7jaM1o&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3126MB |
| v.1.5.3.10a | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1_fLdFVswhOGwjRiQj4YWE-YZTO_GsnrA&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, Diffusion-SVC | 795MB | | v.1.5.3.11 | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1cutPICJa-PI_ww0E3ae9FCuSjY_5PnWE&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, | 795MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1imaTBgWBb9ICkNy9pN6NxBISI6SzhEfL&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3237MB | | | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1aOkc-QhtAj11gI8i335mHhNMUSESeJ5J&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3237MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1GoijW29pjscdvxMhi8xgvPSvcHCGYwXO&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3122MB | | | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=16g33cZ925HNty_0Hly7Aw_nXlQlgqxDC&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3122MB |
| v.1.5.3.10 | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1useZ4gcI0la5OhPuvt2j94CbAhWikpV4&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, Diffusion-SVC | 795MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=13abR2xs4KmNIg9b5RJXFez9g6zwZqMj4&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3237MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1ZxPp-HF7vSEJ8m00WnQaGbo4bTN4LqYD&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC, Diffusion-SVC | 3122MB |
| v.1.5.3.9a | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1GsPTUTUbMvwNwAA8SGvSplwsf-yui0iw&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 794MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1eKZCozh37QDfAr33ZG7lGFUOQv1tOooR&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3237MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1sxUNBPkeSPPNOE1ZknVF-0kx2jHP3kN6&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3122MB |
| v.1.5.3.9 | mac | ONNX(cpu), PyTorch(cpu,mps) | [google](https://drive.google.com/uc?id=1pTTcTseSdIfCyNUjB-K1mYPg9YocSYz6&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 795MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1KWg-QoF6XmLbkUav-fmxc7bdAcD3844V&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3238MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [google](https://drive.google.com/uc?id=1_TXUkDcofYz9mJd2L1ajAoyIBCQF29WL&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3123MB |
| v.1.5.3.8a | mac | ONNX(cpu), PyTorch(cpu,mps) | [normal](https://drive.google.com/uc?id=1hg6lynE3wWJTNTParTa2qB2L06OL9KJ9&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 794MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1C9PCu8pdafO6jJ2yCaB7x54Ls7LcM0Xc&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3122MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1bzrGhHPc9GdaRAMxkksTGtbuRLEeBx9i&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3237MB |
| v.1.5.3.8 | mac | ONNX(cpu), PyTorch(cpu,mps) | [normal](https://drive.google.com/uc?id=1ptmjFCRDW7M0l80072JVRII5tJpF13__&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 794MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=19DfeACmpnzqCVH5bIoFunS2pGPABRuso&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3122MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1AYP_hMdoeacX0KiF31Vd3oEjxwdreSbM&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3237MB |
| v.1.5.3.7 | mac | ONNX(cpu), PyTorch(cpu,mps) | [normal](https://drive.google.com/uc?id=1HdJwgo0__vR6pAkOkekejUZJ0lu2NfDs&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 794MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1JIF4PvKg-8HNUv_fMaXSM3AeYa-F_c4z&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3237MB |
| | win | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1cJzRHmD3vk6av0Dvwj3v9Ef5KUsQYhKv&export=download), [hugging face](https://huggingface.co/wok000/vcclient000/tree/main) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC, DDSP-SVC | 3122MB |
(\*1) You can also download from [hugging_face](https://huggingface.co/wok000/vcclient000/tree/main) (\*1) You can also download from [hugging_face](https://huggingface.co/wok000/vcclient000/tree/main)
(\*2) The developer does not have an AMD graphics card, so it has not been tested. This package only includes onnxruntime-directml. (\*2) The developer does not have an AMD graphics card, so it has not been tested. This package only includes onnxruntime-directml.

View File

@ -0,0 +1 @@
onnxdirectML-cuda

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -11,8 +11,8 @@
"build:dev": "npm-run-all clean webpack:dev", "build:dev": "npm-run-all clean webpack:dev",
"start": "webpack-dev-server --config webpack.dev.js", "start": "webpack-dev-server --config webpack.dev.js",
"build:mod": "cd ../lib && npm run build:dev && cd - && cp -r ../lib/dist/* node_modules/@dannadori/voice-changer-client-js/dist/", "build:mod": "cd ../lib && npm run build:dev && cd - && cp -r ../lib/dist/* node_modules/@dannadori/voice-changer-client-js/dist/",
"build:mod_dos": "cd ../lib && npm run build:dev && cd ../demo && copy ../lib/dist/index.js node_modules/@dannadori/voice-changer-client-js/dist/", "build:mod_dos": "cd ../lib && npm run build:dev && cd ../demo && npm-run-all build:mod_copy",
"build:mod_dos2": "copy ../lib/dist/index.js node_modules/@dannadori/voice-changer-client-js/dist/", "build:mod_copy": "XCOPY ..\\lib\\dist\\* .\\node_modules\\@dannadori\\voice-changer-client-js\\dist\\* /s /e /h /y",
"test": "echo \"Error: no test specified\" && exit 1" "test": "echo \"Error: no test specified\" && exit 1"
}, },
"keywords": [ "keywords": [
@ -26,17 +26,17 @@
"@babel/preset-env": "^7.22.9", "@babel/preset-env": "^7.22.9",
"@babel/preset-react": "^7.22.5", "@babel/preset-react": "^7.22.5",
"@babel/preset-typescript": "^7.22.5", "@babel/preset-typescript": "^7.22.5",
"@types/node": "^20.4.5", "@types/node": "^20.4.6",
"@types/react": "^18.2.17", "@types/react": "^18.2.18",
"@types/react-dom": "^18.2.7", "@types/react-dom": "^18.2.7",
"autoprefixer": "^10.4.14", "autoprefixer": "^10.4.14",
"babel-loader": "^9.1.3", "babel-loader": "^9.1.3",
"copy-webpack-plugin": "^11.0.0", "copy-webpack-plugin": "^11.0.0",
"css-loader": "^6.8.1", "css-loader": "^6.8.1",
"eslint": "^8.45.0", "eslint": "^8.46.0",
"eslint-config-prettier": "^8.8.0", "eslint-config-prettier": "^8.9.0",
"eslint-plugin-prettier": "^5.0.0", "eslint-plugin-prettier": "^5.0.0",
"eslint-plugin-react": "^7.33.0", "eslint-plugin-react": "^7.33.1",
"eslint-webpack-plugin": "^4.0.1", "eslint-webpack-plugin": "^4.0.1",
"html-loader": "^4.2.0", "html-loader": "^4.2.0",
"html-webpack-plugin": "^5.5.3", "html-webpack-plugin": "^5.5.3",
@ -54,11 +54,11 @@
"webpack-dev-server": "^4.15.1" "webpack-dev-server": "^4.15.1"
}, },
"dependencies": { "dependencies": {
"@dannadori/voice-changer-client-js": "^1.0.164", "@dannadori/voice-changer-client-js": "^1.0.166",
"@fortawesome/fontawesome-svg-core": "^6.4.0", "@fortawesome/fontawesome-svg-core": "^6.4.2",
"@fortawesome/free-brands-svg-icons": "^6.4.0", "@fortawesome/free-brands-svg-icons": "^6.4.2",
"@fortawesome/free-regular-svg-icons": "^6.4.0", "@fortawesome/free-regular-svg-icons": "^6.4.2",
"@fortawesome/free-solid-svg-icons": "^6.4.0", "@fortawesome/free-solid-svg-icons": "^6.4.2",
"@fortawesome/react-fontawesome": "^0.2.0", "@fortawesome/react-fontawesome": "^0.2.0",
"protobufjs": "^7.2.4", "protobufjs": "^7.2.4",
"react": "^18.2.0", "react": "^18.2.0",

View File

@ -0,0 +1 @@
onnxdirectML-cuda

View File

@ -63,6 +63,7 @@ type GuiStateAndMethod = {
outputAudioDeviceInfo: MediaDeviceInfo[]; outputAudioDeviceInfo: MediaDeviceInfo[];
audioInputForGUI: string; audioInputForGUI: string;
audioOutputForGUI: string; audioOutputForGUI: string;
audioMonitorForGUI: string;
fileInputEchoback: boolean | undefined; fileInputEchoback: boolean | undefined;
shareScreenEnabled: boolean; shareScreenEnabled: boolean;
audioOutputForAnalyzer: string; audioOutputForAnalyzer: string;
@ -70,6 +71,7 @@ type GuiStateAndMethod = {
setOutputAudioDeviceInfo: (val: MediaDeviceInfo[]) => void; setOutputAudioDeviceInfo: (val: MediaDeviceInfo[]) => void;
setAudioInputForGUI: (val: string) => void; setAudioInputForGUI: (val: string) => void;
setAudioOutputForGUI: (val: string) => void; setAudioOutputForGUI: (val: string) => void;
setAudioMonitorForGUI: (val: string) => void;
setFileInputEchoback: (val: boolean) => void; setFileInputEchoback: (val: boolean) => void;
setShareScreenEnabled: (val: boolean) => void; setShareScreenEnabled: (val: boolean) => void;
setAudioOutputForAnalyzer: (val: string) => void; setAudioOutputForAnalyzer: (val: string) => void;
@ -106,6 +108,7 @@ export const GuiStateProvider = ({ children }: Props) => {
const [outputAudioDeviceInfo, setOutputAudioDeviceInfo] = useState<MediaDeviceInfo[]>([]); const [outputAudioDeviceInfo, setOutputAudioDeviceInfo] = useState<MediaDeviceInfo[]>([]);
const [audioInputForGUI, setAudioInputForGUI] = useState<string>("none"); const [audioInputForGUI, setAudioInputForGUI] = useState<string>("none");
const [audioOutputForGUI, setAudioOutputForGUI] = useState<string>("none"); const [audioOutputForGUI, setAudioOutputForGUI] = useState<string>("none");
const [audioMonitorForGUI, setAudioMonitorForGUI] = useState<string>("none");
const [fileInputEchoback, setFileInputEchoback] = useState<boolean>(false); //最初のmuteが有効になるように。undefined <-- ??? falseしておけばよさそう。undefinedだとwarningがでる。 const [fileInputEchoback, setFileInputEchoback] = useState<boolean>(false); //最初のmuteが有効になるように。undefined <-- ??? falseしておけばよさそう。undefinedだとwarningがでる。
const [shareScreenEnabled, setShareScreenEnabled] = useState<boolean>(false); const [shareScreenEnabled, setShareScreenEnabled] = useState<boolean>(false);
const [audioOutputForAnalyzer, setAudioOutputForAnalyzer] = useState<string>("default"); const [audioOutputForAnalyzer, setAudioOutputForAnalyzer] = useState<string>("default");
@ -270,6 +273,7 @@ export const GuiStateProvider = ({ children }: Props) => {
outputAudioDeviceInfo, outputAudioDeviceInfo,
audioInputForGUI, audioInputForGUI,
audioOutputForGUI, audioOutputForGUI,
audioMonitorForGUI,
fileInputEchoback, fileInputEchoback,
shareScreenEnabled, shareScreenEnabled,
audioOutputForAnalyzer, audioOutputForAnalyzer,
@ -277,6 +281,7 @@ export const GuiStateProvider = ({ children }: Props) => {
setOutputAudioDeviceInfo, setOutputAudioDeviceInfo,
setAudioInputForGUI, setAudioInputForGUI,
setAudioOutputForGUI, setAudioOutputForGUI,
setAudioMonitorForGUI,
setFileInputEchoback, setFileInputEchoback,
setShareScreenEnabled, setShareScreenEnabled,
setAudioOutputForAnalyzer, setAudioOutputForAnalyzer,

View File

@ -19,7 +19,7 @@ export const MainScreen = (props: MainScreenProps) => {
const guiState = useGuiState(); const guiState = useGuiState();
const messageBuilderState = useMessageBuilder(); const messageBuilderState = useMessageBuilder();
useMemo(() => { useMemo(() => {
messageBuilderState.setMessage(__filename, "change_icon", { ja: "アイコン変更", en: "chage icon" }); messageBuilderState.setMessage(__filename, "change_icon", { ja: "アイコン変更", en: "change icon" });
messageBuilderState.setMessage(__filename, "rename", { ja: "リネーム", en: "rename" }); messageBuilderState.setMessage(__filename, "rename", { ja: "リネーム", en: "rename" });
messageBuilderState.setMessage(__filename, "download", { ja: "ダウンロード", en: "download" }); messageBuilderState.setMessage(__filename, "download", { ja: "ダウンロード", en: "download" });
messageBuilderState.setMessage(__filename, "terms_of_use", { ja: "利用規約", en: "terms of use" }); messageBuilderState.setMessage(__filename, "terms_of_use", { ja: "利用規約", en: "terms of use" });
@ -99,7 +99,7 @@ export const MainScreen = (props: MainScreenProps) => {
const slotRow = serverSetting.serverSetting.modelSlots.map((x, index) => { const slotRow = serverSetting.serverSetting.modelSlots.map((x, index) => {
// モデルのアイコン // モデルのアイコン
const generateIconArea = (slotIndex: number, iconUrl: string, tooltip: boolean) => { const generateIconArea = (slotIndex: number, iconUrl: string, tooltip: boolean) => {
const realIconUrl = iconUrl.length > 0 ? iconUrl : "/assets/icons/noimage.png"; const realIconUrl = iconUrl.length > 0 ? serverSetting.serverSetting.voiceChangerParams.model_dir + "/" + slotIndex + "/" + iconUrl.split(/[\/\\]/).pop() : "/assets/icons/noimage.png";
const iconDivClass = tooltip ? "tooltip" : ""; const iconDivClass = tooltip ? "tooltip" : "";
const iconClass = tooltip ? "model-slot-icon-pointable" : "model-slot-icon"; const iconClass = tooltip ? "model-slot-icon-pointable" : "model-slot-icon";
return ( return (

View File

@ -3,158 +3,182 @@ import { useGuiState } from "./001_GuiStateProvider";
import { useAppState } from "../../001_provider/001_AppStateProvider"; import { useAppState } from "../../001_provider/001_AppStateProvider";
import { MergeElement, RVCModelSlot, RVCModelType, VoiceChangerType } from "@dannadori/voice-changer-client-js"; import { MergeElement, RVCModelSlot, RVCModelType, VoiceChangerType } from "@dannadori/voice-changer-client-js";
export const MergeLabDialog = () => { export const MergeLabDialog = () => {
const guiState = useGuiState() const guiState = useGuiState();
const { serverSetting } = useAppState() const { serverSetting } = useAppState();
const [currentFilter, setCurrentFilter] = useState<string>("") const [currentFilter, setCurrentFilter] = useState<string>("");
const [mergeElements, setMergeElements] = useState<MergeElement[]>([]) const [mergeElements, setMergeElements] = useState<MergeElement[]>([]);
// スロットが変更されたときの初期化処理 // スロットが変更されたときの初期化処理
const newSlotChangeKey = useMemo(() => { const newSlotChangeKey = useMemo(() => {
if (!serverSetting.serverSetting.modelSlots) { if (!serverSetting.serverSetting.modelSlots) {
return "" return "";
} }
return serverSetting.serverSetting.modelSlots.reduce((prev, cur) => { return serverSetting.serverSetting.modelSlots.reduce((prev, cur) => {
return prev + "_" + cur.modelFile return prev + "_" + cur.modelFile;
}, "") }, "");
}, [serverSetting.serverSetting.modelSlots]) }, [serverSetting.serverSetting.modelSlots]);
const filterItems = useMemo(() => { const filterItems = useMemo(() => {
return serverSetting.serverSetting.modelSlots.reduce((prev, cur) => { return serverSetting.serverSetting.modelSlots.reduce(
if (cur.voiceChangerType != "RVC") { (prev, cur) => {
return prev if (cur.voiceChangerType != "RVC") {
} return prev;
const curRVC = cur as RVCModelSlot }
const key = `${curRVC.modelType},${cur.samplingRate},${curRVC.embChannels}` const curRVC = cur as RVCModelSlot;
const val = { type: curRVC.modelType, samplingRate: cur.samplingRate, embChannels: curRVC.embChannels } const key = `${curRVC.modelType},${cur.samplingRate},${curRVC.embChannels}`;
const existKeys = Object.keys(prev) const val = { type: curRVC.modelType, samplingRate: cur.samplingRate, embChannels: curRVC.embChannels };
if (!cur.modelFile || cur.modelFile.length == 0) { const existKeys = Object.keys(prev);
return prev if (!cur.modelFile || cur.modelFile.length == 0) {
} return prev;
if (curRVC.modelType == "onnxRVC" || curRVC.modelType == "onnxRVCNono") { }
return prev if (curRVC.modelType == "onnxRVC" || curRVC.modelType == "onnxRVCNono") {
} return prev;
if (!existKeys.includes(key)) { }
prev[key] = val if (!existKeys.includes(key)) {
} prev[key] = val;
return prev }
}, {} as { [key: string]: { type: RVCModelType, samplingRate: number, embChannels: number } }) return prev;
},
}, [newSlotChangeKey]) {} as { [key: string]: { type: RVCModelType; samplingRate: number; embChannels: number } },
);
}, [newSlotChangeKey]);
const models = useMemo(() => { const models = useMemo(() => {
return serverSetting.serverSetting.modelSlots.filter(x => { return serverSetting.serverSetting.modelSlots.filter((x) => {
if (x.voiceChangerType != "RVC") { if (x.voiceChangerType != "RVC") {
return return;
} }
const xRVC = x as RVCModelSlot const xRVC = x as RVCModelSlot;
const filterVals = filterItems[currentFilter] const filterVals = filterItems[currentFilter];
if (!filterVals) { if (!filterVals) {
return false return false;
} }
if (xRVC.modelType == filterVals.type && xRVC.samplingRate == filterVals.samplingRate && xRVC.embChannels == filterVals.embChannels) { if (xRVC.modelType == filterVals.type && xRVC.samplingRate == filterVals.samplingRate && xRVC.embChannels == filterVals.embChannels) {
return true return true;
} else { } else {
return false return false;
} }
}) });
}, [filterItems, currentFilter]) }, [filterItems, currentFilter]);
useEffect(() => { useEffect(() => {
if (Object.keys(filterItems).length > 0) { if (Object.keys(filterItems).length > 0) {
setCurrentFilter(Object.keys(filterItems)[0]) setCurrentFilter(Object.keys(filterItems)[0]);
} }
}, [filterItems]) }, [filterItems]);
useEffect(() => { useEffect(() => {
// models はフィルタ後の配列
const newMergeElements = models.map((x) => { const newMergeElements = models.map((x) => {
return { filename: x.modelFile, strength: 0 } return { slotIndex: x.slotIndex, filename: x.modelFile, strength: 0 };
}) });
setMergeElements(newMergeElements) setMergeElements(newMergeElements);
}, [models]) }, [models]);
const dialog = useMemo(() => { const dialog = useMemo(() => {
const closeButtonRow = ( const closeButtonRow = (
<div className="body-row split-3-4-3 left-padding-1"> <div className="body-row split-3-4-3 left-padding-1">
<div className="body-item-text"> <div className="body-item-text"></div>
</div>
<div className="body-button-container body-button-container-space-around"> <div className="body-button-container body-button-container-space-around">
<div className="body-button" onClick={() => { guiState.stateControls.showMergeLabCheckbox.updateState(false) }} >close</div> <div
className="body-button"
onClick={() => {
guiState.stateControls.showMergeLabCheckbox.updateState(false);
}}
>
close
</div>
</div> </div>
<div className="body-item-text"></div> <div className="body-item-text"></div>
</div> </div>
) );
const filterOptions = Object.keys(filterItems)
const filterOptions = Object.keys(filterItems).map(x => { .map((x) => {
return <option key={x} value={x}>{x}</option> return (
}).filter(x => x != null) <option key={x} value={x}>
{x}
const onMergeElementsChanged = (filename: string, strength: number) => { </option>
const newMergeElements = mergeElements.map((x) => { );
if (x.filename == filename) {
return { filename: x.filename, strength: strength }
} else {
return x
}
}) })
setMergeElements(newMergeElements) .filter((x) => x != null);
}
const onMergeElementsChanged = (slotIndex: number, strength: number) => {
const newMergeElements = mergeElements.map((x) => {
if (x.slotIndex == slotIndex) {
return { slotIndex: x.slotIndex, strength: strength };
} else {
return x;
}
});
setMergeElements(newMergeElements);
};
const onMergeClicked = () => { const onMergeClicked = () => {
const validMergeElements = mergeElements.filter((x) => {
return x.strength > 0;
});
serverSetting.mergeModel({ serverSetting.mergeModel({
voiceChangerType: VoiceChangerType.RVC, voiceChangerType: VoiceChangerType.RVC,
command: "mix", command: "mix",
files: mergeElements files: validMergeElements,
}) });
} };
const modelList = mergeElements.map((x, index) => { const modelList = mergeElements.map((x, index) => {
const name = models.find(model => { return model.modelFile == x.filename })?.name || "" const name =
models.find((model) => {
return model.slotIndex == x.slotIndex;
})?.name || "";
return ( return (
<div key={index} className="merge-lab-model-item"> <div key={index} className="merge-lab-model-item">
<div>{name}</div>
<div> <div>
{name} <input
</div> type="range"
<div> className="body-item-input-slider"
<input type="range" className="body-item-input-slider" min="0" max="100" step="1" value={x.strength} onChange={(e) => { min="0"
onMergeElementsChanged(x.filename, Number(e.target.value)) max="100"
}}></input> step="1"
value={x.strength}
onChange={(e) => {
onMergeElementsChanged(x.slotIndex, Number(e.target.value));
}}
></input>
<span className="body-item-input-slider-val">{x.strength}</span> <span className="body-item-input-slider-val">{x.strength}</span>
</div> </div>
</div> </div>
) );
}) });
const content = ( const content = (
<div className="merge-lab-container"> <div className="merge-lab-container">
<div className="merge-lab-type-filter"> <div className="merge-lab-type-filter">
<div>Type:</div>
<div> <div>
Type: <select
</div> value={currentFilter}
<div> onChange={(e) => {
<select value={currentFilter} onChange={(e) => { setCurrentFilter(e.target.value) }}> setCurrentFilter(e.target.value);
}}
>
{filterOptions} {filterOptions}
</select> </select>
</div> </div>
</div> </div>
<div className="merge-lab-manipulator"> <div className="merge-lab-manipulator">
<div className="merge-lab-model-list"> <div className="merge-lab-model-list">{modelList}</div>
{modelList}
</div>
<div className="merge-lab-merge-buttons"> <div className="merge-lab-merge-buttons">
<div className="merge-lab-merge-buttons-notice"> <div className="merge-lab-merge-buttons-notice">The merged model is stored in the final slot. If you assign this slot, it will be overwritten.</div>
The merged model is stored in the final slot. If you assign this slot, it will be overwritten.
</div>
<div className="merge-lab-merge-button" onClick={onMergeClicked}> <div className="merge-lab-merge-button" onClick={onMergeClicked}>
merge merge
</div> </div>
</div> </div>
</div> </div>
</div> </div>
) );
return ( return (
<div className="dialog-frame"> <div className="dialog-frame">
<div className="dialog-title">MergeLab</div> <div className="dialog-title">MergeLab</div>
@ -166,5 +190,4 @@ export const MergeLabDialog = () => {
); );
}, [newSlotChangeKey, currentFilter, mergeElements, models]); }, [newSlotChangeKey, currentFilter, mergeElements, models]);
return dialog; return dialog;
}; };

View File

@ -1,83 +1,115 @@
import React, { useMemo } from "react" import React, { useMemo, useState } from "react";
import { useAppState } from "../../../001_provider/001_AppStateProvider" import { useAppState } from "../../../001_provider/001_AppStateProvider";
import { useGuiState } from "../001_GuiStateProvider" import { useGuiState } from "../001_GuiStateProvider";
import { useMessageBuilder } from "../../../hooks/useMessageBuilder" import { useMessageBuilder } from "../../../hooks/useMessageBuilder";
import { FontAwesomeIcon } from "@fortawesome/react-fontawesome";
export type ModelSlotAreaProps = { export type ModelSlotAreaProps = {};
}
const SortTypes = {
slot: "slot",
name: "name",
} as const;
export type SortTypes = (typeof SortTypes)[keyof typeof SortTypes];
export const ModelSlotArea = (_props: ModelSlotAreaProps) => { export const ModelSlotArea = (_props: ModelSlotAreaProps) => {
const { serverSetting, getInfo } = useAppState() const { serverSetting, getInfo } = useAppState();
const guiState = useGuiState() const guiState = useGuiState();
const messageBuilderState = useMessageBuilder() const messageBuilderState = useMessageBuilder();
const [sortType, setSortType] = useState<SortTypes>("slot");
useMemo(() => { useMemo(() => {
messageBuilderState.setMessage(__filename, "edit", { "ja": "編集", "en": "edit" }) messageBuilderState.setMessage(__filename, "edit", { ja: "編集", en: "edit" });
}, []) }, []);
const modelTiles = useMemo(() => { const modelTiles = useMemo(() => {
if (!serverSetting.serverSetting.modelSlots) { if (!serverSetting.serverSetting.modelSlots) {
return [] return [];
} }
return serverSetting.serverSetting.modelSlots.map((x, index) => { const modelSlots =
if (!x.modelFile || x.modelFile.length == 0) { sortType == "slot"
return null ? serverSetting.serverSetting.modelSlots
} : serverSetting.serverSetting.modelSlots.slice().sort((a, b) => {
const tileContainerClass = index == serverSetting.serverSetting.modelSlotIndex ? "model-slot-tile-container-selected" : "model-slot-tile-container" return a.name.localeCompare(b.name);
const name = x.name.length > 8 ? x.name.substring(0, 7) + "..." : x.name });
const iconElem = x.iconFile.length > 0 ?
<>
<img className="model-slot-tile-icon" src={x.iconFile} alt={x.name} />
<div className="model-slot-tile-vctype">{x.voiceChangerType}</div>
</>
:
<>
<div className="model-slot-tile-icon-no-entry">no image</div>
<div className="model-slot-tile-vctype">{x.voiceChangerType}</div>
</>
const clickAction = async () => { return modelSlots
const dummyModelSlotIndex = (Math.floor(Date.now() / 1000)) * 1000 + index .map((x, index) => {
await serverSetting.updateServerSettings({ ...serverSetting.serverSetting, modelSlotIndex: dummyModelSlotIndex }) if (!x.modelFile || x.modelFile.length == 0) {
setTimeout(() => { // quick hack return null;
getInfo() }
}, 1000 * 2) const tileContainerClass = x.slotIndex == serverSetting.serverSetting.modelSlotIndex ? "model-slot-tile-container-selected" : "model-slot-tile-container";
} const name = x.name.length > 8 ? x.name.substring(0, 7) + "..." : x.name;
return ( const iconElem =
<div key={index} className={tileContainerClass} onClick={clickAction}> x.iconFile.length > 0 ? (
<div className="model-slot-tile-icon-div"> <>
{iconElem} <img className="model-slot-tile-icon" src={serverSetting.serverSetting.voiceChangerParams.model_dir + "/" + x.slotIndex + "/" + x.iconFile.split(/[\/\\]/).pop()} alt={x.name} />
<div className="model-slot-tile-vctype">{x.voiceChangerType}</div>
</>
) : (
<>
<div className="model-slot-tile-icon-no-entry">no image</div>
<div className="model-slot-tile-vctype">{x.voiceChangerType}</div>
</>
);
const clickAction = async () => {
const dummyModelSlotIndex = Math.floor(Date.now() / 1000) * 1000 + x.slotIndex;
await serverSetting.updateServerSettings({ ...serverSetting.serverSetting, modelSlotIndex: dummyModelSlotIndex });
setTimeout(() => {
// quick hack
getInfo();
}, 1000 * 2);
};
return (
<div key={index} className={tileContainerClass} onClick={clickAction}>
<div className="model-slot-tile-icon-div">{iconElem}</div>
<div className="model-slot-tile-dscription">{name}</div>
</div> </div>
<div className="model-slot-tile-dscription"> );
{name} })
</div> .filter((x) => x != null);
</div > }, [serverSetting.serverSetting.modelSlots, serverSetting.serverSetting.modelSlotIndex, sortType]);
)
}).filter(x => x != null)
}, [serverSetting.serverSetting.modelSlots, serverSetting.serverSetting.modelSlotIndex])
const modelSlotArea = useMemo(() => { const modelSlotArea = useMemo(() => {
const onModelSlotEditClicked = () => { const onModelSlotEditClicked = () => {
guiState.stateControls.showModelSlotManagerCheckbox.updateState(true) guiState.stateControls.showModelSlotManagerCheckbox.updateState(true);
} };
const sortSlotByIdClass = sortType == "slot" ? "model-slot-sort-button-active" : "model-slot-sort-button";
const sortSlotByNameClass = sortType == "name" ? "model-slot-sort-button-active" : "model-slot-sort-button";
return ( return (
<div className="model-slot-area"> <div className="model-slot-area">
<div className="model-slot-panel"> <div className="model-slot-panel">
<div className="model-slot-tiles-container">{modelTiles}</div> <div className="model-slot-tiles-container">{modelTiles}</div>
<div className="model-slot-buttons"> <div className="model-slot-buttons">
<div className="model-slot-sort-buttons">
<div
className={sortSlotByIdClass}
onClick={() => {
setSortType("slot");
}}
>
<FontAwesomeIcon icon={["fas", "arrow-down-1-9"]} style={{ fontSize: "1rem" }} />
</div>
<div
className={sortSlotByNameClass}
onClick={() => {
setSortType("name");
}}
>
<FontAwesomeIcon icon={["fas", "arrow-down-a-z"]} style={{ fontSize: "1rem" }} />
</div>
</div>
<div className="model-slot-button" onClick={onModelSlotEditClicked}> <div className="model-slot-button" onClick={onModelSlotEditClicked}>
{messageBuilderState.getMessage(__filename, "edit")} {messageBuilderState.getMessage(__filename, "edit")}
</div> </div>
</div> </div>
</div> </div>
</div> </div>
) );
}, [modelTiles]) }, [modelTiles, sortType]);
return modelSlotArea return modelSlotArea;
} };

View File

@ -23,6 +23,26 @@ export const DiffusionSVCSettingArea = (_props: DiffusionSVCSettingAreaProps) =>
return <></>; return <></>;
} }
const skipDiffusionClass = serverSetting.serverSetting.skipDiffusion == 0 ? "character-area-toggle-button" : "character-area-toggle-button-active";
const skipDiffRow = (
<div className="character-area-control">
<div className="character-area-control-title">Boost</div>
<div className="character-area-control-field">
<div className="character-area-buttons">
<div
className={skipDiffusionClass}
onClick={() => {
serverSetting.updateServerSettings({ ...serverSetting.serverSetting, skipDiffusion: serverSetting.serverSetting.skipDiffusion == 0 ? 1 : 0 });
}}
>
skip diff
</div>
</div>
</div>
</div>
);
const skipValues = getDivisors(serverSetting.serverSetting.kStep); const skipValues = getDivisors(serverSetting.serverSetting.kStep);
skipValues.pop(); skipValues.pop();
@ -82,6 +102,7 @@ export const DiffusionSVCSettingArea = (_props: DiffusionSVCSettingAreaProps) =>
); );
return ( return (
<> <>
{skipDiffRow}
{kStepRow} {kStepRow}
{speedUpRow} {speedUpRow}
</> </>

View File

@ -49,7 +49,7 @@ export const CharacterArea = (_props: CharacterAreaProps) => {
return <></>; return <></>;
} }
const icon = selected.iconFile.length > 0 ? selected.iconFile : "./assets/icons/human.png"; const icon = selected.iconFile.length > 0 ? serverSetting.serverSetting.voiceChangerParams.model_dir + "/" + selected.slotIndex + "/" + selected.iconFile.split(/[\/\\]/).pop() : "./assets/icons/human.png";
const selectedTermOfUseUrlLink = selected.termsOfUseUrl ? ( const selectedTermOfUseUrlLink = selected.termsOfUseUrl ? (
<a href={selected.termsOfUseUrl} target="_blank" rel="noopener noreferrer" className="portrait-area-terms-of-use-link"> <a href={selected.termsOfUseUrl} target="_blank" rel="noopener noreferrer" className="portrait-area-terms-of-use-link">
[{messageBuilderState.getMessage(__filename, "terms_of_use")}] [{messageBuilderState.getMessage(__filename, "terms_of_use")}]
@ -122,9 +122,13 @@ export const CharacterArea = (_props: CharacterAreaProps) => {
serverSetting.updateServerSettings({ ...serverSetting.serverSetting, serverAudioStated: 0 }); serverSetting.updateServerSettings({ ...serverSetting.serverSetting, serverAudioStated: 0 });
} }
}; };
const onPassThroughClicked = async () => {
serverSetting.updateServerSettings({ ...serverSetting.serverSetting, passThrough: !serverSetting.serverSetting.passThrough });
};
const startClassName = guiState.isConverting ? "character-area-control-button-active" : "character-area-control-button-stanby"; const startClassName = guiState.isConverting ? "character-area-control-button-active" : "character-area-control-button-stanby";
const stopClassName = guiState.isConverting ? "character-area-control-button-stanby" : "character-area-control-button-active"; const stopClassName = guiState.isConverting ? "character-area-control-button-stanby" : "character-area-control-button-active";
const passThruClassName = serverSetting.serverSetting.passThrough == false ? "character-area-control-passthru-button-stanby" : "character-area-control-passthru-button-active blinking";
console.log("serverSetting.serverSetting.passThrough", passThruClassName, serverSetting.serverSetting.passThrough);
return ( return (
<div className="character-area-control"> <div className="character-area-control">
<div className="character-area-control-buttons"> <div className="character-area-control-buttons">
@ -134,6 +138,9 @@ export const CharacterArea = (_props: CharacterAreaProps) => {
<div onClick={onStopClicked} className={stopClassName}> <div onClick={onStopClicked} className={stopClassName}>
stop stop
</div> </div>
<div onClick={onPassThroughClicked} className={passThruClassName}>
passthru
</div>
</div> </div>
</div> </div>
); );

View File

@ -41,68 +41,75 @@ export const ConvertArea = (props: ConvertProps) => {
const gpuSelect = const gpuSelect =
edition.indexOf("onnxdirectML-cuda") >= 0 ? ( edition.indexOf("onnxdirectML-cuda") >= 0 ? (
<div className="config-sub-area-control"> <>
<div className="config-sub-area-control-title">GPU(dml):</div> <div className="config-sub-area-control">
<div className="config-sub-area-control-field"> <div className="config-sub-area-control-title">GPU(dml):</div>
<div className="config-sub-area-buttons"> <div className="config-sub-area-control-field">
<div <div className="config-sub-area-buttons">
onClick={async () => { <div
await serverSetting.updateServerSettings({ onClick={async () => {
...serverSetting.serverSetting, await serverSetting.updateServerSettings({
gpu: -1, ...serverSetting.serverSetting,
}); gpu: -1,
}} });
className={cpuClassName} }}
> className={cpuClassName}
cpu >
</div> <span className="config-sub-area-button-text-small">cpu</span>
<div </div>
onClick={async () => { <div
await serverSetting.updateServerSettings({ onClick={async () => {
...serverSetting.serverSetting, await serverSetting.updateServerSettings({
gpu: 0, ...serverSetting.serverSetting,
}); gpu: 0,
}} });
className={gpu0ClassName} }}
> className={gpu0ClassName}
0 >
</div> <span className="config-sub-area-button-text-small">gpu0</span>
<div </div>
onClick={async () => { <div
await serverSetting.updateServerSettings({ onClick={async () => {
...serverSetting.serverSetting, await serverSetting.updateServerSettings({
gpu: 1, ...serverSetting.serverSetting,
}); gpu: 1,
}} });
className={gpu1ClassName} }}
> className={gpu1ClassName}
1 >
</div> <span className="config-sub-area-button-text-small">gpu1</span>
<div </div>
onClick={async () => { <div
await serverSetting.updateServerSettings({ onClick={async () => {
...serverSetting.serverSetting, await serverSetting.updateServerSettings({
gpu: 2, ...serverSetting.serverSetting,
}); gpu: 2,
}} });
className={gpu2ClassName} }}
> className={gpu2ClassName}
2 >
</div> <span className="config-sub-area-button-text-small">gpu2</span>
<div </div>
onClick={async () => { <div
await serverSetting.updateServerSettings({ onClick={async () => {
...serverSetting.serverSetting, await serverSetting.updateServerSettings({
gpu: 3, ...serverSetting.serverSetting,
}); gpu: 3,
}} });
className={gpu3ClassName} }}
> className={gpu3ClassName}
3 >
<span className="config-sub-area-button-text-small">gpu3</span>
</div>
<div className="config-sub-area-control">
<span className="config-sub-area-button-text-small">
<a href="https://github.com/w-okada/voice-changer/issues/410">more info</a>
</span>
</div>
</div> </div>
</div> </div>
</div> </div>
</div> </>
) : ( ) : (
<div className="config-sub-area-control"> <div className="config-sub-area-control">
<div className="config-sub-area-control-title">GPU:</div> <div className="config-sub-area-control-title">GPU:</div>

View File

@ -2,14 +2,14 @@ import React, { useEffect, useMemo, useRef, useState } from "react";
import { useAppState } from "../../../001_provider/001_AppStateProvider"; import { useAppState } from "../../../001_provider/001_AppStateProvider";
import { fileSelectorAsDataURL, useIndexedDB } from "@dannadori/voice-changer-client-js"; import { fileSelectorAsDataURL, useIndexedDB } from "@dannadori/voice-changer-client-js";
import { useGuiState } from "../001_GuiStateProvider"; import { useGuiState } from "../001_GuiStateProvider";
import { AUDIO_ELEMENT_FOR_PLAY_RESULT, AUDIO_ELEMENT_FOR_TEST_CONVERTED, AUDIO_ELEMENT_FOR_TEST_CONVERTED_ECHOBACK, AUDIO_ELEMENT_FOR_TEST_ORIGINAL, INDEXEDDB_KEY_AUDIO_OUTPUT } from "../../../const"; import { AUDIO_ELEMENT_FOR_PLAY_MONITOR, AUDIO_ELEMENT_FOR_PLAY_RESULT, AUDIO_ELEMENT_FOR_TEST_CONVERTED, AUDIO_ELEMENT_FOR_TEST_CONVERTED_ECHOBACK, AUDIO_ELEMENT_FOR_TEST_ORIGINAL, INDEXEDDB_KEY_AUDIO_MONITR, INDEXEDDB_KEY_AUDIO_OUTPUT } from "../../../const";
import { isDesktopApp } from "../../../const"; import { isDesktopApp } from "../../../const";
export type DeviceAreaProps = {}; export type DeviceAreaProps = {};
export const DeviceArea = (_props: DeviceAreaProps) => { export const DeviceArea = (_props: DeviceAreaProps) => {
const { setting, serverSetting, audioContext, setAudioOutputElementId, initializedRef, setVoiceChangerClientSetting, startOutputRecording, stopOutputRecording } = useAppState(); const { setting, serverSetting, audioContext, setAudioOutputElementId, setAudioMonitorElementId, initializedRef, setVoiceChangerClientSetting, startOutputRecording, stopOutputRecording } = useAppState();
const { isConverting, audioInputForGUI, inputAudioDeviceInfo, setAudioInputForGUI, fileInputEchoback, setFileInputEchoback, setAudioOutputForGUI, audioOutputForGUI, outputAudioDeviceInfo, shareScreenEnabled, setShareScreenEnabled } = useGuiState(); const { isConverting, audioInputForGUI, inputAudioDeviceInfo, setAudioInputForGUI, fileInputEchoback, setFileInputEchoback, setAudioOutputForGUI, setAudioMonitorForGUI, audioOutputForGUI, audioMonitorForGUI, outputAudioDeviceInfo, shareScreenEnabled, setShareScreenEnabled } = useGuiState();
const [inputHostApi, setInputHostApi] = useState<string>("ALL"); const [inputHostApi, setInputHostApi] = useState<string>("ALL");
const [outputHostApi, setOutputHostApi] = useState<string>("ALL"); const [outputHostApi, setOutputHostApi] = useState<string>("ALL");
const [monitorHostApi, setMonitorHostApi] = useState<string>("ALL"); const [monitorHostApi, setMonitorHostApi] = useState<string>("ALL");
@ -244,10 +244,10 @@ export const DeviceArea = (_props: DeviceAreaProps) => {
audio_echo.volume = 0; audio_echo.volume = 0;
setFileInputEchoback(false); setFileInputEchoback(false);
// original stream to play. // // original stream to play.
const audio_org = document.getElementById(AUDIO_ELEMENT_FOR_TEST_ORIGINAL) as HTMLAudioElement; // const audio_org = document.getElementById(AUDIO_ELEMENT_FOR_TEST_ORIGINAL) as HTMLAudioElement;
audio_org.src = url; // audio_org.src = url;
audio_org.pause(); // audio_org.pause();
}; };
const echobackClass = fileInputEchoback ? "config-sub-area-control-field-wav-file-echoback-button-active" : "config-sub-area-control-field-wav-file-echoback-button"; const echobackClass = fileInputEchoback ? "config-sub-area-control-field-wav-file-echoback-button-active" : "config-sub-area-control-field-wav-file-echoback-button";
@ -256,7 +256,7 @@ export const DeviceArea = (_props: DeviceAreaProps) => {
<div className="config-sub-area-control-field"> <div className="config-sub-area-control-field">
<div className="config-sub-area-control-field-wav-file left-padding-1"> <div className="config-sub-area-control-field-wav-file left-padding-1">
<div className="config-sub-area-control-field-wav-file-audio-container"> <div className="config-sub-area-control-field-wav-file-audio-container">
<audio id={AUDIO_ELEMENT_FOR_TEST_ORIGINAL} controls hidden></audio> {/* <audio id={AUDIO_ELEMENT_FOR_TEST_ORIGINAL} controls hidden></audio> */}
<audio className="config-sub-area-control-field-wav-file-audio" id={AUDIO_ELEMENT_FOR_TEST_CONVERTED} controls controlsList="nodownload noplaybackrate"></audio> <audio className="config-sub-area-control-field-wav-file-audio" id={AUDIO_ELEMENT_FOR_TEST_CONVERTED} controls controlsList="nodownload noplaybackrate"></audio>
<audio id={AUDIO_ELEMENT_FOR_TEST_CONVERTED_ECHOBACK} controls hidden></audio> <audio id={AUDIO_ELEMENT_FOR_TEST_CONVERTED_ECHOBACK} controls hidden></audio>
</div> </div>
@ -381,7 +381,8 @@ export const DeviceArea = (_props: DeviceAreaProps) => {
const setAudioOutput = async () => { const setAudioOutput = async () => {
const mediaDeviceInfos = await navigator.mediaDevices.enumerateDevices(); const mediaDeviceInfos = await navigator.mediaDevices.enumerateDevices();
[AUDIO_ELEMENT_FOR_PLAY_RESULT, AUDIO_ELEMENT_FOR_TEST_ORIGINAL, AUDIO_ELEMENT_FOR_TEST_CONVERTED_ECHOBACK].forEach((x) => { // [AUDIO_ELEMENT_FOR_PLAY_RESULT, AUDIO_ELEMENT_FOR_TEST_ORIGINAL, AUDIO_ELEMENT_FOR_TEST_CONVERTED_ECHOBACK].forEach((x) => {
[AUDIO_ELEMENT_FOR_PLAY_RESULT, AUDIO_ELEMENT_FOR_TEST_CONVERTED_ECHOBACK].forEach((x) => {
const audio = document.getElementById(x) as HTMLAudioElement; const audio = document.getElementById(x) as HTMLAudioElement;
if (audio) { if (audio) {
if (serverSetting.serverSetting.enableServerAudio == 1) { if (serverSetting.serverSetting.enableServerAudio == 1) {
@ -598,7 +599,88 @@ export const DeviceArea = (_props: DeviceAreaProps) => {
); );
}, [serverSetting.serverSetting, serverSetting.updateServerSettings, serverSetting.serverSetting.enableServerAudio]); }, [serverSetting.serverSetting, serverSetting.updateServerSettings, serverSetting.serverSetting.enableServerAudio]);
// (6) Monitor // (6) モニター
useEffect(() => {
const loadCache = async () => {
const key = await getItem(INDEXEDDB_KEY_AUDIO_MONITR);
if (key) {
setAudioMonitorForGUI(key as string);
}
};
loadCache();
}, []);
useEffect(() => {
const setAudioMonitor = async () => {
const mediaDeviceInfos = await navigator.mediaDevices.enumerateDevices();
[AUDIO_ELEMENT_FOR_PLAY_MONITOR].forEach((x) => {
const audio = document.getElementById(x) as HTMLAudioElement;
if (audio) {
if (serverSetting.serverSetting.enableServerAudio == 1) {
// Server Audio を使う場合はElementから音は出さない。
audio.volume = 0;
} else if (audioMonitorForGUI == "none") {
// @ts-ignore
audio.setSinkId("");
audio.volume = 0;
} else {
const audioOutputs = mediaDeviceInfos.filter((x) => {
return x.kind == "audiooutput";
});
const found = audioOutputs.some((x) => {
return x.deviceId == audioMonitorForGUI;
});
if (found) {
// @ts-ignore // 例外キャッチできないので事前にIDチェックが必要らしい。
audio.setSinkId(audioMonitorForGUI);
audio.volume = 1;
} else {
console.warn("No audio output device. use default");
}
}
}
});
};
setAudioMonitor();
}, [audioMonitorForGUI, serverSetting.serverSetting.enableServerAudio]);
// (6-1) クライアント
const clientMonitorRow = useMemo(() => {
if (serverSetting.serverSetting.enableServerAudio == 1) {
return <></>;
}
return (
<div className="config-sub-area-control">
<div className="config-sub-area-control-title left-padding-1">monitor</div>
<div className="config-sub-area-control-field">
<select
className="body-select"
value={audioMonitorForGUI}
onChange={(e) => {
setAudioMonitorForGUI(e.target.value);
setItem(INDEXEDDB_KEY_AUDIO_MONITR, e.target.value);
}}
>
{outputAudioDeviceInfo.map((x) => {
return (
<option key={x.deviceId} value={x.deviceId}>
{x.label}
</option>
);
})}
</select>
</div>
</div>
);
}, [serverSetting.serverSetting.enableServerAudio, outputAudioDeviceInfo, audioMonitorForGUI]);
useEffect(() => {
console.log("initializedRef.current", initializedRef.current);
setAudioMonitorElementId(AUDIO_ELEMENT_FOR_PLAY_MONITOR);
}, [initializedRef.current]);
// (6-2) サーバ
const serverMonitorRow = useMemo(() => { const serverMonitorRow = useMemo(() => {
if (serverSetting.serverSetting.enableServerAudio == 0) { if (serverSetting.serverSetting.enableServerAudio == 0) {
return <></>; return <></>;
@ -675,6 +757,41 @@ export const DeviceArea = (_props: DeviceAreaProps) => {
); );
}, [monitorHostApi, serverSetting.serverSetting, serverSetting.updateServerSettings, serverSetting.serverSetting.enableServerAudio]); }, [monitorHostApi, serverSetting.serverSetting, serverSetting.updateServerSettings, serverSetting.serverSetting.enableServerAudio]);
const monitorGainControl = useMemo(() => {
const currentMonitorGain = serverSetting.serverSetting.enableServerAudio == 0 ? setting.voiceChangerClientSetting.monitorGain : serverSetting.serverSetting.serverMonitorAudioGain;
const monitorValueUpdatedAction =
serverSetting.serverSetting.enableServerAudio == 0
? async (val: number) => {
await setVoiceChangerClientSetting({ ...setting.voiceChangerClientSetting, monitorGain: val });
}
: async (val: number) => {
await serverSetting.updateServerSettings({ ...serverSetting.serverSetting, serverMonitorAudioGain: val });
};
return (
<div className="config-sub-area-control">
<div className="config-sub-area-control-title left-padding-2">gain</div>
<div className="config-sub-area-control-field">
<div className="config-sub-area-control-field-auido-io">
<span className="character-area-slider-control-slider">
<input
type="range"
min="0.1"
max="10.0"
step="0.1"
value={currentMonitorGain}
onChange={(e) => {
monitorValueUpdatedAction(Number(e.target.value));
}}
></input>
</span>
<span className="character-area-slider-control-val">{currentMonitorGain}</span>
</div>
</div>
</div>
);
}, [serverSetting.serverSetting, setting, setVoiceChangerClientSetting, serverSetting.updateServerSettings]);
return ( return (
<div className="config-sub-area"> <div className="config-sub-area">
{deviceModeRow} {deviceModeRow}
@ -685,10 +802,13 @@ export const DeviceArea = (_props: DeviceAreaProps) => {
{audioInputScreenRow} {audioInputScreenRow}
{clientAudioOutputRow} {clientAudioOutputRow}
{serverAudioOutputRow} {serverAudioOutputRow}
{clientMonitorRow}
{serverMonitorRow} {serverMonitorRow}
{monitorGainControl}
{outputRecorderRow} {outputRecorderRow}
<audio hidden id={AUDIO_ELEMENT_FOR_PLAY_RESULT}></audio> <audio hidden id={AUDIO_ELEMENT_FOR_PLAY_RESULT}></audio>
<audio hidden id={AUDIO_ELEMENT_FOR_PLAY_MONITOR}></audio>
</div> </div>
); );
}; };

View File

@ -1,13 +1,15 @@
export const AUDIO_ELEMENT_FOR_PLAY_RESULT = "audio-result" export const AUDIO_ELEMENT_FOR_PLAY_RESULT = "audio-result" // 変換後の出力用プレイヤー
export const AUDIO_ELEMENT_FOR_TEST_ORIGINAL = "audio-test-original" export const AUDIO_ELEMENT_FOR_PLAY_MONITOR = "audio-monitor" // 変換後のモニター用プレイヤー
export const AUDIO_ELEMENT_FOR_TEST_CONVERTED = "audio-test-converted" export const AUDIO_ELEMENT_FOR_TEST_ORIGINAL = "audio-test-original" // ??? 使ってないかも。
export const AUDIO_ELEMENT_FOR_TEST_CONVERTED_ECHOBACK = "audio-test-converted-echoback" export const AUDIO_ELEMENT_FOR_TEST_CONVERTED = "audio-test-converted" // ファイルインプットのコントロール
export const AUDIO_ELEMENT_FOR_TEST_CONVERTED_ECHOBACK = "audio-test-converted-echoback" // ファイルインプットのエコーバック
export const AUDIO_ELEMENT_FOR_SAMPLING_INPUT = "body-wav-container-wav-input" export const AUDIO_ELEMENT_FOR_SAMPLING_INPUT = "body-wav-container-wav-input"
export const AUDIO_ELEMENT_FOR_SAMPLING_OUTPUT = "body-wav-container-wav-output" export const AUDIO_ELEMENT_FOR_SAMPLING_OUTPUT = "body-wav-container-wav-output"
export const INDEXEDDB_KEY_AUDIO_OUTPUT = "INDEXEDDB_KEY_AUDIO_OUTPUT" export const INDEXEDDB_KEY_AUDIO_OUTPUT = "INDEXEDDB_KEY_AUDIO_OUTPUT"
export const INDEXEDDB_KEY_AUDIO_MONITR = "INDEXEDDB_KEY_AUDIO_MONITOR"
export const INDEXEDDB_KEY_DEFAULT_MODEL_TYPE = "INDEXEDDB_KEY_DEFALT_MODEL_TYPE" export const INDEXEDDB_KEY_DEFAULT_MODEL_TYPE = "INDEXEDDB_KEY_DEFALT_MODEL_TYPE"

View File

@ -757,6 +757,18 @@ body {
max-height: 60vh; max-height: 60vh;
width: 100%; width: 100%;
overflow-y: scroll; overflow-y: scroll;
&::-webkit-scrollbar {
width: 10px;
height: 10px;
}
&::-webkit-scrollbar-track {
background-color: #eee;
border-radius: 3px;
}
&::-webkit-scrollbar-thumb {
background: #f7cfec80;
border-radius: 3px;
}
.model-slot { .model-slot {
height: 5rem; height: 5rem;
@ -1150,12 +1162,30 @@ body {
flex-direction: row; flex-direction: row;
gap: 2px; gap: 2px;
flex-wrap: wrap; flex-wrap: wrap;
overflow-y: scroll;
max-height: 12rem;
&::-webkit-scrollbar {
width: 10px;
height: 10px;
}
&::-webkit-scrollbar-track {
background-color: #eee;
border-radius: 3px;
}
&::-webkit-scrollbar-thumb {
background: #f7cfec80;
border-radius: 3px;
}
/* width: calc(30rem + 40px + 10px); */ /* width: calc(30rem + 40px + 10px); */
} }
.model-slot-buttons { .model-slot-buttons {
display: flex; display: flex;
flex-direction: column-reverse; gap: 5px;
flex-direction: column;
justify-content: space-between;
width: 4rem;
.model-slot-button { .model-slot-button {
border: solid 2px #999; border: solid 2px #999;
color: white; color: white;
@ -1164,10 +1194,41 @@ body {
background: #333; background: #333;
cursor: pointer; cursor: pointer;
padding: 5px; padding: 5px;
text-align: center;
width: 3rem;
} }
.model-slot-button:hover { .model-slot-button:hover {
border: solid 2px #faa; border: solid 2px #faa;
} }
.model-slot-sort-buttons {
height: 50%;
.model-slot-sort-button {
color: white;
font-size: 0.8rem;
border-radius: 4px;
background: #333;
border: solid 2px #444;
cursor: pointer;
padding: 1px;
text-align: center;
width: 3rem;
}
.model-slot-sort-button-active {
color: white;
font-size: 0.8rem;
border-radius: 4px;
background: #595;
border: solid 2px #595;
cursor: pointer;
padding: 1px;
text-align: center;
width: 3rem;
}
.model-slot-sort-button:hover {
border: solid 2px #faa;
background: #343;
}
}
} }
} }
} }
@ -1277,6 +1338,7 @@ body {
.character-area-control { .character-area-control {
display: flex; display: flex;
gap: 3px; gap: 3px;
align-items: center;
.character-area-control-buttons { .character-area-control-buttons {
display: flex; display: flex;
flex-direction: row; flex-direction: row;
@ -1301,6 +1363,34 @@ body {
border: solid 1px #000; border: solid 1px #000;
} }
} }
.character-area-control-passthru-button-stanby {
width: 5rem;
border: solid 1px #999;
border-radius: 15px;
padding: 2px;
background: #aba;
cursor: pointer;
font-weight: 700;
font-size: 0.8rem;
text-align: center;
&:hover {
border: solid 1px #000;
}
}
.character-area-control-passthru-button-active {
width: 5rem;
border: solid 1px #955;
border-radius: 15px;
padding: 2px;
background: #fdd;
cursor: pointer;
font-weight: 700;
font-size: 0.8rem;
text-align: center;
&:hover {
border: solid 1px #000;
}
}
} }
.character-area-control-title { .character-area-control-title {
@ -1344,6 +1434,35 @@ body {
.character-area-button:hover { .character-area-button:hover {
border: solid 2px #faa; border: solid 2px #faa;
} }
.character-area-toggle-button {
border: solid 2px #999;
color: white;
background: #666;
cursor: pointer;
font-size: 0.8rem;
border-radius: 5px;
height: 1.2rem;
padding-left: 2px;
padding-right: 2px;
}
.character-area-toggle-button:hover {
border: solid 2px #faa;
}
.character-area-toggle-button-active {
border: solid 2px #999;
color: white;
background: #844;
cursor: pointer;
font-size: 0.8rem;
border-radius: 5px;
height: 1.2rem;
padding-left: 2px;
padding-right: 2px;
}
} }
} }
} }
@ -1443,6 +1562,10 @@ audio::-webkit-media-controls-overlay-enclosure{
height: 1.2rem; height: 1.2rem;
padding-left: 2px; padding-left: 2px;
padding-right: 2px; padding-right: 2px;
white-space: nowrap;
}
.config-sub-area-button-text-small {
font-size: 0.5rem;
} }
} }
.config-sub-area-control-field-auido-io { .config-sub-area-control-field-auido-io {
@ -1635,6 +1758,21 @@ audio::-webkit-media-controls-overlay-enclosure{
flex-direction: row; flex-direction: row;
.merge-lab-model-list { .merge-lab-model-list {
width: 70%; width: 70%;
overflow-y: scroll;
max-height: 20rem;
&::-webkit-scrollbar {
width: 10px;
height: 10px;
}
&::-webkit-scrollbar-track {
background-color: #eee;
border-radius: 3px;
}
&::-webkit-scrollbar-thumb {
background: #f7cfec80;
border-radius: 3px;
}
.merge-lab-model-item { .merge-lab-model-item {
display: flex; display: flex;
flex-direction: row; flex-direction: row;
@ -1673,3 +1811,18 @@ audio::-webkit-media-controls-overlay-enclosure{
} }
} }
} }
.blinking {
animation: flash 0.7s cubic-bezier(0.91, -0.14, 0, 1.4) infinite;
}
@keyframes flash {
0%,
100% {
opacity: 1;
}
50% {
opacity: 0.5;
}
}

File diff suppressed because it is too large Load Diff

View File

@ -1,6 +1,6 @@
{ {
"name": "@dannadori/voice-changer-client-js", "name": "@dannadori/voice-changer-client-js",
"version": "1.0.164", "version": "1.0.167",
"description": "", "description": "",
"main": "dist/index.js", "main": "dist/index.js",
"directories": { "directories": {
@ -26,33 +26,33 @@
"author": "wataru.okada@flect.co.jp", "author": "wataru.okada@flect.co.jp",
"license": "ISC", "license": "ISC",
"devDependencies": { "devDependencies": {
"@types/audioworklet": "^0.0.48", "@types/audioworklet": "^0.0.50",
"@types/node": "^20.4.2", "@types/node": "^20.4.8",
"@types/react": "18.2.15", "@types/react": "18.2.18",
"@types/react-dom": "18.2.7", "@types/react-dom": "18.2.7",
"eslint": "^8.45.0", "eslint": "^8.46.0",
"eslint-config-prettier": "^8.8.0", "eslint-config-prettier": "^9.0.0",
"eslint-plugin-prettier": "^5.0.0", "eslint-plugin-prettier": "^5.0.0",
"eslint-plugin-react": "^7.32.2", "eslint-plugin-react": "^7.33.1",
"eslint-webpack-plugin": "^4.0.1", "eslint-webpack-plugin": "^4.0.1",
"npm-run-all": "^4.1.5", "npm-run-all": "^4.1.5",
"prettier": "^3.0.0", "prettier": "^3.0.1",
"raw-loader": "^4.0.2", "raw-loader": "^4.0.2",
"rimraf": "^5.0.1", "rimraf": "^5.0.1",
"ts-loader": "^9.4.4", "ts-loader": "^9.4.4",
"typescript": "^5.1.6", "typescript": "^5.1.6",
"webpack": "^5.88.1", "webpack": "^5.88.2",
"webpack-cli": "^5.1.4", "webpack-cli": "^5.1.4",
"webpack-dev-server": "^4.15.1" "webpack-dev-server": "^4.15.1"
}, },
"dependencies": { "dependencies": {
"@types/readable-stream": "^2.3.15", "@types/readable-stream": "^4.0.0",
"amazon-chime-sdk-js": "^3.15.0", "amazon-chime-sdk-js": "^3.15.0",
"buffer": "^6.0.3", "buffer": "^6.0.3",
"localforage": "^1.10.0", "localforage": "^1.10.0",
"protobufjs": "^7.2.4", "protobufjs": "^7.2.4",
"react": "^18.2.0", "react": "^18.2.0",
"react-dom": "^18.2.0", "react-dom": "^18.2.0",
"socket.io-client": "^4.7.1" "socket.io-client": "^4.7.2"
} }
} }

View File

@ -23,9 +23,11 @@ export class VoiceChangerClient {
private currentMediaStreamAudioSourceNode: MediaStreamAudioSourceNode | null = null private currentMediaStreamAudioSourceNode: MediaStreamAudioSourceNode | null = null
private inputGainNode: GainNode | null = null private inputGainNode: GainNode | null = null
private outputGainNode: GainNode | null = null private outputGainNode: GainNode | null = null
private monitorGainNode: GainNode | null = null
private vcInNode!: VoiceChangerWorkletNode private vcInNode!: VoiceChangerWorkletNode
private vcOutNode!: VoiceChangerWorkletNode private vcOutNode!: VoiceChangerWorkletNode
private currentMediaStreamAudioDestinationNode!: MediaStreamAudioDestinationNode private currentMediaStreamAudioDestinationNode!: MediaStreamAudioDestinationNode
private currentMediaStreamAudioDestinationMonitorNode!: MediaStreamAudioDestinationNode
private promiseForInitialize: Promise<void> private promiseForInitialize: Promise<void>
@ -72,6 +74,12 @@ export class VoiceChangerClient {
this.vcOutNode.connect(this.outputGainNode) // vc node -> output node this.vcOutNode.connect(this.outputGainNode) // vc node -> output node
this.outputGainNode.connect(this.currentMediaStreamAudioDestinationNode) this.outputGainNode.connect(this.currentMediaStreamAudioDestinationNode)
this.currentMediaStreamAudioDestinationMonitorNode = ctx44k.createMediaStreamDestination() // output node
this.monitorGainNode = ctx44k.createGain()
this.monitorGainNode.gain.value = this.setting.monitorGain
this.vcOutNode.connect(this.monitorGainNode) // vc node -> monitor node
this.monitorGainNode.connect(this.currentMediaStreamAudioDestinationMonitorNode)
if (this.vfEnable) { if (this.vfEnable) {
this.vf = await VoiceFocusDeviceTransformer.create({ variant: 'c20' }) this.vf = await VoiceFocusDeviceTransformer.create({ variant: 'c20' })
const dummyMediaStream = createDummyMediaStream(this.ctx) const dummyMediaStream = createDummyMediaStream(this.ctx)
@ -185,6 +193,9 @@ export class VoiceChangerClient {
get stream(): MediaStream { get stream(): MediaStream {
return this.currentMediaStreamAudioDestinationNode.stream return this.currentMediaStreamAudioDestinationNode.stream
} }
get monitorStream(): MediaStream {
return this.currentMediaStreamAudioDestinationMonitorNode.stream
}
start = async () => { start = async () => {
await this.vcInNode.start() await this.vcInNode.start()
@ -239,6 +250,9 @@ export class VoiceChangerClient {
if (this.setting.outputGain != setting.outputGain) { if (this.setting.outputGain != setting.outputGain) {
this.setOutputGain(setting.outputGain) this.setOutputGain(setting.outputGain)
} }
if (this.setting.monitorGain != setting.monitorGain) {
this.setMonitorGain(setting.monitorGain)
}
this.setting = setting this.setting = setting
if (reconstructInputRequired) { if (reconstructInputRequired) {
@ -251,6 +265,9 @@ export class VoiceChangerClient {
if (!this.inputGainNode) { if (!this.inputGainNode) {
return return
} }
if(!val){
return
}
this.inputGainNode.gain.value = val this.inputGainNode.gain.value = val
} }
@ -258,9 +275,22 @@ export class VoiceChangerClient {
if (!this.outputGainNode) { if (!this.outputGainNode) {
return return
} }
if(!val){
return
}
this.outputGainNode.gain.value = val this.outputGainNode.gain.value = val
} }
setMonitorGain = (val: number) => {
if (!this.monitorGainNode) {
return
}
if(!val){
return
}
this.monitorGainNode.gain.value = val
}
///////////////////////////////////////////////////// /////////////////////////////////////////////////////
// コンポーネント設定、操作 // コンポーネント設定、操作
///////////////////////////////////////////////////// /////////////////////////////////////////////////////

View File

@ -68,6 +68,7 @@ export const RVCModelType = {
export type RVCModelType = typeof RVCModelType[keyof typeof RVCModelType] export type RVCModelType = typeof RVCModelType[keyof typeof RVCModelType]
export const ServerSettingKey = { export const ServerSettingKey = {
"passThrough":"passThrough",
"srcId": "srcId", "srcId": "srcId",
"dstId": "dstId", "dstId": "dstId",
"gpu": "gpu", "gpu": "gpu",
@ -97,6 +98,7 @@ export const ServerSettingKey = {
"serverReadChunkSize": "serverReadChunkSize", "serverReadChunkSize": "serverReadChunkSize",
"serverInputAudioGain": "serverInputAudioGain", "serverInputAudioGain": "serverInputAudioGain",
"serverOutputAudioGain": "serverOutputAudioGain", "serverOutputAudioGain": "serverOutputAudioGain",
"serverMonitorAudioGain": "serverMonitorAudioGain",
"tran": "tran", "tran": "tran",
"noiseScale": "noiseScale", "noiseScale": "noiseScale",
@ -123,6 +125,7 @@ export const ServerSettingKey = {
"threshold": "threshold", "threshold": "threshold",
"speedUp": "speedUp", "speedUp": "speedUp",
"skipDiffusion": "skipDiffusion",
"inputSampleRate": "inputSampleRate", "inputSampleRate": "inputSampleRate",
"enableDirectML": "enableDirectML", "enableDirectML": "enableDirectML",
@ -131,6 +134,7 @@ export type ServerSettingKey = typeof ServerSettingKey[keyof typeof ServerSettin
export type VoiceChangerServerSetting = { export type VoiceChangerServerSetting = {
passThrough: boolean
srcId: number, srcId: number,
dstId: number, dstId: number,
gpu: number, gpu: number,
@ -157,6 +161,7 @@ export type VoiceChangerServerSetting = {
serverReadChunkSize: number serverReadChunkSize: number
serverInputAudioGain: number serverInputAudioGain: number
serverOutputAudioGain: number serverOutputAudioGain: number
serverMonitorAudioGain: number
tran: number // so-vits-svc tran: number // so-vits-svc
@ -184,13 +189,14 @@ export type VoiceChangerServerSetting = {
threshold: number// DDSP-SVC threshold: number// DDSP-SVC
speedUp: number // Diffusion-SVC speedUp: number // Diffusion-SVC
skipDiffusion: number // Diffusion-SVC 0:off, 1:on
inputSampleRate: InputSampleRate inputSampleRate: InputSampleRate
enableDirectML: number enableDirectML: number
} }
type ModelSlot = { type ModelSlot = {
slotIndex: number
voiceChangerType: VoiceChangerType voiceChangerType: VoiceChangerType
name: string, name: string,
description: string, description: string,
@ -303,7 +309,9 @@ export type ServerInfo = VoiceChangerServerSetting & {
memory: number, memory: number,
}[] }[]
maxInputLength: number // MMVCv15 maxInputLength: number // MMVCv15
voiceChangerParams: {
model_dir: string
}
} }
export type SampleModel = { export type SampleModel = {
@ -339,6 +347,7 @@ export type DiffusionSVCSampleModel =SampleModel & {
export const DefaultServerSetting: ServerInfo = { export const DefaultServerSetting: ServerInfo = {
// VC Common // VC Common
passThrough: false,
inputSampleRate: 48000, inputSampleRate: 48000,
crossFadeOffsetRate: 0.0, crossFadeOffsetRate: 0.0,
@ -361,6 +370,7 @@ export const DefaultServerSetting: ServerInfo = {
serverReadChunkSize: 256, serverReadChunkSize: 256,
serverInputAudioGain: 1.0, serverInputAudioGain: 1.0,
serverOutputAudioGain: 1.0, serverOutputAudioGain: 1.0,
serverMonitorAudioGain: 1.0,
// VC Specific // VC Specific
srcId: 0, srcId: 0,
@ -397,6 +407,7 @@ export const DefaultServerSetting: ServerInfo = {
threshold: -45, threshold: -45,
speedUp: 10, speedUp: 10,
skipDiffusion: 1,
enableDirectML: 0, enableDirectML: 0,
// //
@ -405,7 +416,10 @@ export const DefaultServerSetting: ServerInfo = {
serverAudioInputDevices: [], serverAudioInputDevices: [],
serverAudioOutputDevices: [], serverAudioOutputDevices: [],
maxInputLength: 128 * 2048 maxInputLength: 128 * 2048,
voiceChangerParams: {
model_dir: ""
}
} }
/////////////////////// ///////////////////////
@ -466,6 +480,7 @@ export type VoiceChangerClientSetting = {
inputGain: number inputGain: number
outputGain: number outputGain: number
monitorGain: number
} }
/////////////////////// ///////////////////////
@ -496,7 +511,8 @@ export const DefaultClientSettng: ClientSetting = {
noiseSuppression: false, noiseSuppression: false,
noiseSuppression2: false, noiseSuppression2: false,
inputGain: 1.0, inputGain: 1.0,
outputGain: 1.0 outputGain: 1.0,
monitorGain: 1.0
} }
} }
@ -533,7 +549,7 @@ export type OnnxExporterInfo = {
// Merge // Merge
export type MergeElement = { export type MergeElement = {
filename: string slotIndex: number
strength: number strength: number
} }
export type MergeModelRequest = { export type MergeModelRequest = {

View File

@ -47,6 +47,7 @@ export type ClientState = {
clearSetting: () => Promise<void> clearSetting: () => Promise<void>
// AudioOutputElement 設定 // AudioOutputElement 設定
setAudioOutputElementId: (elemId: string) => void setAudioOutputElementId: (elemId: string) => void
setAudioMonitorElementId: (elemId: string) => void
ioErrorCount: number ioErrorCount: number
resetIoErrorCount: () => void resetIoErrorCount: () => void
@ -215,6 +216,18 @@ export const useClient = (props: UseClientProps): ClientState => {
} }
} }
const setAudioMonitorElementId = (elemId: string) => {
if (!voiceChangerClientRef.current) {
console.warn("[voiceChangerClient] is not ready for set audio output.")
return
}
const audio = document.getElementById(elemId) as HTMLAudioElement
if (audio.paused) {
audio.srcObject = voiceChangerClientRef.current.monitorStream
audio.play()
}
}
// (2-2) 情報リロード // (2-2) 情報リロード
const getInfo = useMemo(() => { const getInfo = useMemo(() => {
return async () => { return async () => {
@ -286,6 +299,7 @@ export const useClient = (props: UseClientProps): ClientState => {
// AudioOutputElement 設定 // AudioOutputElement 設定
setAudioOutputElementId, setAudioOutputElementId,
setAudioMonitorElementId,
ioErrorCount, ioErrorCount,
resetIoErrorCount resetIoErrorCount

View File

@ -18,6 +18,10 @@ npm run build:docker:vcclient
bash start_docker.sh bash start_docker.sh
``` ```
ブラウザ(Chrome のみサポート)でアクセスすると画面が表示されます。
## RUN with options
GPU を使用しない場合は GPU を使用しない場合は
``` ```

View File

@ -36,6 +36,10 @@ In root folder of repos.
bash start_docker.sh bash start_docker.sh
``` ```
Access with Browser (currently only chrome is supported), then you can see gui.
## RUN with options
Without GPU Without GPU
``` ```

View File

@ -9,6 +9,7 @@ import argparse
from Exceptions import WeightDownladException from Exceptions import WeightDownladException
from downloader.SampleDownloader import downloadInitialSamples from downloader.SampleDownloader import downloadInitialSamples
from downloader.WeightDownloader import downloadWeight from downloader.WeightDownloader import downloadWeight
from voice_changer.VoiceChangerParamsManager import VoiceChangerParamsManager
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
@ -40,19 +41,19 @@ def setupArgParser():
parser.add_argument("--httpsCert", type=str, default="ssl.cert", help="path for the cert of https") parser.add_argument("--httpsCert", type=str, default="ssl.cert", help="path for the cert of https")
parser.add_argument("--httpsSelfSigned", type=strtobool, default=True, help="generate self-signed certificate") parser.add_argument("--httpsSelfSigned", type=strtobool, default=True, help="generate self-signed certificate")
parser.add_argument("--model_dir", type=str, help="path to model files") parser.add_argument("--model_dir", type=str, default="model_dir", help="path to model files")
parser.add_argument("--sample_mode", type=str, default="production", help="rvc_sample_mode") parser.add_argument("--sample_mode", type=str, default="production", help="rvc_sample_mode")
parser.add_argument("--content_vec_500", type=str, help="path to content_vec_500 model(pytorch)") parser.add_argument("--content_vec_500", type=str, default="pretrain/checkpoint_best_legacy_500.pt", help="path to content_vec_500 model(pytorch)")
parser.add_argument("--content_vec_500_onnx", type=str, help="path to content_vec_500 model(onnx)") parser.add_argument("--content_vec_500_onnx", type=str, default="pretrain/content_vec_500.onnx", help="path to content_vec_500 model(onnx)")
parser.add_argument("--content_vec_500_onnx_on", type=strtobool, default=False, help="use or not onnx for content_vec_500") parser.add_argument("--content_vec_500_onnx_on", type=strtobool, default=True, help="use or not onnx for content_vec_500")
parser.add_argument("--hubert_base", type=str, help="path to hubert_base model(pytorch)") parser.add_argument("--hubert_base", type=str, default="pretrain/hubert_base.pt", help="path to hubert_base model(pytorch)")
parser.add_argument("--hubert_base_jp", type=str, help="path to hubert_base_jp model(pytorch)") parser.add_argument("--hubert_base_jp", type=str, default="pretrain/rinna_hubert_base_jp.pt", help="path to hubert_base_jp model(pytorch)")
parser.add_argument("--hubert_soft", type=str, help="path to hubert_soft model(pytorch)") parser.add_argument("--hubert_soft", type=str, default="pretrain/hubert/hubert-soft-0d54a1f4.pt", help="path to hubert_soft model(pytorch)")
parser.add_argument("--nsf_hifigan", type=str, help="path to nsf_hifigan model(pytorch)") parser.add_argument("--nsf_hifigan", type=str, default="pretrain/nsf_hifigan/model", help="path to nsf_hifigan model(pytorch)")
parser.add_argument("--crepe_onnx_full", type=str, help="path to crepe_onnx_full") parser.add_argument("--crepe_onnx_full", type=str, default="pretrain/crepe_onnx_full.onnx", help="path to crepe_onnx_full")
parser.add_argument("--crepe_onnx_tiny", type=str, help="path to crepe_onnx_tiny") parser.add_argument("--crepe_onnx_tiny", type=str, default="pretrain/crepe_onnx_tiny.onnx", help="path to crepe_onnx_tiny")
parser.add_argument("--rmvpe", type=str, help="path to rmvpe") parser.add_argument("--rmvpe", type=str, default="pretrain/rmvpe.pt", help="path to rmvpe")
return parser return parser
@ -96,6 +97,8 @@ voiceChangerParams = VoiceChangerParams(
rmvpe=args.rmvpe, rmvpe=args.rmvpe,
sample_mode=args.sample_mode, sample_mode=args.sample_mode,
) )
vcparams = VoiceChangerParamsManager.get_instance()
vcparams.setParams(voiceChangerParams)
printMessage(f"Booting PHASE :{__name__}", level=2) printMessage(f"Booting PHASE :{__name__}", level=2)
@ -124,7 +127,8 @@ if __name__ == "MMVCServerSIO":
if __name__ == "__mp_main__": if __name__ == "__mp_main__":
printMessage("サーバプロセスを起動しています。", level=2) # printMessage("サーバプロセスを起動しています。", level=2)
printMessage("The server process is starting up.", level=2)
if __name__ == "__main__": if __name__ == "__main__":
mp.freeze_support() mp.freeze_support()
@ -132,12 +136,13 @@ if __name__ == "__main__":
logger.debug(args) logger.debug(args)
printMessage(f"PYTHON:{sys.version}", level=2) printMessage(f"PYTHON:{sys.version}", level=2)
printMessage("Voice Changerを起動しています。", level=2) # printMessage("Voice Changerを起動しています。", level=2)
printMessage("Activating the Voice Changer.", level=2)
# ダウンロード(Weight) # ダウンロード(Weight)
try: try:
downloadWeight(voiceChangerParams) downloadWeight(voiceChangerParams)
except WeightDownladException: except WeightDownladException:
printMessage("RVC用のモデルファイルのダウンロードに失敗しました。", level=2) # printMessage("RVC用のモデルファイルのダウンロードに失敗しました。", level=2)
printMessage("failed to download weight for rvc", level=2) printMessage("failed to download weight for rvc", level=2)
# ダウンロード(Sample) # ダウンロード(Sample)
@ -192,29 +197,31 @@ if __name__ == "__main__":
printMessage("-- ---- -- ", level=1) printMessage("-- ---- -- ", level=1)
# アドレス表示 # アドレス表示
printMessage("ブラウザで次のURLを開いてください.", level=2) printMessage("Please open the following URL in your browser.", level=2)
# printMessage("ブラウザで次のURLを開いてください.", level=2)
if args.https == 1: if args.https == 1:
printMessage("https://<IP>:<PORT>/", level=1) printMessage("https://<IP>:<PORT>/", level=1)
else: else:
printMessage("http://<IP>:<PORT>/", level=1) printMessage("http://<IP>:<PORT>/", level=1)
printMessage("多くの場合は次のいずれかのURLにアクセスすると起動します。", level=2) # printMessage("多くの場合は次のいずれかのURLにアクセスすると起動します。", level=2)
printMessage("In many cases, it will launch when you access any of the following URLs.", level=2)
if "EX_PORT" in locals() and "EX_IP" in locals(): # シェルスクリプト経由起動(docker) if "EX_PORT" in locals() and "EX_IP" in locals(): # シェルスクリプト経由起動(docker)
if args.https == 1: if args.https == 1:
printMessage(f"https://localhost:{EX_PORT}/", level=1) printMessage(f"https://127.0.0.1:{EX_PORT}/", level=1)
for ip in EX_IP.strip().split(" "): for ip in EX_IP.strip().split(" "):
printMessage(f"https://{ip}:{EX_PORT}/", level=1) printMessage(f"https://{ip}:{EX_PORT}/", level=1)
else: else:
printMessage(f"http://localhost:{EX_PORT}/", level=1) printMessage(f"http://127.0.0.1:{EX_PORT}/", level=1)
else: # 直接python起動 else: # 直接python起動
if args.https == 1: if args.https == 1:
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.connect((args.test_connect, 80)) s.connect((args.test_connect, 80))
hostname = s.getsockname()[0] hostname = s.getsockname()[0]
printMessage(f"https://localhost:{PORT}/", level=1) printMessage(f"https://127.0.0.1:{PORT}/", level=1)
printMessage(f"https://{hostname}:{PORT}/", level=1) printMessage(f"https://{hostname}:{PORT}/", level=1)
else: else:
printMessage(f"http://localhost:{PORT}/", level=1) printMessage(f"http://127.0.0.1:{PORT}/", level=1)
# サーバ起動 # サーバ起動
if args.https: if args.https:
@ -237,15 +244,15 @@ if __name__ == "__main__":
p.start() p.start()
try: try:
if sys.platform.startswith("win"): if sys.platform.startswith("win"):
process = subprocess.Popen([NATIVE_CLIENT_FILE_WIN, "--disable-gpu", "-u", f"http://localhost:{PORT}/"]) process = subprocess.Popen([NATIVE_CLIENT_FILE_WIN, "--disable-gpu", "-u", f"http://127.0.0.1:{PORT}/"])
return_code = process.wait() return_code = process.wait()
logger.info("client closed.") logger.info("client closed.")
p.terminate() p.terminate()
elif sys.platform.startswith("darwin"): elif sys.platform.startswith("darwin"):
process = subprocess.Popen([NATIVE_CLIENT_FILE_MAC, "--disable-gpu", "-u", f"http://localhost:{PORT}/"]) process = subprocess.Popen([NATIVE_CLIENT_FILE_MAC, "--disable-gpu", "-u", f"http://127.0.0.1:{PORT}/"])
return_code = process.wait() return_code = process.wait()
logger.info("client closed.") logger.info("client closed.")
p.terminate() p.terminate()
except Exception as e: except Exception as e:
logger.error(f"[Voice Changer] Launch Exception, {e}") logger.error(f"[Voice Changer] Client Launch Exception, {e}")

View File

@ -169,4 +169,4 @@ def getSampleJsonAndModelIds(mode: RVCSampleMode):
RVC_MODEL_DIRNAME = "rvc" RVC_MODEL_DIRNAME = "rvc"
MAX_SLOT_NUM = 10 MAX_SLOT_NUM = 200

View File

@ -9,6 +9,7 @@ import json
@dataclass @dataclass
class ModelSlot: class ModelSlot:
slotIndex: int = -1
voiceChangerType: VoiceChangerType | None = None voiceChangerType: VoiceChangerType | None = None
name: str = "" name: str = ""
description: str = "" description: str = ""
@ -132,19 +133,26 @@ def loadSlotInfo(model_dir: str, slotIndex: int) -> ModelSlots:
if not os.path.exists(jsonFile): if not os.path.exists(jsonFile):
return ModelSlot() return ModelSlot()
jsonDict = json.load(open(os.path.join(slotDir, "params.json"))) jsonDict = json.load(open(os.path.join(slotDir, "params.json")))
slotInfo = ModelSlot(**{k: v for k, v in jsonDict.items() if k in ModelSlot.__annotations__}) slotInfoKey = list(ModelSlot.__annotations__.keys())
slotInfo = ModelSlot(**{k: v for k, v in jsonDict.items() if k in slotInfoKey})
if slotInfo.voiceChangerType == "RVC": if slotInfo.voiceChangerType == "RVC":
return RVCModelSlot(**jsonDict) slotInfoKey.extend(list(RVCModelSlot.__annotations__.keys()))
return RVCModelSlot(**{k: v for k, v in jsonDict.items() if k in slotInfoKey})
elif slotInfo.voiceChangerType == "MMVCv13": elif slotInfo.voiceChangerType == "MMVCv13":
return MMVCv13ModelSlot(**jsonDict) slotInfoKey.extend(list(MMVCv13ModelSlot.__annotations__.keys()))
return MMVCv13ModelSlot(**{k: v for k, v in jsonDict.items() if k in slotInfoKey})
elif slotInfo.voiceChangerType == "MMVCv15": elif slotInfo.voiceChangerType == "MMVCv15":
return MMVCv15ModelSlot(**jsonDict) slotInfoKey.extend(list(MMVCv15ModelSlot.__annotations__.keys()))
return MMVCv15ModelSlot(**{k: v for k, v in jsonDict.items() if k in slotInfoKey})
elif slotInfo.voiceChangerType == "so-vits-svc-40": elif slotInfo.voiceChangerType == "so-vits-svc-40":
return SoVitsSvc40ModelSlot(**jsonDict) slotInfoKey.extend(list(SoVitsSvc40ModelSlot.__annotations__.keys()))
return SoVitsSvc40ModelSlot(**{k: v for k, v in jsonDict.items() if k in slotInfoKey})
elif slotInfo.voiceChangerType == "DDSP-SVC": elif slotInfo.voiceChangerType == "DDSP-SVC":
return DDSPSVCModelSlot(**jsonDict) slotInfoKey.extend(list(DDSPSVCModelSlot.__annotations__.keys()))
return DDSPSVCModelSlot(**{k: v for k, v in jsonDict.items() if k in slotInfoKey})
elif slotInfo.voiceChangerType == "Diffusion-SVC": elif slotInfo.voiceChangerType == "Diffusion-SVC":
return DiffusionSVCModelSlot(**jsonDict) slotInfoKey.extend(list(DiffusionSVCModelSlot.__annotations__.keys()))
return DiffusionSVCModelSlot(**{k: v for k, v in jsonDict.items() if k in slotInfoKey})
else: else:
return ModelSlot() return ModelSlot()
@ -153,10 +161,13 @@ def loadAllSlotInfo(model_dir: str):
slotInfos: list[ModelSlots] = [] slotInfos: list[ModelSlots] = []
for slotIndex in range(MAX_SLOT_NUM): for slotIndex in range(MAX_SLOT_NUM):
slotInfo = loadSlotInfo(model_dir, slotIndex) slotInfo = loadSlotInfo(model_dir, slotIndex)
slotInfo.slotIndex = slotIndex # スロットインデックスは動的に注入
slotInfos.append(slotInfo) slotInfos.append(slotInfo)
return slotInfos return slotInfos
def saveSlotInfo(model_dir: str, slotIndex: int, slotInfo: ModelSlots): def saveSlotInfo(model_dir: str, slotIndex: int, slotInfo: ModelSlots):
slotDir = os.path.join(model_dir, str(slotIndex)) slotDir = os.path.join(model_dir, str(slotIndex))
json.dump(asdict(slotInfo), open(os.path.join(slotDir, "params.json"), "w")) slotInfoDict = asdict(slotInfo)
slotInfo.slotIndex = -1 # スロットインデックスは動的に注入
json.dump(slotInfoDict, open(os.path.join(slotDir, "params.json"), "w"), indent=4)

View File

@ -1,5 +1,6 @@
import json import json
import os import os
import sys
from concurrent.futures import ThreadPoolExecutor from concurrent.futures import ThreadPoolExecutor
from typing import Any, Tuple from typing import Any, Tuple
@ -7,7 +8,6 @@ from const import RVCSampleMode, getSampleJsonAndModelIds
from data.ModelSample import ModelSamples, generateModelSample from data.ModelSample import ModelSamples, generateModelSample
from data.ModelSlot import DiffusionSVCModelSlot, ModelSlot, RVCModelSlot from data.ModelSlot import DiffusionSVCModelSlot, ModelSlot, RVCModelSlot
from mods.log_control import VoiceChangaerLogger from mods.log_control import VoiceChangaerLogger
from voice_changer.DiffusionSVC.DiffusionSVCModelSlotGenerator import DiffusionSVCModelSlotGenerator
from voice_changer.ModelSlotManager import ModelSlotManager from voice_changer.ModelSlotManager import ModelSlotManager
from voice_changer.RVC.RVCModelSlotGenerator import RVCModelSlotGenerator from voice_changer.RVC.RVCModelSlotGenerator import RVCModelSlotGenerator
from downloader.Downloader import download, download_no_tqdm from downloader.Downloader import download, download_no_tqdm
@ -109,7 +109,7 @@ def _downloadSamples(samples: list[ModelSamples], sampleModelIds: list[Tuple[str
"position": line_num, "position": line_num,
} }
) )
slotInfo.modelFile = modelFilePath slotInfo.modelFile = os.path.basename(sample.modelUrl)
line_num += 1 line_num += 1
if targetSampleParams["useIndex"] is True and hasattr(sample, "indexUrl") and sample.indexUrl != "": if targetSampleParams["useIndex"] is True and hasattr(sample, "indexUrl") and sample.indexUrl != "":
@ -124,7 +124,7 @@ def _downloadSamples(samples: list[ModelSamples], sampleModelIds: list[Tuple[str
"position": line_num, "position": line_num,
} }
) )
slotInfo.indexFile = indexPath slotInfo.indexFile = os.path.basename(sample.indexUrl)
line_num += 1 line_num += 1
if hasattr(sample, "icon") and sample.icon != "": if hasattr(sample, "icon") and sample.icon != "":
@ -139,7 +139,7 @@ def _downloadSamples(samples: list[ModelSamples], sampleModelIds: list[Tuple[str
"position": line_num, "position": line_num,
} }
) )
slotInfo.iconFile = iconPath slotInfo.iconFile = os.path.basename(sample.icon)
line_num += 1 line_num += 1
slotInfo.sampleId = sample.id slotInfo.sampleId = sample.id
@ -153,6 +153,8 @@ def _downloadSamples(samples: list[ModelSamples], sampleModelIds: list[Tuple[str
slotInfo.isONNX = slotInfo.modelFile.endswith(".onnx") slotInfo.isONNX = slotInfo.modelFile.endswith(".onnx")
modelSlotManager.save_model_slot(targetSlotIndex, slotInfo) modelSlotManager.save_model_slot(targetSlotIndex, slotInfo)
elif sample.voiceChangerType == "Diffusion-SVC": elif sample.voiceChangerType == "Diffusion-SVC":
if sys.platform.startswith("darwin") is True:
continue
slotInfo: DiffusionSVCModelSlot = DiffusionSVCModelSlot() slotInfo: DiffusionSVCModelSlot = DiffusionSVCModelSlot()
os.makedirs(slotDir, exist_ok=True) os.makedirs(slotDir, exist_ok=True)
@ -167,7 +169,7 @@ def _downloadSamples(samples: list[ModelSamples], sampleModelIds: list[Tuple[str
"position": line_num, "position": line_num,
} }
) )
slotInfo.modelFile = modelFilePath slotInfo.modelFile = os.path.basename(sample.modelUrl)
line_num += 1 line_num += 1
if hasattr(sample, "icon") and sample.icon != "": if hasattr(sample, "icon") and sample.icon != "":
@ -182,7 +184,7 @@ def _downloadSamples(samples: list[ModelSamples], sampleModelIds: list[Tuple[str
"position": line_num, "position": line_num,
} }
) )
slotInfo.iconFile = iconPath slotInfo.iconFile = os.path.basename(sample.icon)
line_num += 1 line_num += 1
slotInfo.sampleId = sample.id slotInfo.sampleId = sample.id
@ -212,16 +214,19 @@ def _downloadSamples(samples: list[ModelSamples], sampleModelIds: list[Tuple[str
logger.info("[Voice Changer] Generating metadata...") logger.info("[Voice Changer] Generating metadata...")
for targetSlotIndex in slotIndex: for targetSlotIndex in slotIndex:
slotInfo = modelSlotManager.get_slot_info(targetSlotIndex) slotInfo = modelSlotManager.get_slot_info(targetSlotIndex)
modelPath = os.path.join(model_dir, str(slotInfo.slotIndex), os.path.basename(slotInfo.modelFile))
if slotInfo.voiceChangerType == "RVC": if slotInfo.voiceChangerType == "RVC":
if slotInfo.isONNX: if slotInfo.isONNX:
slotInfo = RVCModelSlotGenerator._setInfoByONNX(slotInfo) slotInfo = RVCModelSlotGenerator._setInfoByONNX(modelPath, slotInfo)
else: else:
slotInfo = RVCModelSlotGenerator._setInfoByPytorch(slotInfo) slotInfo = RVCModelSlotGenerator._setInfoByPytorch(modelPath, slotInfo)
modelSlotManager.save_model_slot(targetSlotIndex, slotInfo) modelSlotManager.save_model_slot(targetSlotIndex, slotInfo)
elif slotInfo.voiceChangerType == "Diffusion-SVC": elif slotInfo.voiceChangerType == "Diffusion-SVC":
if slotInfo.isONNX: if sys.platform.startswith("darwin") is False:
pass from voice_changer.DiffusionSVC.DiffusionSVCModelSlotGenerator import DiffusionSVCModelSlotGenerator
else: if slotInfo.isONNX:
slotInfo = DiffusionSVCModelSlotGenerator._setInfoByPytorch(slotInfo) pass
modelSlotManager.save_model_slot(targetSlotIndex, slotInfo) else:
slotInfo = DiffusionSVCModelSlotGenerator._setInfoByPytorch(slotInfo)
modelSlotManager.save_model_slot(targetSlotIndex, slotInfo)

3
server/fillSlot.sh Normal file
View File

@ -0,0 +1,3 @@
for i in {1..199}; do
cp -r model_dir/0 model_dir/$i
done

View File

@ -113,6 +113,8 @@ class MMVC_Rest_Fileuploader:
return JSONResponse(content=json_compatible_item_data) return JSONResponse(content=json_compatible_item_data)
except Exception as e: except Exception as e:
print("[Voice Changer] post_merge_models ex:", e) print("[Voice Changer] post_merge_models ex:", e)
import traceback
traceback.print_exc()
def post_update_model_default(self): def post_update_model_default(self):
try: try:

View File

@ -6,6 +6,7 @@ import torch
from data.ModelSlot import DDSPSVCModelSlot from data.ModelSlot import DDSPSVCModelSlot
from voice_changer.DDSP_SVC.deviceManager.DeviceManager import DeviceManager from voice_changer.DDSP_SVC.deviceManager.DeviceManager import DeviceManager
from voice_changer.VoiceChangerParamsManager import VoiceChangerParamsManager
if sys.platform.startswith("darwin"): if sys.platform.startswith("darwin"):
baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")] baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")]
@ -69,12 +70,15 @@ class DDSP_SVC:
def initialize(self): def initialize(self):
self.device = self.deviceManager.getDevice(self.settings.gpu) self.device = self.deviceManager.getDevice(self.settings.gpu)
vcparams = VoiceChangerParamsManager.get_instance().params
modelPath = os.path.join(vcparams.model_dir, str(self.slotInfo.slotIndex), "model", self.slotInfo.modelFile)
diffPath = os.path.join(vcparams.model_dir, str(self.slotInfo.slotIndex), "diff", self.slotInfo.diffModelFile)
self.svc_model = SvcDDSP() self.svc_model = SvcDDSP()
self.svc_model.setVCParams(self.params) self.svc_model.setVCParams(self.params)
self.svc_model.update_model(self.slotInfo.modelFile, self.device) self.svc_model.update_model(modelPath, self.device)
self.diff_model = DiffGtMel(device=self.device) self.diff_model = DiffGtMel(device=self.device)
self.diff_model.flush_model(self.slotInfo.diffModelFile, ddsp_config=self.svc_model.args) self.diff_model.flush_model(diffPath, ddsp_config=self.svc_model.args)
def update_settings(self, key: str, val: int | float | str): def update_settings(self, key: str, val: int | float | str):
if key in self.settings.intData: if key in self.settings.intData:
@ -174,5 +178,9 @@ class DDSP_SVC:
if file_path.find("DDSP-SVC" + os.path.sep) >= 0: if file_path.find("DDSP-SVC" + os.path.sep) >= 0:
# print("remove", key, file_path) # print("remove", key, file_path)
sys.modules.pop(key) sys.modules.pop(key)
except: # type:ignore except: # type:ignore # noqa
pass pass
def get_model_current(self):
return [
]

View File

@ -14,7 +14,7 @@ from voice_changer.RVC.embedder.EmbedderManager import EmbedderManager
# from voice_changer.RVC.onnxExporter.export2onnx import export2onnx # from voice_changer.RVC.onnxExporter.export2onnx import export2onnx
from voice_changer.RVC.deviceManager.DeviceManager import DeviceManager from voice_changer.RVC.deviceManager.DeviceManager import DeviceManager
from Exceptions import DeviceCannotSupportHalfPrecisionException, PipelineCreateException from Exceptions import DeviceCannotSupportHalfPrecisionException, PipelineCreateException, PipelineNotInitializedException
logger = VoiceChangaerLogger.get_instance().getLogger() logger = VoiceChangaerLogger.get_instance().getLogger()
@ -28,7 +28,6 @@ class DiffusionSVC(VoiceChangerModel):
InferencerManager.initialize(params) InferencerManager.initialize(params)
self.settings = DiffusionSVCSettings() self.settings = DiffusionSVCSettings()
self.params = params self.params = params
self.pitchExtractor = PitchExtractorManager.getPitchExtractor(self.settings.f0Detector, self.settings.gpu)
self.pipeline: Pipeline | None = None self.pipeline: Pipeline | None = None
@ -84,6 +83,8 @@ class DiffusionSVC(VoiceChangerModel):
if self.pipeline is not None: if self.pipeline is not None:
pipelineInfo = self.pipeline.getPipelineInfo() pipelineInfo = self.pipeline.getPipelineInfo()
data["pipelineInfo"] = pipelineInfo data["pipelineInfo"] = pipelineInfo
else:
data["pipelineInfo"] = "None"
return data return data
def get_processing_sampling_rate(self): def get_processing_sampling_rate(self):
@ -137,6 +138,9 @@ class DiffusionSVC(VoiceChangerModel):
return (self.audio_buffer, self.pitchf_buffer, self.feature_buffer, convertSize, vol) return (self.audio_buffer, self.pitchf_buffer, self.feature_buffer, convertSize, vol)
def inference(self, receivedData: AudioInOut, crossfade_frame: int, sola_search_frame: int): def inference(self, receivedData: AudioInOut, crossfade_frame: int, sola_search_frame: int):
if self.pipeline is None:
logger.info("[Voice Changer] Pipeline is not initialized.")
raise PipelineNotInitializedException()
data = self.generate_input(receivedData, crossfade_frame, sola_search_frame) data = self.generate_input(receivedData, crossfade_frame, sola_search_frame)
audio: AudioInOut = data[0] audio: AudioInOut = data[0]
pitchf: PitchfInOut = data[1] pitchf: PitchfInOut = data[1]
@ -176,7 +180,8 @@ class DiffusionSVC(VoiceChangerModel):
silenceFrontSec, silenceFrontSec,
embOutputLayer, embOutputLayer,
useFinalProj, useFinalProj,
protect protect,
skip_diffusion=self.settings.skipDiffusion,
) )
result = audio_out.detach().cpu().numpy() result = audio_out.detach().cpu().numpy()
return result return result
@ -211,6 +216,10 @@ class DiffusionSVC(VoiceChangerModel):
"key": "defaultTune", "key": "defaultTune",
"val": self.settings.tran, "val": self.settings.tran,
}, },
{
"key": "dstId",
"val": self.settings.dstId,
},
{ {
"key": "defaultKstep", "key": "defaultKstep",
"val": self.settings.kStep, "val": self.settings.kStep,

View File

@ -1,14 +1,14 @@
import os import os
from const import EnumInferenceTypes
from dataclasses import asdict from dataclasses import asdict
import onnxruntime
import json
from data.ModelSlot import DiffusionSVCModelSlot, ModelSlot, RVCModelSlot from data.ModelSlot import DiffusionSVCModelSlot, ModelSlot, RVCModelSlot
from voice_changer.DiffusionSVC.inferencer.diffusion_svc_model.diffusion.unit2mel import load_model_vocoder_from_combo from voice_changer.DiffusionSVC.inferencer.diffusion_svc_model.diffusion.unit2mel import load_model_vocoder_from_combo
from voice_changer.VoiceChangerParamsManager import VoiceChangerParamsManager
from voice_changer.utils.LoadModelParams import LoadModelParams from voice_changer.utils.LoadModelParams import LoadModelParams
from voice_changer.utils.ModelSlotGenerator import ModelSlotGenerator from voice_changer.utils.ModelSlotGenerator import ModelSlotGenerator
def get_divisors(n): def get_divisors(n):
divisors = [] divisors = []
for i in range(1, int(n**0.5)+1): for i in range(1, int(n**0.5)+1):
@ -31,6 +31,7 @@ class DiffusionSVCModelSlotGenerator(ModelSlotGenerator):
slotInfo.name = os.path.splitext(os.path.basename(slotInfo.modelFile))[0] slotInfo.name = os.path.splitext(os.path.basename(slotInfo.modelFile))[0]
# slotInfo.iconFile = "/assets/icons/noimage.png" # slotInfo.iconFile = "/assets/icons/noimage.png"
slotInfo.embChannels = 768 slotInfo.embChannels = 768
slotInfo.slotIndex = props.slot
if slotInfo.isONNX: if slotInfo.isONNX:
slotInfo = cls._setInfoByONNX(slotInfo) slotInfo = cls._setInfoByONNX(slotInfo)
@ -40,7 +41,10 @@ class DiffusionSVCModelSlotGenerator(ModelSlotGenerator):
@classmethod @classmethod
def _setInfoByPytorch(cls, slot: DiffusionSVCModelSlot): def _setInfoByPytorch(cls, slot: DiffusionSVCModelSlot):
diff_model, diff_args, naive_model, naive_args = load_model_vocoder_from_combo(slot.modelFile, device="cpu") vcparams = VoiceChangerParamsManager.get_instance().params
modelPath = os.path.join(vcparams.model_dir, str(slot.slotIndex), os.path.basename(slot.modelFile))
diff_model, diff_args, naive_model, naive_args = load_model_vocoder_from_combo(modelPath, device="cpu")
slot.kStepMax = diff_args.model.k_step_max slot.kStepMax = diff_args.model.k_step_max
slot.nLayers = diff_args.model.n_layers slot.nLayers = diff_args.model.n_layers
slot.nnLayers = naive_args.model.n_layers slot.nnLayers = naive_args.model.n_layers
@ -52,53 +56,4 @@ class DiffusionSVCModelSlotGenerator(ModelSlotGenerator):
@classmethod @classmethod
def _setInfoByONNX(cls, slot: ModelSlot): def _setInfoByONNX(cls, slot: ModelSlot):
tmp_onnx_session = onnxruntime.InferenceSession(slot.modelFile, providers=["CPUExecutionProvider"])
modelmeta = tmp_onnx_session.get_modelmeta()
try:
slot = RVCModelSlot(**asdict(slot))
metadata = json.loads(modelmeta.custom_metadata_map["metadata"])
# slot.modelType = metadata["modelType"]
slot.embChannels = metadata["embChannels"]
slot.embOutputLayer = metadata["embOutputLayer"] if "embOutputLayer" in metadata else 9
slot.useFinalProj = metadata["useFinalProj"] if "useFinalProj" in metadata else True if slot.embChannels == 256 else False
if slot.embChannels == 256:
slot.useFinalProj = True
else:
slot.useFinalProj = False
# ONNXモデルの情報を表示
if slot.embChannels == 256 and slot.embOutputLayer == 9 and slot.useFinalProj is True:
print("[Voice Changer] ONNX Model: Official v1 like")
elif slot.embChannels == 768 and slot.embOutputLayer == 12 and slot.useFinalProj is False:
print("[Voice Changer] ONNX Model: Official v2 like")
else:
print(f"[Voice Changer] ONNX Model: ch:{slot.embChannels}, L:{slot.embOutputLayer}, FP:{slot.useFinalProj}")
if "embedder" not in metadata:
slot.embedder = "hubert_base"
else:
slot.embedder = metadata["embedder"]
slot.f0 = metadata["f0"]
slot.modelType = EnumInferenceTypes.onnxRVC.value if slot.f0 else EnumInferenceTypes.onnxRVCNono.value
slot.samplingRate = metadata["samplingRate"]
slot.deprecated = False
except Exception as e:
slot.modelType = EnumInferenceTypes.onnxRVC.value
slot.embChannels = 256
slot.embedder = "hubert_base"
slot.f0 = True
slot.samplingRate = 48000
slot.deprecated = True
print("[Voice Changer] setInfoByONNX", e)
print("[Voice Changer] ############## !!!! CAUTION !!!! ####################")
print("[Voice Changer] This onnxfie is depricated. Please regenerate onnxfile.")
print("[Voice Changer] ############## !!!! CAUTION !!!! ####################")
del tmp_onnx_session
return slot return slot

View File

@ -13,6 +13,7 @@ class DiffusionSVCSettings:
kStep: int = 20 kStep: int = 20
speedUp: int = 10 speedUp: int = 10
skipDiffusion: int = 1 # 0:off, 1:on
silenceFront: int = 1 # 0:off, 1:on silenceFront: int = 1 # 0:off, 1:on
modelSamplingRate: int = 44100 modelSamplingRate: int = 44100
@ -29,6 +30,7 @@ class DiffusionSVCSettings:
"kStep", "kStep",
"speedUp", "speedUp",
"silenceFront", "silenceFront",
"skipDiffusion",
] ]
floatData = ["silentThreshold"] floatData = ["silentThreshold"]
strData = ["f0Detector"] strData = ["f0Detector"]

View File

@ -112,25 +112,27 @@ class DiffusionSVCInferencer(Inferencer):
k_step: int, k_step: int,
infer_speedup: int, infer_speedup: int,
silence_front: float, silence_front: float,
skip_diffusion: bool = True,
) -> torch.Tensor: ) -> torch.Tensor:
with Timer("pre-process") as t: with Timer("pre-process", False) as t:
gt_spec = self.naive_model_call(feats, pitch, volume, spk_id=sid, spk_mix_dict=None, aug_shift=0, spk_emb=None) gt_spec = self.naive_model_call(feats, pitch, volume, spk_id=sid, spk_mix_dict=None, aug_shift=0, spk_emb=None)
# gt_spec = self.vocoder.extract(audio_t, 16000)
# gt_spec = torch.cat((gt_spec, gt_spec[:, -1:, :]), 1)
# print("[ ----Timer::1: ]", t.secs) # print("[ ----Timer::1: ]", t.secs)
with Timer("pre-process") as t: with Timer("pre-process", False) as t:
out_mel = self.__call__(feats, pitch, volume, spk_id=sid, spk_mix_dict=None, aug_shift=0, gt_spec=gt_spec, infer_speedup=infer_speedup, method='dpm-solver', k_step=k_step, use_tqdm=False, spk_emb=None) if skip_diffusion == 0:
out_mel = self.__call__(feats, pitch, volume, spk_id=sid, spk_mix_dict=None, aug_shift=0, gt_spec=gt_spec, infer_speedup=infer_speedup, method='dpm-solver', k_step=k_step, use_tqdm=False, spk_emb=None)
gt_spec = out_mel
# print("[ ----Timer::2: ]", t.secs) # print("[ ----Timer::2: ]", t.secs)
with Timer("pre-process") as t: # NOQA
with Timer("pre-process", False) as t: # NOQA
if self.vocoder_onnx is None: if self.vocoder_onnx is None:
start_frame = int(silence_front * self.vocoder.vocoder_sample_rate / self.vocoder.vocoder_hop_size) start_frame = int(silence_front * self.vocoder.vocoder_sample_rate / self.vocoder.vocoder_hop_size)
out_wav = self.mel2wav(out_mel, pitch, start_frame=start_frame) out_wav = self.mel2wav(gt_spec, pitch, start_frame=start_frame)
out_wav *= mask out_wav *= mask
else: else:
out_wav = self.vocoder_onnx.infer(out_mel, pitch, silence_front, mask) out_wav = self.vocoder_onnx.infer(gt_spec, pitch, silence_front, mask)
# print("[ ----Timer::3: ]", t.secs) # print("[ ----Timer::3: ]", t.secs)
return out_wav.squeeze() return out_wav.squeeze()

View File

@ -21,11 +21,16 @@ class Inferencer(Protocol):
def infer( def infer(
self, self,
audio_t: torch.Tensor,
feats: torch.Tensor, feats: torch.Tensor,
pitch_length: torch.Tensor, pitch: torch.Tensor,
pitch: torch.Tensor | None, volume: torch.Tensor,
pitchf: torch.Tensor | None, mask: torch.Tensor,
sid: torch.Tensor, sid: torch.Tensor,
k_step: int,
infer_speedup: int,
silence_front: float,
skip_diffusion: bool = True,
) -> torch.Tensor: ) -> torch.Tensor:
... ...

View File

@ -81,23 +81,6 @@ class Pipeline(object):
@torch.no_grad() @torch.no_grad()
def extract_volume_and_mask(self, audio: torch.Tensor, threshold: float): def extract_volume_and_mask(self, audio: torch.Tensor, threshold: float):
'''
with Timer("[VolumeExt np]") as t:
for i in range(100):
volume = self.volumeExtractor.extract(audio)
time_np = t.secs
with Timer("[VolumeExt pt]") as t:
for i in range(100):
volume_t = self.volumeExtractor.extract_t(audio)
time_pt = t.secs
print("[Volume np]:", volume)
print("[Volume pt]:", volume_t)
print("[Perform]:", time_np, time_pt)
# -> [Perform]: 0.030178070068359375 0.005780220031738281 (RTX4090)
# -> [Perform]: 0.029046058654785156 0.0025115013122558594 (CPU i9 13900KF)
# ---> これくらいの処理ならCPU上のTorchでやった方が早い
'''
volume_t = self.volumeExtractor.extract_t(audio) volume_t = self.volumeExtractor.extract_t(audio)
mask = self.volumeExtractor.get_mask_from_volume_t(volume_t, self.inferencer_block_size, threshold=threshold) mask = self.volumeExtractor.get_mask_from_volume_t(volume_t, self.inferencer_block_size, threshold=threshold)
volume = volume_t.unsqueeze(-1).unsqueeze(0) volume = volume_t.unsqueeze(-1).unsqueeze(0)
@ -116,10 +99,11 @@ class Pipeline(object):
silence_front, silence_front,
embOutputLayer, embOutputLayer,
useFinalProj, useFinalProj,
protect=0.5 protect=0.5,
skip_diffusion=True,
): ):
# print("---------- pipe line --------------------") # print("---------- pipe line --------------------")
with Timer("pre-process") as t: with Timer("pre-process", False) as t:
audio_t = torch.from_numpy(audio).float().unsqueeze(0).to(self.device) audio_t = torch.from_numpy(audio).float().unsqueeze(0).to(self.device)
audio16k = self.resamplerIn(audio_t) audio16k = self.resamplerIn(audio_t)
volume, mask = self.extract_volume_and_mask(audio16k, threshold=-60.0) volume, mask = self.extract_volume_and_mask(audio16k, threshold=-60.0)
@ -127,7 +111,7 @@ class Pipeline(object):
n_frames = int(audio16k.size(-1) // self.hop_size + 1) n_frames = int(audio16k.size(-1) // self.hop_size + 1)
# print("[Timer::1: ]", t.secs) # print("[Timer::1: ]", t.secs)
with Timer("pre-process") as t: with Timer("pre-process", False) as t:
# ピッチ検出 # ピッチ検出
try: try:
# pitch = self.pitchExtractor.extract( # pitch = self.pitchExtractor.extract(
@ -157,7 +141,7 @@ class Pipeline(object):
feats = feats.view(1, -1) feats = feats.view(1, -1)
# print("[Timer::2: ]", t.secs) # print("[Timer::2: ]", t.secs)
with Timer("pre-process") as t: with Timer("pre-process", False) as t:
# embedding # embedding
with autocast(enabled=self.isHalf): with autocast(enabled=self.isHalf):
@ -175,7 +159,7 @@ class Pipeline(object):
feats = F.interpolate(feats.permute(0, 2, 1), size=int(n_frames), mode='nearest').permute(0, 2, 1) feats = F.interpolate(feats.permute(0, 2, 1), size=int(n_frames), mode='nearest').permute(0, 2, 1)
# print("[Timer::3: ]", t.secs) # print("[Timer::3: ]", t.secs)
with Timer("pre-process") as t: with Timer("pre-process", False) as t:
# 推論実行 # 推論実行
try: try:
with torch.no_grad(): with torch.no_grad():
@ -191,7 +175,8 @@ class Pipeline(object):
sid, sid,
k_step, k_step,
infer_speedup, infer_speedup,
silence_front=silence_front silence_front=silence_front,
skip_diffusion=skip_diffusion
).to(dtype=torch.float32), ).to(dtype=torch.float32),
-1.0, -1.0,
1.0, 1.0,
@ -206,7 +191,7 @@ class Pipeline(object):
raise e raise e
# print("[Timer::4: ]", t.secs) # print("[Timer::4: ]", t.secs)
with Timer("pre-process") as t: # NOQA with Timer("pre-process", False) as t: # NOQA
feats_buffer = feats.squeeze(0).detach().cpu() feats_buffer = feats.squeeze(0).detach().cpu()
if pitch is not None: if pitch is not None:
pitch_buffer = pitch.squeeze(0).detach().cpu() pitch_buffer = pitch.squeeze(0).detach().cpu()

View File

@ -7,19 +7,23 @@ from voice_changer.DiffusionSVC.pitchExtractor.PitchExtractorManager import Pitc
from voice_changer.RVC.deviceManager.DeviceManager import DeviceManager from voice_changer.RVC.deviceManager.DeviceManager import DeviceManager
from voice_changer.RVC.embedder.EmbedderManager import EmbedderManager from voice_changer.RVC.embedder.EmbedderManager import EmbedderManager
import os
import torch import torch
from torchaudio.transforms import Resample from torchaudio.transforms import Resample
from voice_changer.VoiceChangerParamsManager import VoiceChangerParamsManager
def createPipeline(modelSlot: DiffusionSVCModelSlot, gpu: int, f0Detector: str, inputSampleRate: int, outputSampleRate: int): def createPipeline(modelSlot: DiffusionSVCModelSlot, gpu: int, f0Detector: str, inputSampleRate: int, outputSampleRate: int):
dev = DeviceManager.get_instance().getDevice(gpu) dev = DeviceManager.get_instance().getDevice(gpu)
vcparams = VoiceChangerParamsManager.get_instance().params
# half = DeviceManager.get_instance().halfPrecisionAvailable(gpu) # half = DeviceManager.get_instance().halfPrecisionAvailable(gpu)
half = False half = False
# Inferencer 生成 # Inferencer 生成
try: try:
inferencer = InferencerManager.getInferencer(modelSlot.modelType, modelSlot.modelFile, gpu) modelPath = os.path.join(vcparams.model_dir, str(modelSlot.slotIndex), os.path.basename(modelSlot.modelFile))
inferencer = InferencerManager.getInferencer(modelSlot.modelType, modelPath, gpu)
except Exception as e: except Exception as e:
print("[Voice Changer] exception! loading inferencer", e) print("[Voice Changer] exception! loading inferencer", e)
traceback.print_exc() traceback.print_exc()

View File

@ -20,6 +20,13 @@ AudioDeviceKind: TypeAlias = Literal["input", "output"]
logger = VoiceChangaerLogger.get_instance().getLogger() logger = VoiceChangaerLogger.get_instance().getLogger()
# See https://github.com/w-okada/voice-changer/issues/620
LocalServerDeviceMode: TypeAlias = Literal[
"NoMonitorSeparate",
"WithMonitorStandard",
"WithMonitorAllSeparate",
]
@dataclass @dataclass
class ServerDeviceSettings: class ServerDeviceSettings:
@ -39,6 +46,7 @@ class ServerDeviceSettings:
serverReadChunkSize: int = 256 serverReadChunkSize: int = 256
serverInputAudioGain: float = 1.0 serverInputAudioGain: float = 1.0
serverOutputAudioGain: float = 1.0 serverOutputAudioGain: float = 1.0
serverMonitorAudioGain: float = 1.0
exclusiveMode: bool = False exclusiveMode: bool = False
@ -59,6 +67,7 @@ EditableServerDeviceSettings = {
"floatData": [ "floatData": [
"serverInputAudioGain", "serverInputAudioGain",
"serverOutputAudioGain", "serverOutputAudioGain",
"serverMonitorAudioGain",
], ],
"boolData": [ "boolData": [
"exclusiveMode" "exclusiveMode"
@ -95,6 +104,14 @@ class ServerDevice:
self.monQueue = Queue() self.monQueue = Queue()
self.performance = [] self.performance = []
# setting change確認用
self.currentServerInputDeviceId = -1
self.currentServerOutputDeviceId = -1
self.currentServerMonitorDeviceId = -1
self.currentModelSamplingRate = -1
self.currentInputChunkNum = -1
self.currentAudioSampleRate = -1
def getServerInputAudioDevice(self, index: int): def getServerInputAudioDevice(self, index: int):
audioinput, _audiooutput = list_audio_device() audioinput, _audiooutput = list_audio_device()
serverAudioDevice = [x for x in audioinput if x.index == index] serverAudioDevice = [x for x in audioinput if x.index == index]
@ -111,36 +128,51 @@ class ServerDevice:
else: else:
return None return None
def audio_callback(self, indata: np.ndarray, outdata: np.ndarray, frames, times, status): ###########################################
# Callback Section
###########################################
def _processData(self, indata: np.ndarray):
indata = indata * self.settings.serverInputAudioGain
unpackedData = librosa.to_mono(indata.T) * 32768.0
unpackedData = unpackedData.astype(np.int16)
out_wav, times = self.serverDeviceCallbacks.on_request(unpackedData)
return out_wav, times
def _processDataWithTime(self, indata: np.ndarray):
with Timer("all_inference_time") as t:
out_wav, times = self._processData(indata)
all_inference_time = t.secs
self.performance = [all_inference_time] + times
self.serverDeviceCallbacks.emitTo(self.performance)
self.performance = [round(x * 1000) for x in self.performance]
return out_wav
def audio_callback_outQueue(self, indata: np.ndarray, outdata: np.ndarray, frames, times, status):
try: try:
indata = indata * self.settings.serverInputAudioGain out_wav = self._processDataWithTime(indata)
with Timer("all_inference_time") as t:
unpackedData = librosa.to_mono(indata.T) * 32768.0 self.outQueue.put(out_wav)
unpackedData = unpackedData.astype(np.int16) outputChannels = outdata.shape[1] # Monitorへのアウトプット
out_wav, times = self.serverDeviceCallbacks.on_request(unpackedData) outdata[:] = np.repeat(out_wav, outputChannels).reshape(-1, outputChannels) / 32768.0
outputChannels = outdata.shape[1] outdata[:] = outdata * self.settings.serverMonitorAudioGain
outdata[:] = np.repeat(out_wav, outputChannels).reshape(-1, outputChannels) / 32768.0
outdata[:] = outdata * self.settings.serverOutputAudioGain
all_inference_time = t.secs
self.performance = [all_inference_time] + times
self.serverDeviceCallbacks.emitTo(self.performance)
self.performance = [round(x * 1000) for x in self.performance]
except Exception as e: except Exception as e:
print("[Voice Changer] ex:", e) print("[Voice Changer] ex:", e)
def audioInput_callback(self, indata: np.ndarray, frames, times, status): def audioInput_callback_outQueue(self, indata: np.ndarray, frames, times, status):
try: try:
indata = indata * self.settings.serverInputAudioGain out_wav = self._processDataWithTime(indata)
with Timer("all_inference_time") as t: self.outQueue.put(out_wav)
unpackedData = librosa.to_mono(indata.T) * 32768.0 except Exception as e:
unpackedData = unpackedData.astype(np.int16) print("[Voice Changer][ServerDevice][audioInput_callback] ex:", e)
out_wav, times = self.serverDeviceCallbacks.on_request(unpackedData) # import traceback
self.outQueue.put(out_wav) # traceback.print_exc()
self.monQueue.put(out_wav)
all_inference_time = t.secs def audioInput_callback_outQueue_monQueue(self, indata: np.ndarray, frames, times, status):
self.performance = [all_inference_time] + times try:
self.serverDeviceCallbacks.emitTo(self.performance) out_wav = self._processDataWithTime(indata)
self.performance = [round(x * 1000) for x in self.performance] self.outQueue.put(out_wav)
self.monQueue.put(out_wav)
except Exception as e: except Exception as e:
print("[Voice Changer][ServerDevice][audioInput_callback] ex:", e) print("[Voice Changer][ServerDevice][audioInput_callback] ex:", e)
# import traceback # import traceback
@ -166,15 +198,144 @@ class ServerDevice:
self.monQueue.get() self.monQueue.get()
outputChannels = outdata.shape[1] outputChannels = outdata.shape[1]
outdata[:] = np.repeat(mon_wav, outputChannels).reshape(-1, outputChannels) / 32768.0 outdata[:] = np.repeat(mon_wav, outputChannels).reshape(-1, outputChannels) / 32768.0
outdata[:] = outdata * self.settings.serverOutputAudioGain # GainはOutputのものをを流用 outdata[:] = outdata * self.settings.serverMonitorAudioGain
# Monitorモードが有効の場合はサンプリングレートはmonitorデバイスが優先されているためリサンプリング不要
except Exception as e: except Exception as e:
print("[Voice Changer][ServerDevice][audioMonitor_callback] ex:", e) print("[Voice Changer][ServerDevice][audioMonitor_callback] ex:", e)
# import traceback # import traceback
# traceback.print_exc() # traceback.print_exc()
###########################################
# Main Loop Section
###########################################
def checkSettingChanged(self):
if self.settings.serverAudioStated != 1:
print(f"serverAudioStarted Changed: {self.settings.serverAudioStated}")
return True
elif self.currentServerInputDeviceId != self.settings.serverInputDeviceId:
print(f"serverInputDeviceId Changed: {self.currentServerInputDeviceId} -> {self.settings.serverInputDeviceId}")
return True
elif self.currentServerOutputDeviceId != self.settings.serverOutputDeviceId:
print(f"serverOutputDeviceId Changed: {self.currentServerOutputDeviceId} -> {self.settings.serverOutputDeviceId}")
return True
elif self.currentServerMonitorDeviceId != self.settings.serverMonitorDeviceId:
print(f"serverMonitorDeviceId Changed: {self.currentServerMonitorDeviceId} -> {self.settings.serverMonitorDeviceId}")
return True
elif self.currentModelSamplingRate != self.serverDeviceCallbacks.get_processing_sampling_rate():
print(f"currentModelSamplingRate Changed: {self.currentModelSamplingRate} -> {self.serverDeviceCallbacks.get_processing_sampling_rate()}")
return True
elif self.currentInputChunkNum != self.settings.serverReadChunkSize:
print(f"currentInputChunkNum Changed: {self.currentInputChunkNum} -> {self.settings.serverReadChunkSize}")
return True
elif self.currentAudioSampleRate != self.settings.serverAudioSampleRate:
print(f"currentAudioSampleRate Changed: {self.currentAudioSampleRate} -> {self.settings.serverAudioSampleRate}")
return True
else:
return False
def runNoMonitorSeparate(self, block_frame: int, inputMaxChannel: int, outputMaxChannel: int, inputExtraSetting, outputExtraSetting):
with sd.InputStream(
callback=self.audioInput_callback_outQueue,
dtype="float32",
device=self.settings.serverInputDeviceId,
blocksize=block_frame,
samplerate=self.settings.serverInputAudioSampleRate,
channels=inputMaxChannel,
extra_settings=inputExtraSetting
):
with sd.OutputStream(
callback=self.audioOutput_callback,
dtype="float32",
device=self.settings.serverOutputDeviceId,
blocksize=block_frame,
samplerate=self.settings.serverOutputAudioSampleRate,
channels=outputMaxChannel,
extra_settings=outputExtraSetting
):
while True:
changed = self.checkSettingChanged()
if changed:
break
time.sleep(2)
print(f"[Voice Changer] server audio performance {self.performance}")
print(f" status: started:{self.settings.serverAudioStated}, model_sr:{self.currentModelSamplingRate}, chunk:{self.currentInputChunkNum}")
print(f" input : id:{self.settings.serverInputDeviceId}, sr:{self.settings.serverInputAudioSampleRate}, ch:{inputMaxChannel}")
print(f" output : id:{self.settings.serverOutputDeviceId}, sr:{self.settings.serverOutputAudioSampleRate}, ch:{outputMaxChannel}")
# print(f" monitor: id:{self.settings.serverMonitorDeviceId}, sr:{self.settings.serverMonitorAudioSampleRate}, ch:{self.serverMonitorAudioDevice.maxOutputChannels}")
def runWithMonitorStandard(self, block_frame: int, inputMaxChannel: int, outputMaxChannel: int, monitorMaxChannel: int, inputExtraSetting, outputExtraSetting, monitorExtraSetting):
with sd.Stream(
callback=self.audio_callback_outQueue,
dtype="float32",
device=(self.settings.serverInputDeviceId, self.settings.serverMonitorDeviceId),
blocksize=block_frame,
samplerate=self.settings.serverInputAudioSampleRate,
channels=(inputMaxChannel, monitorMaxChannel),
extra_settings=[inputExtraSetting, monitorExtraSetting]
):
with sd.OutputStream(
callback=self.audioOutput_callback,
dtype="float32",
device=self.settings.serverOutputDeviceId,
blocksize=block_frame,
samplerate=self.settings.serverOutputAudioSampleRate,
channels=outputMaxChannel,
extra_settings=outputExtraSetting
):
while True:
changed = self.checkSettingChanged()
if changed:
break
time.sleep(2)
print(f"[Voice Changer] server audio performance {self.performance}")
print(f" status: started:{self.settings.serverAudioStated}, model_sr:{self.currentModelSamplingRate}, chunk:{self.currentInputChunkNum}")
print(f" input : id:{self.settings.serverInputDeviceId}, sr:{self.settings.serverInputAudioSampleRate}, ch:{inputMaxChannel}")
print(f" output : id:{self.settings.serverOutputDeviceId}, sr:{self.settings.serverOutputAudioSampleRate}, ch:{outputMaxChannel}")
print(f" monitor: id:{self.settings.serverMonitorDeviceId}, sr:{self.settings.serverMonitorAudioSampleRate}, ch:{monitorMaxChannel}")
def runWithMonitorAllSeparate(self, block_frame: int, inputMaxChannel: int, outputMaxChannel: int, monitorMaxChannel: int, inputExtraSetting, outputExtraSetting, monitorExtraSetting):
with sd.InputStream(
callback=self.audioInput_callback_outQueue_monQueue,
dtype="float32",
device=self.settings.serverInputDeviceId,
blocksize=block_frame,
samplerate=self.settings.serverInputAudioSampleRate,
channels=inputMaxChannel,
extra_settings=inputExtraSetting
):
with sd.OutputStream(
callback=self.audioOutput_callback,
dtype="float32",
device=self.settings.serverOutputDeviceId,
blocksize=block_frame,
samplerate=self.settings.serverOutputAudioSampleRate,
channels=outputMaxChannel,
extra_settings=outputExtraSetting
):
with sd.OutputStream(
callback=self.audioMonitor_callback,
dtype="float32",
device=self.settings.serverMonitorDeviceId,
blocksize=block_frame,
samplerate=self.settings.serverMonitorAudioSampleRate,
channels=monitorMaxChannel,
extra_settings=monitorExtraSetting
):
while True:
changed = self.checkSettingChanged()
if changed:
break
time.sleep(2)
print(f"[Voice Changer] server audio performance {self.performance}")
print(f" status: started:{self.settings.serverAudioStated}, model_sr:{self.currentModelSamplingRate}, chunk:{self.currentInputChunkNum}")
print(f" input : id:{self.settings.serverInputDeviceId}, sr:{self.settings.serverInputAudioSampleRate}, ch:{inputMaxChannel}")
print(f" output : id:{self.settings.serverOutputDeviceId}, sr:{self.settings.serverOutputAudioSampleRate}, ch:{outputMaxChannel}")
print(f" monitor: id:{self.settings.serverMonitorDeviceId}, sr:{self.settings.serverMonitorAudioSampleRate}, ch:{monitorMaxChannel}")
###########################################
# Start Section
###########################################
def start(self): def start(self):
currentModelSamplingRate = -1 self.currentModelSamplingRate = -1
while True: while True:
if self.settings.serverAudioStated == 0 or self.settings.serverInputDeviceId == -1: if self.settings.serverAudioStated == 0 or self.settings.serverInputDeviceId == -1:
time.sleep(2) time.sleep(2)
@ -183,9 +344,9 @@ class ServerDevice:
sd._initialize() sd._initialize()
# Curret Device ID # Curret Device ID
currentServerInputDeviceId = self.settings.serverInputDeviceId self.currentServerInputDeviceId = self.settings.serverInputDeviceId
currentServerOutputDeviceId = self.settings.serverOutputDeviceId self.currentServerOutputDeviceId = self.settings.serverOutputDeviceId
currentServerMonitorDeviceId = self.settings.serverMonitorDeviceId self.currentServerMonitorDeviceId = self.settings.serverMonitorDeviceId
# Device 特定 # Device 特定
serverInputAudioDevice = self.getServerInputAudioDevice(self.settings.serverInputDeviceId) serverInputAudioDevice = self.getServerInputAudioDevice(self.settings.serverInputDeviceId)
@ -220,17 +381,17 @@ class ServerDevice:
# サンプリングレート # サンプリングレート
# 同一サンプリングレートに統一(変換時にサンプルが不足する場合があるため。パディング方法が明らかになれば、それぞれ設定できるかも) # 同一サンプリングレートに統一(変換時にサンプルが不足する場合があるため。パディング方法が明らかになれば、それぞれ設定できるかも)
currentAudioSampleRate = self.settings.serverAudioSampleRate self.currentAudioSampleRate = self.settings.serverAudioSampleRate
try: try:
currentModelSamplingRate = self.serverDeviceCallbacks.get_processing_sampling_rate() self.currentModelSamplingRate = self.serverDeviceCallbacks.get_processing_sampling_rate()
except Exception as e: except Exception as e:
print("[Voice Changer] ex: get_processing_sampling_rate", e) print("[Voice Changer] ex: get_processing_sampling_rate", e)
time.sleep(2) time.sleep(2)
continue continue
self.settings.serverInputAudioSampleRate = currentAudioSampleRate self.settings.serverInputAudioSampleRate = self.currentAudioSampleRate
self.settings.serverOutputAudioSampleRate = currentAudioSampleRate self.settings.serverOutputAudioSampleRate = self.currentAudioSampleRate
self.settings.serverMonitorAudioSampleRate = currentAudioSampleRate self.settings.serverMonitorAudioSampleRate = self.currentAudioSampleRate
# Sample Rate Check # Sample Rate Check
inputAudioSampleRateAvailable = checkSamplingRate(self.settings.serverInputDeviceId, self.settings.serverInputAudioSampleRate, "input") inputAudioSampleRateAvailable = checkSamplingRate(self.settings.serverInputDeviceId, self.settings.serverInputAudioSampleRate, "input")
@ -238,7 +399,7 @@ class ServerDevice:
monitorAudioSampleRateAvailable = checkSamplingRate(self.settings.serverMonitorDeviceId, self.settings.serverMonitorAudioSampleRate, "output") if serverMonitorAudioDevice else True monitorAudioSampleRateAvailable = checkSamplingRate(self.settings.serverMonitorDeviceId, self.settings.serverMonitorAudioSampleRate, "output") if serverMonitorAudioDevice else True
print("Sample Rate:") print("Sample Rate:")
print(f" [Model]: {currentModelSamplingRate}") print(f" [Model]: {self.currentModelSamplingRate}")
print(f" [Input]: {self.settings.serverInputAudioSampleRate} -> {inputAudioSampleRateAvailable}") print(f" [Input]: {self.settings.serverInputAudioSampleRate} -> {inputAudioSampleRateAvailable}")
print(f" [Output]: {self.settings.serverOutputAudioSampleRate} -> {outputAudioSampleRateAvailable}") print(f" [Output]: {self.settings.serverOutputAudioSampleRate} -> {outputAudioSampleRateAvailable}")
if serverMonitorAudioDevice is not None: if serverMonitorAudioDevice is not None:
@ -274,153 +435,51 @@ class ServerDevice:
self.serverDeviceCallbacks.setOutputSamplingRate(self.settings.serverOutputAudioSampleRate) self.serverDeviceCallbacks.setOutputSamplingRate(self.settings.serverOutputAudioSampleRate)
# Blockサイズを計算 # Blockサイズを計算
currentInputChunkNum = self.settings.serverReadChunkSize self.currentInputChunkNum = self.settings.serverReadChunkSize
# block_frame = currentInputChunkNum * 128 # block_frame = currentInputChunkNum * 128
block_frame = int(currentInputChunkNum * 128 * (self.settings.serverInputAudioSampleRate / 48000)) block_frame = int(self.currentInputChunkNum * 128 * (self.settings.serverInputAudioSampleRate / 48000))
sd.default.blocksize = block_frame sd.default.blocksize = block_frame
# main loop # main loop
try: try:
with sd.InputStream( # See https://github.com/w-okada/voice-changer/issues/620
callback=self.audioInput_callback, def judgeServerDeviceMode() -> LocalServerDeviceMode:
dtype="float32", if self.settings.serverMonitorDeviceId == -1:
device=self.settings.serverInputDeviceId, return "NoMonitorSeparate"
blocksize=block_frame, else:
samplerate=self.settings.serverInputAudioSampleRate, if serverInputAudioDevice.hostAPI == serverOutputAudioDevice.hostAPI and serverInputAudioDevice.hostAPI == serverMonitorAudioDevice.hostAPI: # すべて同じ
channels=serverInputAudioDevice.maxInputChannels, return "WithMonitorStandard"
extra_settings=inputExtraSetting elif serverInputAudioDevice.hostAPI != serverOutputAudioDevice.hostAPI and serverInputAudioDevice.hostAPI != serverMonitorAudioDevice.hostAPI and serverOutputAudioDevice.hostAPI != serverMonitorAudioDevice.hostAPI: # すべて違う
): return "WithMonitorAllSeparate"
with sd.OutputStream( elif serverInputAudioDevice.hostAPI == serverOutputAudioDevice.hostAPI: # in/outだけが同じ
callback=self.audioOutput_callback, return "WithMonitorAllSeparate"
dtype="float32", elif serverInputAudioDevice.hostAPI == serverMonitorAudioDevice.hostAPI: # in/monだけが同じ
device=self.settings.serverOutputDeviceId, return "WithMonitorStandard"
blocksize=block_frame, elif serverOutputAudioDevice.hostAPI == serverMonitorAudioDevice.hostAPI: # out/monだけが同じ
samplerate=self.settings.serverOutputAudioSampleRate, return "WithMonitorAllSeparate"
channels=serverOutputAudioDevice.maxOutputChannels,
extra_settings=outputExtraSetting
):
if self.settings.serverMonitorDeviceId != -1:
with sd.OutputStream(
callback=self.audioMonitor_callback,
dtype="float32",
device=self.settings.serverMonitorDeviceId,
blocksize=block_frame,
samplerate=self.settings.serverMonitorAudioSampleRate,
channels=serverMonitorAudioDevice.maxOutputChannels,
extra_settings=monitorExtraSetting
):
while (
self.settings.serverAudioStated == 1 and
currentServerInputDeviceId == self.settings.serverInputDeviceId and
currentServerOutputDeviceId == self.settings.serverOutputDeviceId and
currentServerMonitorDeviceId == self.settings.serverMonitorDeviceId and
currentModelSamplingRate == self.serverDeviceCallbacks.get_processing_sampling_rate() and
currentInputChunkNum == self.settings.serverReadChunkSize and
currentAudioSampleRate == self.settings.serverAudioSampleRate
):
time.sleep(2)
print(f"[Voice Changer] server audio performance {self.performance}")
print(f" status: started:{self.settings.serverAudioStated}, model_sr:{currentModelSamplingRate}, chunk:{currentInputChunkNum}")
print(f" input : id:{self.settings.serverInputDeviceId}, sr:{self.settings.serverInputAudioSampleRate}, ch:{serverInputAudioDevice.maxInputChannels}")
print(f" output : id:{self.settings.serverOutputDeviceId}, sr:{self.settings.serverOutputAudioSampleRate}, ch:{serverOutputAudioDevice.maxOutputChannels}")
print(f" monitor: id:{self.settings.serverMonitorDeviceId}, sr:{self.settings.serverMonitorAudioSampleRate}, ch:{serverMonitorAudioDevice.maxOutputChannels}")
else: else:
while ( raise RuntimeError(f"Cannot JudgeServerMode, in:{serverInputAudioDevice.hostAPI}, mon:{serverMonitorAudioDevice.hostAPI}, out:{serverOutputAudioDevice.hostAPI}")
self.settings.serverAudioStated == 1 and
currentServerInputDeviceId == self.settings.serverInputDeviceId and serverDeviceMode = judgeServerDeviceMode()
currentServerOutputDeviceId == self.settings.serverOutputDeviceId and if serverDeviceMode == "NoMonitorSeparate":
currentServerMonitorDeviceId == self.settings.serverMonitorDeviceId and self.runNoMonitorSeparate(block_frame, serverInputAudioDevice.maxInputChannels, serverOutputAudioDevice.maxOutputChannels, inputExtraSetting, outputExtraSetting)
currentModelSamplingRate == self.serverDeviceCallbacks.get_processing_sampling_rate() and elif serverDeviceMode == "WithMonitorStandard":
currentInputChunkNum == self.settings.serverReadChunkSize and self.runWithMonitorStandard(block_frame, serverInputAudioDevice.maxInputChannels, serverOutputAudioDevice.maxOutputChannels, serverMonitorAudioDevice.maxOutputChannels, inputExtraSetting, outputExtraSetting, monitorExtraSetting)
currentAudioSampleRate == self.settings.serverAudioSampleRate elif serverDeviceMode == "WithMonitorAllSeparate":
): self.runWithMonitorAllSeparate(block_frame, serverInputAudioDevice.maxInputChannels, serverOutputAudioDevice.maxOutputChannels, serverMonitorAudioDevice.maxOutputChannels, inputExtraSetting, outputExtraSetting, monitorExtraSetting)
time.sleep(2) else:
print(f"[Voice Changer] server audio performance {self.performance}") raise RuntimeError(f"Unknown ServerDeviceMode: {serverDeviceMode}")
print(f" status: started:{self.settings.serverAudioStated}, model_sr:{currentModelSamplingRate}, chunk:{currentInputChunkNum}]")
print(f" input : id:{self.settings.serverInputDeviceId}, sr:{self.settings.serverInputAudioSampleRate}, ch:{serverInputAudioDevice.maxInputChannels}")
print(f" output : id:{self.settings.serverOutputDeviceId}, sr:{self.settings.serverOutputAudioSampleRate}, ch:{serverOutputAudioDevice.maxOutputChannels}")
except Exception as e: except Exception as e:
print("[Voice Changer] processing, ex:", e) print("[Voice Changer] processing, ex:", e)
import traceback
traceback.print_exc()
time.sleep(2) time.sleep(2)
def start2(self): ###########################################
# currentInputDeviceId = -1 # Info Section
# currentOutputDeviceId = -1 ###########################################
# currentInputChunkNum = -1
currentModelSamplingRate = -1
while True:
if self.settings.serverAudioStated == 0 or self.settings.serverInputDeviceId == -1:
time.sleep(2)
else:
sd._terminate()
sd._initialize()
sd.default.device[0] = self.settings.serverInputDeviceId
sd.default.device[1] = self.settings.serverOutputDeviceId
serverInputAudioDevice = self.getServerInputAudioDevice(sd.default.device[0])
serverOutputAudioDevice = self.getServerOutputAudioDevice(sd.default.device[1])
print("Devices:", serverInputAudioDevice, serverOutputAudioDevice)
if serverInputAudioDevice is None or serverOutputAudioDevice is None:
time.sleep(2)
print("serverInputAudioDevice or serverOutputAudioDevice is None")
continue
sd.default.channels[0] = serverInputAudioDevice.maxInputChannels
sd.default.channels[1] = serverOutputAudioDevice.maxOutputChannels
currentInputChunkNum = self.settings.serverReadChunkSize
block_frame = currentInputChunkNum * 128
# sample rate precheck(alsa cannot use 40000?)
try:
currentModelSamplingRate = self.serverDeviceCallbacks.get_processing_sampling_rate()
except Exception as e:
print("[Voice Changer] ex: get_processing_sampling_rate", e)
continue
try:
with sd.Stream(
callback=self.audio_callback,
blocksize=block_frame,
# samplerate=currentModelSamplingRate,
dtype="float32",
# dtype="int16",
# channels=[currentInputChannelNum, currentOutputChannelNum],
):
pass
self.settings.serverInputAudioSampleRate = currentModelSamplingRate
self.serverDeviceCallbacks.setInputSamplingRate(currentModelSamplingRate)
self.serverDeviceCallbacks.setOutputSamplingRate(currentModelSamplingRate)
print(f"[Voice Changer] sample rate {self.settings.serverInputAudioSampleRate}")
except Exception as e:
print("[Voice Changer] ex: fallback to device default samplerate", e)
print("[Voice Changer] device default samplerate", serverInputAudioDevice.default_samplerate)
self.settings.serverInputAudioSampleRate = round(serverInputAudioDevice.default_samplerate)
self.serverDeviceCallbacks.setInputSamplingRate(round(serverInputAudioDevice.default_samplerate))
self.serverDeviceCallbacks.setOutputSamplingRate(round(serverInputAudioDevice.default_samplerate))
sd.default.samplerate = self.settings.serverInputAudioSampleRate
sd.default.blocksize = block_frame
# main loop
try:
with sd.Stream(
callback=self.audio_callback,
# blocksize=block_frame,
# samplerate=vc.settings.serverInputAudioSampleRate,
dtype="float32",
# dtype="int16",
# channels=[currentInputChannelNum, currentOutputChannelNum],
):
while self.settings.serverAudioStated == 1 and sd.default.device[0] == self.settings.serverInputDeviceId and sd.default.device[1] == self.settings.serverOutputDeviceId and currentModelSamplingRate == self.serverDeviceCallbacks.get_processing_sampling_rate() and currentInputChunkNum == self.settings.serverReadChunkSize:
time.sleep(2)
print("[Voice Changer] server audio", self.performance)
print(f"[Voice Changer] started:{self.settings.serverAudioStated}, input:{sd.default.device[0]}, output:{sd.default.device[1]}, mic_sr:{self.settings.serverInputAudioSampleRate}, model_sr:{currentModelSamplingRate}, chunk:{currentInputChunkNum}, ch:[{sd.default.channels}]")
except Exception as e:
print("[Voice Changer] ex:", e)
time.sleep(2)
def get_info(self): def get_info(self):
data = asdict(self.settings) data = asdict(self.settings)
try: try:

View File

@ -1,6 +1,7 @@
import sys import sys
import os import os
from data.ModelSlot import MMVCv13ModelSlot from data.ModelSlot import MMVCv13ModelSlot
from voice_changer.VoiceChangerParamsManager import VoiceChangerParamsManager
from voice_changer.utils.VoiceChangerModel import AudioInOut from voice_changer.utils.VoiceChangerModel import AudioInOut
@ -63,19 +64,22 @@ class MMVCv13:
def initialize(self): def initialize(self):
print("[Voice Changer] [MMVCv13] Initializing... ") print("[Voice Changer] [MMVCv13] Initializing... ")
vcparams = VoiceChangerParamsManager.get_instance().params
configPath = os.path.join(vcparams.model_dir, str(self.slotInfo.slotIndex), self.slotInfo.configFile)
modelPath = os.path.join(vcparams.model_dir, str(self.slotInfo.slotIndex), self.slotInfo.modelFile)
self.hps = get_hparams_from_file(self.slotInfo.configFile) self.hps = get_hparams_from_file(configPath)
if self.slotInfo.isONNX: if self.slotInfo.isONNX:
providers, options = self.getOnnxExecutionProvider() providers, options = self.getOnnxExecutionProvider()
self.onnx_session = onnxruntime.InferenceSession( self.onnx_session = onnxruntime.InferenceSession(
self.slotInfo.modelFile, modelPath,
providers=providers, providers=providers,
provider_options=options, provider_options=options,
) )
else: else:
self.net_g = SynthesizerTrn(len(symbols), self.hps.data.filter_length // 2 + 1, self.hps.train.segment_size // self.hps.data.hop_length, n_speakers=self.hps.data.n_speakers, **self.hps.model) self.net_g = SynthesizerTrn(len(symbols), self.hps.data.filter_length // 2 + 1, self.hps.train.segment_size // self.hps.data.hop_length, n_speakers=self.hps.data.n_speakers, **self.hps.model)
self.net_g.eval() self.net_g.eval()
load_checkpoint(self.slotInfo.modelFile, self.net_g, None) load_checkpoint(modelPath, self.net_g, None)
# その他の設定 # その他の設定
self.settings.srcId = self.slotInfo.srcId self.settings.srcId = self.slotInfo.srcId
@ -105,8 +109,10 @@ class MMVCv13:
if key == "gpu" and self.slotInfo.isONNX: if key == "gpu" and self.slotInfo.isONNX:
providers, options = self.getOnnxExecutionProvider() providers, options = self.getOnnxExecutionProvider()
vcparams = VoiceChangerParamsManager.get_instance().params
modelPath = os.path.join(vcparams.model_dir, str(self.slotInfo.slotIndex), self.slotInfo.modelFile)
self.onnx_session = onnxruntime.InferenceSession( self.onnx_session = onnxruntime.InferenceSession(
self.slotInfo.modelFile, modelPath,
providers=providers, providers=providers,
provider_options=options, provider_options=options,
) )
@ -249,3 +255,15 @@ class MMVCv13:
sys.modules.pop(key) sys.modules.pop(key)
except: # NOQA except: # NOQA
pass pass
def get_model_current(self):
return [
{
"key": "srcId",
"val": self.settings.srcId,
},
{
"key": "dstId",
"val": self.settings.dstId,
}
]

View File

@ -1,6 +1,7 @@
import sys import sys
import os import os
from data.ModelSlot import MMVCv15ModelSlot from data.ModelSlot import MMVCv15ModelSlot
from voice_changer.VoiceChangerParamsManager import VoiceChangerParamsManager
from voice_changer.utils.VoiceChangerModel import AudioInOut from voice_changer.utils.VoiceChangerModel import AudioInOut
if sys.platform.startswith("darwin"): if sys.platform.startswith("darwin"):
@ -70,7 +71,11 @@ class MMVCv15:
def initialize(self): def initialize(self):
print("[Voice Changer] [MMVCv15] Initializing... ") print("[Voice Changer] [MMVCv15] Initializing... ")
self.hps = get_hparams_from_file(self.slotInfo.configFile) vcparams = VoiceChangerParamsManager.get_instance().params
configPath = os.path.join(vcparams.model_dir, str(self.slotInfo.slotIndex), self.slotInfo.configFile)
modelPath = os.path.join(vcparams.model_dir, str(self.slotInfo.slotIndex), self.slotInfo.modelFile)
self.hps = get_hparams_from_file(configPath)
self.net_g = SynthesizerTrn( self.net_g = SynthesizerTrn(
spec_channels=self.hps.data.filter_length // 2 + 1, spec_channels=self.hps.data.filter_length // 2 + 1,
@ -96,7 +101,7 @@ class MMVCv15:
self.onxx_input_length = 8192 self.onxx_input_length = 8192
providers, options = self.getOnnxExecutionProvider() providers, options = self.getOnnxExecutionProvider()
self.onnx_session = onnxruntime.InferenceSession( self.onnx_session = onnxruntime.InferenceSession(
self.slotInfo.modelFile, modelPath,
providers=providers, providers=providers,
provider_options=options, provider_options=options,
) )
@ -108,7 +113,7 @@ class MMVCv15:
self.settings.maxInputLength = self.onxx_input_length - (0.012 * self.hps.data.sampling_rate) - 1024 # onnxの場合は入力長固(crossfadeの1024は仮) # NOQA self.settings.maxInputLength = self.onxx_input_length - (0.012 * self.hps.data.sampling_rate) - 1024 # onnxの場合は入力長固(crossfadeの1024は仮) # NOQA
else: else:
self.net_g.eval() self.net_g.eval()
load_checkpoint(self.slotInfo.modelFile, self.net_g, None) load_checkpoint(modelPath, self.net_g, None)
# その他の設定 # その他の設定
self.settings.srcId = self.slotInfo.srcId self.settings.srcId = self.slotInfo.srcId
@ -139,8 +144,10 @@ class MMVCv15:
setattr(self.settings, key, val) setattr(self.settings, key, val)
if key == "gpu" and self.slotInfo.isONNX: if key == "gpu" and self.slotInfo.isONNX:
providers, options = self.getOnnxExecutionProvider() providers, options = self.getOnnxExecutionProvider()
vcparams = VoiceChangerParamsManager.get_instance().params
modelPath = os.path.join(vcparams.model_dir, str(self.slotInfo.slotIndex), self.slotInfo.modelFile)
self.onnx_session = onnxruntime.InferenceSession( self.onnx_session = onnxruntime.InferenceSession(
self.slotInfo.modelFile, modelPath,
providers=providers, providers=providers,
provider_options=options, provider_options=options,
) )
@ -208,7 +215,8 @@ class MMVCv15:
solaSearchFrame: int = 0, solaSearchFrame: int = 0,
): ):
# maxInputLength を更新(ここでやると非効率だが、とりあえず。) # maxInputLength を更新(ここでやると非効率だが、とりあえず。)
self.settings.maxInputLength = self.onxx_input_length - crossfadeSize - solaSearchFrame # onnxの場合は入力長固(crossfadeの1024は仮) # NOQA if self.slotInfo.isONNX:
self.settings.maxInputLength = self.onxx_input_length - crossfadeSize - solaSearchFrame # onnxの場合は入力長固(crossfadeの1024は仮) # NOQA get_infoで返る値。この関数内の処理では使わない。
newData = newData.astype(np.float32) / self.hps.data.max_wav_value newData = newData.astype(np.float32) / self.hps.data.max_wav_value
@ -310,3 +318,19 @@ class MMVCv15:
sys.modules.pop(key) sys.modules.pop(key)
except: # NOQA except: # NOQA
pass pass
def get_model_current(self):
return [
{
"key": "srcId",
"val": self.settings.srcId,
},
{
"key": "dstId",
"val": self.settings.dstId,
},
{
"key": "f0Factor",
"val": self.settings.f0Factor,
}
]

View File

@ -1,6 +1,7 @@
import os import os
from data.ModelSlot import MMVCv15ModelSlot from data.ModelSlot import MMVCv15ModelSlot
from voice_changer.VoiceChangerParamsManager import VoiceChangerParamsManager
from voice_changer.utils.LoadModelParams import LoadModelParams from voice_changer.utils.LoadModelParams import LoadModelParams
from voice_changer.utils.ModelSlotGenerator import ModelSlotGenerator from voice_changer.utils.ModelSlotGenerator import ModelSlotGenerator
@ -15,7 +16,9 @@ class MMVCv15ModelSlotGenerator(ModelSlotGenerator):
elif file.kind == "mmvcv15Config": elif file.kind == "mmvcv15Config":
slotInfo.configFile = file.name slotInfo.configFile = file.name
elif file.kind == "mmvcv15Correspondence": elif file.kind == "mmvcv15Correspondence":
with open(file.name, "r") as f: vcparams = VoiceChangerParamsManager.get_instance().params
filePath = os.path.join(vcparams.model_dir, str(props.slot), file.name)
with open(filePath, "r") as f:
slotInfo.speakers = {} slotInfo.speakers = {}
while True: while True:
line = f.readline() line = f.readline()

View File

@ -4,17 +4,17 @@ import torch
from const import UPLOAD_DIR from const import UPLOAD_DIR
from voice_changer.RVC.modelMerger.MergeModel import merge_model from voice_changer.RVC.modelMerger.MergeModel import merge_model
from voice_changer.utils.ModelMerger import ModelMerger, ModelMergerRequest from voice_changer.utils.ModelMerger import ModelMerger, ModelMergerRequest
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
class RVCModelMerger(ModelMerger): class RVCModelMerger(ModelMerger):
@classmethod @classmethod
def merge_models(cls, request: ModelMergerRequest, storeSlot: int): def merge_models(cls, params: VoiceChangerParams, request: ModelMergerRequest, storeSlot: int):
print("[Voice Changer] MergeRequest:", request) merged = merge_model(params, request)
merged = merge_model(request)
# いったんは、アップロードフォルダに格納する。(歴史的経緯) # いったんは、アップロードフォルダに格納する。(歴史的経緯)
# 後続のloadmodelを呼び出すことで永続化モデルフォルダに移動させられる。 # 後続のloadmodelを呼び出すことで永続化モデルフォルダに移動させられる。
storeDir = os.path.join(UPLOAD_DIR, f"{storeSlot}") storeDir = os.path.join(UPLOAD_DIR)
print("[Voice Changer] store merged model to:", storeDir) print("[Voice Changer] store merged model to:", storeDir)
os.makedirs(storeDir, exist_ok=True) os.makedirs(storeDir, exist_ok=True)
storeFile = os.path.join(storeDir, "merged.pth") storeFile = os.path.join(storeDir, "merged.pth")

View File

@ -5,7 +5,8 @@ import torch
import onnxruntime import onnxruntime
import json import json
from data.ModelSlot import ModelSlot, RVCModelSlot from data.ModelSlot import RVCModelSlot
from voice_changer.VoiceChangerParamsManager import VoiceChangerParamsManager
from voice_changer.utils.LoadModelParams import LoadModelParams from voice_changer.utils.LoadModelParams import LoadModelParams
from voice_changer.utils.ModelSlotGenerator import ModelSlotGenerator from voice_changer.utils.ModelSlotGenerator import ModelSlotGenerator
@ -13,6 +14,7 @@ from voice_changer.utils.ModelSlotGenerator import ModelSlotGenerator
class RVCModelSlotGenerator(ModelSlotGenerator): class RVCModelSlotGenerator(ModelSlotGenerator):
@classmethod @classmethod
def loadModel(cls, props: LoadModelParams): def loadModel(cls, props: LoadModelParams):
vcparams = VoiceChangerParamsManager.get_instance().params
slotInfo: RVCModelSlot = RVCModelSlot() slotInfo: RVCModelSlot = RVCModelSlot()
for file in props.files: for file in props.files:
if file.kind == "rvcModel": if file.kind == "rvcModel":
@ -24,17 +26,20 @@ class RVCModelSlotGenerator(ModelSlotGenerator):
slotInfo.defaultProtect = 0.5 slotInfo.defaultProtect = 0.5
slotInfo.isONNX = slotInfo.modelFile.endswith(".onnx") slotInfo.isONNX = slotInfo.modelFile.endswith(".onnx")
slotInfo.name = os.path.splitext(os.path.basename(slotInfo.modelFile))[0] slotInfo.name = os.path.splitext(os.path.basename(slotInfo.modelFile))[0]
print("RVC:: slotInfo.modelFile", slotInfo.modelFile)
# slotInfo.iconFile = "/assets/icons/noimage.png" # slotInfo.iconFile = "/assets/icons/noimage.png"
modelPath = os.path.join(vcparams.model_dir, str(props.slot), os.path.basename(slotInfo.modelFile))
if slotInfo.isONNX: if slotInfo.isONNX:
slotInfo = cls._setInfoByONNX(slotInfo) slotInfo = cls._setInfoByONNX(modelPath, slotInfo)
else: else:
slotInfo = cls._setInfoByPytorch(slotInfo) slotInfo = cls._setInfoByPytorch(modelPath, slotInfo)
return slotInfo return slotInfo
@classmethod @classmethod
def _setInfoByPytorch(cls, slot: ModelSlot): def _setInfoByPytorch(cls, modelPath: str, slot: RVCModelSlot):
cpt = torch.load(slot.modelFile, map_location="cpu") cpt = torch.load(modelPath, map_location="cpu")
config_len = len(cpt["config"]) config_len = len(cpt["config"])
version = cpt.get("version", "v1") version = cpt.get("version", "v1")
@ -113,8 +118,8 @@ class RVCModelSlotGenerator(ModelSlotGenerator):
return slot return slot
@classmethod @classmethod
def _setInfoByONNX(cls, slot: ModelSlot): def _setInfoByONNX(cls, modelPath: str, slot: RVCModelSlot):
tmp_onnx_session = onnxruntime.InferenceSession(slot.modelFile, providers=["CPUExecutionProvider"]) tmp_onnx_session = onnxruntime.InferenceSession(modelPath, providers=["CPUExecutionProvider"])
modelmeta = tmp_onnx_session.get_modelmeta() modelmeta = tmp_onnx_session.get_modelmeta()
try: try:
slot = RVCModelSlot(**asdict(slot)) slot = RVCModelSlot(**asdict(slot))

View File

@ -0,0 +1,289 @@
'''
VoiceChangerV2向け
'''
from dataclasses import asdict
import numpy as np
import torch
from data.ModelSlot import RVCModelSlot
from mods.log_control import VoiceChangaerLogger
from voice_changer.RVC.RVCSettings import RVCSettings
from voice_changer.RVC.embedder.EmbedderManager import EmbedderManager
from voice_changer.utils.VoiceChangerModel import AudioInOut, PitchfInOut, FeatureInOut, VoiceChangerModel
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
from voice_changer.RVC.onnxExporter.export2onnx import export2onnx
from voice_changer.RVC.pitchExtractor.PitchExtractorManager import PitchExtractorManager
from voice_changer.RVC.pipeline.PipelineGenerator import createPipeline
from voice_changer.RVC.deviceManager.DeviceManager import DeviceManager
from voice_changer.RVC.pipeline.Pipeline import Pipeline
from Exceptions import DeviceCannotSupportHalfPrecisionException, PipelineCreateException, PipelineNotInitializedException
import resampy
from typing import cast
logger = VoiceChangaerLogger.get_instance().getLogger()
class RVCr2(VoiceChangerModel):
def __init__(self, params: VoiceChangerParams, slotInfo: RVCModelSlot):
logger.info("[Voice Changer] [RVCr2] Creating instance ")
self.deviceManager = DeviceManager.get_instance()
EmbedderManager.initialize(params)
PitchExtractorManager.initialize(params)
self.settings = RVCSettings()
self.params = params
# self.pitchExtractor = PitchExtractorManager.getPitchExtractor(self.settings.f0Detector, self.settings.gpu)
self.pipeline: Pipeline | None = None
self.audio_buffer: AudioInOut | None = None
self.pitchf_buffer: PitchfInOut | None = None
self.feature_buffer: FeatureInOut | None = None
self.prevVol = 0.0
self.slotInfo = slotInfo
# self.initialize()
def initialize(self):
logger.info("[Voice Changer][RVCr2] Initializing... ")
# pipelineの生成
try:
self.pipeline = createPipeline(self.params, self.slotInfo, self.settings.gpu, self.settings.f0Detector)
except PipelineCreateException as e: # NOQA
logger.error("[Voice Changer] pipeline create failed. check your model is valid.")
return
# その他の設定
self.settings.tran = self.slotInfo.defaultTune
self.settings.indexRatio = self.slotInfo.defaultIndexRatio
self.settings.protect = self.slotInfo.defaultProtect
logger.info("[Voice Changer] [RVC] Initializing... done")
def setSamplingRate(self, inputSampleRate, outputSampleRate):
self.inputSampleRate = inputSampleRate
self.outputSampleRate = outputSampleRate
self.initialize()
def update_settings(self, key: str, val: int | float | str):
logger.info(f"[Voice Changer][RVC]: update_settings {key}:{val}")
if key in self.settings.intData:
setattr(self.settings, key, int(val))
if key == "gpu":
self.deviceManager.setForceTensor(False)
self.initialize()
elif key in self.settings.floatData:
setattr(self.settings, key, float(val))
elif key in self.settings.strData:
setattr(self.settings, key, str(val))
if key == "f0Detector" and self.pipeline is not None:
pitchExtractor = PitchExtractorManager.getPitchExtractor(self.settings.f0Detector, self.settings.gpu)
self.pipeline.setPitchExtractor(pitchExtractor)
else:
return False
return True
def get_info(self):
data = asdict(self.settings)
if self.pipeline is not None:
pipelineInfo = self.pipeline.getPipelineInfo()
data["pipelineInfo"] = pipelineInfo
else:
data["pipelineInfo"] = "None"
return data
def get_processing_sampling_rate(self):
return self.slotInfo.samplingRate
def generate_input(
self,
newData: AudioInOut,
crossfadeSize: int,
solaSearchFrame: int,
extra_frame: int
):
# 16k で入ってくる。
inputSize = newData.shape[0]
newData = newData.astype(np.float32) / 32768.0
newFeatureLength = inputSize // 160 # hopsize:=160
if self.audio_buffer is not None:
# 過去のデータに連結
self.audio_buffer = np.concatenate([self.audio_buffer, newData], 0)
if self.slotInfo.f0:
self.pitchf_buffer = np.concatenate([self.pitchf_buffer, np.zeros(newFeatureLength)], 0)
self.feature_buffer = np.concatenate([self.feature_buffer, np.zeros([newFeatureLength, self.slotInfo.embChannels])], 0)
else:
self.audio_buffer = newData
if self.slotInfo.f0:
self.pitchf_buffer = np.zeros(newFeatureLength)
self.feature_buffer = np.zeros([newFeatureLength, self.slotInfo.embChannels])
convertSize = inputSize + crossfadeSize + solaSearchFrame + extra_frame
if convertSize % 160 != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。
convertSize = convertSize + (160 - (convertSize % 160))
outSize = int(((convertSize - extra_frame) / 16000) * self.slotInfo.samplingRate)
# バッファがたまっていない場合はzeroで補う
if self.audio_buffer.shape[0] < convertSize:
self.audio_buffer = np.concatenate([np.zeros([convertSize]), self.audio_buffer])
if self.slotInfo.f0:
self.pitchf_buffer = np.concatenate([np.zeros([convertSize // 160]), self.pitchf_buffer])
self.feature_buffer = np.concatenate([np.zeros([convertSize // 160, self.slotInfo.embChannels]), self.feature_buffer])
# 不要部分をトリミング
convertOffset = -1 * convertSize
featureOffset = convertOffset // 160
self.audio_buffer = self.audio_buffer[convertOffset:] # 変換対象の部分だけ抽出
if self.slotInfo.f0:
self.pitchf_buffer = self.pitchf_buffer[featureOffset:]
self.feature_buffer = self.feature_buffer[featureOffset:]
# 出力部分だけ切り出して音量を確認。(TODO:段階的消音にする)
cropOffset = -1 * (inputSize + crossfadeSize)
cropEnd = -1 * (crossfadeSize)
crop = self.audio_buffer[cropOffset:cropEnd]
vol = np.sqrt(np.square(crop).mean())
vol = max(vol, self.prevVol * 0.0)
self.prevVol = vol
return (self.audio_buffer, self.pitchf_buffer, self.feature_buffer, convertSize, vol, outSize)
def inference(self, receivedData: AudioInOut, crossfade_frame: int, sola_search_frame: int):
if self.pipeline is None:
logger.info("[Voice Changer] Pipeline is not initialized.")
raise PipelineNotInitializedException()
# 処理は16Kで実施(Pitch, embed, (infer))
receivedData = cast(
AudioInOut,
resampy.resample(
receivedData,
self.inputSampleRate,
16000,
),
)
crossfade_frame = int((crossfade_frame / self.inputSampleRate) * 16000)
sola_search_frame = int((sola_search_frame / self.inputSampleRate) * 16000)
extra_frame = int((self.settings.extraConvertSize / self.inputSampleRate) * 16000)
# 入力データ生成
data = self.generate_input(receivedData, crossfade_frame, sola_search_frame, extra_frame)
audio = data[0]
pitchf = data[1]
feature = data[2]
convertSize = data[3]
vol = data[4]
outSize = data[5]
if vol < self.settings.silentThreshold:
return np.zeros(convertSize).astype(np.int16) * np.sqrt(vol)
device = self.pipeline.device
audio = torch.from_numpy(audio).to(device=device, dtype=torch.float32)
repeat = 1 if self.settings.rvcQuality else 0
sid = self.settings.dstId
f0_up_key = self.settings.tran
index_rate = self.settings.indexRatio
protect = self.settings.protect
if_f0 = 1 if self.slotInfo.f0 else 0
embOutputLayer = self.slotInfo.embOutputLayer
useFinalProj = self.slotInfo.useFinalProj
try:
audio_out, self.pitchf_buffer, self.feature_buffer = self.pipeline.exec(
sid,
audio,
pitchf,
feature,
f0_up_key,
index_rate,
if_f0,
# 0,
self.settings.extraConvertSize / self.inputSampleRate if self.settings.silenceFront else 0., # extaraDataSizeの秒数。入力のサンプリングレートで算出
embOutputLayer,
useFinalProj,
repeat,
protect,
outSize
)
# result = audio_out.detach().cpu().numpy() * np.sqrt(vol)
result = audio_out[-outSize:].detach().cpu().numpy() * np.sqrt(vol)
result = cast(
AudioInOut,
resampy.resample(
result,
self.slotInfo.samplingRate,
self.outputSampleRate,
),
)
return result
except DeviceCannotSupportHalfPrecisionException as e: # NOQA
logger.warn("[Device Manager] Device cannot support half precision. Fallback to float....")
self.deviceManager.setForceTensor(True)
self.initialize()
# raise e
return
def __del__(self):
del self.pipeline
# print("---------- REMOVING ---------------")
# remove_path = os.path.join("RVC")
# sys.path = [x for x in sys.path if x.endswith(remove_path) is False]
# for key in list(sys.modules):
# val = sys.modules.get(key)
# try:
# file_path = val.__file__
# if file_path.find("RVC" + os.path.sep) >= 0:
# # print("remove", key, file_path)
# sys.modules.pop(key)
# except Exception: # type:ignore
# # print(e)
# pass
def export2onnx(self):
modelSlot = self.slotInfo
if modelSlot.isONNX:
logger.warn("[Voice Changer] export2onnx, No pyTorch filepath.")
return {"status": "ng", "path": ""}
if self.pipeline is not None:
del self.pipeline
self.pipeline = None
torch.cuda.empty_cache()
self.initialize()
output_file_simple = export2onnx(self.settings.gpu, modelSlot)
return {
"status": "ok",
"path": f"/tmp/{output_file_simple}",
"filename": output_file_simple,
}
def get_model_current(self):
return [
{
"key": "defaultTune",
"val": self.settings.tran,
},
{
"key": "defaultIndexRatio",
"val": self.settings.indexRatio,
},
{
"key": "defaultProtect",
"val": self.settings.protect,
},
]

View File

@ -46,7 +46,7 @@ class EmbedderManager:
file = cls.params.content_vec_500_onnx file = cls.params.content_vec_500_onnx
return OnnxContentvec().loadModel(file, dev) return OnnxContentvec().loadModel(file, dev)
except Exception as e: # noqa except Exception as e: # noqa
print("[Voice Changer] use torch contentvec") print("[Voice Changer] use torch contentvec", e)
file = cls.params.hubert_base file = cls.params.hubert_base
return FairseqHubert().loadModel(file, dev, isHalf) return FairseqHubert().loadModel(file, dev, isHalf)
elif embederType == "hubert-base-japanese": elif embederType == "hubert-base-japanese":

View File

@ -8,7 +8,7 @@ from voice_changer.RVC.inferencer.RVCInferencerv2 import RVCInferencerv2
from voice_changer.RVC.inferencer.RVCInferencerv2Nono import RVCInferencerv2Nono from voice_changer.RVC.inferencer.RVCInferencerv2Nono import RVCInferencerv2Nono
from voice_changer.RVC.inferencer.WebUIInferencer import WebUIInferencer from voice_changer.RVC.inferencer.WebUIInferencer import WebUIInferencer
from voice_changer.RVC.inferencer.WebUIInferencerNono import WebUIInferencerNono from voice_changer.RVC.inferencer.WebUIInferencerNono import WebUIInferencerNono
from voice_changer.RVC.inferencer.VorasInferencebeta import VoRASInferencer import sys
class InferencerManager: class InferencerManager:
@ -38,7 +38,11 @@ class InferencerManager:
elif inferencerType == EnumInferenceTypes.pyTorchRVCv2 or inferencerType == EnumInferenceTypes.pyTorchRVCv2.value: elif inferencerType == EnumInferenceTypes.pyTorchRVCv2 or inferencerType == EnumInferenceTypes.pyTorchRVCv2.value:
return RVCInferencerv2().loadModel(file, gpu) return RVCInferencerv2().loadModel(file, gpu)
elif inferencerType == EnumInferenceTypes.pyTorchVoRASbeta or inferencerType == EnumInferenceTypes.pyTorchVoRASbeta.value: elif inferencerType == EnumInferenceTypes.pyTorchVoRASbeta or inferencerType == EnumInferenceTypes.pyTorchVoRASbeta.value:
return VoRASInferencer().loadModel(file, gpu) if sys.platform.startswith("darwin") is False:
from voice_changer.RVC.inferencer.VorasInferencebeta import VoRASInferencer
return VoRASInferencer().loadModel(file, gpu)
else:
raise RuntimeError("[Voice Changer] VoRAS is not supported on macOS")
elif inferencerType == EnumInferenceTypes.pyTorchRVCv2Nono or inferencerType == EnumInferenceTypes.pyTorchRVCv2Nono.value: elif inferencerType == EnumInferenceTypes.pyTorchRVCv2Nono or inferencerType == EnumInferenceTypes.pyTorchRVCv2Nono.value:
return RVCInferencerv2Nono().loadModel(file, gpu) return RVCInferencerv2Nono().loadModel(file, gpu)
elif inferencerType == EnumInferenceTypes.pyTorchWebUI or inferencerType == EnumInferenceTypes.pyTorchWebUI.value: elif inferencerType == EnumInferenceTypes.pyTorchWebUI or inferencerType == EnumInferenceTypes.pyTorchWebUI.value:

View File

@ -1,12 +1,14 @@
from typing import Dict, Any from typing import Dict, Any
import os
from collections import OrderedDict from collections import OrderedDict
import torch import torch
from voice_changer.ModelSlotManager import ModelSlotManager
from voice_changer.utils.ModelMerger import ModelMergerRequest from voice_changer.utils.ModelMerger import ModelMergerRequest
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
def merge_model(request: ModelMergerRequest): def merge_model(params: VoiceChangerParams, request: ModelMergerRequest):
def extract(ckpt: Dict[str, Any]): def extract(ckpt: Dict[str, Any]):
a = ckpt["model"] a = ckpt["model"]
opt: Dict[str, Any] = OrderedDict() opt: Dict[str, Any] = OrderedDict()
@ -34,11 +36,16 @@ def merge_model(request: ModelMergerRequest):
weights = [] weights = []
alphas = [] alphas = []
slotManager = ModelSlotManager.get_instance(params.model_dir)
for f in files: for f in files:
strength = f.strength strength = f.strength
if strength == 0: if strength == 0:
continue continue
weight, state_dict = load_weight(f.filename) slotInfo = slotManager.get_slot_info(f.slotIndex)
filename = os.path.join(params.model_dir, str(f.slotIndex), os.path.basename(slotInfo.modelFile)) # slotInfo.modelFileはv.1.5.3.11以前はmodel_dirから含まれている。
weight, state_dict = load_weight(filename)
weights.append(weight) weights.append(weight)
alphas.append(f.strength) alphas.append(f.strength)

View File

@ -4,7 +4,7 @@ import torch
from onnxsim import simplify from onnxsim import simplify
import onnx import onnx
from const import TMP_DIR, EnumInferenceTypes from const import TMP_DIR, EnumInferenceTypes
from data.ModelSlot import ModelSlot from data.ModelSlot import RVCModelSlot
from voice_changer.RVC.deviceManager.DeviceManager import DeviceManager from voice_changer.RVC.deviceManager.DeviceManager import DeviceManager
from voice_changer.RVC.onnxExporter.SynthesizerTrnMs256NSFsid_ONNX import ( from voice_changer.RVC.onnxExporter.SynthesizerTrnMs256NSFsid_ONNX import (
SynthesizerTrnMs256NSFsid_ONNX, SynthesizerTrnMs256NSFsid_ONNX,
@ -24,10 +24,12 @@ from voice_changer.RVC.onnxExporter.SynthesizerTrnMsNSFsidNono_webui_ONNX import
from voice_changer.RVC.onnxExporter.SynthesizerTrnMsNSFsid_webui_ONNX import ( from voice_changer.RVC.onnxExporter.SynthesizerTrnMsNSFsid_webui_ONNX import (
SynthesizerTrnMsNSFsid_webui_ONNX, SynthesizerTrnMsNSFsid_webui_ONNX,
) )
from voice_changer.VoiceChangerParamsManager import VoiceChangerParamsManager
def export2onnx(gpu: int, modelSlot: ModelSlot): def export2onnx(gpu: int, modelSlot: RVCModelSlot):
modelFile = modelSlot.modelFile vcparams = VoiceChangerParamsManager.get_instance().params
modelFile = os.path.join(vcparams.model_dir, str(modelSlot.slotIndex), os.path.basename(modelSlot.modelFile))
output_file = os.path.splitext(os.path.basename(modelFile))[0] + ".onnx" output_file = os.path.splitext(os.path.basename(modelFile))[0] + ".onnx"
output_file_simple = os.path.splitext(os.path.basename(modelFile))[0] + "_simple.onnx" output_file_simple = os.path.splitext(os.path.basename(modelFile))[0] + "_simple.onnx"

View File

@ -18,6 +18,7 @@ from voice_changer.RVC.inferencer.OnnxRVCInferencer import OnnxRVCInferencer
from voice_changer.RVC.inferencer.OnnxRVCInferencerNono import OnnxRVCInferencerNono from voice_changer.RVC.inferencer.OnnxRVCInferencerNono import OnnxRVCInferencerNono
from voice_changer.RVC.pitchExtractor.PitchExtractor import PitchExtractor from voice_changer.RVC.pitchExtractor.PitchExtractor import PitchExtractor
from voice_changer.utils.Timer import Timer
logger = VoiceChangaerLogger.get_instance().getLogger() logger = VoiceChangaerLogger.get_instance().getLogger()
@ -89,174 +90,181 @@ class Pipeline(object):
protect=0.5, protect=0.5,
out_size=None, out_size=None,
): ):
# 16000のサンプリングレートで入ってきている。以降この世界は16000で処理。 # print(f"pipeline exec input, audio:{audio.shape}, pitchf:{pitchf.shape}, feature:{feature.shape}")
# print(f"pipeline exec input, silence_front:{silence_front}, out_size:{out_size}")
search_index = self.index is not None and self.big_npy is not None and index_rate != 0 with Timer("main-process", False) as t: # NOQA
# self.t_pad = self.sr * repeat # 1秒 # 16000のサンプリングレートで入ってきている。以降この世界は16000で処理。
# self.t_pad_tgt = self.targetSR * repeat # 1秒 出力時のトリミング(モデルのサンプリングで出力される) search_index = self.index is not None and self.big_npy is not None and index_rate != 0
audio = audio.unsqueeze(0) # self.t_pad = self.sr * repeat # 1秒
# self.t_pad_tgt = self.targetSR * repeat # 1秒 出力時のトリミング(モデルのサンプリングで出力される)
audio = audio.unsqueeze(0)
quality_padding_sec = (repeat * (audio.shape[1] - 1)) / self.sr # padding(reflect)のサイズは元のサイズより小さい必要がある。 quality_padding_sec = (repeat * (audio.shape[1] - 1)) / self.sr # padding(reflect)のサイズは元のサイズより小さい必要がある。
self.t_pad = round(self.sr * quality_padding_sec) # 前後に音声を追加 self.t_pad = round(self.sr * quality_padding_sec) # 前後に音声を追加
self.t_pad_tgt = round(self.targetSR * quality_padding_sec) # 前後に音声を追加 出力時のトリミング(モデルのサンプリングで出力される) self.t_pad_tgt = round(self.targetSR * quality_padding_sec) # 前後に音声を追加 出力時のトリミング(モデルのサンプリングで出力される)
audio_pad = F.pad(audio, (self.t_pad, self.t_pad), mode="reflect").squeeze(0) audio_pad = F.pad(audio, (self.t_pad, self.t_pad), mode="reflect").squeeze(0)
p_len = audio_pad.shape[0] // self.window p_len = audio_pad.shape[0] // self.window
sid = torch.tensor(sid, device=self.device).unsqueeze(0).long() sid = torch.tensor(sid, device=self.device).unsqueeze(0).long()
# RVC QualityがOnのときにはsilence_frontをオフに。 # RVC QualityがOnのときにはsilence_frontをオフに。
silence_front = silence_front if repeat == 0 else 0 silence_front = silence_front if repeat == 0 else 0
pitchf = pitchf if repeat == 0 else np.zeros(p_len) pitchf = pitchf if repeat == 0 else np.zeros(p_len)
out_size = out_size if repeat == 0 else None out_size = out_size if repeat == 0 else None
# ピッチ検出 # ピッチ検出
try:
if if_f0 == 1:
pitch, pitchf = self.pitchExtractor.extract(
audio_pad,
pitchf,
f0_up_key,
self.sr,
self.window,
silence_front=silence_front,
)
# pitch = pitch[:p_len]
# pitchf = pitchf[:p_len]
pitch = torch.tensor(pitch, device=self.device).unsqueeze(0).long()
pitchf = torch.tensor(pitchf, device=self.device, dtype=torch.float).unsqueeze(0)
else:
pitch = None
pitchf = None
except IndexError as e: # NOQA
# print(e)
# import traceback
# traceback.print_exc()
raise NotEnoughDataExtimateF0()
# tensor型調整
feats = audio_pad
if feats.dim() == 2: # double channels
feats = feats.mean(-1)
assert feats.dim() == 1, feats.dim()
feats = feats.view(1, -1)
# embedding
with autocast(enabled=self.isHalf):
try: try:
feats = self.embedder.extractFeatures(feats, embOutputLayer, useFinalProj) if if_f0 == 1:
if torch.isnan(feats).all(): pitch, pitchf = self.pitchExtractor.extract(
raise DeviceCannotSupportHalfPrecisionException() audio_pad,
pitchf,
f0_up_key,
self.sr,
self.window,
silence_front=silence_front,
)
# pitch = pitch[:p_len]
# pitchf = pitchf[:p_len]
pitch = torch.tensor(pitch, device=self.device).unsqueeze(0).long()
pitchf = torch.tensor(pitchf, device=self.device, dtype=torch.float).unsqueeze(0)
else:
pitch = None
pitchf = None
except IndexError as e: # NOQA
# print(e)
# import traceback
# traceback.print_exc()
raise NotEnoughDataExtimateF0()
# tensor型調整
feats = audio_pad
if feats.dim() == 2: # double channels
feats = feats.mean(-1)
assert feats.dim() == 1, feats.dim()
feats = feats.view(1, -1)
# embedding
with Timer("main-process", False) as te:
with autocast(enabled=self.isHalf):
try:
feats = self.embedder.extractFeatures(feats, embOutputLayer, useFinalProj)
if torch.isnan(feats).all():
raise DeviceCannotSupportHalfPrecisionException()
except RuntimeError as e:
if "HALF" in e.__str__().upper():
raise HalfPrecisionChangingException()
elif "same device" in e.__str__():
raise DeviceChangingException()
else:
raise e
# print(f"[Embedding] {te.secs}")
# Index - feature抽出
# if self.index is not None and self.feature is not None and index_rate != 0:
if search_index:
npy = feats[0].cpu().numpy()
# apply silent front for indexsearch
npyOffset = math.floor(silence_front * 16000) // 360
npy = npy[npyOffset:]
if self.isHalf is True:
npy = npy.astype("float32")
# TODO: kは調整できるようにする
k = 1
if k == 1:
_, ix = self.index.search(npy, 1)
npy = self.big_npy[ix.squeeze()]
else:
score, ix = self.index.search(npy, k=8)
weight = np.square(1 / score)
weight /= weight.sum(axis=1, keepdims=True)
npy = np.sum(self.big_npy[ix] * np.expand_dims(weight, axis=2), axis=1)
# recover silient font
npy = np.concatenate([np.zeros([npyOffset, npy.shape[1]], dtype=np.float32), feature[:npyOffset:2].astype("float32"), npy])[-feats.shape[1]:]
feats = torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate + (1 - index_rate) * feats
feats = F.interpolate(feats.permute(0, 2, 1), scale_factor=2).permute(0, 2, 1)
if protect < 0.5 and search_index:
feats0 = feats.clone()
# ピッチサイズ調整
p_len = audio_pad.shape[0] // self.window
if feats.shape[1] < p_len:
p_len = feats.shape[1]
if pitch is not None and pitchf is not None:
pitch = pitch[:, :p_len]
pitchf = pitchf[:, :p_len]
feats_len = feats.shape[1]
if pitch is not None and pitchf is not None:
pitch = pitch[:, -feats_len:]
pitchf = pitchf[:, -feats_len:]
p_len = torch.tensor([feats_len], device=self.device).long()
# pitchの推定が上手くいかない(pitchf=0)場合、検索前の特徴を混ぜる
# pitchffの作り方の疑問はあるが、本家通りなので、このまま使うことにする。
# https://github.com/w-okada/voice-changer/pull/276#issuecomment-1571336929
if protect < 0.5 and search_index:
pitchff = pitchf.clone()
pitchff[pitchf > 0] = 1
pitchff[pitchf < 1] = protect
pitchff = pitchff.unsqueeze(-1)
feats = feats * pitchff + feats0 * (1 - pitchff)
feats = feats.to(feats0.dtype)
p_len = torch.tensor([p_len], device=self.device).long()
# apply silent front for inference
if type(self.inferencer) in [OnnxRVCInferencer, OnnxRVCInferencerNono]:
npyOffset = math.floor(silence_front * 16000) // 360
feats = feats[:, npyOffset * 2 :, :] # NOQA
feats_len = feats.shape[1]
if pitch is not None and pitchf is not None:
pitch = pitch[:, -feats_len:]
pitchf = pitchf[:, -feats_len:]
p_len = torch.tensor([feats_len], device=self.device).long()
# 推論実行
try:
with torch.no_grad():
with autocast(enabled=self.isHalf):
audio1 = (
torch.clip(
self.inferencer.infer(feats, p_len, pitch, pitchf, sid, out_size)[0][0, 0].to(dtype=torch.float32),
-1.0,
1.0,
)
* 32767.5
).data.to(dtype=torch.int16)
except RuntimeError as e: except RuntimeError as e:
if "HALF" in e.__str__().upper(): if "HALF" in e.__str__().upper():
print("11", e)
raise HalfPrecisionChangingException() raise HalfPrecisionChangingException()
elif "same device" in e.__str__():
raise DeviceChangingException()
else: else:
raise e raise e
# Index - feature抽出 feats_buffer = feats.squeeze(0).detach().cpu()
# if self.index is not None and self.feature is not None and index_rate != 0:
if search_index:
npy = feats[0].cpu().numpy()
# apply silent front for indexsearch
npyOffset = math.floor(silence_front * 16000) // 360
npy = npy[npyOffset:]
if self.isHalf is True: if pitchf is not None:
npy = npy.astype("float32") pitchf_buffer = pitchf.squeeze(0).detach().cpu()
# TODO: kは調整できるようにする
k = 1
if k == 1:
_, ix = self.index.search(npy, 1)
npy = self.big_npy[ix.squeeze()]
else: else:
score, ix = self.index.search(npy, k=8) pitchf_buffer = None
weight = np.square(1 / score)
weight /= weight.sum(axis=1, keepdims=True)
npy = np.sum(self.big_npy[ix] * np.expand_dims(weight, axis=2), axis=1)
# recover silient font del p_len, pitch, pitchf, feats
npy = np.concatenate([np.zeros([npyOffset, npy.shape[1]], dtype=np.float32), feature[:npyOffset:2].astype("float32"), npy])[-feats.shape[1]:] # torch.cuda.empty_cache()
feats = torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate + (1 - index_rate) * feats
feats = F.interpolate(feats.permute(0, 2, 1), scale_factor=2).permute(0, 2, 1)
if protect < 0.5 and search_index:
feats0 = feats.clone()
# ピッチサイズ調整 # inferで出力されるサンプリングレートはモデルのサンプリングレートになる。
p_len = audio_pad.shape[0] // self.window # pipelineに入力されるときはhubertように16k
if feats.shape[1] < p_len: if self.t_pad_tgt != 0:
p_len = feats.shape[1] offset = self.t_pad_tgt
if pitch is not None and pitchf is not None: end = -1 * self.t_pad_tgt
pitch = pitch[:, :p_len] audio1 = audio1[offset:end]
pitchf = pitchf[:, :p_len]
feats_len = feats.shape[1] del sid
if pitch is not None and pitchf is not None: # torch.cuda.empty_cache()
pitch = pitch[:, -feats_len:] # print("EXEC AVERAGE:", t.avrSecs)
pitchf = pitchf[:, -feats_len:]
p_len = torch.tensor([feats_len], device=self.device).long()
# pitchの推定が上手くいかない(pitchf=0)場合、検索前の特徴を混ぜる
# pitchffの作り方の疑問はあるが、本家通りなので、このまま使うことにする。
# https://github.com/w-okada/voice-changer/pull/276#issuecomment-1571336929
if protect < 0.5 and search_index:
pitchff = pitchf.clone()
pitchff[pitchf > 0] = 1
pitchff[pitchf < 1] = protect
pitchff = pitchff.unsqueeze(-1)
feats = feats * pitchff + feats0 * (1 - pitchff)
feats = feats.to(feats0.dtype)
p_len = torch.tensor([p_len], device=self.device).long()
# apply silent front for inference
if type(self.inferencer) in [OnnxRVCInferencer, OnnxRVCInferencerNono]:
npyOffset = math.floor(silence_front * 16000) // 360
feats = feats[:, npyOffset * 2 :, :] # NOQA
feats_len = feats.shape[1]
if pitch is not None and pitchf is not None:
pitch = pitch[:, -feats_len:]
pitchf = pitchf[:, -feats_len:]
p_len = torch.tensor([feats_len], device=self.device).long()
# 推論実行
try:
with torch.no_grad():
with autocast(enabled=self.isHalf):
audio1 = (
torch.clip(
self.inferencer.infer(feats, p_len, pitch, pitchf, sid, out_size)[0][0, 0].to(dtype=torch.float32),
-1.0,
1.0,
)
* 32767.5
).data.to(dtype=torch.int16)
except RuntimeError as e:
if "HALF" in e.__str__().upper():
print("11", e)
raise HalfPrecisionChangingException()
else:
raise e
feats_buffer = feats.squeeze(0).detach().cpu()
if pitchf is not None:
pitchf_buffer = pitchf.squeeze(0).detach().cpu()
else:
pitchf_buffer = None
del p_len, pitch, pitchf, feats
# torch.cuda.empty_cache()
# inferで出力されるサンプリングレートはモデルのサンプリングレートになる。
# pipelineに入力されるときはhubertように16k
if self.t_pad_tgt != 0:
offset = self.t_pad_tgt
end = -1 * self.t_pad_tgt
audio1 = audio1[offset:end]
del sid
# torch.cuda.empty_cache()
return audio1, pitchf_buffer, feats_buffer return audio1, pitchf_buffer, feats_buffer
def __del__(self): def __del__(self):

View File

@ -9,15 +9,17 @@ from voice_changer.RVC.embedder.EmbedderManager import EmbedderManager
from voice_changer.RVC.inferencer.InferencerManager import InferencerManager from voice_changer.RVC.inferencer.InferencerManager import InferencerManager
from voice_changer.RVC.pipeline.Pipeline import Pipeline from voice_changer.RVC.pipeline.Pipeline import Pipeline
from voice_changer.RVC.pitchExtractor.PitchExtractorManager import PitchExtractorManager from voice_changer.RVC.pitchExtractor.PitchExtractorManager import PitchExtractorManager
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
def createPipeline(modelSlot: RVCModelSlot, gpu: int, f0Detector: str): def createPipeline(params: VoiceChangerParams, modelSlot: RVCModelSlot, gpu: int, f0Detector: str):
dev = DeviceManager.get_instance().getDevice(gpu) dev = DeviceManager.get_instance().getDevice(gpu)
half = DeviceManager.get_instance().halfPrecisionAvailable(gpu) half = DeviceManager.get_instance().halfPrecisionAvailable(gpu)
# Inferencer 生成 # Inferencer 生成
try: try:
inferencer = InferencerManager.getInferencer(modelSlot.modelType, modelSlot.modelFile, gpu) modelPath = os.path.join(params.model_dir, str(modelSlot.slotIndex), os.path.basename(modelSlot.modelFile))
inferencer = InferencerManager.getInferencer(modelSlot.modelType, modelPath, gpu)
except Exception as e: except Exception as e:
print("[Voice Changer] exception! loading inferencer", e) print("[Voice Changer] exception! loading inferencer", e)
traceback.print_exc() traceback.print_exc()
@ -40,7 +42,8 @@ def createPipeline(modelSlot: RVCModelSlot, gpu: int, f0Detector: str):
pitchExtractor = PitchExtractorManager.getPitchExtractor(f0Detector, gpu) pitchExtractor = PitchExtractorManager.getPitchExtractor(f0Detector, gpu)
# index, feature # index, feature
index = _loadIndex(modelSlot) indexPath = os.path.join(params.model_dir, str(modelSlot.slotIndex), os.path.basename(modelSlot.indexFile))
index = _loadIndex(indexPath)
pipeline = Pipeline( pipeline = Pipeline(
embedder, embedder,
@ -55,21 +58,17 @@ def createPipeline(modelSlot: RVCModelSlot, gpu: int, f0Detector: str):
return pipeline return pipeline
def _loadIndex(modelSlot: RVCModelSlot): def _loadIndex(indexPath: str):
# Indexのロード # Indexのロード
print("[Voice Changer] Loading index...") print("[Voice Changer] Loading index...")
# ファイル指定がない場合はNone
if modelSlot.indexFile is None:
print("[Voice Changer] Index is None, not used")
return None
# ファイル指定があってもファイルがない場合はNone # ファイル指定があってもファイルがない場合はNone
if os.path.exists(modelSlot.indexFile) is not True: if os.path.exists(indexPath) is not True or os.path.isfile(indexPath) is not True:
print("[Voice Changer] Index file is not found")
return None return None
try: try:
print("Try loading...", modelSlot.indexFile) print("Try loading...", indexPath)
index = faiss.read_index(modelSlot.indexFile) index = faiss.read_index(indexPath)
except: # NOQA except: # NOQA
print("[Voice Changer] load index failed. Use no index.") print("[Voice Changer] load index failed. Use no index.")
traceback.print_exc() traceback.print_exc()

View File

@ -1,6 +1,7 @@
import sys import sys
import os import os
from data.ModelSlot import SoVitsSvc40ModelSlot from data.ModelSlot import SoVitsSvc40ModelSlot
from voice_changer.VoiceChangerParamsManager import VoiceChangerParamsManager
from voice_changer.utils.VoiceChangerModel import AudioInOut from voice_changer.utils.VoiceChangerModel import AudioInOut
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
@ -92,13 +93,17 @@ class SoVitsSvc40:
def initialize(self): def initialize(self):
print("[Voice Changer] [so-vits-svc40] Initializing... ") print("[Voice Changer] [so-vits-svc40] Initializing... ")
self.hps = get_hparams_from_file(self.slotInfo.configFile) vcparams = VoiceChangerParamsManager.get_instance().params
configPath = os.path.join(vcparams.model_dir, str(self.slotInfo.slotIndex), self.slotInfo.configFile)
modelPath = os.path.join(vcparams.model_dir, str(self.slotInfo.slotIndex), self.slotInfo.modelFile)
self.hps = get_hparams_from_file(configPath)
self.settings.speakers = self.hps.spk self.settings.speakers = self.hps.spk
# cluster # cluster
try: try:
if self.slotInfo.clusterFile is not None: if self.slotInfo.clusterFile is not None:
self.cluster_model = get_cluster_model(self.slotInfo.clusterFile) clusterPath = os.path.join(vcparams.model_dir, str(self.slotInfo.slotIndex), self.slotInfo.clusterFile)
self.cluster_model = get_cluster_model(clusterPath)
else: else:
self.cluster_model = None self.cluster_model = None
except Exception as e: except Exception as e:
@ -110,7 +115,7 @@ class SoVitsSvc40:
if self.slotInfo.isONNX: if self.slotInfo.isONNX:
providers, options = self.getOnnxExecutionProvider() providers, options = self.getOnnxExecutionProvider()
self.onnx_session = onnxruntime.InferenceSession( self.onnx_session = onnxruntime.InferenceSession(
self.slotInfo.modelFile, modelPath,
providers=providers, providers=providers,
provider_options=options, provider_options=options,
) )
@ -122,7 +127,7 @@ class SoVitsSvc40:
) )
net_g.eval() net_g.eval()
self.net_g = net_g self.net_g = net_g
load_checkpoint(self.slotInfo.modelFile, self.net_g, None) load_checkpoint(modelPath, self.net_g, None)
def getOnnxExecutionProvider(self): def getOnnxExecutionProvider(self):
availableProviders = onnxruntime.get_available_providers() availableProviders = onnxruntime.get_available_providers()
@ -379,6 +384,10 @@ class SoVitsSvc40:
except Exception: # type:ignore except Exception: # type:ignore
pass pass
def get_model_current(self):
return [
]
def resize_f0(x, target_len): def resize_f0(x, target_len):
source = np.array(x) source = np.array(x)

View File

@ -37,8 +37,14 @@ class GPUInfo:
@dataclass() @dataclass()
class VoiceChangerManagerSettings: class VoiceChangerManagerSettings:
modelSlotIndex: int = -1 modelSlotIndex: int = -1
passThrough: bool = False # 0: off, 1: on
# ↓mutableな物だけ列挙 # ↓mutableな物だけ列挙
intData: list[str] = field(default_factory=lambda: ["modelSlotIndex"]) boolData: list[str] = field(default_factory=lambda: [
"passThrough"
])
intData: list[str] = field(default_factory=lambda: [
"modelSlotIndex",
])
class VoiceChangerManager(ServerDeviceCallbacks): class VoiceChangerManager(ServerDeviceCallbacks):
@ -121,7 +127,6 @@ class VoiceChangerManager(ServerDeviceCallbacks):
def get_instance(cls, params: VoiceChangerParams): def get_instance(cls, params: VoiceChangerParams):
if cls._instance is None: if cls._instance is None:
cls._instance = cls(params) cls._instance = cls(params)
# cls._instance.voiceChanger = VoiceChanger(params)
return cls._instance return cls._instance
def loadModel(self, params: LoadModelParams): def loadModel(self, params: LoadModelParams):
@ -147,7 +152,7 @@ class VoiceChangerManager(ServerDeviceCallbacks):
os.makedirs(dstDir, exist_ok=True) os.makedirs(dstDir, exist_ok=True)
logger.info(f"move to {srcPath} -> {dstPath}") logger.info(f"move to {srcPath} -> {dstPath}")
shutil.move(srcPath, dstPath) shutil.move(srcPath, dstPath)
file.name = dstPath file.name = os.path.basename(dstPath)
# メタデータ作成(各VCで定義) # メタデータ作成(各VCで定義)
if params.voiceChangerType == "RVC": if params.voiceChangerType == "RVC":
@ -188,6 +193,7 @@ class VoiceChangerManager(ServerDeviceCallbacks):
data["modelSlots"] = self.modelSlotManager.getAllSlotInfo(reload=True) data["modelSlots"] = self.modelSlotManager.getAllSlotInfo(reload=True)
data["sampleModels"] = getSampleInfos(self.params.sample_mode) data["sampleModels"] = getSampleInfos(self.params.sample_mode)
data["python"] = sys.version data["python"] = sys.version
data["voiceChangerParams"] = self.params
data["status"] = "OK" data["status"] = "OK"
@ -214,11 +220,18 @@ class VoiceChangerManager(ServerDeviceCallbacks):
return return
elif slotInfo.voiceChangerType == "RVC": elif slotInfo.voiceChangerType == "RVC":
logger.info("................RVC") logger.info("................RVC")
from voice_changer.RVC.RVC import RVC # from voice_changer.RVC.RVC import RVC
self.voiceChangerModel = RVC(self.params, slotInfo) # self.voiceChangerModel = RVC(self.params, slotInfo)
self.voiceChanger = VoiceChanger(self.params) # self.voiceChanger = VoiceChanger(self.params)
# self.voiceChanger.setModel(self.voiceChangerModel)
from voice_changer.RVC.RVCr2 import RVCr2
self.voiceChangerModel = RVCr2(self.params, slotInfo)
self.voiceChanger = VoiceChangerV2(self.params)
self.voiceChanger.setModel(self.voiceChangerModel) self.voiceChanger.setModel(self.voiceChangerModel)
elif slotInfo.voiceChangerType == "MMVCv13": elif slotInfo.voiceChangerType == "MMVCv13":
logger.info("................MMVCv13") logger.info("................MMVCv13")
from voice_changer.MMVCv13.MMVCv13 import MMVCv13 from voice_changer.MMVCv13.MMVCv13 import MMVCv13
@ -260,10 +273,16 @@ class VoiceChangerManager(ServerDeviceCallbacks):
del self.voiceChangerModel del self.voiceChangerModel
return return
def update_settings(self, key: str, val: str | int | float): def update_settings(self, key: str, val: str | int | float | bool):
self.store_setting(key, val) self.store_setting(key, val)
if key in self.settings.intData: if key in self.settings.boolData:
if val == "true":
newVal = True
elif val == "false":
newVal = False
setattr(self.settings, key, newVal)
elif key in self.settings.intData:
newVal = int(val) newVal = int(val)
if key == "modelSlotIndex": if key == "modelSlotIndex":
newVal = newVal % 1000 newVal = newVal % 1000
@ -283,6 +302,9 @@ class VoiceChangerManager(ServerDeviceCallbacks):
return self.get_info() return self.get_info()
def changeVoice(self, receivedData: AudioInOut): def changeVoice(self, receivedData: AudioInOut):
if self.settings.passThrough is True: # パススルー
return receivedData, []
if hasattr(self, "voiceChanger") is True: if hasattr(self, "voiceChanger") is True:
return self.voiceChanger.on_request(receivedData) return self.voiceChanger.on_request(receivedData)
else: else:
@ -299,8 +321,8 @@ class VoiceChangerManager(ServerDeviceCallbacks):
req.files = [MergeElement(**f) for f in req.files] req.files = [MergeElement(**f) for f in req.files]
slot = len(self.modelSlotManager.getAllSlotInfo()) - 1 slot = len(self.modelSlotManager.getAllSlotInfo()) - 1
if req.voiceChangerType == "RVC": if req.voiceChangerType == "RVC":
merged = RVCModelMerger.merge_models(req, slot) merged = RVCModelMerger.merge_models(self.params, req, slot)
loadParam = LoadModelParams(voiceChangerType="RVC", slot=slot, isSampleMode=False, sampleId="", files=[LoadModelParamFile(name=os.path.basename(merged), kind="rvcModel", dir=f"{slot}")], params={}) loadParam = LoadModelParams(voiceChangerType="RVC", slot=slot, isSampleMode=False, sampleId="", files=[LoadModelParamFile(name=os.path.basename(merged), kind="rvcModel", dir="")], params={})
self.loadModel(loadParam) self.loadModel(loadParam)
return self.get_info() return self.get_info()

View File

@ -0,0 +1,17 @@
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
class VoiceChangerParamsManager:
_instance = None
def __init__(self):
self.params = None
@classmethod
def get_instance(cls):
if cls._instance is None:
cls._instance = cls()
return cls._instance
def setParams(self, params: VoiceChangerParams):
self.params = params

View File

@ -6,7 +6,7 @@
- 適用VoiceChangerModel - 適用VoiceChangerModel
DiffusionSVC DiffusionSVC
RVC
''' '''
from typing import Any, Union from typing import Any, Union
@ -208,12 +208,13 @@ class VoiceChangerV2(VoiceChangerIF):
block_frame = receivedData.shape[0] block_frame = receivedData.shape[0]
crossfade_frame = min(self.settings.crossFadeOverlapSize, block_frame) crossfade_frame = min(self.settings.crossFadeOverlapSize, block_frame)
self._generate_strength(crossfade_frame) self._generate_strength(crossfade_frame)
# data = self.voiceChanger.generate_input(newData, block_frame, crossfade_frame, sola_search_frame)
audio = self.voiceChanger.inference( audio = self.voiceChanger.inference(
receivedData, receivedData,
crossfade_frame=crossfade_frame, crossfade_frame=crossfade_frame,
sola_search_frame=sola_search_frame sola_search_frame=sola_search_frame
) )
if hasattr(self, "sola_buffer") is True: if hasattr(self, "sola_buffer") is True:
np.set_printoptions(threshold=10000) np.set_printoptions(threshold=10000)
audio_offset = -1 * (sola_search_frame + crossfade_frame + block_frame) audio_offset = -1 * (sola_search_frame + crossfade_frame + block_frame)

View File

@ -5,7 +5,7 @@ from dataclasses import dataclass
@dataclass @dataclass
class MergeElement: class MergeElement:
filename: str slotIndex: int
strength: int strength: int

View File

@ -1,15 +1,43 @@
import time import time
import inspect
class Timer(object): class Timer(object):
def __init__(self, title: str): storedSecs = {} # Class variable
def __init__(self, title: str, enalbe: bool = True):
self.title = title self.title = title
self.enable = enalbe
self.secs = 0
self.msecs = 0
self.avrSecs = 0
if self.enable is False:
return
self.maxStores = 10
current_frame = inspect.currentframe()
caller_frame = inspect.getouterframes(current_frame, 2)
frame = caller_frame[1]
filename = frame.filename
line_number = frame.lineno
self.key = f"{title}_{filename}_{line_number}"
if self.key not in self.storedSecs:
self.storedSecs[self.key] = []
def __enter__(self): def __enter__(self):
if self.enable is False:
return
self.start = time.time() self.start = time.time()
return self return self
def __exit__(self, *_): def __exit__(self, *_):
if self.enable is False:
return
self.end = time.time() self.end = time.time()
self.secs = self.end - self.start self.secs = self.end - self.start
self.msecs = self.secs * 1000 # millisecs self.msecs = self.secs * 1000 # millisecs
self.storedSecs[self.key].append(self.secs)
self.storedSecs[self.key] = self.storedSecs[self.key][-self.maxStores:]
self.avrSecs = sum(self.storedSecs[self.key]) / len(self.storedSecs[self.key])

View File

@ -0,0 +1,20 @@
{
"signedContributors": [
{
"name": "w-okada",
"id": 48346627,
"comment_id": 1667673774,
"created_at": "2023-08-07T11:21:42Z",
"repoId": 527419347,
"pullRequestNo": 661
},
{
"name": "w-okada",
"id": 48346627,
"comment_id": 1667674735,
"created_at": "2023-08-07T11:22:28Z",
"repoId": 527419347,
"pullRequestNo": 661
}
]
}

View File

@ -44,6 +44,8 @@ If you have the old version, be sure to unzip it into a separate folder.
When connecting remotely, please use `.bat` file (win) and `.command` file (mac) where http is replaced with https. When connecting remotely, please use `.bat` file (win) and `.command` file (mac) where http is replaced with https.
Access with Browser (currently only chrome is supported), then you can see gui.
### Console ### Console
When you run a .bat file (Windows) or .command file (Mac), a screen like the following will be displayed and various data will be downloaded from the Internet at the initial start-up. Depending on your environment, it may take 1-2 minutes in many cases. When you run a .bat file (Windows) or .command file (Mac), a screen like the following will be displayed and various data will be downloaded from the Internet at the initial start-up. Depending on your environment, it may take 1-2 minutes in many cases.

View File

@ -44,6 +44,8 @@
リモートから接続する場合は、`.bat`ファイル(win)、`.command`ファイル(mac)の http が https に置き換わっているものを使用してください。 リモートから接続する場合は、`.bat`ファイル(win)、`.command`ファイル(mac)の http が https に置き換わっているものを使用してください。
ブラウザ(Chrome のみサポート)でアクセスすると画面が表示されます。
### コンソール表示 ### コンソール表示
`.bat`ファイル(win)や`.command`ファイル(mac)を実行すると、次のような画面が表示され、初回起動時には各種データをインターネットからダウンロードします。 `.bat`ファイル(win)や`.command`ファイル(mac)を実行すると、次のような画面が表示され、初回起動時には各種データをインターネットからダウンロードします。