站长邮箱:rocky@bit.ac.cn

模型列表

模型参数量功能特性流式支持指令控制
Qwen3-TTS-12Hz-1.7B-VoiceDesign1.7B基于描述生成语音
Qwen3-TTS-12Hz-1.7B-CustomVoice1.7B9种精选音色 + 指令控制
Qwen3-TTS-12Hz-1.7B-Base1.7B3秒快速克隆 + 微调
Qwen3-TTS-12Hz-0.6B-CustomVoice0.6B9种精选音色
Qwen3-TTS-12Hz-0.6B-Base0.6B3秒快速克隆 + 微调

性能对比

模型中文 WER英文 WER
CosyVoice 30.711.45
MiniMax-Speech0.831.65
Qwen3-TTS-12Hz-1.7B-Base0.771.24
Qwen3-TTS-12Hz-0.6B-Base0.921.32
FireRedTTS 21.141.95

环境配置

# 创建 Python 3.12 环境
conda create -n qwen3-tts python=3.12 -y
conda activate qwen3-tts

# 安装 qwen-tts 包
pip install -U qwen-tts

# 推荐安装 FlashAttention 2 以减少显存占用
pip install -U flash-attn --no-build-isolation

1. 自定义语音生成(CustomVoice)

import torch
import soundfile as sf
from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    device_map="cuda:0",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)

# 单条推理
wavs, sr = model.generate_custom_voice(
    text="其实我真的有发现,我是一个特别善于观察别人情绪的人。",
    language="Chinese",
    speaker="Vivian",
    instruct="用特别愤怒的语气说",
)
sf.write("output_custom_voice.wav", wavs[0], sr)

# 批量推理
wavs, sr = model.generate_custom_voice(
    text=[
        "其实我真的有发现,我是一个特别善于观察别人情绪的人。", 
        "She said she would be here by noon."
    ],
    language=["Chinese", "English"],
    speaker=["Vivian", "Ryan"],
    instruct=["", "Very happy."]
)

 精选音色

音色描述母语
Vivian明亮、略带锐利的年轻女声中文
Serena温暖、温柔的年轻女声中文
Uncle_Fu成熟男声,低沉圆润中文
Dylan年轻的北京男声,清晰自然中文(北京话)
Eric活泼的成都男声,略带沙哑中文(四川话)
Ryan动感男声,节奏感强英文
Aiden阳光美式男声,中音清晰英文
Ono_Anna俏皮日本女声,轻盈灵动日语
Sohee温暖韩国女声,情感丰富韩语

2. 语音设计(VoiceDesign)

import torch
import soundfile as sf
from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign",
    device_map="cuda:0",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)

wavs, sr = model.generate_voice_design(
    text="哥哥,你回来啦,人家等了你好久好久了!",
    language="Chinese",
    instruct="体现撒娇稚嫩的萝莉女声,音调偏高且起伏明显,营造出黏人、做作又刻意卖萌的听觉效果。",
)
sf.write("output_voice_design.wav", wavs[0], sr)

3. 语音克隆(Base)

import torch
import soundfile as sf
from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-Base",
    device_map="cuda:0",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)

ref_audio = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone.wav"
ref_text = "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you."

wavs, sr = model.generate_voice_clone(
    text="I am solving the equation: x = [-b ± √(b²-4ac)] / 2a? Nobody can — it's a disaster (◍•͈⌔•͈◍), very sad!",
    language="English",
    ref_audio=ref_audio,
    ref_text=ref_text,
)
sf.write("output_voice_clone.wav", wavs[0], sr)

4. 启动本地 Web UI

# CustomVoice 模型
qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --ip 0.0.0.0 --port 8000

# VoiceDesign 模型
qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --ip 0.0.0.0 --port 8000

# Base 模型
qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-Base --ip 0.0.0.0 --port 8000

然后访问 http://:8000 即可体验。

0已赞

回顶部