文本转语音工具bark生成长音频方法和生成短音频的方法是不一样的,前几天制作了用bark制作短音频的教程,文章链接:《最强文本转语音工具:Bark,本地安装+云端部署+在线体验详细教程,AI一键生成带语气情感的语音及歌唱》,查看本长语音生成教程前,建议先看一下上一篇教程,熟悉一下基础安装操作。中bark长语音生成说明:,今天我们来用bark生成长度超过14秒的长音频,下面演示一下具体操作。
1、 colab 云端部署教程
首先打开谷歌,网站地址:,点击【文件】-【新建笔记本】。
先链接谷歌云盘,然后新建代码输入框输入下面代码安装bark
pip install git+https://github.com/suno-ai/bark.git
生成长语音有三种模式,1、简单模式,2、高级模式,3、对话模式
先说第一种简单模式,是使用 nltk 将较长的文本拆分成句子,并一个一个地生成句子。
先运行如下代码,安装完整nltk库
import nltk nltk.download('punkt')
punkt下载完成后,运行如下代码:
import os os.environ["CUDA_VISIBLE_DEVICES"] = "0"from IPython.display import Audioimport nltk # we'll use this to split into sentencesimport numpy as npfrom bark.generation import ( generate_text_semantic, preload_models,)from bark.api import semantic_to_waveformfrom bark import generate_audio, SAMPLE_RATE preload_models()script = """ Hey, have you heard about this new text-to-audio model called "Bark"? Apparently, it's the most realistic and natural-sounding text-to-audio model out there right now. People are saying it sounds just like a real person speaking. I think it uses advanced machine learning algorithms to analyze and understand the nuances of human speech, and then replicates those nuances in its own speech output. It's pretty impressive, and I bet it could be used for things like audiobooks or podcasts. In fact, I heard that some publishers are already starting to use Bark to create audiobooks. It would be like having your own personal voiceover artist. I really think Bark is going to be a game-changer in the world of text-to-audio technology. """.replace("n", " ").strip()sentences = nltk.sent_tokenize(script)SPEAKER = "v2/en_speaker_6"silence = np.zeros(int(0.25 * SAMPLE_RATE)) # quarter second of silencepieces = []for sentence in sentences: audio_array = generate_audio(sentence, history_prompt=SPEAKER) pieces += [audio_array, silence.copy()]Audio(np.concatenate(pieces), rate=SAMPLE_RATE)
这个段代码的意思就是,将的文本内容生成语音,可以设置发音人,打开下面链接可以查看所有发音人列表。
将这段文本转为语音用了非常长的时间,已经用时1小时32分钟了,还没有完成,等不了了,这个模型对电脑配置要求确实有点高。
第二种高级模式
有时 Bark 会在提示结束时产生一些额外的音频。 我们可以通过降低 bark 停止生成文本的阈值来解决这个问题。 我们在 ic 中使用 参数调整。
生成音频的完整代码:
import os os.environ["CUDA_VISIBLE_DEVICES"] = "0"from IPython.display import Audiofrom scipy.io.wavfile import write as write_wavimport nltk # we'll use this to split into sentencesimport numpy as npfrom bark.generation import ( generate_text_semantic, preload_models,)from bark.api import semantic_to_waveformfrom bark import generate_audio, SAMPLE_RATE preload_models()script = """ Hey, have you heard about this new text-to-audio model called "Bark"? Apparently, it's the most realistic and natural-sounding text-to-audio model out there right now. People are saying it sounds just like a real person speaking. I think it uses advanced machine learning algorithms to analyze and understand the nuances of human speech, and then replicates those nuances in its own speech output. It's pretty impressive, and I bet it could be used for things like audiobooks or podcasts. In fact, I heard that some publishers are already starting to use Bark to create audiobooks. It would be like having your own personal voiceover artist. I really think Bark is going to be a game-changer in the world of text-to-audio technology. """.replace("n", " ").strip()sentences = nltk.sent_tokenize(script)GEN_TEMP = 0.6SPEAKER = "v2/en_speaker_6" #这里修改发音人silence = np.zeros(int(0.25 * SAMPLE_RATE)) # quarter second of silencepieces = []for sentence in sentences: semantic_tokens = generate_text_semantic( sentence, history_prompt=SPEAKER, temp=GEN_TEMP, min_eos_p=0.05, #修改前后多余声音参数在这里 ) audio_array = semantic_to_waveform(semantic_tokens, history_prompt=SPEAKER,) pieces += [audio_array, silence.copy()]Audio(np.concatenate(pieces), rate=SAMPLE_RATE)write_wav("bark_generation.wav", SAMPLE_RATE, np.concatenate(pieces))
可以只修改(待转换语音的文本)、(前后多余声音微调参数)、(发音人)的值,其它不了解的可以不用管。
第三种是对话模式
可以自定义对话内容,不同人设置不同的发音,以下是完整示例代码:
"CUDA_VISIBLE_DEVICES"] = "0"import Audiowavfile import write as write_wav# we'll use this to split into sentencesimport ( generate_text_semantic, semantic_to_waveform generate_audio, SAMPLE_RATE preload_models"Samantha": "v2/en_speaker_9", "John": "v2/en_speaker_2"}# Script generated by chat GPT""" Samantha: Hey, have you heard about this new text-to-audio model called "Bark"? John: No, I haven't. What's so special about it? Samantha: Well, apparently it's the most realistic and natural-sounding text-to-audio model out there right now. People are saying it sounds just like a real person speaking. John: Wow, that sounds amazing. How does it work? Samantha: I think it uses advanced machine learning algorithms to analyze and understand the nuances of human speech, and then replicates those nuances in its own speech output. John: That's pretty impressive. Do you think it could be used for things like audiobooks or podcasts? Samantha: Definitely! In fact, I heard that some publishers are already starting to use Bark to create audiobooks. And I bet it would be great for podcasts too. John: I can imagine. It would be like having your own personal voiceover artist. Samantha: Exactly! I think Bark is going to be a game-changer in the world of text-to-audio technology."""().split("n")for s in script if s]int(0.5*SAMPLE_RATE)) line.split(": ") generate_audio(text, history_prompt=speaker_lookup[speaker], )audio_array, silence.copy()]pieces), rate=SAMPLE_RATE)"bark_generation.wav", SAMPLE_RATE, np.concatenate(pieces))#直接生成音频文件,可以加入谷歌云盘路径自动保存到谷歌云盘
如果对代码不是很了解的话,可以只修改里的发音人和里待生成音频的文本,其它可以不用管。
这个生成时间比较长,有需要的话可以体验一下。
原文:
© 版权声明
文章版权归作者所有,未经允许请勿转载。
相关文章
暂无评论...