ai 写代码的工具文本转语音工具Bark生成长语音代码，突破14秒长度限制的方法

默认分类1年前 (2023)发布 admin

1,048 0 0

文本转语音工具bark生成长音频方法和生成短音频的方法是不一样的，前几天制作了用bark制作短音频的教程，文章链接：《最强文本转语音工具：Bark，本地安装+云端部署+在线体验详细教程，AI一键生成带语气情感的语音及歌唱》，查看本长语音生成教程前，建议先看一下上一篇教程，熟悉一下基础安装操作。中bark长语音生成说明：，今天我们来用bark生成长度超过14秒的长音频，下面演示一下具体操作。

1、 colab 云端部署教程

首先打开谷歌，网站地址：，点击【文件】-【新建笔记本】。

先链接谷歌云盘，然后新建代码输入框输入下面代码安装bark

pip install git+https://github.com/suno-ai/bark.git

生成长语音有三种模式，1、简单模式，2、高级模式，3、对话模式

先说第一种简单模式，是使用 nltk 将较长的文本拆分成句子，并一个一个地生成句子。

先运行如下代码，安装完整nltk库

ai 写代码的工具文本转语音工具Bark生成长语音代码，突破14秒长度限制的方法

import nltk
nltk.download('punkt')

punkt下载完成后，运行如下代码：

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"from IPython.display import Audioimport nltk  # we'll use this to split into sentencesimport numpy as npfrom bark.generation import (
    generate_text_semantic,
    preload_models,)from bark.api import semantic_to_waveformfrom bark import generate_audio, SAMPLE_RATE
preload_models()script = """
Hey, have you heard about this new text-to-audio model called "Bark"? 
Apparently, it's the most realistic and natural-sounding text-to-audio model 
out there right now. People are saying it sounds just like a real person speaking. 
I think it uses advanced machine learning algorithms to analyze and understand the 
nuances of human speech, and then replicates those nuances in its own speech output. 
It's pretty impressive, and I bet it could be used for things like audiobooks or podcasts. 
In fact, I heard that some publishers are already starting to use Bark to create audiobooks. 
It would be like having your own personal voiceover artist. I really think Bark is going to 
be a game-changer in the world of text-to-audio technology.
""".replace("n", " ").strip()sentences = nltk.sent_tokenize(script)SPEAKER = "v2/en_speaker_6"silence = np.zeros(int(0.25 * SAMPLE_RATE))  # quarter second of silencepieces = []for sentence in sentences:
    audio_array = generate_audio(sentence, history_prompt=SPEAKER)
    pieces += [audio_array, silence.copy()]Audio(np.concatenate(pieces), rate=SAMPLE_RATE)

这个段代码的意思就是，将的文本内容生成语音，可以设置发音人，打开下面链接可以查看所有发音人列表。

将这段文本转为语音用了非常长的时间，已经用时1小时32分钟了，还没有完成，等不了了，这个模型对电脑配置要求确实有点高。

第二种高级模式

ai 写代码的工具文本转语音工具Bark生成长语音代码，突破14秒长度限制的方法

有时 Bark 会在提示结束时产生一些额外的音频。我们可以通过降低 bark 停止生成文本的阈值来解决这个问题。我们在 ic 中使用参数调整。

生成音频的完整代码：

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"from IPython.display import Audiofrom scipy.io.wavfile import write as write_wavimport nltk  # we'll use this to split into sentencesimport numpy as npfrom bark.generation import (
    generate_text_semantic,
    preload_models,)from bark.api import semantic_to_waveformfrom bark import generate_audio, SAMPLE_RATE
preload_models()script = """
Hey, have you heard about this new text-to-audio model called "Bark"? 
Apparently, it's the most realistic and natural-sounding text-to-audio model 
out there right now. People are saying it sounds just like a real person speaking. 
I think it uses advanced machine learning algorithms to analyze and understand the 
nuances of human speech, and then replicates those nuances in its own speech output. 
It's pretty impressive, and I bet it could be used for things like audiobooks or podcasts. 
In fact, I heard that some publishers are already starting to use Bark to create audiobooks. 
It would be like having your own personal voiceover artist. I really think Bark is going to 
be a game-changer in the world of text-to-audio technology.
""".replace("n", " ").strip()sentences = nltk.sent_tokenize(script)GEN_TEMP = 0.6SPEAKER = "v2/en_speaker_6" #这里修改发音人silence = np.zeros(int(0.25 * SAMPLE_RATE))  # quarter second of silencepieces = []for sentence in sentences:
    semantic_tokens = generate_text_semantic(
        sentence,
        history_prompt=SPEAKER,
        temp=GEN_TEMP,
        min_eos_p=0.05,  #修改前后多余声音参数在这里
    )
    audio_array = semantic_to_waveform(semantic_tokens, history_prompt=SPEAKER,)
    pieces += [audio_array, silence.copy()]Audio(np.concatenate(pieces), rate=SAMPLE_RATE)write_wav("bark_generation.wav", SAMPLE_RATE, np.concatenate(pieces))

可以只修改(待转换语音的文本)、(前后多余声音微调参数)、(发音人)的值，其它不了解的可以不用管。

第三种是对话模式

可以自定义对话内容，不同人设置不同的发音，以下是完整示例代码：

"CUDA_VISIBLE_DEVICES"] = "0"import Audiowavfile import write as write_wav# we'll use this to split into sentencesimport (
    generate_text_semantic, semantic_to_waveform generate_audio, SAMPLE_RATE
preload_models"Samantha": "v2/en_speaker_9", "John": "v2/en_speaker_2"}# Script generated by chat GPT"""
Samantha: Hey, have you heard about this new text-to-audio model called "Bark"?
John: No, I haven't. What's so special about it?
Samantha: Well, apparently it's the most realistic and natural-sounding text-to-audio model out there right now. People are saying it sounds just like a real person speaking.
John: Wow, that sounds amazing. How does it work?
Samantha: I think it uses advanced machine learning algorithms to analyze and understand the nuances of human speech, and then replicates those nuances in its own speech output.
John: That's pretty impressive. Do you think it could be used for things like audiobooks or podcasts?
Samantha: Definitely! In fact, I heard that some publishers are already starting to use Bark to create audiobooks. And I bet it would be great for podcasts too.
John: I can imagine. It would be like having your own personal voiceover artist.
Samantha: Exactly! I think Bark is going to be a game-changer in the world of text-to-audio technology."""().split("n")for s in script if s]int(0.5*SAMPLE_RATE)) line.split(": ") generate_audio(text, history_prompt=speaker_lookup[speaker], )audio_array, silence.copy()]pieces), rate=SAMPLE_RATE)"bark_generation.wav", SAMPLE_RATE, np.concatenate(pieces))#直接生成音频文件，可以加入谷歌云盘路径自动保存到谷歌云盘

如果对代码不是很了解的话，可以只修改里的发音人和里待生成音频的文本，其它可以不用管。

这个生成时间比较长，有需要的话可以体验一下。

原文：