Meta Develops ‘Voicebox’, a Cutting-Edge Generative AI Model Aiming to Revolutionize the Field of Speech Generation
Posted Date – 12:15 AM, Mon – 6/19/23

San Francisco: Meta has developed a cutting-edge generative AI model “Voicebox” that aims to revolutionize the field of speech generation.
“We developed Voicebox, the first model that generalizes to speech generation tasks that were not specifically trained to perform with state-of-the-art performance,” Meta said in a blog post.
According to the company, Voicebox can generate images and text in a variety of styles, and it can create output from scratch or modify samples given to it.
However, instead of creating a picture or a piece of text, Voicebox generates high-quality audio clips.
The model supports speech synthesis in six languages, including English, French, German, Spanish, Polish, and Portuguese, as well as noise removal, content editing, style transfer, and diverse sample generation.
Additionally, Meta says, Voicebox uses a new method to learn only from raw audio and accompanying transcriptions.
Unlike autoregressive models for audio generation, Voicebox can modify any part of a given sample, not just the end of a given audio clip.
Additionally, the tech giant says Voicebox is trained to predict speech snippets given surrounding speech and transcripts of the snippet.
Once a model has learned to populate speech from context, it can be applied to a wide range of speech generation tasks, including generating parts of a recording without recreating the entire recording.
This versatility enables Voicebox to excel in a variety of tasks including – in-context text-to-speech synthesis, cross-language style transfer, speech noise reduction and editing, and diverse speech sampling.