Microsoft VALL-E: This new AI tool can simulate any voice in just three seconds

Microsoft VALL-E: This new AI tool can simulate any voice in just three seconds

Microsoft Corporation has unveiled VALL-E, a new text-to-speech artificial intelligence (AI) tool which can accurately mimic a person’s voice with a three-second audio sample. Once familiar with a particular voice, VALL-E can create sounds of that person speaking in any situation while aiming to capture their emotional tone.

VALL-E’s developers believe it can be used for high-quality text-to-speech applications, speech editing, where a recording of a person can be edited and changed from a text transcript (making them say something they did not say), and audio content creation when combined with other generative AI models like GPT-3.

Microsoft used the LibriLight audio library from Meta to train the speech synthesis capabilities of VALL-E. Over 7,000 speakers contributed 60,000 hours of English-language voice, mostly from LibriVox public domain audiobooks. For VALL-E to perform well, the voice in the three-second sample must sound much like the voice in the learning algorithm.

Microsoft VALL-E based on EnCodec by Meta

Based mainly on EnCodec, which Meta unveiled in October 2022, VALL-E is a “neural codec language model.” Unlike previous text-to-speech techniques, which typically synthesize speech by altering waveforms, VALL-E generates distinct audio codec codes from text and acoustic stimuli.

It analyses how someone sounds, utilizes EnCodec to separate the pertinent information into discrete parts (known as “tokens”) and uses training data to compare what it “knows” about how the voice may sound if it spoke more than the three-second sample.

Concerns exist, though, regarding the ethical implications of this technological solution. The voices produced by VALL-E and comparable technology will sound more convincing, opening the door for spam calls that realistically replicate the sounds of real persons a potential victim knows.


Team Eela

TechEela, the Bedrock of MarTech and Innovation, is a Digital Media Publication Website. We see a lot around us that needs to be told, shared, and experienced, and that is exactly what we offer to you as shots. As we like to say, “Here’s to everything you ever thought you knew. To everything, you never thought you knew”

Leave a Reply

Your email address will not be published. Required fields are marked *