Meta’s Latest AI Model Launches Seamless and Expressive Text-To-Speech Communication

Meta has introduced a collection of AI language translation models named Seamless Communication, including four distinct AI models. This feature enables more authentic and natural communication across languages. The suite comprises three primary models: SeamlessExpressive, SeamlessStreaming, and SeamlessM4T v2. These models are specifically developed to retain the nuances and expression of speech when translating across languages. The seamless communication Meta models provide speech and text translations with two-second latency (approx.). This enables seamless communication through speech and text.

Seamless Communication Meta Models: A New Text-to-Speech Generation

The launch of seamless communication Meta-models marks a significant stride in overcoming language barriers through fast, expressive, and high-quality AI translation. The suite includes:


A pivotal innovation is SeamlessExpressive, a model meticulously designed to uphold the nuances of speech like tone, pacing, emphasis, and emotion during language translation. Beyond mere vocabulary selection, these prosodic features of speech convey crucial signals about a speaker’s intention and emotional state. SeamlessExpressive is the first publicly accessible translation system that considers these intricate details explicitly. This model significantly improves speech-to-speech translation across six significant languages: English, Spanish, and Chinese.


The second element, SeamlessStreaming, reduces translation lag time to facilitate proximity real-time conversations. Using a methodology that initiates translation before a speaker concludes their speech, SeamlessStreaming achieves speech and text translations with approximately a two-second latency, all while ensuring precision. This seamless communication Meta model accommodates a broad spectrum of languages, offering automatic speech recognition for nearly 100 languages and speech-to-speech translation across 36 languages.

SeamlessM4T v2

The latest expressive and streaming models extend the capabilities of Meta’s SeamlessM4T v2, an enhanced multilingual multitask model initially launched in August. This foundational seamless communication Meta model effectively manages tasks like speech-to-text, text-to-speech, speech-to-speech translation, and various other modalities for nearly 100 languages, maintaining state-of-the-art quality. The V2 architecture improvements, which feature a non-autoregressive decoder, enhance consistency across diverse inputs and outputs.


