Meta acquires voice-AI startup PlayAI and the talent behind it

Published: 02:03, July 14, 2025

Meta has acquired PlayAI, a voice-AI startup that offers ultra-low-latency neural text-to-speech, rapid voice-cloning, and real-time speech APIs for conversational agents.

In an internal memo seen by Bloomberg, the “entire PlayAI team” will join Meta next week. They will report to Johan Schalkwyk, who recently joined Meta from voice AI startup Sesame AI.

Financial terms of the deal have not yet been disclosed.

As voice-driven interactions become more common, the acquisition is hardly unexpected.

Why PlayAI?

Co-founded by Mahmoud Felfel and Hammad Syed, PlayAI started as a 2016 Chrome plug-in called Play.ht that simply read articles aloud. To spread the word they offered bloggers an embeddable audio player branded “powered by Play.ht.” Interest followed, and so did paid subscriptions.

Early on, the team relied on outside text-to-speech APIs, but users wanted finer control, such as edits, pronunciation tweaks, and tight timing. That need pushed PlayAI to build its own text-to-speech (TTS) models. The bet paid off.

PlayAI logo.

To keep the edge they’d just won, the team entered a rapid-fire R&D cycle, shipping successive model generations, raising a $21 million seed round in November 2024, and publishing a steady stream of performance papers.

It recently developed a clever way of making AI voices sound more human and reliable. Instead of generating just one version of speech, the system creates several different versions simultaneously. A specialized aligner model then picks the best performance, rejecting any low-performers (ones that stumble over words, add strange pauses, etc.)

The result is a word-error rate of just 2.2%, roughly two misread words in every hundred, on standard evaluation sets. But here’s what’s really impressive: they achieved this accuracy without making the voices sound robotic or flat. While they could technically push accuracy even higher, they deliberately chose to keep the voices expressive and natural-sounding.

The system works fast enough for real-time conversations, processing and delivering speech as quickly as needed for natural back-and-forth dialogue.

This level of robustness is likely what piqued Meta’s interest.

A strategic piece in Meta’s ambitious AI plans

Meta’s internal memo specifically said that PlayAI’s “work in creating natural voices, along with a platform for easy voice creation, is a great match for our work and road map, across AI Characters, Meta AI, Wearables and audio content creation.”

The PlayAI team joins a company in the midst of a radical AI overhaul. The new Meta Superintelligence Labs (MSL), led by Alexandr Wang, consolidates all of Meta’s AI efforts under one roof with an ambitious goal of building artificial general intelligence and delivering “personal superintelligence for everyone”.

And this restructuring appears to be driven by a sense of urgency from the top. Reports indicate that Zuckerberg has been frustrated with the performance of Meta in AI and has since taken a more hands-on “founder” role in recruitment. The creation of MSL and the aggressive poaching campaign are direct responses, designed to close the perceived gap with competitors.

PlayAI fits perfectly into this new structure. The startup has demonstrated the exact kind of focused, product-ready technology that a giant like Meta can immediately integrate and scale.

So, is Meta after the tech or the talent?

In this case it’s both, but with different weights.

Talent is front-and-center. The internal memo stresses that the entire PlayAI staff will be folded into Johan Schalkwyk’s new speech-tech group, mirroring Meta’s recent pattern of buying small AI shops chiefly to stock its benches with specialised researchers and engineers.
The IP comes along for the ride (and matters). Unlike some pure “acquihires,” PlayAI already sells production-grade APIs, owns proprietary low-latency TTS models, and offers a ready-made agent framework. Meta’s memo calls that platform “a great match for our roadmap,” signalling that the codebase and pretrained models could very well be integrated rather than discarded.

Why talent slightly outweighs tech:

Meta likely has more than enough compute to retrain voice models in-house; the harder asset to replicate quickly is a team that already knows how to hit super-low latency, multi-speaker dialogue, and one-tap voice cloning.
The new hires slot neatly into projects like Ray-Ban glasses, AI Characters, and Meta AI, and other audio-content projects, areas where iterative R&D speed matters.

So, while Meta gains both the tech and the team, the real prize is the know-how to scale real-time, high-fidelity voice-AI.

Joseph Nordqvist

Before founding MBN in 2013, Joseph wrote for one of the world’s largest independent medical news websites. He holds a bachelor’s in Marketing and Publicity, a PGP in AI and ML from UT Austin, and is currently completing an MSc in Computer Science at the University of York.