sync. recently launched!

Launch YC: sync. – an api for realtime lipsync

"Animate anyone in any video to say (or sing) anything you want in any language."
sync.labs is building audio-visual models to generate, modify, and synthesize humans in video.

Founded by
Prady Modukuru, Prajwal K R, and Rudrabha Mukhopadhyay


They’ve built a state-of-the-art lip-sync model – and they’re building towards real-time face-to-face conversations w/ AI indistinguishable from humans 🦾

Try Sync's playground here:

How does it work?

Theoretically, their models can support any language — they learn phoneme / viseme mappings (the most basic unit / “token” of how sounds we make map to the shapes our mouths make to create them). It’s simple, but a start towards learning a foundational understanding of humans from video.

Why is this useful?

[1] They can dissolve language as a barrier

Check out how they used it to dub the entire 2-hour Tucker Carlson interview with Putin speaking fluent English.

Imagine millions gaining access to knowledge, entertainment, and connection — regardless of their native tongue.

Realtime at the edge takes us further — live multilingual broadcasts + video calls, even walking around Tokyo w/ a Vision Pro 2 speaking English while everyone else Japanese.

[2] They can move the human-computer interface beyond text-based-chat

Keyboard / mice are lossy + low bandwidth. Human communication is rich and goes beyond just the words we say. What if we could compute w/ a face-to-face interaction?

Maybe embedding context around expressions + body language in inputs / outputs would help us interact w/ computers in a more human way. This thread of research is exciting.

[3] and more

Powerful models small enough to run at the edge could unlock a lot:


Extreme compression for face-to-face video streaming

Enhanced, spatial-aware transcription w/ lip-reading

Detecting deepfakes in the wild

On-device real-time video translation


Learn More

May 17, 2024
