New AI model can generate sound and speech from silent videos

10/02/202610/02/2026

Researchers from Apple and Renmin University of China have introduced VSSFlow, a new AI model that can generate both sound effects and speech from silent video using a single unified system. Unlike earlier approaches, jointly training on sound and speech improves performance for both, delivering results that rival task-specific models. The team has open-sourced the code, plans to release the model weights, and shared demos showing realistic audio and dialogue generated directly from video.

Source: 9to5Mac

Related Posts