Researchers from Apple and Renmin University of China have introduced VSSFlow, a new AI model that can generate both sound effects and speech from silent video using a single unified system. Unlike earlier approaches, jointly training on sound and speech improves performance for both, delivering results that rival task-specific models. The team has open-sourced the code, plans to release the model weights, and shared demos showing realistic audio and dialogue generated directly from video.
Source: 9to5Mac