On April 21, ShengShu Technology presented Vidu Q1 — a browser-based AI model that allows users to create five-second 1080p videos from two images and a text description. Thanks to the “First-to-Last Frame” approach, movements in the clip remain consistent even if the source images are unrelated, opening up new possibilities for independent editing with smooth scene transitions.
In the new version, audio is integrated directly into the workflow — text prompts allow you to generate background music or sound effects at 48 kHz, add multilayer tracks up to ten seconds long, and use time commands such as “0–2 s wind.” This eliminates the need for external sound libraries and speeds up the editing process.
Vidu Q1 also offers improved anime generation — with sharper lines and more stable frame blending, based on the image integrity preservation method first introduced in Vidu 1.5. According to internal VBench tests, the model outperforms Runway Gen-2, OpenAI Sora, and Luma Dream Machine in prompt accuracy and frame consistency.
One of the first companies to test Vidu Q1 was Aura Productions, which reported a several-fold reduction in post-production costs for a fifty-episode anime series. The model combines instant image transitions, fast rendering, advanced anime creation, and multilayer audio, giving small teams and bloggers access to cinematic processing capabilities without the need for visual effects or sound specialists.
ShengShu Technology, founded in Singapore in 2023, specializes in multimodal large language models. After opening the Vidu platform to commercial users in July 2024, the company already serves creators in over 200 regions and actively collaborates with film studios, advertising agencies, and social media to implement new Q1 features.