Google announced Tuesday at Google I/O 2024 View, a new AI video synthesis model that can create high-resolution videos from text, images, or video prompts, similar to Sora from OpenAI. It can create 1080p videos lasting more than a minute and edit videos from written instructions, but it has not yet been released for widespread use.
Veo reportedly includes the ability to edit existing videos using text commands, maintain visual consistency across frames, create video sequences of up to 60 seconds in length and more than one prompt or series of prompts that make up a narrative. The company says it can create detailed scenes and apply cinematic effects such as time lapses, aerial shots, and various visual styles
Since DALL-E 2’s launch in April 2022, we’ve seen a showcase of new photomontage and videomontage modules that aim to allow anyone who can write a written description to create a detailed photo or video. Although neither technology is fully optimized, AI image and video generators are steadily growing more capable.
In February, we covered a preview of OpenAI’s Sora video generator, which many at the time believed represented the best AI-powered video compositing the industry had to offer. It impressed Tyler Perry enough that he halted expansions of his film studio. However, to date, OpenAI has not provided public access to the tool, instead, limiting its use to a select group of testers.
Now, at first glance, Google’s Veo appears to be capable of producing videos similar to what Sora has achieved. We haven’t tried it ourselves, so we can only check out selected demo videos provided by the company On its website. This means that anyone viewing it should take Google’s claims with a grain of salt, because the creation results may not be typical.
Typical videos from Veo include a cowboy on horseback, a quick shot on a suburban street, a kebab grilled on the grill, a time-lapse of a sunflower opening, and more. There is clearly an absence of any detailed depiction of humans, which has historically been difficult for AI-powered image and video models to create without obvious distortions.
Google says Veo builds on the company’s previous video creation models, including Generative Query Network (GQN), DVD-GAN, and Imagen-Video. VinakiWalt, VideoPoet and Lumiere. To enhance quality and efficiency, Veo training data includes more detailed video feedback, and uses compressed “latent” video representations. To improve the quality of Veo’s video creation, Google has included more detailed captions for the videos used to train Veo, allowing the AI to interpret prompts more accurately.
Veo also seems notable because it supports filmmaking commands: “Given a video input command and an editing command, such as adding a kayak to an aerial shot of a coastline, Veo can apply that command to the raw video and create a new edited video,” the company says.
While the demos look impressive at first glance (especially compared to Will Smith eating spaghetti), Google acknowledges the difficulty of creating an AI video. “Maintaining visual consistency can be a challenge for video creation models,” the company wrote. “Characters, objects, or even entire scenes can flicker, jump, or shift unexpectedly between frames, disrupting your viewing experience.”
Google has tried to mitigate these drawbacks with “sophisticated latent propagation converters,” which is basically meaningless marketing talk with no details. But the company is confident enough in the model it is Working with actor Donald Glover and his studio, Gilga, to create an AI-generated explainer film that will premiere soon.
Initially, Veo will be available to selected creators through it Video FX, a new experimental tool available on Google’s AI Test Kitchen, labs.google. Creators can join the VideoFX waitlist to gain access to Veo features in the coming weeks. Google plans to integrate some of Veo’s capabilities into YouTube Shorts and other products in the future.
There’s no information yet on where Google obtained Veo’s training data (if we had to guess, YouTube is likely involved). But Google says it’s taking a “responsible” approach with Veo. According to the company, “Videos created by Veo are watermarked with Synthide IDour cutting-edge tool for watermarking and identifying AI-generated content, passing it through security filters and preservation checks that help mitigate privacy, copyright, and bias risks.”
“Analyst. Web buff. Wannabe beer trailblazer. Certified music expert. Zombie lover. Explorer. Pop culture fanatic.”
More Stories
It certainly looks like the PS5 Pro will be announced in the next few weeks.
Leaks reveal the alleged PS5 Pro name and design
Apple introduces AI-powered object removal in photos with latest iOS update