The future of AI creativity: Google’s Veo 2, Imagen 3, and Whisk

17:42 PM, December 16, 2024

Google has unveiled significant enhancements to its generative AI offerings, introducing a new version of its video model, Veo 2, and an improved Imagen 3 image generation model. The company also debuted Whisk, a fresh visual remixing tool designed to help users combine ideas with images instead of relying solely on text prompts.

Google-Imagen-Created-Image — The image was created using Google’s latest Imagen 3 generator.

Veo 2: Better Cinematic Realism and Customization
Veo 2, the latest evolution of Google’s video generation technology, is built to produce higher-quality, more realistic videos that reflect an improved understanding of real-world physics and human expression. Users can specify cinematic parameters—such as genres, lenses, and effects—and trust Veo 2 to deliver accordingly. For instance, the model can interpret a request for an “18mm lens” or a “low-angle tracking shot” and produce footage that reflects those choices with greater accuracy. Human evaluators have deemed Veo 2’s results state-of-the-art, and early examples show videos that range from a scientist working under fluorescent lights in a lab to a flock of flamingos wading in a tranquil lagoon.

Video of Veo in action:

Veo 2 further reduces the unwanted “hallucinations” common in generative models, such as extra limbs or objects that don’t belong in a scene. This shift toward realism marks a step forward in addressing longstanding challenges in video AI. As with Google’s other image and video models, Veo 2 outputs include an invisible SynthID watermark, designed to discreetly indicate that the content is AI-generated, which can help mitigate potential misinformation.

The new Veo 2 capabilities are first coming to Google’s VideoFX tool in Labs, and the company plans to widen access to more users over time. There are also plans to bring Veo 2 to YouTube Shorts and other Google products in the coming year.

Imagen 3: Brighter, Richer, More Faithful Imagery
Alongside Veo 2, Google rolled out an updated version of Imagen 3, its image-generation model. Imagen 3 excels at producing sharper, more detailed outputs, and it can now handle a wider range of artistic styles—from photorealism to impressionism, and even anime—while adhering more closely to user prompts. The model’s improvements mean that those seeking detailed, well-lit images or more nuanced compositions will find it easier to get the exact style they envision.

Created with Imagen 3 – The prompt I used was: “A beautiful snowy landscape”

Imagen 3 is now available globally in ImageFX, Google’s Labs-based image generation tool, covering more than 100 countries. The broader rollout signals Google’s growing confidence in the model’s stability and utility.

Whisk: Image-Guided Creativity
In addition to these model updates, Google introduced Whisk, a new Labs experiment that takes a fresh approach to prompting AI. Instead of relying on detailed text descriptions alone, users can supply images to guide the generative process, effectively merging subjects, scenes, and styles. Imagine combining a snapshot of your pet (the subject), a dreamy forest backdrop (the scene), and a vintage painterly style (the style). Whisk blends these inspirations into a new creation. While the results aren’t always perfect—users might notice subtle differences in size, color, or facial features—Whisk is designed for rapid exploration rather than pixel-perfect accuracy. Users can refine outputs by tweaking the underlying prompts that Whisk automatically generates, making it easier to iterate toward the desired image.

Video of Whisk:

This tool, currently available to select U.S.-based users in Google Labs, represents an effort to make AI more accessible and playful, lowering the barrier for those who find text-only prompts cumbersome.

Contextual Landscape: Recent AI Advances
Google’s announcement arrives at a time when other major players are also pushing the boundaries of AI. For instance, Google itself recently revealed Gemini 2.0, a model that aims to bring a new “agentic” era of AI capabilities, allowing models to take multiple steps, understand the world around them, and perform more complex tasks. Meanwhile, OpenAI introduced Sora, a video generation model that competes with Google’s Veo line. Though it may not yet match Veo 2’s nuanced grasp of cinematography and scene-building, Sora’s introduction highlights that the generative video race is heating up.

These concurrent developments underscore a broader trend: AI tools are evolving quickly, moving beyond basic text-only outputs to create dynamic visuals, blend styles, and even coordinate complex tasks across different domains. Google’s own set of releases, including Veo 2, Imagen 3, and Whisk, reflect a strategic push to stay at the forefront of this rapidly changing field. With more users getting a taste of these technologies through Labs experiments, the company is likely gathering invaluable feedback that will shape the future of generative AI for both casual creators and industry professionals alike.

Share this: