The future of AI creativity: Google’s Veo 2, Imagen 3, and Whisk

Google has unveiled significant enhancements to its generative AI offerings, introducing a new version of its video model, Veo 2, and an improved Imagen 3 image generation model. The company also debuted Whisk, a fresh visual remixing tool designed to help users combine ideas with images instead of relying solely on text prompts.

Google-Imagen-Created-Image
The image was created using Google’s latest Imagen 3 generator.

Veo 2: Better Cinematic Realism and Customization
Veo 2, the latest evolution of Google’s video generation technology, is built to produce higher-quality, more realistic videos that reflect an improved understanding of real-world physics and human expression. Users can specify cinematic parameters—such as genres, lenses, and effects—and trust Veo 2 to deliver accordingly. For instance, the model can interpret a request for an “18mm lens” or a “low-angle tracking shot” and produce footage that reflects those choices with greater accuracy. Human evaluators have deemed Veo 2’s results state-of-the-art, and early examples show videos that range from a scientist working under fluorescent lights in a lab to a flock of flamingos wading in a tranquil lagoon.

Video of Veo in action:

Veo 2 further reduces the unwanted “hallucinations” common in generative models, such as extra limbs or objects that don’t belong in a scene. This shift toward realism marks a step forward in addressing longstanding challenges in video AI. As with Google’s other image and video models, Veo 2 outputs include an invisible SynthID watermark, designed to discreetly indicate that the content is AI-generated, which can help mitigate potential misinformation.

The new Veo 2 capabilities are first coming to Google’s VideoFX tool in Labs, and the company plans to widen access to more users over time. There are also plans to bring Veo 2 to YouTube Shorts and other Google products in the coming year.

Imagen 3: Brighter, Richer, More Faithful Imagery
Alongside Veo 2, Google rolled out an updated version of Imagen 3, its image-generation model. Imagen 3 excels at producing sharper, more detailed outputs, and it can now handle a wider range of artistic styles—from photorealism to impressionism, and even anime—while adhering more closely to user prompts. The model’s improvements mean that those seeking detailed, well-lit images or more nuanced compositions will find it easier to get the exact style they envision.

Created with Imagen 3 – The prompt I used was: “A beautiful snowy landscape”

Imagen 3 is now available globally in ImageFX, Google’s Labs-based image generation tool, covering more than 100 countries. The broader rollout signals Google’s growing confidence in the model’s stability and utility.

Whisk: Image-Guided Creativity
In addition to these model updates, Google introduced Whisk, a new Labs experiment that takes a fresh approach to prompting AI. Instead of relying on detailed text descriptions alone, users can supply images to guide the generative process, effectively merging subjects, scenes, and styles. Imagine combining a snapshot of your pet (the subject), a dreamy forest backdrop (the scene), and a vintage painterly style (the style). Whisk blends these inspirations into a new creation. While the results aren’t always perfect—users might notice subtle differences in size, color, or facial features—Whisk is designed for rapid exploration rather than pixel-perfect accuracy. Users can refine outputs by tweaking the underlying prompts that Whisk automatically generates, making it easier to iterate toward the desired image.

Video of Whisk:

This tool, currently available to select U.S.-based users in Google Labs, represents an effort to make AI more accessible and playful, lowering the barrier for those who find text-only prompts cumbersome.

Contextual Landscape: Recent AI Advances
Google’s announcement arrives at a time when other major players are also pushing the boundaries of AI. For instance, Google itself recently revealed Gemini 2.0, a model that aims to bring a new “agentic” era of AI capabilities, allowing models to take multiple steps, understand the world around them, and perform more complex tasks. Meanwhile, OpenAI introduced Sora, a video generation model that competes with Google’s Veo line. Though it may not yet match Veo 2’s nuanced grasp of cinematography and scene-building, Sora’s introduction highlights that the generative video race is heating up.

These concurrent developments underscore a broader trend: AI tools are evolving quickly, moving beyond basic text-only outputs to create dynamic visuals, blend styles, and even coordinate complex tasks across different domains. Google’s own set of releases, including Veo 2, Imagen 3, and Whisk, reflect a strategic push to stay at the forefront of this rapidly changing field. With more users getting a taste of these technologies through Labs experiments, the company is likely gathering invaluable feedback that will shape the future of generative AI for both casual creators and industry professionals alike.