Google’s new Gemini AI can make videos and live images — and it's being pitched to businesses

It’s no secret that Google’s flagship AI chatbot Gemini has had some problems. Its production of historically inaccurate images forced Google parent Alphabet to temporarily suspend the product earlier this year.

But Google is trying to turn the page on its early AI mishaps. Keynote speakers at the tech giant’s annual Google Cloud Next conference in Las Vegas on Tuesday showed off new features of Gemini Pro 1.5, the latest version of its chatbot that’s now publicly available. Spectators watched while demonstrators muttered to themselves and typed prompts into the revamped AI chatbot to highlight its new tools — perhaps the most important of which is its ability to “ground” queries. “Grounding” means responses on Gemini Pro 1.5 are linked to “verifiable sources of information,” the company said Tuesday.

The announcements about Gemini 1.5 Pro included a range of updates to the chatbot as part of Google’s push to sell its AI products to corporate customers. Gemini now includes further capabilities for something called “long context understanding,” which basically means it can process a lot more information. And it has multimodal capabilities — or the ability to process not just text but also audio, video, and other formats to generate responses.

“With these two advances, enterprises can do things today that just weren’t possible with AI before,” Google CEO Sundar Pichai said during the presentation.

Businesses have already been piloting the product. Goldman Sachs, Mercedes, and Uber are among the early Gemini 1.5 Pro customers, Google said.

“Customers can process vast amounts of information in a single stream including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code, or over 700,000 words,” Google said in its announcement.

“For example,” the company added, “a gaming company could provide a video analysis of a player’s performance, along with tips to improve. Or an insurance company could combine video, images and text inputs to create an incident report, making the claims process easier.”

Google had some other AI announcements, too, a full list of which can be found on the Google Next 2024 conference website.

Google is launching an AI-powered video creation app, Google Vids. “I simply type in a prompt using an existing document for context. Now based on that prompt, Gemini suggests a narrative outline for the story that I could easily customize and edit,” said Aparna Pappu, Google’s VP of Google Workspace. “I choose an expressive style, and Vids works its magic.”

Google’s latest version of its AI generator, Imagen 2.0, which is powered by Gemini, has the ability to create live images from text prompts. It’s still in “preview” mode, but keynote speakers in Las Vegas showed off the feature.

“Marketing and creative teams can generate animated images from a text prompt, including product images, ads, GIFs, and storyboards,” Pappu said. Another demonstrator noted that the tool creates live images that would otherwise take “days or weeks of scouting and shooting”

Pappu also announced that Google’s AI-generated Imagen images will have the ability to be watermarked using Google DeepMind’s SynthID.

Click to rate this post!

[Total: 0 Average: 0]

●