Google is training its robots with Gemini AI to improve their ability to navigate and complete tasks. DeepMind’s robotics team explained in New research paper How Gemini 1.5 Pro’s long context window — which limits how much information an AI model can process — could allow users to more easily interact with RT-2 robots using natural language instructions.
The technology works by taking a video tour of a specific area, such as a home or office space, and the researchers use Gemini 1.5 Pro software to have the robot “watch” the video to learn about the environment. The robot can then execute commands based on what it observes using verbal and/or visual outputs — such as directing users to a power outlet after showing it a phone and asking, “Where can I charge it?” DeepMind says its Gemini-powered robot has had a 90 percent success rate across more than 50 user instructions given in an operating area of more than 9,000 square feet.
The researchers also found “preliminary evidence” that Gemini 1.5 Pro enabled the robots to plan how to carry out instructions beyond just directing. For example, when a user with a lot of Coke cans on their desk asked the robot if their favorite drink was available, the team said Gemini “knew that the robot should navigate to the fridge, check for Coke cans, and then report back to the user with the result.” DeepMind says it plans to investigate these findings further.
Google’s video demos are impressive, though the clear clips after the robot acknowledges each request belie the fact that it takes 10 to 30 seconds to process those instructions, according to the research paper. It may be some time before we share our homes with more advanced environmental mapping robots, but at least these robots might be able to find our lost keys or wallets.
More Stories
It certainly looks like the PS5 Pro will be announced in the next few weeks.
Leaks reveal the alleged PS5 Pro name and design
Apple introduces AI-powered object removal in photos with latest iOS update