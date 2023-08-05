Google DeepMind has unveiled RT-2, a groundbreaking model that allows robots to directly perform complex tasks. This innovative model is constructed using web-based text and images and is trained using vast vision-language datasets available online.

By treating robot actions as a second language, which can be translated into text tokens, RT-2 facilitates the transfer of generalization, reasoning, and semantic comprehension from vision-language models to robotic policies. By incorporating visual and linguistic commands from the web, RT-2 learns to apply its physical skills in both familiar and unfamiliar scenarios.

RT-2 has demonstrated exceptional capabilities in object recognition, precise object positioning, and making complex inferences based on contextual information. It leverages web-scale vision-language pretraining and has been experimented on models consisting of up to an astounding 55 billion parameters. Through 6,000 robotic evaluations, RT-2 has showcased remarkable advancements in generalization across objects, scenes, and instructions, showcasing a wide range of emergent abilities.

However, while RT-2 has showcased improvements in generalization across semantic and visual concepts, it does not provide robots with new motion abilities. To resolve this limitation, experts suggest exploring diverse data-gathering approaches, such as analyzing human films, to acquire new skills.

Moreover, RT-2 faces computational challenges in real-time inference for high-frequency control. To address this, researchers propose addressing the issue through quantization and distillation approaches.

The integration of pretraining using vision-language models with robotic data has proven to be a highly promising strategy. It paves the way for the development of powerful vision-language-action (VLA) models, significantly enhancing the field of robot learning. This integration also opens up new avenues for future research and advancements in the field of robotics.