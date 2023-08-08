Creating bots that can interact naturally with people using language has been a long-standing goal of artificial intelligence. While current embodied agents can follow simple commands, interactive agents need to understand language in a broader sense. This includes knowledge transmission, situational information, and coordination.

To teach agents to communicate in different languages, researchers have explored reinforcement learning techniques. However, most existing techniques focus on training agents to perform specific tasks based on instructions. This poses challenges as natural language has various roles in the real world.

For instance, if the agent is cleaning, it should continue with the next cleaning step, but if it is serving dinner, it should collect the bowls. Language alone cannot determine the best course of action without considering the context. Therefore, mapping language directly to actions is not an effective learning signal.

Instead, researchers propose that language’s main function is to aid in future prediction. For example, the phrase “I put the bowls away” allows the agent to predict future observations accurately. This prediction capability enables agents to anticipate environmental changes, improving their understanding of language and how it interacts with the world.

To implement this approach, researchers from UC Berkeley introduce Dynalang. Dynalang is an agent that learns a language and visual model of the world through online experience. It separates learning to behave using the model from learning to model the world with language. The agent receives visual and textual inputs, which are compressed into a latent space. It then trains the world model to anticipate future representations. Using this representation, the policy is trained to make decisions that maximize task reward.

Dynalang can be pretrained on single modalities and does not require task rewards. It also combines language production and perception, allowing the agent to communicate about the environment through language. Through testing in various domains, Dynalang demonstrates its ability to understand different forms of language and perform various tasks. It outperforms other reinforcement learning algorithms in challenging scenarios.

The contributions of this research include the introduction of Dynalang, showcasing its superior performance compared to existing algorithms, and highlighting the potential for combining language creation and pretraining in a single model.