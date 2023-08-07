Creating chatbots and virtual agents that can communicate naturally with people has been a long-standing goal of artificial intelligence. While present-day agents can follow simple commands, such as picking up specific objects, interactive agents need to understand a wider range of language usage.

Most of the language that children encounter in texts or through conversations relates to conveying information about the world. Researchers have been exploring ways to enable agents to speak in different languages. Reinforcement learning (RL) has been used to train language-conditioned agents, but current techniques typically focus on specific tasks and instructions.

Mapping language directly to actions poses challenges since natural language serves various roles in the real world. For example, if the task at hand is cleaning, the agent should continue with the next cleaning step, whereas if it involves serving dinner, the agent should collect the bowls. The correlation between language and action becomes weaker when the language doesn’t explicitly refer to the task. To address this, researchers propose that the primary function of language for agents is to aid future prediction.

By enabling agents to predict future outcomes based on language inputs, agents can better understand how language interacts with the world. This learning approach allows agents to anticipate changes in their environment by using prior knowledge, such as understanding that wrenches can be used to tighten nuts. Language instructions assist agents in predicting rewards, contributing to a richer learning signal.

Researchers from UC Berkeley introduce Dynalang, an agent that learns a language and visual model of the world through online experiences. The agent separates learning how to behave using the model from learning how to model the world with language. The visual and textual inputs are compressed into a latent space, allowing the agent to anticipate future representations by training the world model. Based on these representations, the policy is trained to make decisions that maximize task reward.

Dynalang can be pre-trained on single modalities without activities or task rewards, and it has a unified framework for language production. The agent’s language model is influenced by perception, helping it communicate about the environment. Dynalang achieves impressive results across various domains, learning to understand linguistic cues, environment dynamics, and corrections to complete tasks efficiently.

Overall, the findings demonstrate Dynalang’s ability to comprehend different forms of language and accomplish diverse tasks. It outperforms state-of-the-art RL algorithms and task-specific designs, showcasing the potential of combining language learning with pretraining in a single model.