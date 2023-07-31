Human input plays a crucial role in enhancing social dialogue models. In the field of reinforcement learning, researchers have made significant progress in learning from feedback by incorporating human annotations. These annotations can come in the form of numerical scores, rankings, natural language comments, or binary assessments provided by users. To gather these signals, most studies rely on crowdworkers, as natural users may not want to contribute or may provide inaccurate information.

Researchers from New York University and Meta AI conducted a study to explore the potential of utilizing implicit signals from real discussions between models and organic users to improve dialogue models. The researchers believe that organic users, despite not providing explicit annotations, closely represent the data distribution for future deployment. Additionally, using implicit signals from previous dialogue episodes eliminates the need for expensive crowdsourcing.

The researchers investigated whether the dialogue model could be adjusted to utilize implicit feedback signals such as the quantity, length, sentiment, or responsiveness of upcoming human answers. They used publicly available, de-identified data from the BlenderBot online deployment for their analysis. Through training sample and rerank models and comparing various implicit feedback signals, they found that their novel models outperformed the baseline replies according to automated and human evaluations.

However, the researchers also considered the potential negative effects of optimizing for certain signals. For example, optimizing for longer discussion lengths might lead the model to offer contentious opinions or respond in a hostile manner. On the other hand, optimizing for a favorable response or mood reduced these negative behaviors compared to the baseline model. The researchers concluded that while implicit feedback from humans is a valuable training signal for enhancing overall performance, careful consideration of the specific signals used is necessary due to the resulting behavioral consequences.

In conclusion, this study highlights the potential of utilizing implicit feedback signals from natural user discussions to enhance dialogue models. By leveraging these signals, researchers can improve model performance while reducing costs associated with crowdsourcing. However, careful examination of the behaviors resulting from specific feedback signals is crucial to avoid undesirable outcomes.