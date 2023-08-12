The ability of large language models (LLMs) to generate coherent, contextually relevant, and semantically meaningful text has advanced significantly. However, it is still common for LLMs to generate inaccurate, doubtful, and nonsensical results. To address this challenge, researchers at Meta AI Research have developed Shepherd, a language model designed to evaluate outputs produced by other models.

Shepherd aims to provide natural language feedback on model-generated output text across various domains. By identifying specific problems such as factuality, logical flaws, coherence, and alignment, Shepherd can suggest modifications to enhance the quality of the generated text.

To train and evaluate Shepherd, researchers created a high-quality feedback dataset consisting of two unique sets: community feedback from online forums and human-annotated input gathered from generations across multiple tasks. The community feedback captured diverse interactions, while the human-annotated data provided more formal evaluations.

After training on a combination of these datasets, Shepherd outperformed ChatGPT models on several downstream tasks. Comparisons were made with other models like Alpaca, SelFee, and ChatGPT to assess the performance of Shepherd’s feedback. It was found that Shepherd’s criticisms were often favored over those of other models due to its ability to provide accurate judgment.

The study also revealed that the use of high-quality human-annotated data for fine-tuning models improved overall model performance. Shepherd’s feedback was consistently more consistent in various assessment circumstances and showcased better judgment.

In conclusion, Shepherd offers comprehensive criticisms of language model-generated content, effectively raising its quality. The study demonstrates the effectiveness of Shepherd in various generating tasks by carefully analyzing the generated feedback. The development of a top-notch feedback dataset is an important contribution to future research in this field.