A study published in May explores the idea that rather than being a threat to humanity, generative artificial intelligence (AI) may be a danger to itself. The study, which has not yet undergone peer review, suggests that through training with data generated by AI itself, these tools accumulate errors and eventually cease to function properly. This phenomenon is referred to as “model collapse” and is rooted in the mathematics that form the basis of generative AI, whether used to produce images, sound, or text.

Generative AI systems create content based on textual input, known as a prompt. However, they also require training with contextualized and formatted datasets that are representative of the desired output. For example, systems generating text like Bard or ChatGPT need millions of pages of data to predict the next word after a given prompt accurately. Similarly, image-generating AIs require vast quantities of annotated visual elements to automatically fill in missing parts of an image based on probability.

The raw material for these datasets is initially generated by human activity and largely retrieved from the internet. Textual data can include social media posts, buyer reviews on e-commerce sites, or news articles. Image datasets may consist of facial databases or annotated collections of satellite images.

Generative AIs are primarily systems of statistical prediction, where the output is based on probabilities established from real-world data. However, if AIs were trained using their own output, rare occurrences are at risk of disappearing after multiple generations.

“Models tend to overestimate frequent events and underestimate implausible ones. With each step of recursion, this amplifies,” says Nicolas Papernot, a researcher at the University of Toronto and co-author of the study. Comparing it to making copies with a photocopier, the details present in the original document gradually diminish with each subsequent copy.

Thus, training AI models solely with their own output could lead to a loss of diversity and originality over time. However, further research and evaluation are necessary to fully understand the implications of this phenomenon.

Sumber: [sumber]