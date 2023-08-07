OpenAI has introduced GPTBot, an automated web crawler, as part of its efforts to collect publicly available data for training AI models. The company aims to ensure transparency and responsible data usage by filtering out paywall-protected sources and removing personally identifiable information (PII) and policy-violating text.

Website owners can choose to disable GPTBot’s access by adding the bot to their site’s robot.txt file. OpenAI hopes that website owners will voluntarily limit or deny access to their sites instead of opting in for training.

Critics, however, have raised ethical concerns about OpenAI’s web scraping approach. They argue that OpenAI should acknowledge the specific websites used to train its models and practice proper citation.

In addition to GPTBot, OpenAI has filed for a trademark for “GPT-5,” indicating its active work on the next version of the GPT-4 AI model. Speculation suggests that GPT-5 could bring OpenAI closer to achieving artificial general intelligence (AGI), in line with the company’s long-term goals.

It is worth noting that OpenAI has discontinued its AI Classifier, which was previously used to detect GPT-generated text. GPTBot’s purpose is to gather more data from the internet for training the upcoming GPT-5 model.