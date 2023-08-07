OpenAI has recently announced the launch of GPTBot, a web crawler that is utilized to gather information from the internet. This data is then used for AI applications such as ChatGPT, allowing the AI model to provide accurate and AI-generated responses to questions or prompts.

GPTBot operates under the user-agent token “GPTBot” and its full user-agent string is “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)”. This user-agent string provides identification for GPTBot when accessing websites.

Website owners have the option to allow or block GPTBot from accessing their websites by adding specific instructions in the website’s robots.txt file. To completely disallow GPTBot from accessing a website, the following can be added to the robots.txt file:

User-agent: GPTBot

Disallow: /

Alternatively, if selective access is desired, specific directories can be allowed while others are blocked. For example:

User-agent: GPTBot

Allow: /directory-1/

Disallow: /directory-2/

OpenAI has also shared the IP range used by GPTBot. While the current list only includes one IP range, it is possible that more IP ranges will be added in the future.

This announcement holds significance for website owners who have concerns about their content being utilized by AI models. By disallowing GPTBot, website owners can ensure their content is not accessed or consumed by OpenAI.

It is worth noting that these actions mirror the protocol followed to block other web crawlers such as GoogleBot and BingBot. These protocols are necessary as search engine providers and AI developers seek alternative methods to the traditional robots.txt file.

For more information about GPTBot and its functionalities, detailed documentation is available on the OpenAI website.