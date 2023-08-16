CityLife

Protecting Your Data From AI Crawlers: What You Need to Know

ByRobert Andrew

Aug 16, 2023
Earlier this month, Zoom users expressed their concerns when the company updated its terms of service to allow the use of video call data for training artificial intelligence systems. The backlash prompted Zoom to backtrack on its decision. This incident raises several questions about how comfortable we are with sharing our information with AI systems, especially given the current uncertainties surrounding their capabilities and potential usage.

Videoconferencing platforms like Zoom, FaceTime, and Google Meet collect detailed data about our faces, homes, and voices, making them some of the most personal and data-rich services we use. It is disconcerting to think that this data can be mined to train AI models for any purpose a tech company desires. This situation presents an opportunity for us to reevaluate what information we are willing to provide to tech giants who have been collecting our personal data for decades.

If you post pictures or words on public-facing platforms or websites, chances are that your information is being scraped by AI crawlers gathering data for AI companies. Websites, personal blogs, and online publications are particularly vulnerable, as their content is often used to train AI systems. OpenAI’s ChatGPT, Google’s Bard, and Meta’s LLaMa are examples of AI systems known as large language models (LLMs) that train on massive datasets consisting of images and words from various sources. Google’s “Colossal Clean Crawl Corpus” alone spans 15 million websites.

To block AI crawlers, website owners can add an entry to their site’s robots.txt file and instruct it to disallow crawling. However, this will not remove data that has already been scraped. Blocking AI adoption is also possible by placing content behind paywalls or password-protected access.

The same principles apply to apps as well. If you create content to be posted publicly on digital platforms, it is highly likely that it will be crawled by AI systems. Social media apps, in particular, rely on user-generated content for analyzing and targeting ads. Only services that offer end-to-end encryption or robust privacy settings can provide some level of protection for user data.

Popular apps like TikTok, with over a billion users, rely heavily on AI and machine learning. Its algorithm, based on computer vision and machine learning, tailors content to user preferences. It is important to be aware of how our data is being used in these apps and to consider utilizing privacy settings when available.

In conclusion, it is crucial for individuals and businesses to be informed about how their data is handled by platforms and apps. Taking necessary precautions, such as blocking AI crawlers or utilizing privacy settings, can help protect our data from being used without our consent.

