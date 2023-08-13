Not long ago, there were concerns that artificial intelligence (AI) could spell doom for Adobe, a software company specializing in creative tools. New AI tools like DALL-E 2 and Midjourney threatened to make Adobe’s image-editing offerings obsolete. However, Adobe has successfully created its own suite of AI tools called Firefly, using its vast database of stock photos. By avoiding copyright disputes over internet images, Adobe has seen a 36% increase in its share price since the launch of Firefly.

The rise of AI tools highlights the growing competition for dominance in the AI market. To fuel these powerful AI models, massive amounts of data are required. And with the shortage of specialist AI chips, model builders are increasingly focused on finding new data sources. According to research firm Epoch AI, the demand for data is growing so rapidly that high-quality training text may be exhausted by 2026. The latest AI models from tech giants Google and Meta have already been trained on over 1 trillion words.

In addition to the size of datasets, the quality of the data is crucial. Models trained on well-written, factually accurate writing produce better results. Specialized information sets are also highly valued, allowing models to be fine-tuned for niche applications, such as Microsoft’s code-writing AI tool developed through the acquisition of GitHub.

Accessing data is becoming more challenging as content creators demand compensation for their material. Copyright infringement cases have already been brought against AI model builders in the US. To secure data sources, AI companies are striking deals with content providers like news agencies and stock photography platforms.

As data becomes scarcer, model builders are improving the quality of the data they already have. AI labs employ data annotators to label images and rate answers, and some of this work is being outsourced to countries with cheaper labor. AI firms are also gathering data through user interactions with their tools and using feedback mechanisms to improve the models.

However, there is still an untapped source of data in the AI market.