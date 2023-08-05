Artificial intelligence (AI) and large language models have become essential components of our technological landscape. However, there exists a hidden workforce that operates behind these advanced technologies, often laboring in remote locations for meager wages. The true nature of this workforce is explored in a recent article in New York magazine.

While many people focus on the potential job automation brought about by language models like OpenAI’s ChatGPT, there is a whole community of workers responsible for labeling and clarifying data to train these models. The article brings attention to the challenges faced by these workers in Kenya, where they engage in tasks like categorizing distorted dialogue and uploading different facial expressions. This work is just a small part of the larger process of AI development.

These AI jobs are the opposite of what David Graeber termed “bullshit jobs” – they are roles that people want to automate but still require human intervention. Annotation, or labeling data, is crucial in training large language models. Previously, it was assumed that with sufficient labeled data, the model would eventually learn on its own. However, it turns out that continuous annotation is necessary due to the limitations and failures of machine learning.

Inadequate training data can have significant consequences, as evidenced by a self-driving car accident involving Uber in 2018. The car failed to identify a woman with a bicycle, emphasizing the importance of expanding the labor pool to address such “edge cases.” The article delves into the various tasks performed by these hidden workers, such as classifying TikTok content, monitoring email spam, and analyzing emotional aspects of online ads.

The majority of this work is outsourced, and strict non-disclosure agreements are enforced, leading many workers to request anonymity. Estimating the exact number of workers involved is challenging, with some estimates ranging from millions to potentially billions. The article also highlights the author’s personal experience attempting annotation tasks, emphasizing the complexity and literal nature of the instructions due to the model’s lack of context.

Annotators often find themselves in a unique situation where they have to think like robots in order for machines to mimic human behavior. This involves following sometimes nonsensical but consistent rules, akin to taking a standardized test while under the influence of hallucinogens. The article also sheds light on how these workers form communities to navigate the challenges of the system and support each other, particularly in places like Kenya.

In conclusion, the article provides valuable insights into the hidden workforce behind AI and large language models. These workers play a crucial role in training the models and managing the intricacies of annotation tasks. Although their efforts often go unnoticed, they are instrumental in shaping the advancements we see in AI technology.