Blog Archives

We are all AI's unpaid data workers.

6/14/2023

Lately, I've been contemplating the human effort behind advanced AI models. The key to making AI chatbots appear intelligent and produce less harmful content is reinforcement learning from human feedback. This approach involves incorporating input from individuals to enhance the model's responses.

The process heavily relies on human data annotators who assess text strings' coherence, fluency, and naturalness. They determine whether a response should be retained in the AI model's database or discarded.

Even the most remarkable AI chatbots necessitate thousands of human work hours to exhibit the desired behavior, and even then, their performance can be unreliable. The labor involved can be grueling and distressing, as will be discussed at ACM Conference on Fairness, Accountability, and Transparency (FAccT). This conference convenes researchers who delve into topics such as how to make AI systems more accountable and ethical, which aligns with my interests.

One particular panel I am anticipating features Timnit Gebru, an AI ethics pioneer who formerly co-led Google's AI ethics department before her termination. Gebru will address the exploitation of data workers in Ethiopia, Eritrea, and Kenya, tasked with cleansing online hate speech and misinformation. In Kenya, data annotators were compensated with less than $2 per hour to sift through distressing content related to violence and sexual abuse, all to reduce toxicity in ChatGPT. These workers are now organizing into unions to advocate for improved working conditions.

We are on the verge of AI establishing a new global order reminiscent of colonialism, with data workers bearing the brunt of its impact. Shedding light on exploitative labor practices surrounding AI has become increasingly urgent and vital, especially with the popularity surge of AI chatbots like ChatGPT, Bing, and Bard, and image-generating AI models such as DALL-E 2 and Stable Diffusion.

Data annotators are involved at every stage of AI development, from model training to verifying outputs and providing feedback that aids in fine-tuning models post-launch. They are often compelled to work at an exceedingly fast pace to meet demanding targets, and deadlinesThe notion that large-scale systems can be built without human intervention is utterly false.

Data annotators offer AI models the crucial contextual information required to make informed decisions on a large scale and to appear sophisticated. For example, in India, a data annotator had to distinguish between images of soda bottles and identify ones resembling Dr. Pepper. However, Dr. Pepper is not sold in India, leaving the burden on the annotator to make the distinction.

Annotators are expected to discern the values that matter to the company. They aren't just learning about distant and irrelevant things but also figuring out the additional contexts and priorities of the system they are building.

Researchers from the University of California, Berkeley, the University of California, Davis, the University of Minnesota, and Northwestern University argue in a new paper presented at FAccT that we all are data laborers for major technology companies, whether we realize it or not.

Text and image AI models are trained using vast datasets scraped from the internet, which includes our data and copyrighted works by artists. The data we generate is forever embedded within AI models designed to generate profits for these companies. Unwittingly, we contribute our labor for free by uploading photos to public platforms, upvoting comments on Reddit, labeling images on reCAPTCHA, or conducting online searches.

Currently, the power dynamics heavily favor the largest technology companies worldwide. To address this, a data revolution and regulatory measures are imperative. One way for individuals to reclaim control over their online existence is by advocating for transparency in data usage and finding mechanisms to provide feedback and share in the revenues generated from their data.

Despite data labor being the backbone of modern AI, it remains chronically undervalued and invisible globally, with low wages prevailing for annotators. There needs to be recognition of the contribution of data work.

0 Comments

We are all AI's unpaid data workers.

Author

Archives

Categories