Column

The Ghost Shift

By

Every model that looks autonomous is standing on a hidden workforce of annotators, raters, and moderators paid by the task. A column on the human labour inside the machine, and who is kept off the org chart.

There is a story the AI industry tells about itself, and the story has no people in it. A model is “trained.” A system “learns.” An assistant “understands.” The grammar is intransitive on purpose. It removes the agent. It lets the reader imagine that intelligence accreted in a data centre the way frost accretes on a window, a natural process requiring no hands.

The hands exist. They belong to annotators in Nairobi, raters in Manila, moderators in Hyderabad, and contract workers in Gatineau and Moncton who will never appear in a press release. Every model that looks autonomous is standing on a workforce that has been deliberately kept off the org chart. The autonomy is the product. The labour is the cost the product is designed to hide.

What the work actually is

A foundation model is pre-trained on scraped text. That stage is the one everyone discusses. It is also the stage that does the least to make the model usable. The thing that turns a raw next-token predictor into a product that follows instructions and refuses to write a bomb recipe is a second stage built almost entirely out of human judgement.

That stage has names. Supervised fine-tuning. Reinforcement learning from human feedback. Red-teaming. Each of them is a euphemism for a room full of people reading model outputs and ranking them. A worker is shown two answers and asked which is better. A worker is shown a paragraph and asked to flag whether it describes self-harm, sexual violence, or child abuse. A worker is shown an image and asked to draw a box around every pedestrian. The worker does this hundreds of times a shift, for a rate set per task, with a quality threshold that can terminate the contract if missed.

This is not an edge case in the production process. It is the production process. The capability that gets marketed as an emergent property of scale is, in large part, the laminated residue of millions of individual human ratings, each one purchased as cheaply as the supply chain allows.

The supply chain is the point

In January 2023, TIME reported that OpenAI had contracted a firm called Sama to label toxic content for content-moderation tooling, using workers in Kenya paid a take-home rate of between roughly 1.32 and 2 US dollars an hour. The workers read descriptions of the worst things humans do to each other, all day, to build the filter that keeps that material out of the consumer product. Several reported lasting psychological harm. The contract was cancelled early. The model shipped on schedule.

The specifics shocked people who had not been paying attention. They should not have. The arrangement is the standard structure of the industry, not an aberration within it. The work is routed through layers of subcontractors precisely so that the firm whose logo is on the product never appears as the employer. The buyer purchases “data services.” The vendor purchases “annotation capacity.” The capacity is people, hired through a platform, paid by the piece, classified as independent contractors so that no minimum wage, no occupational-health obligation, and no severance attaches to the work.

The geography is chosen for the same reason the legal classification is chosen. A rating produced in Nairobi costs a fraction of a rating produced in Toronto, and the wage differential is the margin. The entire model rests on arbitraging the gap between where the value is captured and where the labour is performed. This is not a new pattern. It is the garment industry with a content-moderation queue instead of a sewing line.

Canada’s clean hands

Canada likes to imagine it is on the buying end of this and not the selling end, which is a way of saying Canada imagines it has clean hands. It does not.

Canadian firms train models. Canadian firms buy annotation. The annotation is procured through the same global platforms that route the work to the lowest-cost jurisdiction, which means Canadian AI products are built on the same underpaid, unprotected, deliberately invisible labour as everyone else’s. There is no Canadian content-supply-chain standard. There is no procurement rule that requires a federally funded AI project to disclose where its training labour was performed or what it was paid. The 2024 federal AI infrastructure commitments said nothing about the people who do the labelling, because the people who do the labelling are not visible to a policy process that thinks of AI as compute and talent and forgets that compute and talent both sit on top of a third input nobody names.

There is also a domestic version. Annotation and moderation contracts run through Canadian business-process-outsourcing firms in smaller cities, where the pay is legal but low and the work is precarious. These jobs are real, and the people doing them deserve to be counted as what they are, which is workers in the AI industry. They are instead counted as call-centre overflow, a cost line in a vendor’s quarterly report.

Why the invisibility is engineered

It would be possible to run this differently. A firm could employ its annotators directly, pay a living wage, provide mental-health support for moderation work, and disclose the arrangement. Some firms have moved partway toward this under pressure. None has done it because the market rewards it. They do it when a journalist forces the cost of invisibility above the cost of decency, and not before.

The invisibility is engineered because the story requires it. An assistant that “understands” can be sold as a leap in machine intelligence. An assistant that is, in meaningful part, a compressed archive of a hundred thousand human judgements bought at a dollar-thirty an hour is a labour-arbitrage business with a chatbot front end. The first story supports a trillion-dollar valuation. The second story supports a union drive. The industry has a powerful financial interest in which story gets told, and it has told the first one with great discipline.

The discipline shows up in the language. “Training data” erases the trainer. “Human feedback” erases the human. “Data annotation” sounds like a clerical task rather than what it often is, which is trauma exposure as a job function. The vocabulary is not careless. It is doing exactly what it was built to do, which is to keep the workforce conceptually offstage so that the capital onstage can claim the whole performance.

What counting them would require

Naming the labour is the first move, and it is not a small one. Once the annotators are visible as workers in the AI industry, ordinary labour politics becomes available. Supply-chain disclosure. Procurement standards that require disclosed wages and conditions for any publicly funded model. Occupational-health protections for moderation work, which is genuinely hazardous and is currently treated as if it were data entry. The right to organize, which contract classification is specifically designed to obstruct.

None of this requires a new theory. It requires applying the old theory to a sector that has spent a decade insisting it is too novel for the old theory to apply. It is not novel. A firm that captures the value of labour while structuring the arrangement so that no obligation attaches to the labour is doing something the labour movement has a century of experience naming.

The argument

The model is not autonomous. It is a workforce wearing a mask, and the mask is the product. Behind it are annotators, raters, and moderators, hired through subcontractors, paid by the task, routed to the cheapest jurisdiction, and kept off every org chart and out of every keynote. Their judgements are the second and decisive stage of production, the stage that turns a text predictor into a saleable system. The industry has every incentive to call this stage “training” and leave the people out of the sentence.

Put them back in the sentence. A rating is labour. A moderation shift is labour, and dangerous labour at that. The intelligence that is sold as emergent is in large part purchased, one human judgement at a time, from people the buyer has arranged never to meet. Canada is a buyer in this market and pretends it is a bystander.

The first political act is the simplest. Count the workers. Everything else follows from refusing to accept a story about machines that has been carefully written to have no people in it.

Labour Data

← More columns