Topic

Data

Training sets, consent, scraping, and the political economy of the input.

Every frontier model was built on top of human writing, art, code, and conversation that nobody asked permission to take. The legal frame is still being written. The political frame is older: enclosure, the conversion of common resources into private property without compensation. If your work trained the model, you have a claim on the output. This topic covers data provenance, scraping disputes, opt-outs, training transparency, and the case for a data dividend.

Columns (4)

  1. The Thirst of the Machine

    A data centre is a claim on a watershed and a power grid, dressed up as a cloud. A column on compute as the enclosure of physical commons, and why Quebec's hydro is the real frontier.

  2. The Ghost Shift

    Every model that looks autonomous is standing on a hidden workforce of annotators, raters, and moderators paid by the task. A column on the human labour inside the machine, and who is kept off the org chart.

  3. Primitive Accumulation for the LLM Age

    The scraping of the public web was not a bug. It was the opening move of a classic enclosure. A column on how the training corpus became private property.

  4. Who Owns the Weights

    Model weights are the factory floor of the 21st century, built from scraped public labour and held as private property. A column on the enclosure of intelligence.

Contradictions (1)

Claim
AI is built to free humanity from drudgery.
Reality
It is trained on the unpaid labour of everyone who has ever written a sentence online, and its first deployment is to fire the person who writes the next one.

Demands (2)

  1. 01

    A data dividend. If a model was trained on your words, your face, or your voice, you own a share of the output.

  2. 02

    Open training data, public audits. Nothing trained in the dark deserves to decide anything in the light.

Other topics