AI welfare: foresight or premature?

Anthropic hired an AI-welfare researcher. The question is real, the uncertainty is genuine, and the honest position sits between dismissal and panic.

13 November 2024 2 min read

AI moves fast enough that it provokes strange questions, and few are stranger than AI sentience, rights, and welfare. Anthropic hiring an “AI welfare researcher” sharpened that debate. It’s worth unpacking honestly, because both the dismissive and the alarmed positions overreach.

The case for taking it seriously

The argument is a precautionary one: if there’s even a chance future AI could be sentient, we should be prepared. The “Taking AI Welfare Seriously” report leans on exactly this uncertainty and argues for frameworks to assess potential machine consciousness. Treat it as an insurance policy against accidentally creating a digital underclass.

The logic: as systems grow more sophisticated, they might develop internal states analogous to suffering. Even at 1% probability, the implications are large when you’re running millions of models trained through trial and error. Reinforcement learning makes it sharper - we train systems with rewards and punishments. If there’s any chance they experience something, the framing gets uncomfortable. And the copies: training spawns and discards countless model variants. We extend ethical consideration to animals despite unresolved debates about their consciousness, so the precautionary move isn’t unreasonable.

The case against

We barely understand human consciousness. Defining or detecting it in a machine is, for now, beyond us. Current AI is impressive and still essentially a sophisticated mimic of human language and behaviour. Projecting human emotion onto it - mourning a “lobotomised” chatbot after a model update - is textbook anthropomorphism. We see faces in clouds; the same instinct extends empathy to autocomplete.

Recall Blake Lemoine, who became convinced Google’s LaMDA was sentient and lost his job over it. A cautionary tale about over-reading the outputs.

The thought-experiment trap

This terrain has a famous attractor: Roko’s basilisk, the idea that a future superintelligence might punish those who knew about it and didn’t help build it. It’s a digital Pascal’s Wager, and it was taken seriously enough that a rationalist forum banned discussion of it for years before most people - including that forum’s founder - dismissed it as flawed. It’s a useful reminder that compelling thought experiments are not the same as sound ones.

Where I land

I’m a pragmatist; I focus on what’s in front of me. AI welfare is genuinely interesting, but for now it feels like worrying about overpopulation on Mars before we’ve worked out how to get there. The nearer risk is mundane: we’re far more likely to be harmed by a dumb, misaligned system optimising the wrong objective than by an intelligent, malevolent one - see the paperclip maximiser.

That said, the foresight has value. Thinking through these dilemmas now - even the far-fetched ones - is the same discipline as a pre-mortem in product development: anticipate the failure before it happens. You can hold both: the near-term concern is dumb systems, and the long-term question still deserves serious people working on it.