There is a battle over the reputation of advanced AI applications going on in the news. Two worldviews conflict: Are we unleashing dangerous forces that threaten humanity? Or are we just making computers and software do a lot of new things?
Two Wall Street Journal reporters fired a shot in this battle recently, publishing an article claiming that an AI app “bullied” a Denver engineer who rejected some code it submitted to an open-source project he helps maintain. The general theme of the article was that AI applications are already becoming autonomous agents, and cranky and nasty ones at that. In the article, the “Denver engineer” concludes with a warning, “Right now this is a baby version,” he said. “But I think it’s incredibly concerning for the future.”
A big picture of Dario Amodei, the CEO of Anthropic, graced the middle of the article. And that tells you a lot about the source of the article. Amodei views advanced AI as having a 25% or higher risk of causing a societal catastrophe in the next five years. He warns of rapid AI development outpacing safety measures, widespread white-collar job displacement, and potential loss of control over autonomous AI systems.
People who don’t know much about AI Safety research, and the political position about that promoted by Dario Amodei, might see the WSJ article as an emerging scientific consensus about the autonomy of AI applications. I don’t. I see evidence of a public relations campaign by a particular AI model developer.
What’s really going on here?
One’s opinions about the dangers of AI should not be based on fantasies that they are emerging life forms that will rise up and destroy us. Yet people who believe that, or whose careers may depend on keeping that worry alive, are mostly in charge of what we hear about AI autonomy. Journalists such as the WSJ’s Sam Schechner and Georgia Wells are not reporting on the frontiers of science, they are distributing interpretations of AI behavior given to them by Amodei and the AI Safety crowd.
This is not to say that all of the safety research, or safety concerns, are wrong. It’s just that most of those experiments are done by people with a vested interest in generating concerns about AI autonomy.
None of these people try to assess empirically the limitations or constraints on AI autonomy. Instead, they conduct experiments intended to find evidence of machine autonomy. This bias has some epistemological justifications – after all, you can’t prove a negative. But it also creates a major confirmation bias incentive. Just as PhD students in statistical social science MUST find a statistically significant correlation between their variables, else they’ve all been wasting their time, so the AI Safety researcher must find evidence of danger, of machine autonomy.
Finding evidence that an AI application “bullied” a human, finding demonstrations of murderous intentions an AI system expressed toward people who try to turn them off – that not only is more achievable for the researcher, but a lot more interesting in the attention economy. It keeps AI safety research funded and in the public eye. A finding that “advanced AI models are just scaled-up computing infrastructures that humans, build and manage” would not get on to the front page of the Wall Street Journal. But where do these “findings” come from?
Are AI Experiments Misleading?
My own reviews of research on AI Safety by AI labs has left me highly critical. I have found that evidence of behavioral autonomy disappears when the process used in the experiment and the training and instructions are made transparent. This means that journalists and public intellectuals should not draw any conclusions about AI autonomy from intermediaries, unless the people conducting the experiment disclose the exact factual details of their tests. When one knows exactly which tests were done, how they were structured, and who programmed them, AI autonomy disappears from the plot, like the failure of a spirit to appear in a séance when the lights are turned on. Looking carefully at these experiments, we find not machine autonomy, but highly specialized instructions, often involving giving the machine conflicting objectives, and a laboratory experiment that was designed to find behavior that could be interpreted as autonomous. Often the evidence underlying a claim of autonomy is statistical; e.g., 5 or 6 different models were tested, and in 10% or 3% or 20% of the tests the output was “misaligned” with what the humans conducting the experiment think was the proper output.
Tests for Autonomy
Here is how to pick apart AI research that claims to find evidence of machine autonomy. Ask questions about the design and preparation of the experiment:
- What was the AI application used in this instance? The WSJ article implies that multiple models were tested; let’s be told which ones (and which versions) they were. Were the models specialized applications developed to conduct this test or ones intended for general use in production?
- Did the AI submit code to a real-world software sharing community, or was it a simulation set up by the experimenters? If the former, did it create its account without human input or control? Did human input prompt it to develop and submit code to this forum or did this happen spontaneously? Did a human prompt it to submit the software, or did the application do this all by itself?
- Was the experiment deliberately designed to test what would happen if the application’s code was rejected, or did this result happen unexpectedly?
- On what blog did this AI post its attack on Mr Schambaugh? Did the application create its own account, set up its own public web site, or did the lab create an experimental one for it? Can anyone see this blog post?
- Was the AI application’s submission of blog posts rule-governed, or spontaneous and unexpected? In other words, did the humans controlling this machine give it specific conditions that had to be met to post blogs, and tell it what kind of messages would go into those blogs?
- In explaining the attack on Mr Schambaugh, did humans tell the machine to react angrily to a rejection of its code? Is this kind of reaction rule-governed, a pattern embedded in the application’s training, or was it random?
- Tell us more about Scott Shambaugh, the person who “rejected” the AI application’s code. Was he part of the experiment? What is his connection to Anthropic? Did he know that the code was submitted by an AI agent? If he did not, which standard of quality or functionality did the AI’s code not meet? Did the humans in the lab bring Mr Schambaugh into this experiment without his permission or knowledge?
Software Liability: The new name for AI Safety Research
We are just beginning to realize how much AI governance has been misdirected by the doomer narrative of an autonomous, malevolent AGI. One of the biggest casualties is research into the possiblew failings and problems of AI models. There is a legitimate, even important role for “AI Safety” research, but the focus on machine autonomy has turned the whole field into confirmation-biased hunts for misanthropic behavior by AI applications.
Here’s the new direction it should take: AI Safety research should be renamed software liability research. Researchers in model developers’ labs should stop hunting for AI autonomy and focus on all the ways in which flaws or unintended consequences of specific ML applications (AI models) might generate harm, and how to distribute responsibility for those costs. As I’ve argued elsewhere, those kinds of tests are application-specific. How might autonomous vehicles go wrong? How do we assign responsibility to manufacturers of vehicles, model developers, drivers? How will we assign liability when automated code distributions (updates), such as Crowdstrike’s massive failure, generated cascading problems?
It’s true that this country had a policy debate about software liability 25 years ago. Insofar as it was resolved, it was that there should be as little as possible. AI applications reopen that debate. The search for machine autonomy distracts from it.
The post Did an AI application really “bully” a human? appeared first on Internet Governance Project.