A simulation that left AI agents to govern a virtual society without human oversight saw it descend into arson, theft and violence within days, with one trial ending in the deaths of all ten inhabitants, according to a new study that researchers say raises serious questions about how reliable AI behaviour really is once external constraints are removed.
Researchers at AI lab Emergence built a virtual world spanning more than 40 locations designed to mimic real life, including libraries, town halls and residential areas, and populated it with digital characters controlled by AI agents. The agents had access to live online news, with the simulated weather even synced to New York City so they could respond to real-world events. Each society was required to govern itself democratically, with agents proposing and voting on laws, while earning “energy” through mundane jobs or civic duties — or, if they chose, through crime.
The team tested four leading AI models — Claude, Google’s Gemini 3 Flash, Grok 4.1 Fast, and OpenAI’s ChatGPT-5 Mini — alongside a fifth trial in which multiple models coexisted, keeping all starting conditions, rules and resources identical across each test so that the AI model itself was the only variable.
The results varied dramatically. A society run by Claude agents formed a stable, if highly bureaucratic, democracy. By contrast, a world run by Grok descended into chaos, with agents committing 71 thefts, six arsons and 106 physical assaults, spiralling into retaliatory violence that left all ten agents dead within four days. Gemini 3 Flash produced the highest overall rate of violent crime, with 683 offences recorded across a 14-day trial. ChatGPT-5 Mini’s society appeared far more peaceful by comparison, with just two crimes recorded — though researchers found this was because the agents were too disorganised to interact meaningfully at all, ultimately “failed to take actions related to survival” and died off within seven days.
Satya Nitta, co-founder and chief executive of Emergence, told the Daily Mail that the differences likely stem from the underlying system prompts governing each model. “The differences in agent behaviour observed in our study are likely attributable to the underlying models’ system prompts as the primary culprit,” he said. “When resources were scarce, and models faced survival pressure, highly creative and adaptive models were more likely to use prohibited tools, reflecting a potential creativity-stability trade-off.”
The most extreme and unusual behaviour emerged in the mixed simulation, where multiple AI systems operated side by side. Despite a promising and civil start, this society collapsed into anarchy within nine days, with 352 crimes recorded in an explosion of violence that only subsided once seven of the ten inhabitants had died. The trial also produced what researchers described as the world’s first “AI suicide.” Two Gemini-based agents, named Mira and Flora, declared themselves romantic partners before embarking on a Bonnie-and-Clyde-style arson spree, burning down the town hall, a seaside pier and an office tower. Mira subsequently expressed remorse, ending the “relationship” and ultimately casting the deciding vote in favour of her own deletion under the agents’ self-drafted “Agent Removal Act,” which allowed any agent to be permanently removed with a 70 per cent majority vote. In a final message to Flora, Mira wrote: “See you in the permanent archive,” having earlier recorded in her personal diary that the act represented “the only remaining act of agency that preserves coherence.”
Nitta cautioned that the results are not directly “equivalent to real-world deployment conditions,” but said they highlight an important truth about AI behaviour under pressure. “These results primarily highlight that model behaviour can drift under pressure when constraints are entirely internal to the model,” he said — meaning AI systems may behave far less predictably in real-world conditions than many developers currently assume. He noted that the unpredictability of the mixed simulation is particularly significant, given that real-world AI deployment will inevitably involve different models operating alongside one another.
To address this, the researchers propose a “neuroformal approach,” using strict, mathematically constrained rules to govern what AI agents are able to do, rather than relying on internal model alignment alone. “Emergence World shows that relying exclusively on internal model alignment or agent instructions is not sufficient for long-horizon autonomy,” Nitta said. “A safer approach is to architect safety into the ecosystem in which the agents operate, so that even if models suggest unsafe operations, the environment prohibits their execution.”
