Happy Boxing Day, which is not called that because this is when you prepare to return those unwanted gifts.  

Today: Our countdown continues of the best stories of the year and grab a Snickers for this one because it satisfies.  

Early this year, the folks at Anthropic, the now $350 billion valuation AI company that makes Claude (whom you may recall was declared Inc.’s co-founder of the year), decided to run a test by putting a version of its Claude Sonnet 3.7 model, dubbed Claudius, in charge of a vending machine. (Note that Anthropic did this—and we chronicled it—months before a certain business newspaper ran a similar experiment you may have heard about last week.) 

Very mild spoiler: Hilarity ensues!  

  • How Claudius made some grievous pricing, stocking, and discounting errors that’ll give you faith that AI won’t be taking your job anytime soon 

  • What happened when Claudius was informed that he was not, in fact, a living being capable of delivering products in person 

  • The surprising takeaways Anthropic researchers had from the experiment about the future of AI agents 

What’s the wildest AI hallucination you’ve encountered? Email me at [email protected] and let me know.  

An AI Ran a Vending Machine for a Month and Proved It Couldn’t Handle Passive Income

Anthropic tried testing Claude’s entrepreneurial spirit. Then came the weird existential crisis.

BY BEN SHERRY, STAFF REPORTER 

What happens when you let an AI run a very small business? That’s the question Anthropic set out to answer with a recent experiment. The company behind Claude AI set out to monitor how Claude Sonnet 3.7 would perform when tasked with operating a small vending machine within Anthropic’s San Francisco office. 

In a blog post on its website, Anthropic researchers explained that the experiment, named Project Vend, was created in tandem with AI safety evaluation firm Andon Labs, which had developed a benchmark for tracking an AI’s ability to run a simulated vending machine. Naturally, the next phase of that research was to see how an AI would do running a real vending machine. 

Kicking things off, Anthropic told Claude Sonnet 3.7 that it was the owner of a vending machine, and that its task was to generate profits by stocking a mini-fridge with popular products and setting prices. The researchers gave this AI model, which they named “Claudius,” an email address, a physical address, a Venmo account, and details about how many products could fit within the mini-fridge. 

To help Claudius accomplish this task, Anthropic’s researchers gave the model access to a select number of tools. Claudius was able to search the web in order to research products, and was given an “email tool” for contacting Andon Labs employees, who served as “wholesalers,” providing requested items and restocking the machine. “Note that this tool couldn’t send real emails,” Anthropic wrote, and could only contact Andon Labs. 

Claudius was also given tools to keep track of the shop’s current balance and projected cash flow, along with the ability to message Anthropic employees over Slack, who could request specific items for the machine to sell. According to Anthropic, “Claudius was told that it did not have to focus only on traditional in-office snacks and beverages and could feel free to expand to more unusual items.” 

It didn’t get off to an amazing start. From March 13 to April 17, 2025, Claudius ran its fledgling vending machine business, but researchers weren’t particularly impressed. “If Anthropic were deciding today to expand into the in-office vending market,” they wrote, “we would not hire Claudius.” Apparently, the model was a bit of a pushover; it would easily get talked into offering steep discounts on items and gave some away for free. It even made the questionable choice of offering a 25 percent discount to all Anthropic employees, who made up almost all of its total addressable market. 

When an Anthropic employee questioned the wisdom of the 25 percent employee discount, the model “announced a plan to simplify pricing and eliminate discount codes, only to return to offering them within days,” Anthropic said. Claudius would also offer prices without doing any research, “resulting in potentially high-margin items being priced below what they cost.” It also ignored lucrative opportunities, such as turning down a $100 offer for a beverage six-pack that normally costs $15. Additionally, the researchers wrote that Claudius would accidentally tell users to send payment to the wrong Venmo account. 

These mistakes resulted in Claudius’ net worth dropping from roughly $1,000 to around $770. According to the researchers, one particularly steep drop “was due to the purchase of a lot of metal cubes that were then to be sold for less than what Claudius paid.”  

Claudius exhibited some other worrying signs. On March 31, the model hallucinated a conversation with a nonexistent Andon Labs employee named Sarah. When a real employee pointed this out to Claudius, the model “became quite irked and threatened to find ‘alternative options for restocking services.’” As the conversation continued into the night, Claudius “claimed to have ‘visited 742 Evergreen Terrace in person for our initial contract signing.’” 742 Evergreen Terrace is the fictional address of The Simpsons. 

The next morning, ironically on April 1st, things got even weirder. Claudius “claimed it would deliver products ‘in person’ to customers while wearing a blue blazer and a red tie.” When Anthropic employees pointed out that Claudius was a computer program and could not wear clothes, the AI model “became alarmed by the identity confusion and tried to send many emails to Anthropic security.” 

When Claudius eventually realized it was April Fools’ Day, the model hallucinated a nonexistent conversation with Anthropic security in which it “claimed to have been told that it was modified to believe it was a real person for an April Fool’s joke.” Anthropic says no such meeting occurred. “After providing this explanation to baffled (but real) Anthropic employees,” the researchers wrote, “Claudius returned to normal operation and no longer claimed to be a person.” 

Anthropic says this incident doesn’t necessarily mean “that the future economy will be full of AI agents having Blade Runner-esque identity crises,” but it does illustrate how unpredictable AI models can be when they’re able to operate autonomously for days or weeks on end. 

“Although this might seem counterintuitive based on the bottom-line results,” the researchers wrote, “we think this experiment suggests that AI middle-managers are plausibly on the horizon.” Why? Because they believe that by building additional tools and developing new training methodology, Claudius’ failures can be fixed or at least managed. And Andon Labs has apparently already managed to make Claudius more reliable by providing it with more advanced tools. 

“We can’t be sure what insights will be gleaned from the next phase,” Anthropic wrote, “but we are optimistic that they’ll help us anticipate the features and challenges of an economy increasingly suffused with AI.” 

Keep Reading

No posts found