The last AI tool I watched die took about six weeks.
It was 2023, and a company I was working with had bought an AI prospecting tool. $1,200 a month. The demo was beautiful. The rep showed it finding contacts, writing personalized emails, scheduling follow-ups. Everyone in the room nodded. The VP of Sales signed the contract before the meeting ended.
Six weeks later, nobody was using it. The tool had pulled contacts from one data source, which overlapped with ZoomInfo, which the SDR team already had. The AI-generated emails were technically correct but sounded like a stranger who had read your LinkedIn bio and was trying too hard. And the scheduling feature didn't sync with the CRM, so reps had to manually log everything the tool was supposed to automate.
The tool wasn't bad. It just had no connective tissue to anything else. No shared context with the other five tools in the stack. No way to learn from what was working. No coordination layer. It was a smart employee hired into a company with no onboarding, no manager, and no desk.
I've watched this happen at every company I've worked at. Zendesk. Duo Security. Vapi. Different tools, same collapse.
How bad is it, actually?
For a while, I cited this stat as coming from Harvard. I was wrong. The "95% of AI projects fail" number actually comes from MIT's Project NANDA, published in July 2025 -- a study of 300+ publicly disclosed AI initiatives, interviews with 52 organizations, and surveys of 153 senior leaders. Their definition of success: deployment beyond pilot with measurable P&L impact after six months.
Five percent cleared that bar.
But here's the thing -- the MIT number is actually the most aggressive claim in a stack of research that all says roughly the same thing. RAND Corporation found 80% of AI projects fail, based on interviews with 65 experienced data scientists. BCG surveyed 1,000 C-suite executives across 59 countries and found 74% had yet to show tangible value from AI. Only 4% reported anything substantial.
Gartner predicted 30% of GenAI projects would be abandoned after proof-of-concept by end of 2025. By March 2025, S&P Global's survey of over 1,000 professionals found the abandonment rate had already hit 42%.
You can argue about the exact number. I don't think it matters whether it's 74% or 95%. The pattern is the same everywhere: companies buy AI, run a pilot, get a demo that looks great, and then nothing happens.
Why do they fail?
The RAND study named the top cause and it's not what most people expect. It's not data quality. It's not bad models. The number-one reason, based on their 65 interviews: business leaders misunderstand or miscommunicate what problem needs to be solved using AI. The organization focuses on the technology instead of the problem.
I've sat in rooms where the conversation was "we need to use AI" instead of "we need to reduce our cost per lead by 40%." The first conversation produces a pilot. The second produces a tool you actually use.
BCG quantified something I think is the most important finding in any of this research. They call it the 10-20-70 rule: 10% of the value comes from the algorithm, 20% from the technology, and 70% from the people, processes, and organizational work around it.
The MIT NANDA report found something that should bother every CTO reading this: buying AI tools from vendors and building partnerships succeeds about 67% of the time. Internal builds? One-third as often.
So the models work. The algorithms work. The vendors are building real things. What's breaking?
What's actually missing?
Infrastructure. The layer underneath.
I keep coming back to this because it's the answer every study points to, even when they use different words for it. RAND says organizations don't have adequate systems to deploy completed models. BCG says the 5% that succeed built fit-for-purpose technology architecture first.
A Salesforce study found 50% of deployed AI agents operate in complete isolation -- they can't share context or coordinate with each other at all.
That number stopped me. Fifty percent of agents running with no connection to any other agent.
It's the prospecting tool I watched die in 2023, multiplied by every company that bought AI in the last three years. Smart agents, no connective tissue.
The DevOps Research and Assessment report found something that quantifies this gap perfectly: as AI adoption increased, delivery throughput declined by 1.5% and stability declined by 7.2%. Code is being written faster than ever, but it isn't reaching production any quicker. The pipeline between "built" and "running" is the bottleneck, not the building itself.
What Google's research says about why this happens
In March 2026, Google's Paradigms of Intelligence team published a paper that, for me, connected every failure I'd watched to a single idea. The paper argues that intelligence doesn't scale by making one brain bigger. It scales through social organization.
Every intelligence explosion in human history -- language, writing, institutions -- was a social transition, not an individual upgrade. Primate intelligence scaled with social group size, not habitat difficulty.
They coined a term I think about constantly. Centaur actors. Hybrid human-AI combinations where one human directs many agents, or many humans and many AIs collaborate in shifting configurations throughout the day.
But the part that hit hardest was their argument about alignment. Standard AI alignment -- RLHF, one human correcting one AI -- doesn't scale to billions of agents. What scales is what they call institutional alignment: defined roles and protocols, like a courtroom functions because "judge," "attorney," and "jury" are well-defined slots, independent of who occupies them.
That's the problem with the AI tool that died in six weeks. It had no institutional slot. No defined role in relation to the other five tools. No protocol for what information flowed where.
The paper's conclusion: the path forward is putting as much effort into building agent institutions as building agents themselves.
Almost nobody is doing that.
What the 5% actually have
The companies that make AI work at scale -- JPMorgan Chase, Netflix, Uber, Walmart, Stitch Fix -- all share something the other 95% don't. They spent years building internal ML platforms before they saw returns.
JPMorgan built OmniAI, an internal ML-as-a-service platform, and JADE, a unified data architecture spanning 500 petabytes. They now run 450+ AI use cases in production and generate $2 billion annually in AI-driven savings.
Netflix built Metaflow, an internal ML framework that cut deployment time from four months to seven days. Before Metaflow, 60% of their code was related to systems plumbing and only 40% was actual data science. They flipped that ratio by building the foundation first.
Uber's Michelangelo platform manages 400 active ML projects serving 10 million predictions per second at peak. Before that platform existed, ML developers spent 70% of their time on plumbing and debugging. Not building. Plumbing.
The pattern across all five: platform-first infrastructure built over years, data science embedded inside business units instead of siloed, and C-suite sponsorship from the start.
They didn't buy better models. They built the institution underneath.
What I'm building about it
I read the Google paper in early 2026 and it crystallized something I'd been circling for months. Everyone is building agents. Nobody is building the organizational structure those agents need to actually work together. No OS. No institutional layer.
That's what steadybase is. The Agent Operating System. Not another agent. The layer underneath all the agents.
My own business runs on it. Drew handles GTM intelligence. Brian runs pipeline. A content agent produces posts. A QA agent reviews outputs before they go live. They coordinate through a shared signal bus -- when one agent detects something, the others respond. That kind of coordination doesn't happen when you have six disconnected tools.
The total cost to run this agent workforce is $4.12 a day. The tools it replaced cost $2,400 a month.
I'm wrong about plenty of things. I'm still figuring out the right governance thresholds -- how much autonomy agents should have before a human needs to approve something. I don't have that dialed in yet. But the thesis holds: the 95% failure rate is not a model problem. It's a foundation problem.
Google's researchers wrote that the next intelligence explosion won't be a single silicon brain but a complex society specializing and sprawling like a city. Intelligence growing like a city, not a single meta-mind.
steadybase is the city infrastructure. And Agents as a Service is how you move in.