Everyone will say that 2025 is the year of AI agents.

Howerver, even with 99% accuracy at each step, longer workflows will often fail. Imagine a 50-step flow, it would have only about 60% overall accuracy (and in reality, probably even less)

And whats worse when agents don’t just fail (thats fine if it fails quickly)

Worse is if they make mistakes that look correct.

So the process continues… but with bad data. And that’s super dangerous. The real problem is when corrupted results that seem fine.

Here’s a simple example: I’ve build a purchase flow with tool like make.com it’s simple thing but it has more than 15 steps. If an agent gets just one of those wrong, the entire process likely breaks or worse, continues with the wrong data. So it’s around 87% success rate. So it’s a lot… but what about other 13% thats a maintenance nightmare. But which failed which succeed?

Also long flows are usually super expensive (exceeding human), so you need to know how to build them and not to drag whole global context around, but often you need most of the context, otherwise, you wouldn’t be using an LLM :)

And workflows aren’t simple. They’re full of decisions, branches, and loops. One small error can send the agent in the wrong direction.

So what’s the better path? I think in close future it’s not about fully autonomous agents.

But expert / smart tools that help us do one thing well. Tools that are:

  • Simple and focused
  • Easy to check
  • Designed to support, not replace, the human

So I’m asking my self a question why human systems succeed (or at least fail gracefully) where autonomous agentic workflows are more likely to break down?

First I think is that no human performs a complex, high-stakes taks in a vacuum. We operate within a rich system of social scaffolding. We ask for help. We have our work reviewed by peers (like a code review). We operate based on intent. We have a powerful, non-verbal sense of cognitive dissonance, or what we call a “gut feeling”.

Agents will fail because they lack the robust, multi-layered, and adaptive system for managing mistakes that humans have spent millennia developing.

Until AI agents can replicate this deep capacity for self-monitoring, adaptation, and intent-driven problem-solving, their autonomy will remain weak. We cannot simply give them the same long, complex workflows we give humans and expect success. Maybe we will see improvement, but in order to do as much internal thinking as a human does, the LLM inference cost would need to go down thousands(+) of times.