The Era of AI Agents

How AI agents powered by LLMs are bringing a wave of investment, innovation and value.

Democratizing Knowledge Work

If 2023 was the year of the LLM, thanks to ChatGPT and Llama, (hallucinate actually became the word of the year!), and 2024 was the year of RAG, 2025 is undoubtedly the year of the agent.

Dec 27, 2025

Aaron Levie

AI agents bring democratization to every form of non-deterministic knowledge work. Now, we can dramatically lower the cost of investment for almost any given task in an organization. The mistake that people make when thinking about ROI is making the "R" the core variable, when the real point of leverage is bringing down the cost of "I". Now, we have the ability to blow up the core constraint driving many of these tradeoffs: the cost of doing these activities. Every business in the world has access to the talent and resources of a Fortune 500 company 10 years ago.

To quote just one example, in the cyber domain, talent is in huge shortfall and best talent tends to be aggregated at the tech giants. However, in security operation centers (SOC), where human analysts are often overwhelmed by a constant stream of data and alerts, AI agents are taking over the most taxing security operations work, automating manual tasks like alert triage and investigation. This is allowing human analysts to dedicate their focus to their most critical work: hunting threats and developing next-generation defenses, freeing-up the best cyber talent to work on the hardest problems.

We see this trend towards broad and wide industry adoption of AI in OpneAI's State of the Enterprise 2025 report, with 9,000 organizations across various industries having now processed over 10 billion tokens each. This points towards the increasing democratization and a move from experimentation to scaled-up production use of LLM-based AI and increasingly agentic AI.

See the catalog of Industry AI Use Cases for sector-level examples.

What are AI Agents?

LLMs can think better and solve harder problems with more tokens. If you give an LLM like ChatGPT a complex problem (e.g. plan a 5-day AI conference in Singapore), in a single turn, it will probably give you a suggestion of an agenda, perhaps some hotels for the conference and some activities for the networking dinner - it tries its best to find you a solution within that chat turn. It cannot, however, in that turn verify if the hotel has vacancies during that period, send out the call for papers or fully setup the conference review system and email reviewers. To be fair, humans too require multiple steps to organise successful AI conferences.

The idea of AI agents is to solve these complex problems by using the LLM in a loop (giving it much more time and tokens). Agentic AI will break a high-level objective into a step-by-step plan, execute actions, observe the results, and iterate. If an agent hits a snag, like when it tries to book the hotel venue and the API returns an error, it won't just stop or hallucinate success. Instead, the agent within that loop has the ability to read and understand the error, self-correct and retry (perhaps book another hotel that has availability). This ability to maintain state and pursue a goal over time is what differentiates an AI agent (it has a level of autonomy/agency).

The Economic Thesis: Jevons Paradox in Action

Aaron Levie’s observation above captures the macroeconomic shift perfectly. We are seeing a classic Jevons Paradox in knowledge work: as the efficiency of performing a task increases (driven by lower inference costs and higher agent reliability), the consumption of that task explodes rather than contracts.

The "Cost of I": Traditional automation required high implementation costs (Capex). Agents lower this "Cost of Investment" by handling non-deterministic edge cases that previously broke rigid scripts.
Infrastructure Spend: This shift is visible in hard dollars. Enterprise spend on AI infrastructure hit $47.4 billion in the first half of 2024 alone (up 97% YoY), signaling that organizations now view agents as foundational rather than experimental.

The Technical Inflection: "Flow Engineering"

What changed in 2025 wasn't just model intelligence, but reliability. We moved from "Prompt Engineering" (optimizing a single turn) to "Flow Engineering" (optimizing the system of turns).

The Benchmarks

The improvement in agentic performance is quantifiable and steep:

SWE-bench (Software Engineering): In 2023, AI systems could solve just 4.4% of real-world GitHub issues. By early 2025, assisted systems jumped to 71.7% resolution rates.
Unassisted Autonomy: Even on the more rigorous "unassisted" baselines, agents like Devin moved from ~1.96% to ~13.86% success rates, proving capability in complex, multi-step reasoning without human hand-holding.

The Aider Polyglot benchmark tells a similar story: the leading closed-source assistants keep climbing, with OpenAI, Claude, and Gemini each posting new highs through 2025.

Loading model progress...

Frontier coding assistants keep pushing Aider Polyglot accuracy higher.Source:Aider LLM Leaderboards

Loading model progress...

SWE-bench Bash scores continue to climb among frontier assistant models.Source:swebench.com

Deployment: The "Build vs. Buy" Equilibrium

The market has matured into distinct adoption patterns. According to McKinsey's latest State of AI data, 62% of enterprises are now experimenting with agents, with 23% successfully scaling them into production environments.

The adoption is bifurcated:

The "Buy" Side: Organizations are purchasing "Agentic RAG" solutions for customer experience (CX), where agents don't just answer questions but execute tasks (e.g., processing refunds via APIs).
The "Build" Side: Developers are utilizing frameworks like LangGraph and the Model Context Protocol (MCP) to build custom orchestration layers. The sweet spot for reliability appears to be agents limited to 5–10 specific tools; beyond this, performance degrades due to context overload.

Coding Agents Prove Value

Coding-focused CLIs are seeing sustained usage growth, signaling real-world value for day-to-day developer workflows.

Loading coding CLI download stats…

Average monthly npm downloads across coding agent CLIs.Source:npm

What to Watch: The New Bottlenecks

As we move into 2026, the constraints are shifting from capability to governance.

Evaluation is the New Gold: You cannot ship what you cannot measure. Traditional "vibes-based" evaluation is being replaced by rigorous testing sets (like SWE-bench Verified) to prevent regression.
Safety & Compliance: With the EU AI Act entering enforcement phases in 2025, "auditability" is no longer optional. Agents must leave a "paper trail" of their reasoning steps to remain compliant in enterprise sectors like Finance and Healthcare.
The Human-in-the-Loop: We are not removing humans; we are moving them "up the stack." The human role is shifting from operator to orchestrator, reviewing agent plans rather than executing individual steps.

Conclusion: The agentic revolution is not about replacing workers; it is about giving every worker the leverage of a 10,000-person organization. The cost of "doing" is collapsing, and the value of "directing" is at an all-time high.

The Era of AI Agents

On this page