· Engineering · 10 min read
Agentic AI is the New Cloud (And You're Already Overpaying)
Agentic AI is becoming the new Cloud. And we are repeating the exact same lazy, expensive mistakes.

Once upon a time, in a prehistoric digital landscape where “the cloud” was just something that ruined weekend barbecues, software engineers actually had to care about things like optimization.
If an application ran slowly, you didn’t just throw money at the problem. You opened up the code, profile-tested the execution paths, and looked for bottlenecks. The reason for this wasn’t necessarily pure engineering pride; it was simple math. The time and mental energy required to optimize a loop or refactor a database query was significantly lower than the bureaucratic, multi-week obstacle course required to order, receive, and rack a new physical server in a data center.
And then, the clouds gathered.
Initially, they were light, fluffy things—providing a bit of shade and relief from infrastructure headaches. But over time, they completely blocked out the sun.
The cloud fundamentally rewrote the cost equation. Suddenly, spinning up a new virtual machine or scaling horizontally became a matter of seconds and a few clicks. Why spend a week of expensive developer time squeezing a 10% performance gain out of a legacy service when you can just spin up five more AWS instances in five seconds? It’s just pennies per hour, right?
Except pennies accumulate.
Over time, those easy-breezy cloud bills began quietly compounding into tens of thousands, and eventually millions, of dollars. Very few companies had the time or discipline to meticulously calibrate, size, and monitor their infrastructure.
(Fun fact: I recently audited several hundred database instances across a corporate infrastructure—a topic I’ll be deep-diving into soon on a new, dedicated platform filled with concrete case studies. By simply resizing misconfigured resources and fixing lazy setups, I identified over 50% in infrastructure savings—representing a healthy, six-figure sum saved monthly. All of this was achieved with zero performance loss, and in a few places, actual performance improvements).
But this post isn’t about cloud waste. It’s about history repeating itself.
Because right now, Agentic AI is becoming the new Cloud. And we are repeating the exact same lazy, expensive mistakes.
The Illusion of the Low-Cost “Magic”
Watch any developer run tools like Claude Code, Antigravity, or whatever agentic coding companion they’ve adopted this week. It feels like pure witchcraft. You describe a complex feature, provide a few markdown files, hit enter, and wait. The agent hums in the background, analyzing files, executing shell commands, and generating code.
It is brilliant. It feels effortless.
But let’s pull back the curtain and look at the actual plumbing of what is happening.
Your task context—your workspace, your project structure, your active files—is sitting locally on your machine. Meanwhile, the heavy intellectual lifting is happening on the model provider’s remote servers. Here is the catch: these large language models are stateless.
THE STATELESS CHATTERBOX
[ Your Machine ] [ API Provider ]
│ │
├───────── Entire Context ───────►│ (Reads 50,000 tokens)
│ (Every single step!) │
│ │
│◄──────── Next Action ───────────┤ (Generates 200 tokens)
│ │This means that with every single iterative step, prompt, or tool call, the agent has to package up your entire context—your markdown instructions, the files you are actively editing, your previous prompts, and the agent’s own intermediate thoughts—and ship it back to the API.
The model doesn’t “remember” the last turn. It has to re-read the entire history of the universe every single time you ask it to change a variable name. It is literally getting hit in the face with a massive wall of text on every single loop. And the model provider’s meter is happily ticking away, charging you per token.
The $21 Bibliography: A Lesson in Token Drain
To see how quickly this scales out of control, let me share a personal case study.
I recently used Claude Code to build the bibliography for FAvol2 (you can see the final, rendered results in action right here). The task itself was straightforward, repetitive, and data-heavy: taking a 135-page Google Document—roughly 422KB of raw markdown text full of chaotic links, annotations, and research materials—and transforming it into highly structured, strictly formatted YAML files. These YAML files are what I use to programmatically render the sources view on my site.
(I won’t get bogged down in the specific mechanics of how that rendering view works, or why structured YAML is such a massive sanity-saver compared to manually writing hundreds of custom HTML elements—though I might write a dedicated post on how this approach saves me countless hours of maintenance frustration if enough people show interest, so let me know).
The important takeaway is that I was forcing an agent to take a loosely structured, highly repetitive document, clean it up, and squeeze it into a formalized YAML schema.
Naturally, I could have spent a day writing a custom python script to automate pulling archived links, parsing metadata, and formatting the output. Instead, I decided to let the agent do it.
During the initial run, I naively dumped the entire context into the agent’s lap and let it run its planning loop. For every single chapter, the agent loaded tens of thousands of tokens of context just to output a tiny, clean YAML block. By the time I finished the second chapter, the billing dashboard stopped me cold.
The first two chapters had cost $21 in API fees. At that rate, the entire bibliography would have cost over $200. I stopped, took back the “context management” duties, and shifted to a semi-manual workflow inside the chat: I fed the model only the target YAML structure and pasted the raw text of exactly one chapter at a time.
The result? The remaining 19 chapters cost less than $5 in total.
THE BIBLIOGRAPHY EXPERIMENT (FAvol2)
Naive "Agent" Approach (2 Chapters): ██████████████ $21.00
Managed Context Approach (19 Chapters): █ $5.00The standard counterargument here is predictable: “Developer time is worth more than a few API tokens.” And that is absolutely true—in isolation. Just like developer time was worth more than a single cloud server in 2012. But this logic falls apart when it scales. It works until you have dozens of developers running agentic loops all day, and your monthly LLM API bill suddenly rivals the payroll of your entire engineering department.
The solution to this problem has existed for decades: divide and conquer. By splitting a large task into smaller, highly focused components, we can easily shave 80% to 90% off our token consumption. But doing that requires developers to do something deeply unfashionable: actually maintain a mental map of their codebase and understand what they are doing.
The Illusion of Ease: Enter “PromptOps”
Just as the cloud didn’t actually eliminate infrastructure management, Agentic AI isn’t eliminating development overhead. It is simply shifting it to a new layer of abstraction.
When companies migrated to the cloud to “save money on sysadmins,” they quickly realized they needed to build entire departments just to keep the cloud from burning a hole in their wallets. We saw the rise of Platform Engineering, DevOps, SRE, FinOps, and Cloud Security. Yes, these roles existed in some form before, but the sheer complexity of cloud ecosystems made them indispensable, highly specialized, and incredibly expensive.
We are seeing the exact same pattern emerge with Agentic AI.
To make agents work reliably, you don’t just write code anymore. You maintain a complex web of supporting infrastructure:
- Writing and version-controlling instruction files (CLAUDE.md, GEMINI.md, system prompts).
- Fine-tuning prompt chains and routing logic.
- Designing, hosting, and securing MCP (Model Context Protocol) servers to give agents access to your internal databases and APIs.
- Structuring vector databases for Retrieval-Augmented Generation (RAG).
- Developing custom agent plugins and orchestrating massive validation workflows to verify and test them before publishing to dedicated corporate “Plugin Marketplaces” (which, naturally, requires yet another engineering squad to build and maintain).
Before you know it, the engineering hours you supposedly “saved” by letting an AI write your boilerplate code are being reinvested into a newly minted “AI Platform Support” team whose sole job is to keep the prompts from breaking and the API bills from bankrupting the company.
(And naturally, because we are currently at the absolute peak of the hype cycle, the median salary for a “GenAI Enablement Specialist” is comfortably higher than that of the seasoned engineers who actually keep the lights on. Think of it as the trendiness premium—paid for, quite ironically, by the exact same “savings” the AI was supposed to deliver in the first place).
The Illusion of Understanding
The most dangerous byproduct of these layers of abstraction is the illusion of knowledge.
As a platform engineer, a significant part of my job has always been diplomatic: learning how to gently explain to developers that they don’t actually understand how Kubernetes, service meshes, or network routing works, without bruising their egos. To them, spinning up a microservice is just running three lines of Terraform or clicking a button in a portal. Never mind that the security and platform specialists want to curl up into the fetal position and weep at the sight of what is actually being built in the process…
Now, we are importing this exact same illusion into business decision-making.
Management loves fast, confident answers. In many organizations, non-technical executives with a shiny “C” in their title have started demanding dedicated, custom MCP servers. They want to connect an AI agent directly to their operational databases so they can bypass the data team entirely.
They ask two highly logical (to them) questions:
- “Why should I wait an entire sprint for a dashboard from the Data Squad when I can just ask my AI agent to get it for me instantly?”
- “And why should I bother to learn some ‘es-que-el’ when I can just query the data in natural language? Hell, I don’t even need to bother with English!”
It sounds incredibly empowering. But let’s play out the inevitable scenario:
The C-level executive asks the agent a nuanced operational question. The agent, lacking the deep, unwritten context of how the company’s messy databases actually map to real-world operations, hallucinatingly joins two incompatible tables and spits out a beautifully formatted, highly confident, but completely incorrect answer.
Meanwhile, the dedicated Data Squad—who actually understand the quirks, the legacy exceptions, and the context of the business—produces a report with a vastly different, far more sober number.
Who do you think the executive is going to trust?
Will they trust the data team that brought them a complicated, nuanced explanation of why business is slow? Or will they trust the stateless, instantly gratifying AI agent that gave them a simple, pleasing answer in five seconds?
Tesler’s Law Always Collects Its Debt
None of this should surprise us. It is simply a modern manifestation of a fundamental truth of systems design: Tesler’s Law of Conservation of Complexity.
As I’ve written about before, complexity in any system is a constant. It cannot be destroyed. It can only be moved.
When you make things simpler for the end-user, you make them more complex for the developer. When you make development “simpler” by introducing cloud scaling, you move the complexity to the infrastructure and the billing department. And when you make coding “simpler” by offloading it to autonomous agents, you move the complexity to context management, prompt architecture, and the cognitive load of verifying code you didn’t actually write.
Agentic AI is an incredibly powerful tool. But if we treat it as a magic wand that allows us to stop thinking, we will find out—just as we did with the cloud—that the bill always arrives. And it’s usually much higher than we expected.
There is an entire chapter dedicated to cloud costs spiraling out of control in Fuckup Almanac Vol. 1. If you found this post interesting, you might want to check it out.



