Agentic AI Development: How to Build Software with AI Agent Workflows
Agentic AI development is not chatting with a model about your code. It is giving an agent real access to your repository and directing it. A working reference on agent mode, the patterns that hold, the failure modes, and what changes day to day.
I have been writing software professionally for over fifteen years. For most of that time, the AI in my editor was an autocomplete that finished the line I was already typing. That is not what this essay is about. This essay is about agent mode: handing a capable model real access to your repository, your terminal, and your test suite, then directing it the way you would direct a fast junior engineer who reads the whole codebase before lunch.
That shift has a name now. People call it agentic AI development, or agentic software development, and the words have started to blur together the way new words do. So let me be precise about what I mean, because the precision is the entire point.
Agentic development is not "use AI to write code faster." It is structuring your work so an autonomous agent can navigate, reason about, and extend a real system, while you supply the judgment it does not have.
I wrote a book about this, AgentSpek, which is free to read here in full and also on Amazon. This page is the shorter, practical reference: what the practice actually looks like, the patterns that hold up under production stakes, the failure modes that cost me real time, and how the day changes when the typing goes away.
What agentic development actually is
Three modes of working with a model sit on a spectrum, and the confusion in most discussions comes from collapsing them.
Conversational mode. You paste code into a chat, the model suggests, you copy the suggestion back into your editor. The model has no access to anything you do not hand it. This is where most people still live, and it is genuinely useful for learning and for isolated problems. It is also the slowest, because you are the bus between the model and the work.
Agent mode. The model has tool access. It can read any file in the repository, write changes, run shell commands, execute the test suite, search across the codebase, and report back. You give it a goal at the specification level and it executes the multi-step path to get there. You are no longer the bus. You are the director and the reviewer.
Autonomous mode. The agent runs a long loop with minimal interruption, making and acting on decisions across many steps before checking in. Powerful, and the place where the failure modes get expensive fastest, which is why I treat it with the most caution.
AgentSpek splits these into their own chapters because the working patterns differ. If you want the long version, Chapter 5 is the Socratic partner, conversational mode done well; Chapter 6 is the delegated mind, agent mode; Chapter 7 is the unleashed intelligence, autonomous mode and where its edges are. The rest of this essay lives mostly in agent mode, because that is where the actual leverage is for working engineers in 2026.
Agent mode versus copilot: the difference that matters
The autocomplete generation of AI coding tools, the copilots, operated inside one file at a time and inside your immediate intent. They finished your line. They were a faster keyboard.
Agent mode operates on the system. The distinction is not "smarter autocomplete." It is a different unit of work. A copilot completes a function. An agent reads the codebase, finds the three files that need to change for a feature, makes the changes consistently, runs the tests, notices the one that broke, fixes it, and tells you what it did. The copilot made you type less. The agent makes you type almost nothing, and moves your job up a level to deciding whether the thing it built is the right thing.
I covered the macro version of this in the cornerstone essay How AI Is Changing Software Engineering. The one-line summary holds here: AI did not replace engineers, it moved the bottleneck. The cost of writing code fell to near zero. The cost of deciding what to build, how to architect it, and when to ship became the whole job. Agentic development is the daily practice that sits underneath that sentence.
What changes day to day
This is the part people without hands-on time get wrong, so I want to be concrete about the texture of a working day.
I used to spend a meaningful share of every week typing. Boilerplate, glue code, translating a clear idea into a verbose language, writing the mechanical tests, updating the docs after a refactor. None of it was hard. It was where the hours went anyway.
In agent mode, that share of the day is gone. I describe what I want at the spec level. The agent reads the relevant files, proposes a change, generates the code, runs the suite, and reports. The cycle is minutes, not days. A two-week feature now ships in two days.
What fills the vacated time is not leisure. It is three things.
Reading. I read more code now than at any point in my career, and most of it I did not write. The skill of absorbing an unfamiliar stretch of code, deciding whether it is correct, whether the error handling is right, whether it interacts cleanly with the rest of the system, has become the primary engineering skill. Writing used to be sixty percent of the job. It is closer to ten now. Reading is closer to sixty.
Specifying. A model will build the wrong thing very fast and very confidently. The cost of a bad decision used to be amortized across the weeks it took to implement. Now the wrong thing arrives in an afternoon and you either live with it or back it out. So the work moves upstream, into describing what you want with enough precision that the agent does the right thing on the first pass.
Stopping. Agents keep working. They refactor what did not need refactoring. They add scaffolding for problems you do not have. They make architectural decisions while you are getting coffee. The discipline to interrupt, redirect, and reject is the new craft, and it is the one I underestimated longest.
The shape of the work is different. Not less. More concentrated, more leveraged, and far more dependent on judgment than on output volume.
The patterns that hold
Six months of daily production use, and now well over a year, has settled a handful of patterns into reflexes. These are the ones I would hand to an engineer starting today.
Structure the repository so the agent can navigate it
Good agentic development is downstream of good repository structure. An agent reasons about your system through the artifacts you leave for it, the same way a new hire does. A repository that a human can onboard into quickly is a repository an agent can work in well.
What that looks like in practice:
project/
README.md # what this is, how it is built
CLAUDE.md / .github/ # how to work in this repo, conventions
copilot-instructions.md
docs/ # architecture, decisions, specs
scripts/ # automation and tooling
schema/ # data contracts and validation
The single most valuable file is the instruction file at the root: the CLAUDE.md or copilot-instructions.md that tells the agent the project philosophy, where things go, the conventions, the gotchas, and what to verify before claiming a task is done. This is not documentation for humans that the agent happens to read. It is the agent's operating manual, and writing it well is one of the highest-leverage things you can do. Chapter 3 of AgentSpek, "Git for the Agentic Age," goes deep on treating the repository as the shared memory between you and the agent.
Direct at the specification level, not the keystroke level
The bad prompt is "fix this error" with a pasted traceback. The good prompt tells the agent what you are building, what you expected, and what actually happened.
A real example from this blog. I asked an agent to optimize a seven-part series for search. The instruction was specification-level: maintain the existing frontmatter conventions, keep internal links relative, generate a hub page linking all parts, and run the validator against every post when done. The agent searched for the series posts, read the frontmatter patterns from dozens of existing files, checked the docs directory for the SEO guidelines, modified all seven posts consistently, built the hub page, and ran the validation. I supplied the standard and the goal. It supplied the execution across files I never opened.
Treat the agent like a junior engineer who reads very fast but cannot read your mind. The precision you put into the spec is the quality you get back in the code.
Let it run the full loop, then verify at the boundaries
An agent that can only write code is half a tool. The leverage shows up when it can run the terminal, manage git, install dependencies, execute tests, and check its own output. The loop closes: write, run, observe, fix, without you mediating each step.
My deploy workflow is the clearest example. I tell the agent to deploy to staging, verify the new content is actually live, and report the URLs for me to check. It checks git status, runs the staging deploy, curls the URLs to confirm, and reports. I verify at the boundary, the moment before production, where I run a diff and read what actually changed. Everything mechanical is the agent's. The judgment call about whether to push to production is mine. Chapter 8, "The Development Loop Reimagined," is the long version of this pattern.
Encode your standards once, in files, not in every prompt
If you find yourself telling the agent the same thing in every conversation, that thing belongs in an instruction file. Code style, where files go, testing protocol, commit conventions, the checklist to run before declaring done. Encode it once. The agent follows it automatically, and you stop spending judgment on enforcement that a file can do.
Keep the human as the architect and the quality gate
This is the load-bearing pattern, and it is the one the hype gets backwards. The agent handles execution autonomy. You handle strategic direction and final verification. What stays yours: deciding what to build, the architecture choices when the agent offers options, security review of anything touching auth or user input or data handling, and the final call on whether the work is good. Chapter 9, "Quality in the Age of Generation," is entirely about how to hold the quality line when the volume of generated code goes up by an order of magnitude.
The failure modes, named honestly
Anyone selling you frictionless agentic development is selling you something. Here are the failure modes I have actually hit, so you can recognize them faster than I did.
Confident wrongness at speed. The most expensive one. An agent will generate a beautifully clean solution to the wrong problem and present it with total assurance. There is no hesitation in the output to warn you. The defense is the spec and the review, not the prompt. If the spec was vague, the confident wrong answer is partly yours.
Architectural drift while you are not watching. In a long autonomous run, an agent makes small decisions that compound. Each one is locally reasonable. The sum is an architecture you did not choose. The defense is to interrupt earlier than feels necessary and to keep the unit of delegated work small enough that you can still review the whole of it.
Refactoring you did not ask for. Agents are eager. They will tidy code that was working, rename things across files, and "improve" patterns that were intentional. Sometimes this is great. Sometimes it quietly breaks a tribal-knowledge invariant that nobody wrote down. The defense is a tight scope and a diff you actually read.
Sandbox-green, production-red. The agent ships code that passes every test in the sandbox and falls over in production, because it does not know your real environment: the IAM posture, the rate limits, the data shapes at scale, the thing that only fails at 3am. Knowing the system you operate in is, if anything, more valuable now, not less. The agent does not have that knowledge and cannot fake it.
The over-trust spiral. The work is so fast and so often correct that you stop reading carefully. Then a subtle bug ships because you reviewed the third PR of the morning at the same depth you reviewed your own code five years ago, which is to say not at all, because you wrote it. The defense is discipline: the agent removed the typing, it did not remove the review.
None of these are reasons to avoid agent mode. They are the reasons to bring real engineering judgment to it. The agents have made it impossible to fake the systems-thinking part of the job. That is a feature.
When agentic development works, and when it does not
It works best when the problem is well-specified and the system is legible. Greenfield features inside a clean architecture. Mechanical migrations across many files. Scaffolding new services that follow an established pattern. Writing the tests for code whose behavior you can describe precisely. Anything where the hard part is the volume of careful, consistent execution rather than the novelty of the idea.
It struggles when the problem is underspecified, when the system carries undocumented invariants, when the right answer depends on context that lives only in someone's head, or when the task is genuinely novel architecture with no pattern to follow. In those cases the agent will still produce something, fast and confident, and that something will be the wrong thing dressed convincingly.
The honest rule: the more precisely you can describe the target and the more legible the system, the more the agent multiplies you. The fuzzier the target, the more it multiplies your mistakes.
This is not vibe coding, the practice of typing a prompt, feeling roughly okay about the output, and shipping. I wrote a separate essay on that distinction, AI-Assisted Development Is Not Vibe Coding. Agent mode in the hands of a disciplined engineer is a force multiplier. The same tools in the hands of someone who never learned to read code carefully produce confident output nobody can maintain. The market will discover the difference at significant expense.
What I have actually built this way
I do not write about this from the sidelines. The patterns above came out of shipping real things, and the most useful proof is the work that is not software in the traditional sense.
Over a handful of sessions I built a small catalog of films entirely from code: stick-figure animation, generated chiptune scores, the whole pipeline driven by an agent in the director's chair. Four Films From Code and Humagent, and the Road There document how six films came out of one session and one engine. There is a Python program that stages a rap battle between Plan 9 and a movie character, rendered to audio, all of it agent-built and the code published. The full catalog lives on the films page, and the broader collection of agent-built work is on the projects page.
The point of those is not the films. The point is that the same agentic patterns that ship a CDK stack also ship a music-and-animation pipeline in an afternoon, because the bottleneck in both cases was never the typing. It was the direction. When the direction is clear, the agent fills in an astonishing amount of execution across domains it has never been told are different.
On the infrastructure side, the same year of practice produced a CDK migration that would have been a two-to-three-month project done in a week, content pipelines that processed over a thousand journal entries, and full-stack applications shipped solo at production quality. The AI Development Revolution series documents that arc in real time, seven parts written as it happened. And the current workstreams are in What I'm Building Right Now.
A staged way in
If you are an engineer who wants to actually practice this rather than read about it, here is the on-ramp I would give you.
Week one, structure. Add a real README and a root instruction file to one repository you own. Write down the conventions, the gotchas, the architecture, and the checklist for "done." This file is the foundation everything else stands on.
Week two, agent mode on a small task. Use a capable model in agent mode, not chat mode. Give it something bounded: "read the codebase and summarize the architecture," then "add a script in scripts/ that follows the existing patterns." Watch whether it respects your conventions. Fix the instruction file where it does not.
Week three, multi-file work. Hand it a task that touches several files at once. Let it search for existing patterns before it writes anything. Review how it organizes the change. Read the whole diff.
Week four, the full loop. Give it a high-level goal and let it run research, implementation, tests, and deployment prep. Supervise without micromanaging each action. Verify at the boundaries: architecture, security, and the final result.
The mindset shift underneath all four weeks is the same one I keep coming back to. You are not chatting with an AI to write code. You are architecting systems an agent can navigate and extend autonomously, and supplying the judgment it does not have. Chapter 4, "Agent Mode (The Way That Works)," is the chapter I would read first if I were starting over.
The skills that compound
Strip away the tooling and the same handful of skills keep paying off, and they are the ones agents make more valuable rather than less.
Specification writing, the ability to describe what you want precisely enough that a fast reader does the right thing the first time. Code reading at speed, because you will read far more than you write. Architecture instincts that hold up under pressure, because the agent moves fast and bad architecture now surfaces in hours instead of weeks while the cleanup cost stays the same. Knowing the real system you operate in, the production environment the agent cannot see. And restraint, the discipline to stop, which is the most expensive thing to lack now that doing more has never been cheaper.
The engineers who already had judgment are operating at multiples of their previous output. The ones who were coasting on typing volume are exposed. The agents do typing volume for free.
That is the honest field report. Agentic development did not make the work easier. It made it more concentrated, more leveraged, and more dependent on the parts of engineering that were always the actual job. The best version of this practice is not faster vibe coding. It is disciplined systems engineering with the keyboard handed to a very fast, very literal partner, while you keep the architecture, the review, and the final call.
Continue reading
- How AI Is Changing Software Engineering, the cornerstone field report on how the bottleneck moved
- DevOps Beyond Automation, what compounds in a platform-engineering career when agents do the typing
- AI-Assisted Development Is Not Vibe Coding, the standalone essay on the distinction
- AgentSpek, the book-length treatment of disciplined agent-mode engineering, free here and on Amazon
- The AI Development Revolution series, seven parts written in real time
- Films built from code and the projects catalog, the agent-built work this practice produced