My AI Code Review Workflow: Review the Change, Not the Typing
The code got cheap. The review did not. Here is the review workflow I actually use when AI agents write most of the diff: three passes, an adversarial second agent, and the discipline of reading what ships.
The code got cheap. The review did not.
That is the whole situation in two sentences. When an agent can produce a competent multi-file change in minutes, the bottleneck moves to the person reading the diff. Most of the advice I see about AI code review is written by people who have not lived there yet. This is the workflow I actually use, every day, on real systems.
The shape of the problem
Reviewing agent code is not like reviewing a colleague's PR. A colleague gets tired and writes less. An agent does not, so the volume is relentless and the failure modes are different. An agent's code usually compiles, usually passes the obvious test, and usually looks right. When it is wrong, it is wrong in a confident, well-formatted way. The typos are gone. The bugs wear suits now.
So the review cannot be a typo hunt. It has to operate at the level of the change itself: what was asked for, what was delivered, and what came along for the ride.
Three passes
I read every diff three times, fast. Each pass asks one question.
Pass one: does it do the thing. Not "does it look like it does the thing." I run it. Tests, build, the actual command, the actual page. If the agent says tests pass, I want to see them pass. A green checkmark I did not watch happen is a rumor.
Pass two: does it belong. This is the architecture pass. Is the change at the right altitude? Did the agent solve the problem where the problem lives, or did it patch the symptom three layers up? Did it invent a new pattern when the codebase already has one? Agents are eager. They will happily add a second way to do something rather than find the first way. This pass is where most of my rejections come from.
Pass three: what came along for the ride. Scope creep, dependency additions, config changes, the helpful little refactor nobody asked for. I read the file list before I read the files. If a one-line fix touches nine files, the diff has a story to tell and I want to hear it before I approve it.
Three passes sounds slow. It is not. Each one is a single question, and a single question reads fast. Ten minutes on a medium diff. The discipline is not the time, it is refusing to let the passes blur together into one vague skim.
The adversarial second agent
Here is the part that changed my practice this year: I have one agent review another agent's work, and I prompt the reviewer to refute, not to assess.
"Assess this change" gets you a polite summary. "Find what is wrong with this change, assume something is" gets you a hunter. The reviewer agent does not get tired, does not get attached to the code, and was not the one who wrote it, so it has no ego in the game. It catches a real bug maybe one time in five. That rate is worth it, because the cost is a few minutes of wall-clock I was spending anyway on pass one.
The verdicts still come to me. The agents generate findings; I generate judgment. That division has not moved and I do not expect it to.
Gates that earned their place
Every automated check in my pipeline is there because something bit me once. Lint, typecheck, a security scan, and a couple of project-specific checks that would sound strange anywhere else but exist because a specific deploy broke a specific way. The pre-commit chain on this site runs four checks, and I can tell you the incident behind each one.
That is my test for adding a gate: it has to map to a real failure, not a hypothetical one. Speculative gates rot. Earned gates get maintained, because you remember the bite.
When an agent's change passes the gates, that clears the floor, not the bar. Gates catch the known failure modes. The review catches the new ones.
What I refuse to automate
The reading. That is the line, though I want to be honest about how the line holds.
I will let agents write the code, run the tests, draft the commit message, and argue with each other about the diff. The standard I hold for myself is that I read what ships, and most of the time I meet it. I am not going to pretend it is absolute. There are changes I end up verifying mostly through testing: I run the suite, I exercise the real behavior, I watch it work, and I read the shape of the diff without tracing every line. A mechanical sweep across forty files, generated test coverage, a migration I have watched succeed before. That is not me recommending it. It is what a high-volume day actually looks like when the writing gets ahead of the reading.
What does not move is the direction. I still have a lot of code to review, always, and reading is the thing I protect first and give up last. The day you stop reading is the day your mental model of the system starts to drift, and a senior engineer with a stale mental model is just a slow junior with better stories.
The review is where you stay current with your own codebase. Hand that away and you have handed away the job.
Where this fits
This workflow is one loop inside a larger working method: specify, delegate, verify, record. I wrote the full manual in The AI-Assisted Engineering HOWTO, and the argument for why this is a discipline and not a shortcut in AI-Assisted Development Is Not Vibe Coding. If agent mode is still a fuzzy term, start with What Is Agent Mode? When the agents start reviewing the systems that build the systems, that is Recursive DevOps, and it is closer than it sounds.
The longer argument is the book: AgentSpek, free to read here.