The AI-Assisted Engineering HOWTO
A practical, plain-spoken manual for building software with an AI agent: the setup, the loop, context management, and verified Claude Code and Copilot commands. Written like the old Linux HOWTOs, by someone who runs it daily.
Revision 1.1, June 2026. This is a HOWTO in the old sense: a plain manual you can read top to bottom and then go do the thing. I grew up on the Linux Documentation Project, learning the internet one HOWTO at a time. I wanted to write one for years. This is that document, for the skill I use most now. It has high-level tips and it has commands you can paste. Come back to it.
0. Introduction
AI-assisted engineering is building software by directing an AI agent instead of typing most of the code yourself. Done well, it is the biggest change to how I work in fifteen years. Done badly, it produces confident garbage faster than you can read it.
This manual is the difference between the two. It is the setup, the loop, the commands, and the failure modes, written for an engineer who has used a chat assistant a few times and now wants to work this way for real, on a codebase that matters.
I run this with two tools and two models. Claude Code in the terminal. GitHub Copilot agent mode in VS Code. Sonnet for most of it, Opus when the thinking gets hard. The method outlives any one tool, but I am going to be specific, because a HOWTO with no commands in it is just an opinion.
0.1 Who this is for
You write or maintain real software. You have opened an AI chat window, pasted some code, and gotten useful answers back. You suspect there is more to it. There is. You do not need to be an expert. You do need to stay responsible for the output.
0.2 What this is not
This is not vibe coding, the practice of prompting and hoping and shipping whatever falls out. If you want that line drawn clearly, read AI-Assisted Development Is Not Vibe Coding first, then come back. It is also not a magic-prompt collection. There is no incantation. There is a method.
1. What you need before you start
A short list. The rest of the document assumes you have it.
- A real project under version control. Git, committed, clean working tree. This is your safety net and it is not optional. If you cannot get back to a known-good state in one command, fix that first.
- A test suite, even a thin one. The agent will run it. Tests are how the machine checks its own work so you do not have to read every line by hand.
- At least one agent with hands. Claude Code in the terminal, or Copilot agent mode in VS Code, or both. A chat box you paste into is the training-wheels version. You want the version that can read the repo, edit files, and run commands.
- A few minutes to write down how your project works. That file beats any prompt trick, and section 4 is about building it.
2. The mental model: you stopped typing
The hardest part of this is not technical. You have to stop thinking of yourself as the person who writes the code.
For most of your career the bottleneck was your hands: how fast you turned an idea into correct syntax. AI moved that bottleneck. The cost of producing code fell to almost nothing. What did not get cheaper is deciding what to build and confirming it is right. That is where your whole job lives now. I wrote about the shape of this in Agent Mode Changes the Shape of Thought, and about its hidden bill in The Cognitive Cost of Modern Software Engineering. You become a director, not a typist. Direction is harder than it sounds.
Here is what that looks like at my desk. This is my own setup, on my own time. On my Organic Arts LLC projects and websites, where I set the rules, I run four to eight Claude Code terminals at once, each on a different task, with VS Code open beside them for review: reading the code and the markdown, making small tweaks by hand, stepping through git compares before anything gets committed. The terminals do the building. VS Code is where I look before I trust. My day job is its own world with its own approved tooling, and I am thankful to work somewhere that actively encourages AI-assisted engineering; there the setup is tighter and editor-first. The number of windows is not the point. The work is parallel now, and your job is to keep the parallel work coherent.
Hold that picture. Every tip below is really about protecting the two things only you can do, deciding and verifying, and handing the rest to the machine.
3. The two tools I use
Pick a tool with real repository access and learn it well. Two windows beat ten you half-know. Here is the stack I actually run.
3.1 Claude Code, in the terminal
This is where most of my building happens. You start it in the project directory:
claude # start an interactive session in the current repo
claude --model sonnet # start on Sonnet (fast, my default)
claude --model opus # start on Opus (deeper reasoning, harder problems)
Inside the session you talk to it in plain language and it does the work: reads files, writes the change across the codebase, runs the tests, reports back. You switch models mid-session when a task gets hard:
/model opus # switch this session to Opus
/model sonnet # switch back
Shift+Tab cycles the permission mode: ask-every-time, auto-accept edits, and plan mode (read-only, where it proposes a plan before touching anything). I live in ask-every-time for real work and plan mode when I am still deciding what to do. Esc interrupts it the moment it heads the wrong way. Do not wait politely for a bad run to finish.
3.2 Copilot agent mode, in VS Code
When I am already in the editor, or my setup is constrained, I use Copilot agent mode. Open the Copilot chat panel, switch it to Agent, and pick the model from the selector. The same two names are there: Claude Sonnet for daily work, Claude Opus when the problem is gnarly. Agent mode does what the terminal does, multi-file edits and a test loop, with the diff right there in the editor where you read it.
3.3 When I reach for which
The terminal wins when I want many things happening at once, or I am scripting, or the job is large. The editor wins when the change is small and visual and I want to eyeball the diff as it shows up in the tree. Neither is the "real" one. They run the same loop. Use the one whose friction is lower for the task in front of you.
4. Set up your workspace: the memory file
Before you ask the agent to do anything, give it a place to stand.
The single most useful thing you can do is write a project memory file: a document the agent reads every session that tells it how this project works. Each tool reads its own, and this is the part people get wrong, so be clear about it:
- Claude Code reads
CLAUDE.mdat the repo root. It also reads a personal one at~/.claude/CLAUDE.mdfor your cross-project preferences, and nestedCLAUDE.mdfiles deeper in the tree when it works in those folders. - Copilot agent mode reads
.github/copilot-instructions.mdat the repo root.
They are not either-or. They are additional. If you use both tools, you keep both files, and you keep them saying the same thing. This repo carries both. Same content, two front doors.
Claude Code can write you a starting draft from the code itself:
/init # generate a starter CLAUDE.md from the codebase
/memory # see which memory files are loaded right now
Put in the file what you would otherwise repeat out loud every session:
- What the project is, and how it is built, tested, and run.
- The conventions you actually enforce, in your own words.
- The parts that are dangerous to touch, and the rule for touching them.
- The mistakes the agent already made, so it stops making them.
Keep it lean. A long memory file is a memory file the agent skims and you stop maintaining; aim for something you would actually read. When it grows, split the detail into smaller files and pull them in with @path imports:
See @docs/architecture.md for the system layout.
Testing rules live in @docs/testing.md.
This file turns your standing intent into something permanent. You write your standards once and they apply forever, instead of re-explaining them in every prompt. A good memory file is worth more than any clever phrasing, because it removes the need for clever phrasing. Appendix A has a starter you can copy into either file.
5. The core loop
Everything else is one loop, run over and over: specify, delegate, verify, record. I run it every day and wrote it up in full in An AI Agent Workflow for Software Engineers That Actually Holds Up. The working summary:
Specify. Get clear on what you want before you type a prompt. For anything past a trivial fix, write the intent down: the outcome, the constraints, what is non-negotiable. Vague in, vague out.
Delegate. Hand over the task, not the keystrokes. "Add rate limiting to the payments endpoint and update the tests" is a task. Let the agent read the code, make the change, and run the suite. Do not micromanage the implementation. You set the destination; it drives.
Verify. Read the diff. Run the tests. Check the real behavior, not just that a command exited zero. People skip this step and it is the one that matters most, so it has its own section below.
Record. When you make a real decision, write down why. I work off architecture decision records a lot of the time, one short ADR per real choice, plus a journal note for myself. An ADR pays off twice: it records why you did something, and next time it specifies the work, because a clear decision is already most of a clear prompt.
The loop holds because the first and third moves are exactly the moves an agent cannot make for you. That is the point of the whole method.
6. Context and token management
The agent only knows what is in its context window: the running record of your conversation, the files it has read, the test output it has seen. The window is large but not infinite, and a stuffed, stale window makes the agent dumber. Managing it is a real skill, so treat it like one.
Two commands do most of the work:
/context # see what is filling the window right now
/clear # wipe the conversation, keep the project memory file
/compact # summarize the conversation so far and keep going
Use /clear between unrelated tasks. A fresh window for a new job is faster and sharper than one dragging an hour of irrelevant history. Use /compact when one long task has filled the window but you still need its thread; it keeps the gist and drops the noise. Check /context when answers start drifting; a full window is often the reason.
Model choice is the other lever. Sonnet is fast and cheap and handles most engineering work, so it is my default and I leave it there. I move to Opus when a task needs real reasoning: a thorny architecture decision, a bug that has survived two attempts, a plan with many moving parts. Then I move back. Running everything on the heavier model is slower and costs more without making the easy work any better. Match the model to the difficulty, not to your mood.
There is a sharper reason to care about all of this: tokens. The deeper you get, the more you feel it, especially as you near a plan limit or you are paying by the token. The skill that compounds is accomplishing more with fewer tokens, and it is worth treating as a craft of its own. A tight task that points the agent at the three files that matter costs a fraction of a vague one that makes it read half the repo just to orient itself. Clear between tasks so you are not paying to carry dead history. Keep the memory file short. Send big searches out as a subagent so the long output stays out of your main window. Reach for Sonnet first and save Opus for where the reasoning earns its higher price. Doing more with less is not a constraint here. It is the game.
7. Tips and tricks
The part I would have read first. Each of these I learned by getting it wrong.
Commit before any big agent run. A clean checkpoint means a bad run costs you a
git reset --hard, not an afternoon. Treat the working tree as scratch paper the agent writes on.Work in small, verifiable units. One coherent change at a time. A pile of changes you cannot review is a pile you cannot trust, however good it looks.
Make the agent run the tests itself. Do not run them and report back. Wire it so it sees the failures directly and iterates. Closing that loop is most of the magic.
Green is necessary, not sufficient. AI-generated code can pass every test and still be wrong, because the test never covered the thing that breaks. Passing tests buy confidence, not certainty.
Verify behavior, not exit codes. A command that exits zero did what it was told, which is not the same as what you wanted. For anything that touches the real world, look at the real world.
Give it the error, not your summary of the error. Paste the actual stack trace, the actual failing test, the actual log line. The agent is good with raw evidence and bad with your paraphrase.
Fence the dangerous areas in writing. If code must not change without care, say so in the memory file, not in your head. The agent honors written rules. It cannot read your worries.
Say a correction once, in the memory file. If you find yourself fixing the same thing twice, that fix belongs in the file, permanently.
When it goes in circles, stop and re-specify. An agent stuck in a loop is almost always an agent you under-specified. The fix is upstream, in what you asked, not in asking again louder.
8. Commands and hacks worth knowing
These are the moves that compound once you are past the basics. All of them are part of how I run a normal day.
Headless mode, for scripting. Outside a session, -p runs one prompt and exits. Good in scripts, git hooks, and CI:
claude -p "summarize the changes in the last commit"
cat error.log | claude -p "what is the root cause here?"
Pick up where you left off. A session is not gone when you close it:
claude --continue # resume the most recent session in this repo
claude --resume # choose from a list of past sessions
Run agents in parallel without collisions. This is how the four-to-eight-terminals setup actually works. Each agent gets its own git worktree, a separate checkout of the same repo on its own branch, so two agents editing at once never step on each other:
git worktree add ../proj-auth -b feature-auth
git worktree add ../proj-logging -b fix-logging
# open a Claude Code session in each directory; they cannot collide
git worktree list
git worktree remove ../proj-auth # clean up when the branch is merged
Teach it a repeatable job once. A custom slash command is a markdown file of instructions you invoke by name. Drop it in .claude/commands/ in the repo (or ~/.claude/commands/ for all your projects):
# .claude/commands/ship.md -> invoked as /ship
# put your standard pre-deploy steps in that file in plain language
After that, /ship runs your checklist the same way every time. This is the lock-it-down move from the next section, made concrete.
9. Lock it down, or leave it open
There are two modes of working this way and you need both.
Sometimes I want the creativity of not locking things down. I give the agent room, leave the constraints loose, and let it surprise me. I try to encourage artistry in the outcome and in the system itself, not just correctness. This is the same instinct I bring to freewriting, the practice I call consciousness mining: keep the prompt loose, stay honest, get the ego out of the way, so something I did not plan has room to surface. You are fishing, and you want a wide net. A tight spec here kills the thing you were reaching for.
Other times I build for repetition and lock it down hard. A clear ADR. A strict memory file. Fences around the dangerous code. A custom slash command so the steps never drift. A task specified so tightly there is one reasonable result. That is the mode for work that has to be right and has to be the same every time.
The skill is knowing which one you are in. Leave a production change loose and you ship a subtle bug. Lock down an exploration and you kill the surprise you were after. Decide on purpose, at the start, which mode the task is, and do not drift between them by accident.
10. Common failure modes and how to fix them
Symptom, then cause, then fix.
The output looks right and is subtly wrong. Cause: you verified the build, not the behavior. Fix: exercise the actual feature. I once moved a site to a private origin; the change was correct except the new origin did not serve directory index files, so every subpage would have broken. The build was green. I caught it by loading a real page, not by trusting the log.
It keeps reintroducing a mistake you fixed. Cause: the correction lives in your memory, not the project's. Fix: write the rule into the memory file so it survives the session.
Its answers start drifting and getting vague. Cause: the context window is full of stale history. Fix: /clear for a new task, or /compact to keep the thread but drop the noise. Check /context when in doubt.
It changed something you did not ask it to. Cause: the task was broader than you thought, or the danger zone was not fenced. Fix: narrow the task, mark the protected areas in writing, reset to your last commit.
It is slow and you want to just write it yourself. Cause: the task is small enough that specifying it costs more than doing it. Fix: do the small ones yourself. The loop earns its keep on tasks big enough that direction beats typing. Not everything should be delegated.
You shipped fast and now you do not understand your own system. Cause: you skipped the record step, and speed without memory is debt. Fix: slow down on the decisions, write them down, read the diffs you waved through. The cognitive bill is real and it comes due.
11. Frequently asked questions
Claude Code or Copilot agent mode? I use both. Terminal for parallel and large work, the editor for small visual changes. They run the same loop. My longer comparison for engineering work is in Claude vs Copilot for DevOps.
Sonnet or Opus? Sonnet for almost everything; it is fast and it is enough. Opus for the hard reasoning: thorny architecture, a bug that survived two attempts, a plan with many parts. Switch with /model, then switch back.
CLAUDE.md or copilot-instructions.md? Both, if you use both tools. Claude Code reads CLAUDE.md; Copilot agent mode reads .github/copilot-instructions.md. Keep both, keep them in sync. They are additional, not a choice.
Is this just vibe coding with extra steps? No, and the difference is the whole thing. Vibe coding skips specify and verify. This method is built on them. See AI-Assisted Development Is Not Vibe Coding.
Can I let it run on its own? Sometimes, with care. Autonomous mode runs the loop without you in it. When to trust it is its own question, covered in What Is Autonomous Mode?.
Does this replace software engineers? No. It moves the work. The cost of writing code fell; the cost of deciding what to build did not. How AI Is Changing Software Engineering is the long answer.
Does this apply to infrastructure? Yes, and the payoff compounds there. The same loop runs on Dockerfiles, CI pipelines, and infrastructure code. Recursive DevOps is where I take that to its end.
12. Where to go next
This document is the entry point. The cluster behind it goes deeper:
- AI-Assisted Engineering, the hub that collects every essay and field note in one place. Start there if you want the map.
- How AI Is Changing Software Engineering, the field report on where the bottleneck moved.
- An AI Agent Workflow for Software Engineers, the core loop in full.
- DevOps Beyond Automation, what compounds across a long engineering career.
- AgentSpek, my book on building this way, free to read here.
All of this is the practical application of one larger idea I keep coming back to, making complexity visible: keeping a system understandable enough to navigate without pretending it is small. AI-assisted engineering is where I apply it every day.
13. A note on the meta
I built this manual the way it tells you to build software. I specified it, delegated the draft to an agent in the exact agent mode described above, verified every command against the live tools, and recorded the decisions as I went. The document is an instance of its own method. That is the meta.
The meta of the meta is the memory file. The CLAUDE.md and the voice rules that governed how this got written are the same kind of standing intent section 4 tells you to keep, pointed at prose instead of code. The thing that shaped the writing is itself a project memory file.
And the Übermeta, with the umlaut it has earned: the method applies to itself, all the way up. Building the thing, writing about building the thing, and setting the rules for how that writing is done are one loop running at three heights. This is Recursive DevOps pointed at language. The system that makes the work and the system that improves the system are the same system. Once you see it you do not unsee it. Good. Now go build something.
Appendix A: A starter memory file
Drop this at the repo root as CLAUDE.md for Claude Code, and as .github/copilot-instructions.md for Copilot agent mode. Same content, both files. Fill in the brackets. Keep it short and true; a file that lies to the agent is worse than no file.
# Project Memory
## What this is
[One or two sentences. What the project does, who uses it.]
## How it is built and run
- Install: [command]
- Dev: [command]
- Test: [command]
- Build: [command]
## Conventions that matter
- [The naming, structure, or style rules you actually enforce.]
- [How errors are handled here.]
- [Anything a newcomer always gets wrong.]
## Do not touch without care
- [Files or systems that are load-bearing and easy to break.]
- [The rule for changing them.]
## Lessons (append as you go)
- [Mistakes the agent made, so it stops repeating them.]
Appendix B: Spec and ADR templates
Two more good ideas worth keeping as templates. The first is for the Specify step: write the intent before you prompt. The second is the ADR I keep one of per real decision.
A spec you hand the agent:
# Intent: [one-line outcome]
## Outcome
[What is true when this is done.]
## Constraints
- [What it must not break.]
- [What it must stay compatible with.]
## Non-negotiable
- [The parts there is no flexibility on.]
## Done when
- [The check that proves it works: behavior, not exit code.]
An ADR you write when you decide something real:
# ADR-[NNN]: [the decision in a few words]
**Status:** [Proposed | Accepted | Superseded]
**Date:** [YYYY-MM-DD]
## Context
[The situation that forced a choice.]
## Decision
[What we are doing, in plain language.]
## Consequences
[What this makes easy, what it makes hard, what we gave up.]
Appendix C: Command cheat-sheet
The commands from this manual in one place. Terminal commands are for Claude Code; the slash commands run inside a session.
# Start and choose a model
claude # interactive session in the current repo
claude --model sonnet # start on Sonnet (default for daily work)
claude --model opus # start on Opus (hard reasoning)
# Pick up past work
claude --continue # resume the most recent session here
claude --resume # choose from past sessions
# Headless, for scripts and pipes
claude -p "one-shot prompt" # run once and exit
cat file.log | claude -p "..." # pipe input in
# Parallel agents via git worktrees
git worktree add ../proj-x -b feature-x
git worktree list
git worktree remove ../proj-x
# Inside a session
/model opus | /model sonnet # switch model mid-task
/context # what is filling the context window
/clear # wipe conversation, keep the memory file
/compact # summarize and continue
/init # generate a starter CLAUDE.md
/memory # show loaded memory files
/agents # configure subagents
Shift+Tab # cycle permission modes (ask / auto-edit / plan)
Esc # interrupt a run that is going wrong
About this document
Written by Joshua Ayson, a DevOps engineering leader who builds this way every day. Corrections and better tips are welcome; like the HOWTOs I learned from, this one gets revised.
- Revision 1.0, June 2026. First cut: setup, the loop, tips, failure modes.
- Revision 1.1, June 2026. Added the tools I actually use (Claude Code and Copilot agent mode), the memory-file split, context and token management, verified commands, and a cheat-sheet.
If it saved you an afternoon, it did its job.