I Wrote a Linter That Strips the AI Tells From My Own Writing
The em dash is the easy one. I can catch every em dash I will ever type with about nine lines of shell, and I do, on every commit. The script walks my published posts and the book chapters and fails ...
The em dash is the easy one.
I can catch every em dash I will ever type with about nine lines of shell, and I do, on every commit. The script walks my published posts and the book chapters and fails the commit if it finds a real em dash, the YAML-escaped version of one, a spaced double hyphen that the markdown renderer would turn into a dash, or the space-less word--word that slips past everyone. Then it runs a second time after the build, against the rendered HTML, because the sneaky ones do not come from prose. They come from a title in the frontmatter or a string in a template, and the only place you can be sure to catch them is the page a reader actually sees. Anything that reached the HTML gets caught, no matter where it came from.
That gate works. It has never once let an em dash through since I wired it in. And it taught me the thing this whole post is about, which is that the em dash was never really an AI tell. It is a typographic habit. The real tells are harder, and they do not survive contact with a regular expression.
The ones a script can catch
A few of them are mechanical enough to grep for, so I do.
Figurative "lands" and "resonates," the ones that show up as "the idea lands" or "that really resonated." A literal rocket can land and a literal string can resonate, so the script cannot just ban the words, but it can flag them for me to look at. "The part nobody talks about," in every costume, which is a move that fakes a secret out of a common observation. And the opener I caught myself using on essay after essay: "I have spent fifteen years doing X, and here is how I use AI." Every AI-engineering blog opens that exact way now. I had eight of them.
So I grep for the strings. But here is what I learned the first time I did: the moment you catch one, the writer just reaches for the synonym. Swap "resonated" for "stuck with me," swap one credential opener for another, and the sentence still reads like a machine wrote it. The generated feeling survives the word swap, because the tell was never the word. It was the register the word arrived in.
The ones it cannot
You cannot grep for register. You cannot grep for a sentence that is technically correct and completely lifeless, or for a paragraph that resolves a little too neatly, or for the particular evenness that generated prose has when nothing in it was ever uncertain.
So the second layer is not a linter at all. It is a judgment, and the rule I use for it is older than any of this. Read the thing aloud against a sample of my own older writing, and if it does not sound like me, rewrite it until it does. I keep a body of my freewriting as the reference and the standard is written down so it cannot quietly drift. The only reliable detector of "does this sound like me" is me, or something that has read enough of me to stand in.
A blocklist cannot do that job, and this is the part most AI-tell guides get wrong. Some of the words on every generic list are genuinely mine. I use "unleash" and "palpable" and "realm" in their old physical sense, the way I have used them for years, and a literal-minded pass that strips them in the name of de-robotizing my prose just deletes my actual voice. The disqualifier is never the word. It is whether the word showed up in the generic figurative register or the real one, and no script knows the difference.
The system leaks, on purpose
I want to be honest about the state of this, because the honesty is the point. It is not a finished detector. I find new tells after they have already shipped, on pages that were live for weeks. When I find one I add it to the gate if it is mechanical and to the written standard if it is not, and lately I have started pointing an agent at my own published archive and asking it where I still sound like a machine. It finds things. Some of them I argue with. The list is not done and it is not going to be done, because the writing keeps maturing and the tells move with it.
That is the same loop I run on everything: when a failure gets past you, you do not just fix the one instance, you build the thing that catches the next one, and you accept that you are never fully ahead of it. I wrote about that habit in the engineering context in Recursive DevOps. It turns out prose has the same shape.
If you write with AI, this is the whole takeaway. You need two different things, and only one of them can be automated. You need a gate for the mechanical tells, the dashes and the dead phrases, which a script handles forever once you write it. And you need a taste oracle for the rest, a real sense of your own register that you either supply yourself or train something to imitate from a corpus of you. The first one I can hand you. The second one is the actual writing, and it was always going to be. The rest of how I work this way lives at AI-assisted engineering.