Handwriting OCR With AI Vision: Reading Cursive Tesseract Cannot

Tesseract could not read a word of it.

I have years of handwritten journals, the actual paper kind, and I wanted them on this site. The obvious first move is optical character recognition, and the obvious tool is Tesseract, which is good, free, and everywhere. It is also built for printed text. Pointed at my cursive it returned something between a ransom note and static. Not a few errors to clean up. Garbage, top to bottom, page after page.

That makes sense once you think about what the old OCR actually does. It looks for the shapes of known letters in clean rows. Handwriting has no clean rows, my letters connect and lean and change shape depending on what came before them, and half the time a word is only legible because of the words around it. Reading cursive is not character recognition. It is closer to reading, the comprehension kind, and the old tools do not do that.

What changed

A vision model does. When you hand a modern multimodal model an image of a page, it is not matching letter shapes against a font. It is doing the same thing you do when you read a friend's bad handwriting: using context, expectation, and the whole line at once to decide what a squiggle most likely says. Claude reads my cursive. Not perfectly, but at the level of a patient human who has seen my writing before, which is exactly the level I needed.

So the pipeline is simple, and most of it is not the AI part.

A journal becomes a scanned PDF. A script splits the PDF into one image per page and scrubs the scanner app's watermark out of the corner, because that watermark sits right on top of the text on the last line and the model will dutifully try to read it. Each clean page goes to the vision model, which returns a transcription. Anywhere it is unsure, it marks the spot rather than guessing confidently, which matters more than it sounds, because a confident wrong word is far more expensive than a flagged uncertain one.

The part I do not automate, and would not

Then I read it. Every word, against the actual notebook open next to me.

This is the one exception to how I work everywhere else. With code, I have written about how reading turned into triage, because there is more of it than any person can read start to finish. With my own handwriting the opposite is true. There is not that much of it, it is mine, and the whole point is that it comes across as I wrote it and not as a model smoothed it. So I read all of it, slowly, and I fix what the machine misheard. The model does the seeing. I do the deciding about what the words are. That division is the entire design.

How it gets better

The first batch was the worst batch, and on purpose.

After I correct a transcription, the misreads do not just get fixed and forgotten. They go into a lessons file, paired with what the word actually was. Over batches that file becomes a record of how the model tends to misread me specifically, plus a small dictionary of the words and coinages I use that no general model would guess. The next batch starts from those priors. The model is not learning in any deep sense between runs, but I am feeding its next first pass everything the last pass got wrong, and the first-pass accuracy climbs because of it. The work compounds. Batch ten is much less correcting than batch one.

That is the shape I keep coming back to in all of this. The machine is extraordinary at the mechanical seeing that used to be impossible, and useless at the one judgment that actually matters, which is whether the words are mine. So you build the pipeline to do the part it is good at, you keep yourself in the loop for the part it is not, and you give it a memory of its own mistakes so the boundary between the two keeps moving in your favor. The rest of how I build this way is at AI-assisted engineering.