I Compiled My Whole Blog Into One File at Build Time
There is a 4.8 megabyte TypeScript file in my repository that I did not write. A script writes it, on every build, and I commit the result into git like it was mine. Most build advice would tell me th...
There is a 4.8 megabyte TypeScript file in my repository that I did not write. A script writes it, on every build, and I commit the result into git like it was mine. Most build advice would tell me that is a mistake. I do it on purpose, and after a year of living with the decision I would make it again.
The file is the whole blog. Every published post, every draft, every book chapter, around five hundred entries, flattened into one big typed array that the pages import directly. The markdown is still the source of truth. It still lives in folders, one file per post, the way it should. But the pages never read those folders. They read the array.
Why not just read the markdown
This is an Astro site, and Astro is perfectly happy to read markdown at build time. For a while it did. The problem was not any single page. It was that almost every page needs to know about the others.
The related-posts block needs the other posts. The category indexes need every post in a category. The hub page needs the whole AI cluster. The sitemap needs all of it, and so does the internal-link graph I built to audit the site. Every one of those was reaching into the same folders, reading the same hundreds of files, parsing the same frontmatter, running the same logic to filter drafts and normalize dates and derive slugs. The work was duplicated everywhere and it got slower as the site grew.
So I moved the reading out of the pages entirely. A prebuild step runs once, reads all the markdown, runs the shared loader logic a single time, and writes the result into one file. After that, every page that needs content imports an array that is already parsed, already filtered, already typed. No file system in the hot path. No parsing repeated two hundred times. The page layer stops being a thing that reads markdown and becomes a thing that queries a list.
The part people will argue with
Committing a generated file is the choice that gets the side-eye, and I understand why. Generated artifacts in version control are usually a smell. They go stale, they bloat diffs, they invite the question of which copy is real.
I keep it anyway, for three reasons that turned out to matter more than the tidiness.
The build is deterministic and fast, because the expensive step already happened and its output is sitting right there. A content change shows up in git as a content diff, which is a readable audit trail of what actually changed on the site, not just which source file I touched. And the site can build even if the loader has a bad day, because the thing the pages depend on is data, not code that has to run correctly at the worst possible moment.
The cost, honestly
There is a real price, and I am not going to pretend there is not. The cache and the markdown can drift. If I edit a post and forget to regenerate the file, the post changes and the site does not, and for a confusing few minutes I am debugging a problem that does not exist because I am looking at the new markdown and the old cache.
That has bitten me. The fix is not to be more careful, because careful is a feeling. The fix is to wire the regeneration into the build so it cannot be skipped, and to treat the commit as part of editing the post rather than a separate chore. The generated file is only safe because the thing that keeps it honest is automatic. A precomputed cache you have to remember to update is a bug with a delay on it.
That is the whole trade, and it is a general one. When reading is expensive and happens in a lot of places, precompute it once into a shape that is cheap to read, and then pay the standing cost of keeping the precomputed thing in sync with the truth. The win is real and so is the bill. I run the same pattern on the internal-link graph and on the search index, and it is the same lesson every time: the speed is free, the consistency is not, and you buy the consistency with automation or you do not really own it. The rest of how I build this site lives at AI-assisted engineering.