MDX at Scale: What Breaks When You Have More Than 5 Posts

When you have five blog posts, your MDX setup feels solid. You have a directory, a helper function that reads it, and a working page. Then you ship a few more posts, add a tag filter, try to add a sitemap, and suddenly things are slightly wrong in ways that are annoying to debug.

This is the story of what breaks between post five and post fifteen, and how the patterns in this site are designed to prevent it.

The Standard Setup (and What It Assumes)

Most Next.js MDX blogs follow a pattern like this:

// naive version — works fine up to ~5 posts
import fs from "fs";
import path from "path";
import matter from "gray-matter";
 
export function getAllPosts() {
  const dir = path.join(process.cwd(), "src/content/blog");
  return fs.readdirSync(dir)
    .filter((f) => f.endsWith(".mdx"))
    .map((filename) => {
      const raw = fs.readFileSync(path.join(dir, filename), "utf8");
      const { data } = matter(raw);
      return { slug: filename.replace(".mdx", ""), ...data };
    });
}

This works. It is also carrying hidden assumptions: the filesystem is available at runtime, every MDX file has valid frontmatter, and the type of the returned object is whatever matter happens to parse. These assumptions hold until they do not.

Problem 1: Manifest Drift

The most common MDX blog failure mode is a post that exists in the filesystem but produces wrong data — missing description, wrong date, typo in a tag — and you only notice because a visitor sees it in a listing or a social preview is blank.

With filesystem scanning, every file is treated equally regardless of whether it is ready to publish. You have to either gate on a published: true frontmatter field (which is easy to forget) or rely on the directory itself as the publication gate (which breaks the moment you commit a draft).

The static manifest inverts this. A post does not exist until you add it to the manifest. Forgetting to add it does not produce a broken post — it produces no post. That is a much better failure mode.

// src/lib/blog-manifest.ts
import type { BlogPost } from "./blog";
 
export const blogManifest: BlogPost[] = [
  {
    slug: "mdx-at-scale",
    title: "MDX at Scale: What Breaks When You Have More Than 5 Posts",
    description: "The patterns that work for a small MDX blog start showing cracks...",
    date: "2026-03-20",
    tags: ["mdx", "nextjs", "engineering"],
    readingTime: 7,
    published: true,
  },
  // ...
];

The manifest is the source of truth. The MDX file is the content. They are separate concerns.

Problem 2: Types Are Inference All the Way Down

With gray-matter and filesystem scanning, the type of a blog post object is Record<string, unknown>. You can cast it, but the cast is not verified. A post with date: null will pass TypeScript and break your date sort at runtime.

The manifest approach gives you a real type:

// src/lib/blog.ts
export interface BlogPost {
  slug: string;
  title: string;
  description: string;
  date: string;
  tags: string[];
  readingTime: number;
  published: boolean;
}

Every manifest entry is checked against this interface at compile time. A post with a missing description or a readingTime typed as a string is a TypeScript error, not a runtime surprise. This is especially valuable when posts are being written by an automated process — the manifest is the contract.

Problem 3: Tag Inconsistency

This one is subtle. At five posts, your tags are fine. At fifteen, you have agentic-engineering in eight posts, agentic_engineering in two, agent in one, and AI in another. Your tag filter shows five different entries for what should be one category, and you have no idea which posts are missing from each.

The fix is a taxonomy document:

# Canonical Tags
 
| Tag                  | Scope                          |
|----------------------|--------------------------------|
| agentic-engineering  | AI agent orchestration work    |
| ai                   | AI/ML concepts, models         |
| nextjs               | Next.js App Router, builds     |
| cloudflare           | CF Workers, Pages, edge        |

Then you enforce it. Not with a linter (though you could), but with a review step before any post goes into the manifest with published: true. The taxonomy document is in docs/blog-tagging-taxonomy.md. The author guidelines reference it. The barrier to adding a new tag is intentionally non-zero.

At fifteen posts this feels like overkill. At fifty posts you will be glad it exists.

Problem 4: Reading Time Is a Lie

Most reading time implementations count words and divide by 200. That estimate is fine for body text. It is misleading for technical posts with dense code blocks, which readers scan differently than prose.

There is no universally correct answer here, but the manifest-driven approach makes it easy to set readingTime manually per post rather than computing it from the file. A post with five code blocks that each take thirty seconds to parse is not a seven-minute read just because it has 1,400 words.

The convention in this site: estimate at 200 wpm, round up, then add one minute for every three substantial code blocks. It is still an approximation. It is a better approximation.

Problem 5: Component Sprawl

MDX lets you use React components inline. This is powerful. It is also a maintenance surface.

The pattern that scales: keep the component list small and stable. If a component is used in one post, it probably should not be a registered MDX component — it should either be promoted to a reusable site component or the post should be rethought. One-off MDX components are technical debt that accrues silently.

The check: before adding a component to the MDX provider config, ask whether it belongs in the design system or whether the content design should change instead.

Why Explicit Beats Automatic

The through-line here is that filesystem magic — auto-discovery, inferred types, auto-registration — trades immediate convenience for long-term maintenance cost. Every post is different from every other post. The manifest forces you to register those differences explicitly, which means the differences are visible and verifiable.

This is not an original insight. It is the same reason migrations are better than auto-migrate, why explicit imports beat barrel files at scale, and why seed data beats fixtures. Automatic systems optimize for the common case. Explicit systems give you control over every case.

For a blog that will have fifty posts by this time next year, the manifest is the right bet. For a blog that will stay at five posts forever, the filesystem scanner is fine.

You probably do not know which one you have.

The Manifest Pattern in Practice

The full implementation is not complex. blog-manifest.ts holds the typed array. blog.ts exports getAllPosts(), getPostBySlug(), and getRecentPosts() — all filtering from the manifest. No filesystem reads, no gray-matter, no path.join.

// src/lib/blog.ts
import { blogManifest } from "./blog-manifest";
 
export function getAllPosts(): BlogPost[] {
  return blogManifest
    .filter((post) => post.published)
    .sort((a, b) => new Date(b.date).getTime() - new Date(a.date).getTime());
}
 
export function getPostBySlug(slug: string): BlogPost | undefined {
  return blogManifest.find((post) => post.slug === slug);
}

The MDX files live in src/content/blog/ and are loaded by slug using next/mdx dynamic imports at render time. The manifest tells the app which slugs exist. The MDX file provides the content. The separation means you can change either without touching the other.

It is a small pattern. It handles the problems that show up at scale. The cost is one manual step per post. That is a trade worth making.