Fuzzy Matching URLs in the AI and LLM Hallucination Era

In the LLM era, one subtle but important problem has started to show up more often:

If a large language model cites a blog post and slightly hallucinates or mutates the URL slug, the link it generates can easily 404.

That is bad for:

Readers who click a broken link and never find the content.
Creators who lose traffic and context.
The web because machine-generated citations become fragile.

To make my blog more resilient to this, I implemented fuzzy slug routing using fuse.js, so that even if an LLM or a human slightly messes up the slug, the user can still land on the right post.

The Problem: Generative Text, Fragile URLs

A typical blog URL looks like this:

/blog/my-awesome-post-about-slugs

This slug is usually:

Derived from the post title.
Treated as a strict, exact identifier.
Assumed to match perfectly.

In a world where LLMs generate text, that assumption is increasingly weak.

An LLM might:

Drop a word.
Add an extra word.
Change singular to plural.
Slightly rephrase the title before slugifying.

For example:

Original slug:

/blog/understanding-static-site-generation
LLM-generated slug:

/blog/understanding-static-site-generators

To a human, that is clearly the same idea. To a traditional router, it is a completely different path that results in a 404.

What I really want is something like this: you are close enough, let me find the post you most likely meant.

The Idea: Treat the Slug as a Fuzzy Query

Instead of treating the slug as a hard, exact key, I treat it as a fuzzy search query over all my posts.

Conceptually, the flow is:

Read the incoming slug from the URL.
Use fuse.js to run a fuzzy search against a list of all posts.
If there is a strong enough match, resolve that post and render it.
If not, fall back to a proper 404.

This means that even if an LLM changes the slug slightly, there is still a good chance the user lands on the intended article.

Imagine the canonical slug is:

/blog/fuzzy-slug-routing-for-blogs-in-the-llm-era

But an LLM outputs:

/blog/fuzzy-slug-routing-in-llm-era

A traditional router sees a completely unknown slug and returns 404.

With this setup, that incoming slug is treated as a query, and as long as the similarity is high enough, it still resolves to the correct post.

Live demo

You can see this idea in action with two URLs.

fuzzy-matching-urls-in-the-ai-and-llm-hallucination-era is the canonical slug for this post.
fuzzy-matching-urls-in-the-hallucination-era is a slightly altered slug that should still land on the same post.

Data Strategy: Precomputing data.json at Build Time

I use a CMS as the source of truth for my blog posts. A naive approach would be to call the CMS API on every request to resolve the slug.

That would be slower, dependent on external network calls, and unnecessary given that the content is mostly static.

Instead, I decided to handle this at build time.

During the build, I fetch all blog posts from the CMS and extract the core fields needed for routing and search, such as title and slug, and optionally tags or summary. I then serialize them into a static data.json file.

Here is the build time script that generates that file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import { mkdir, writeFile } from 'node:fs/promises';
import path from 'node:path';
import { getClient } from './sanity-server';

async function generateBlogFuseData() {
  const blogs = await getBlogs();

  const filePath = 'src/fuse/data.json';

  await mkdir(path.dirname(filePath), { recursive: true });
  await writeFile(filePath, JSON.stringify(blogs), 'utf8');
}

generateBlogFuseData();

At runtime, the slug resolution layer does three simple things:

Load data.json.
Instantiate Fuse with that dataset.
Run a fuzzy search against slug and optionally title.

For my use case, slug is the primary field, and title can be a secondary field to catch more aggressive mutations.

The resolution logic looks like this conceptually.

In code, the core of the resolution function looks like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import Fuse from 'fuse.js';
import blogs from '../../../src/fuse/data.json';

async function getData(params: { slug: string }) {
  if (!params || !params.slug) throw new Error('No slug found in params');

  const fuse = new Fuse(blogs as Array<{ slug: string; title: string }>, {
    keys: ['slug', 'title'],
    useExtendedSearch: true,
    ignoreLocation: true,
    threshold: 0.44,
    minMatchCharLength: 6
  });

  const slugLength = params.slug?.split('-').length || 0;
  const relatedBlogs =
    slugLength >= 3
      ? fuse.search(params.slug).map(({ item }) => item)
      : [{ slug: '', title: '' }];

  const slug = relatedBlogs?.[0]?.slug;

  return { slug };
}

Take the incoming slug string from the URL. Run fuse.search on that slug. Inspect the top result. If its score is within a safe threshold, treat it as the resolved post. Otherwise, return a 404.

Because fuse.js runs on a static data.json that is generated at build time, this lookup is cheap and fast.

Why Bother: Robustness in an AI Heavy Web

This entire approach is about robustness.

LLMs will increasingly generate links. Some will be perfect, many will be close enough but technically wrong. Traditional slug routing is brittle: a single character off and the request fails.

Fuzzy routing offers graceful degradation: minor hallucinations still resolve to useful content.

It is also a small step toward making URLs more semantic instead of strictly token based.

Trade Offs and Considerations

This is not magic, and there are trade offs.

Ambiguity can show up if multiple posts are very similar, because fuzzy search might pick the wrong one. Threshold tuning matters a lot. You need to experiment to find the sweet spot between overly strict and overly forgiving. SEO still relies on canonical URLs. The fuzzy match is a way to serve content, but the canonical slug remains the source of truth.

In practice, for a personal blog or small content site, these trade offs are acceptable.

Conclusion

In a world where content is generated, links are machine written, and hallucinations are inevitable, relying on exact string equality for slugs feels outdated.

By precomputing a simple data.json at build time, using fuse.js to fuzzily match incoming slugs, and falling back to a 404 only when there is truly no good match, I have made my blog more resilient to LLM hallucinations and minor human errors while keeping the implementation simple and friendly to static hosting.

If you run a content site and expect LLMs to link to you, fuzzy slug routing is a small but meaningful upgrade to your URL strategy.

NOTE: LLM was used to help with grammar fixes in this blog post.