Solved: Why ChatGPT Won't Say "Brian Hood" (Blame Regexes)

Solved: Why ChatGPT Won't Say "Brian Hood" (Blame Regexes)

We've figured out why ChatGPT refuses to say certain people's names, no matter what. And it's for the best engineering reason ever: regexes.

Yep, you heard that right. ChatGPT's internal code has regexes looking for specific names and throwing an error anytime it sees them coming out of ChatGPT directly.

The oldest security measure in the book

What's the oldest and best security measure? Regexes, of course. But why are they even doing this?

An arrow points to code that uses a regex of Brian Hood's name. The arrow is labeled "maximum security"

It turns out these are people who tried to sue OpenAI. So how can you get in any more trouble with ChatGPT making up lies about people? Simple: just never allow ChatGPT to mention that person's name. Ever. No matter what.

A news article titled "ChatGPT: Mayor starts legal bid over false bribery claim" with an image of Brian Hood.

But isn't that terrible engineering?

Sure, you might say, "Wow, what terrible engineering. How disgusting." But really? What are they going to do instead?

A meme of "Hackerman"

Are they going to retrain a new model for millions of dollars just to not mention these few random people's names? And do that every time they're worried about lawsuits?

Or if someone's mounting a suit against you, are you going to let yourself continue to be liable to the same thing over and over?

Sometimes simple is best

It's easy to make fun of this approach, but it's such a simple, easy-to-leverage solution that they did in a pinch. Because it was a good idea.

Fun fact: the actual Builder.io codebase for our Figma to code product does literally the same thing. Yes, we have advanced machine learning, malicious content detection, yada yada.

Builder.io code that checks for malicious matchers using regex

But sometimes when we see a very clear pattern that we need to just stop immediately, we just add it to an array of regexes and disable any content that matches. It's fast. It's cheap. It's effective.

A list of names that ChatGPT can't respond with, including Brian Hood

Y'all, you don't have to over-engineer every single problem.

How strict are these regexes?

The regexes are pretty strict. If you had a dot in the name, if you had the person's middle initial or middle name in there, it's fine. It'll output it. But just first name, space, last name? No chance.

And the same goes for other names like this.

The power of regex

While AI hallucination is no new thing, and it's probably the number one problem that LLMs have had forever and probably will have for some time, at least we know when you can't fix it with the world's most advanced AI, you can always fix it with a regex.

A regex with Brian Hood's name with sparkles

Don't let anyone tell you in your pull requests that a regex is no good or a bad practice. The biggest, best, most innovative companies do it. And you should do it too.