Home/Thoughts
Thoughts

Your AI Sounds Dumb Because of One Missing Step

The fix takes 20 minutes. The upside is every demo you run from here on lands harder.

I was on a call with a developer who's building an AI chatbot product. Smart guy. His team had already deployed web scraping to the cloud, they were handling proxy timeouts, they'd gotten the core scraping functionality working on a live server. Real progress, moving fast.

And then he showed me the chatbot responding to a user question, and it spit out something like: hermanmiller.com/products/seating/office-chairs/aeron-chairs/

Dead. Raw. URL dump. Like a broken phone tree from 2004.

Nobody's wowed by that. Nobody reads that and thinks, wow, this AI is smart. They think the product is half-baked. And the worst part? His AI wasn't dumb. His underlying model was fine. The problem had nothing to do with GPT, nothing to do with his prompts, nothing to do with his training data. It was one missing step at initialization.

The Difference Between a URL and a Colleague

Think about how you talk to a knowledgeable coworker versus how you talk to a search engine. When you ask a coworker, "where do I sign up?" they say, "go to the signup page." They don't paste a URL in your face. They use language that maps to intent. They've internalized what lives where, and they communicate accordingly.

A chatbot that spits raw URLs is not acting like a colleague. It's acting like it has a URL database and no idea what those URLs mean. Which, if you skipped this step, is exactly what it is.

Here's the fix: when you initialize your OpenAI assistant, before any user conversation begins, you make one call where you pass a labeled URL map as context. You tell the AI: this URL is the signup page, this URL is pricing, this URL is the Aeron chairs product page. You give it the spider of links and you have it - or a separate lightweight OpenAI call - generate human-readable labels for each one.

Now when a user asks "how do I buy one of those chairs," the bot says "you can check out the Aeron chairs page" and links it. Not a raw string. A label. A sentence that sounds like a person wrote it.

That's the whole thing. That one initialization call is responsible for more "wow, this actually feels smart" reactions in demos than any amount of prompt engineering.

Why Most Developers Skip This

The reason this step gets skipped is that the chatbot technically works without it. If you test it yourself, you know what every URL is. You don't notice the problem because you have context the user doesn't have. The URL hermanmiller.com/products/seating/office-chairs/aeron-chairs/ makes sense to you because you built the damn thing. It means nothing to someone who just opened the widget on a website for the first time.

The second reason is scope confusion. On my call, the developer asked me directly: "Is link labeling part of the scraping scope, or is it a prompt scope thing?"

Good question. The answer is: it's both, and that's exactly why it falls through the cracks. Nobody owns it. The scraper pulls the links. The prompt engineer writes the instructions. Nobody is explicitly assigned to the step that says take the links we pulled, label them intelligently, and pass that label map as context at initialization.

So it doesn't get done. And the chatbot sounds mechanical even though the underlying model is perfectly capable of sounding human.

How the Initialization Call Actually Works

The concept is straightforward. When you set up your assistant via the OpenAI API, you have an initialization step - you pass instructions, context, and configuration before any user thread begins. Most developers use this step for persona instructions: "You are a helpful assistant for Company X. Be concise. Don't make things up." That's fine. That's necessary. But it's not enough.

What you want to do in addition to that is pass a structured reference of your site's URL map. Something like:

You can generate this map automatically by running your scraped URL list through a quick OpenAI call and asking it to label each URL based on the slug and page content. That becomes your reference document. Then you pass that document as context during assistant initialization - not at query time, at startup.

Once the assistant has that map baked in, it doesn't have to guess what hermanmiller.com/products/seating/office-chairs/aeron-chairs/ is. It already knows. It was told at initialization. So when a user asks a question, it can respond with natural language that includes a properly labeled, human-readable link.

This is exactly what tools like Chatbase figured out early. They don't have some proprietary AI model. They're running on the same OpenAI infrastructure everyone else has access to. The difference is in how they structure initialization - they make sure the assistant understands what every piece of content in its knowledge base is and what it's called before it ever talks to a user. That's what makes their bots feel polished.

Free Download: 7-Figure Offer Builder

Drop your email and get instant access.

By entering your email you agree to receive daily emails from Alex Berman and can unsubscribe at any time.

You're in! Here's your download:

Access Now →

One More Thing: Scope the Domain

There was a related issue on the same call. The scraper was pulling in links from Herman Miller's investor relations page, and possibly would have gone further - Twitter profiles, LinkedIn company pages, external partner sites. The question came up: do we scrape those too?

No. And this isn't just a technical preference. It's a product integrity issue.

If your chatbot scrapes your client's Twitter profile and then starts serving users answers based on tweet content from three years ago, you've got a problem. If it pulls in an investor relations page and tells a regular customer something that belongs in a shareholder briefing, you've got a bigger problem. If it accidentally indexes a competitor mention or a news article that linked back to the site - now your AI is talking about things it has no business talking about.

The rule is simple: limit the scraper to the root domain. If someone wants to include a subdomain or a specific external page, they add it explicitly as a second input. You don't let the spider follow every outbound link on the page. That's how you end up with a chatbot that knows more about your client's LinkedIn endorsements than their actual product catalog.

Scoping the domain also makes your URL map cleaner, which makes the initialization call more accurate, which makes the chatbot smarter. Everything downstream of that first decision either works cleanly or doesn't.

The Sandbox Test Before Integration

One thing I want to be clear about: don't integrate until you've tested this in a sandbox. The workflow we discussed on the call was exactly right - build a simple sandbox interface where you can paste any URL and see the chatbot output in real time, before you wire it into your front-end product.

The reason is obvious once you've done it wrong: you don't know what the scraper is actually pulling until you see it. You don't know how the AI is going to label a messy URL slug until you test it. You don't know if the timeout is going to kill the call at 28 seconds until you watch it happen in a sandbox where the only person affected is you.

Build it there first. Approve it. Then integrate. That one step saves you from pushing broken behavior into a product demo and having to explain to a client why your AI is telling their customers to visit /products/legacy-archive/deprecated-2019/index.html.

There's also a front-end piece here that's equally overlooked. When your chatbot outputs a labeled link - "visit the Aeron chairs page" - that text needs to be clickable and visually distinguishable. Underlined, at minimum. Not blue necessarily, not neon, but underlined so the user knows they can click it.

If it looks like regular sentence text, users don't click it. If it's a raw URL, it looks like spam. The sweet spot is: same color as the surrounding text, underlined, and hyperlinked. That's it. You're not designing a UI here. You're just not making it invisible.

This is a rich text rendering problem on the Bubble front-end side, not an AI problem. But it matters because even if your AI is now producing perfectly labeled links at initialization, if those links don't render as clickable text in your chatbot widget, you've solved the intelligence problem and created a usability problem.

Fix both. They're equally cheap to solve.

Need Targeted Leads?

Search unlimited B2B contacts by title, industry, location, and company size. Export to CSV instantly. $149/month, free to try.

Try the Lead Database →

What This Means for Anyone Shipping an AI Product Right Now

I've built SaaS products. I've built and exited SaaS products. And I can tell you that the gap between a demo that gets a "huh, interesting" and a demo that gets a "wait, we need to buy this" almost always comes down to the polish layer - the things that make AI feel like it's actually paying attention to the user's context instead of just completing a query.

URL labeling at initialization is one of the cheapest polish moves available. You're not retraining a model. You're not hiring a prompt engineering consultant. You're making one structured call at startup that tells your AI what your content map looks like in plain English. It takes maybe 20 minutes to implement properly once you understand what you're doing.

The developers who skip this are going to watch a competitor ship a product that uses the exact same model and looks twice as smart. Don't be that developer.

If you're building with web scraping as part of your stack - whether for AI chatbots, lead enrichment, or anything else - the underlying scraper infrastructure matters a lot. For scraping web data at scale, ScraperCity is what I use for B2B data specifically, and it integrates cleanly into workflows like this. For building your URL maps and contact lists, tools like Clay are worth having in the stack for data enrichment and structuring what you pull.

And if you want the full picture on how to build outbound systems that actually convert - from how you source data to how you structure your outreach - the 7-Figure Agency Blueprint covers the whole stack. The technical stuff I talked about here is one piece of a larger machine.

The point is: your AI probably isn't dumb. It's just missing one initialization step. Add it. Watch your demos land differently.

Ready to Book More Meetings?

Get the exact scripts, templates, and frameworks Alex uses across all his companies.

By entering your email you agree to receive daily emails from Alex Berman and can unsubscribe at any time.

You're in! Here's your download:

Access Now →