The Metric You'll Fight For Is the One Lying to You

I was on a Galadon Gold onboarding call with a founder who runs a job search service. Smart guy. Real operation. He's sending 50,000 cold emails a week, running 60,000 LinkedIn connection requests a week, generating around 200 meetings weekly through outreach alone. By any reasonable measure, this is a serious machine.

At some point I said, offhand, that a 0.4% meeting book rate off 50,000 weekly sends wasn't great. He pushed back and said his reply rate was 5%. I told him I wasn't sure I believed that number - not because I thought he was lying, but because the math on meetings-booked didn't support it.

What happened next is what this whole post is about.

He corrected me in real time. "It's 5.65 percent. It wasn't five." And then he shared his screen without me asking, pulled up his dashboard, and showed me the number.

5.65%. Right there. Documented. Verified. He'd been tracking it for a year.

And here's what I noticed: he didn't just share the data. He needed to share the data. The energy behind that screen share wasn't casual. It was the energy of a guy defending something important. He'd been doing cold email for a long time. He knew his numbers. And someone had just questioned the number he was most proud of.

I want to be clear - 5.65% is genuinely a solid reply rate. The benchmarks we track across the Galadon Gold community put a healthy cold email campaign in that zone. He's earned that number. It represents real work, real testing, real iteration over years.

But the screen share told me something more important than the reply rate itself.

When a Number Becomes an Identity

There's a version of metric-tracking that's scientific. You watch numbers the way a lab tech watches a cell culture - with detachment, curiosity, and a willingness to kill the experiment if the results aren't what you hoped. You don't root for the data. You just read it.

Then there's the other version, which is what most founders actually do. You watch a number survive long enough that you start to identify with it. It becomes proof of something about you - your competence, your persistence, your system. And the moment someone questions it, the conversation stops being about optimization and starts being about identity.

The number that made him screen share wasn't just a reply rate. It was, in some meaningful sense, him.

That's the trap.

Because here's what the rest of the conversation revealed: his operation, for all its impressive scale, had some real infrastructure issues that were almost certainly suppressing his true potential. He was running 50,000+ sends per week through Google Workspace inboxes at a time when Google's AI spam detection has been progressively crushing deliverability for high-volume senders. There have been large-scale blanket bans on providers like Instantly who were running Google Workspace infrastructure. There's also what I'd call the shadow spam problem - emails that don't even reach the spam folder, they just disappear. Google started routing certain emails into a secondary spam layer that recipients can't even see. No inbox, no spam folder, just gone.

What does that mean practically? His 5.65% reply rate might be measured correctly against delivered emails. But if a significant percentage of his sends are being swallowed by spam filters or disappearing entirely before they hit any inbox - the real underlying reply rate from people who actually saw the email could be meaningfully higher. Or his deliverability wall is silently capping what he could be achieving.

When I pointed this out, his response was essentially: the system is working, I'm happy with 5%, I've tried switching tools before and it's a pain in the ass, and besides - why would I change something that's performing?

That logic sounds reasonable. It might even be reasonable in isolation. But it's also exactly the reasoning you'd use if a number had become too important to you to risk.

The Infrastructure Problem He Didn't Want to See

I'm not going to spend this whole post on cold email infrastructure, but the specifics matter here because they illustrate the psychological point.

The recommendation I gave him was straightforward: don't turn off what you're doing, but supplement with custom infrastructure. Specifically, I pointed him toward HyperTide, which runs on Microsoft Azure servers rather than standard Google Workspace. The reason this matters is that emails sent through Azure infrastructure aren't filtered through the same AI spam detection algorithms Google uses for its own Workspace emails. They're essentially coming from a different traffic lane entirely, which means they don't carry the same fingerprint that's getting flagged at scale.

The principle is simple: if you're sending 50,000 emails a week through Google Workspace and deliverability has been declining - which it has been, industry-wide, for anyone paying attention - then the same campaigns run through a custom infrastructure stack will hit more inboxes. Not because the emails changed. Just because the routing did.

I also suggested pairing that with EmailBison as a sequencer, which is purpose-built for high-volume sending at the agency and enterprise level. It runs single-tenant infrastructure - your sends never share an IP with anyone else, so another sender's spam behavior can't contaminate your reputation. For someone at his volume, the math on a dedicated stack makes sense pretty quickly.

His pushback wasn't "that's wrong." His pushback was: "We've built a lot of infrastructure around what we have. Switching takes time. And look - the numbers are good."

Which, again, is reasonable. But I'd seen this before. The systems you've built around a metric can eventually become the reason you can't improve it.

The AI Personalization Assumption

There was a second thing I challenged him on, and this one got an even more interesting reaction.

His stack includes AI-generated personalized messaging for every email - the system pulls signals, formats them with AI, and generates tailored copy at scale. This is a popular approach right now, and it sounds good in theory. Personalized messages should get better replies than generic ones, right?

The problem is that when we look across all the active senders we're working with inside Galadon Gold - overseeing millions of cold emails a month - the data has been consistently pointing in a direction that surprises a lot of people: AI-powered personalized messaging is underperforming good spin-text. A well-constructed cold email script with varied spin-text and a clear value prop is beating hyper-personalized AI emails in meeting book rate, repeatedly.

There are a few reasons this might be true. AI-generated personalization often sounds like AI-generated personalization - there's a cadence to it that experienced prospects have started to recognize. It also tends toward longer, more complex sentences, which can hurt deliverability. And at high volume, the "personalization" often amounts to mentioning the prospect's company name and industry in slightly different arrangements - which isn't really personalization, it's decoration.

His reaction to this? Defensive. Not combative, but firm. He'd built the AI personalization into his entire workflow. He'd spent time and money on it. And his system was producing a 5.65% reply rate, which was good. Why would he believe that a simpler approach would outperform it?

This is the exact same pattern as the infrastructure conversation. A decision that made sense at the time has been calcified into a permanent feature of the operation - and now any evidence that contradicts it feels like an attack on the whole system.

I told him to test it. Don't replace the AI personalization. Just run Nabil, our head cold email coach, on one campaign with clean spin-text scripts and compare the reply rates over a few weeks. Let the data tell the story. That's it. If I'm wrong, you lose nothing. If I'm right, you find out what your real ceiling is.

He acknowledged it was worth thinking about. But his energy in that moment was the energy of someone who had already decided the answer.

Free Download: 7-Figure Offer Builder

Drop your email and get instant access.

You're in! Here's your download:

Access Now →

Scale That Creates Inertia

Here's the thing about running an operation at that volume: it creates a very specific kind of blindness.

When you're sending 50,000 emails a week, 200 meetings a week, 60,000 LinkedIn connections a week - you start to feel like the system is the achievement. The machine is running. The numbers are good. Why break what works?

But I'd argue that the bigger your scale, the more you should be questioning your core metrics - not less. Because at 50,000 sends a week, a 1% improvement in deliverability isn't a marginal gain. It's potentially thousands of additional conversations a year. A move from 5.65% replies to even 7% doesn't change your system - it doubles your meeting volume at the same cost basis.

The problem is that scale creates inertia. Every API integration, every inbox you've connected, every Slack channel where your team monitors responses - all of it is built around the current stack. Changing infrastructure feels enormous because the switching cost isn't just financial, it's operational and psychological.

And the number - that 5.65% - provides cover for the inertia. It says: you don't need to change anything, you're doing great. Which is exactly what a trap would say.

What I Actually Told Him to Do

My advice was deliberately conservative, because I knew he wasn't going to blow up a working system - and he shouldn't. The move isn't to tear everything down. The move is to add a lane.

First: keep running your current stack. Don't turn it off. Let it keep producing meetings while you test.

Second: stand up HyperTide as a supplement. Run 25,000 of your weekly sends through the custom Azure infrastructure and watch what happens to deliverability. This doesn't require rebuilding your whole system. It's a parallel test. Two weeks of warmup, then compare the reply rates side by side. The difference in deliverability between Google Workspace and custom infrastructure isn't subtle - we've seen it consistently across the community.

Third: get on a call with our cold email coach Nabil and test one campaign without AI personalization. Run a tightly written spin-text script against your current AI-generated version. Track the reply rates. Let the market tell you which one wins.

That's it. Two tests. Neither of them requires stopping what's working. Both of them give you real data about whether 5.65% is your ceiling or your floor.

If the tests come back flat, you've confirmed that your current approach is dialed in and you can stop wondering. If they come back positive - and I think they will - you've just found your next real performance jump without changing your core operation.

Either way, you're measuring. You're not defending.

The Metric You'll Fight For

I want to come back to the screen share, because it's the part of this conversation I keep thinking about.

In fifteen-plus years of running sales operations, writing cold emails myself, building and selling companies - I've made this exact mistake with my own numbers. There was a period where I had a metric I was proud of, and I can look back now and see that I stopped being willing to genuinely question it. I tested things around it, but not against it. I optimized within the system rather than examining the system.

The signal that a metric has become an identity is usually this: you feel relief when it holds, and you feel defensive when someone questions it. Those emotional responses have nothing to do with optimization. They're about your self-concept.

A number you'll fight for is a number you've stopped being able to see clearly.

The right relationship with your metrics is much colder than that. It's: this is what the data shows right now, and I'm actively trying to find out if it can be better. Not: this is proof that my system works and I'll screen-share to prove it.

That founder had built something real. The 200 meetings per week, the scale of the operation, the year-over-year tracking - that's legitimate work and legitimate results. I wasn't dismissing any of it. I was asking whether 5.65% was the best he could do, or just the best he'd done so far.

Those are different questions. And only one of them is worth asking.

Need Targeted Leads?

Search unlimited B2B contacts by title, industry, location, and company size. Export to CSV instantly. $149/month, free to try.

Try the Lead Database →

The Practical Takeaway

If you're running cold email at any serious volume, the questions worth asking right now are:

What's my actual deliverability rate? Not how many emails I'm sending, but how many are hitting the primary inbox. If you're on Google Workspace, that number has probably been declining and you may not realize it because your reply rate is still "good."
Is my AI personalization actually helping? Test it against a clean spin-text script. Don't assume the answer. The data from high-volume campaigns has been pointing in a direction most people don't expect.
What metric am I most reluctant to question? Whatever your answer is - that's where you should look first.

For lead sourcing and building the lists that feed these campaigns, we use a combination of tools depending on the use case - LinkedIn Recruiter for certain profiles, Apollo for broader B2B targeting, and ScraperCity's B2B database for unlimited access without per-lead costs. For finding verified emails once you have a name, ScraperCity's email finder is what we reach for. The infrastructure conversation doesn't matter if your list is weak to begin with.

For the sending side, the stack I recommended on that call was HyperTide for infrastructure and EmailBison for sequencing at high volume. If you're at lower volume or just getting started, Smartlead or Instantly are solid starting points - both integrate cleanly with HyperTide's infrastructure. And grab the top 5 cold email scripts if you want to see what clean, tested copy looks like before you A/B test it against AI personalization.

The bigger point is this: the metric you're most attached to is the one most worth interrogating. Not because you're wrong. But because attachment is the thing that stops you from finding out.

That founder sent me his screen to prove his number was real. The number was real. But the willingness to defend it so hard - that was the more interesting data point.

If you want to work through your own numbers on a live call, that's exactly what we do inside Galadon Gold. Bring your stack, your reply rates, your current scripts - and we'll find out together whether you've hit your ceiling or just your comfort zone.

Ready to Book More Meetings?

Get the exact scripts, templates, and frameworks Alex uses across all his companies.