AI Workflows

How I Built an AI Visibility Audit Tool for B2B SaaS Using Claude Code

Usama Khan
Usama KhanPublished: Apr 5, 20267 min read
How I Built an AI Visibility Audit Tool for B2B SaaS Using Claude Code

I've been using Claude Code a lot lately. Mostly to build small workflows that help me get better results for my clients. Things like content audits, competitor tracking, and data pulls that would take hours to do by hand.

One thing I kept coming back to was AI visibility. When someone asks ChatGPT or Perplexity to recommend a tool in my client's category, does their brand actually show up? And if it does, is their website being used as a source?

There are tools for this already. Peec AI, Otterly, and a few others track AI mentions. But I wanted to build the process myself. Not because those tools are bad. I just wanted to understand the mechanics firsthand. What exactly gets checked, how the scoring works, and what the output should look like for it to be useful.

So I opened Claude Code and started building.

Key takeaways

  • AI visibility and Google rankings are two different things. Your brand can rank on page one and still be invisible when buyers ask ChatGPT or Perplexity for recommendations.
  • Track two signals separately: whether the brand name appeared in the response (recommended) and whether a URL from the brand's domain was used as a source (cited). They tell you different things about your visibility.
  • The tool is only as good as the queries you feed it. Pull from keyword research tools like Ahrefs or SEMrush, focusing on conversion-focused and bottom-of-funnel terms. Layer in sales call transcripts, search console data, and Reddit threads. Then convert those keywords into natural questions buyers actually type into AI.
  • Claude Code doesn't need you to be a developer. You write a clear brief describing what you want, it builds the tool, and you iterate on the logic in plain language until it works.

What the AI visibility audit tool actually does

The best part about building with Claude Code is that you don't need to understand every line of code behind it. You describe what you want, and it builds. Once it's built, you just run it. That makes it accessible even if you're not technical.

AI Visibility Audit Process
AI Visibility Audit Process

Put simply, my AI visibility audit tool does this:

  • Takes a brand name, competitor names, and a list of buyer-intent queries as input through a simple config file.
  • Sends each query to four AI platforms (OpenAI, Anthropic, Gemini, and Perplexity) through their APIs and captures the full response.
  • Each query runs twice per platform. LLMs don't give the same answer every time. You can ask the same question five minutes apart and get different brands recommended. Running it twice helps you spot the difference between a brand that consistently shows up and one that appears randomly.
  • Checks each response for two things. First, was the brand name mentioned anywhere in the answer? That's a recommendation. Second, was a URL from the brand's website used as a source? That's a citation. Both are tracked independently.
  • Outputs a branded Excel report with the overall visibility rate, per-platform breakdown, competitor share of voice, and blindspot queries where the brand never showed up.

You feed it a config. It queries the AI platforms. You get a report. That's the loop.

Why I planned the logic before touching Claude Code

It's tempting to open Claude Code and start prompting right away. I didn't. I spent time thinking through the architecture first. What goes in, what comes out, and how the tool decides what counts as visibility.

Claude Code builds exactly what you tell it to build. If your brief is vague, the output is vague. So I needed clarity on two things before writing a single prompt: what "visibility" means and which queries to track.

Defining what "visibility" actually means

I track two independent signals for every query.

Recommended means the brand name appeared anywhere in the AI response. First in the list, last in the list, doesn't matter. If a buyer sees the brand name, that's visibility. They now know it exists. They might ask a follow-up or Google it later.

Cited means a URL from the brand's website showed up as a source. The AI pulled content from your site to build its answer. You can be recommended without being cited. And your blog post can be a source without your brand being named as a recommendation. That's why I track them separately.

Scoring is binary. The brand appeared or it didn't. I tried a weighted system early on where being the "first recommendation" scored higher than a passing mention. It sounded smart but didn't hold up. A buyer seeing your brand in a list of nine tools still puts you in their consideration set. Scoring that lower than a direct recommendation doesn't reflect how people actually research.

Picking the right queries to track

The tool is only as useful as the queries you feed it. This part matters more than the code.

Don't guess what buyers ask AI platforms. Pull from real data. Ahrefs and Google Search Console give you the keywords people already search for. Sales call transcripts show you the exact questions prospects ask before buying. Reddit threads reveal how people phrase questions when they're comparing tools.

The key is converting those keywords into natural questions. Nobody types "knowledge management software contact centers" into ChatGPT. They type "What is the best knowledge management tool for contact centers?" The format matters because LLMs respond differently to conversational questions than to keyword fragments.

Most of the queries I track fall into patterns like:

  • "Best [category] tool for [use case]"
  • "[Tool X] alternatives"
  • "[Tool X] vs [Tool Y]"
  • "[Category] software for [industry]"
  • "How to solve [problem]"
  • "Best [category] tools for [customer type]"

I aim for 50 to 60 queries per audit. Enough to see patterns without burning through API credits.

How I built it inside Claude Code

Step 1: Write the brief, not the code

I didn't ask Claude Code to "build me an AI audit tool" and hope for the best. I wrote a detailed brief describing everything: the config file structure, how each AI platform should be queried as a separate module, what the detection logic should check for, how errors should be handled, and what the final Excel report should look like.

Then I told it to read the full brief, ask me clarifying questions, and outline its implementation plan before writing a single line of code.

This is the most important step. A clear brief up front saves hours of back-and-forth later. Think of it like briefing a developer. The more specific your requirements, the closer the first build gets to what you actually want.

Step 2: Review what it builds

Claude Code generated the full project. An entry point script, separate modules for each AI platform, the detection logic, and a report generator. Clean folder structure, error handling, logging. All from one prompt.

But I still reviewed every file. Good thing I did. It used Anthropic's most expensive model (Opus) for a simple task that just needed to extract one sentence from a response. I switched it to Haiku, which costs roughly 30 to 50 times less per call and handles the task just fine.

Step 3: Iterate on the definitions

The first version of the tool classified mentions into three tiers: cited, recommended, and mentioned. "Cited" originally meant the brand was positioned as the top recommendation. But that's not what citation actually means.

I went back and redefined it. "Cited" now means a URL from the brand's domain appeared as a source in the AI response. The AI actually pulled from your content. "Recommended" means the brand name appeared in the answer. These are two different signals, and the first version was conflating them.

This is the reality of building with AI. The first output is rarely the final product. You use it, spot what's off, and refine. Claude Code makes iteration fast because you just describe the change in plain language and it rewrites the logic.

What the report looks like

The tool outputs a branded Excel file with three sheets.

Detailed results is the raw data. Every query across all four platforms, every run, laid out row by row. For each response, you can see whether the brand was recommended, whether it was cited as a source, the actual URLs cited, the sentence where the brand appeared, and which competitors showed up in the same response. If the brand didn't appear at all, the row is flagged in red so gaps are obvious at a glance.

Summary is what you'd show a client or a stakeholder. It starts with the AI visibility rate: the percentage of responses where the brand appeared in any form. Below that, it breaks out the recommendation rate and citation rate separately.

Then it shows per-platform performance so you can see if, say, Perplexity cites your content but Gemini ignores you completely. Share of voice compares your brand against competitors across all queries. Strongest queries show where you consistently appear. Blindspot queries, highlighted in red, show where you're invisible.

How to read this report is the third tab in the sheet. It explains what every metric means, how visibility rate is calculated, what the difference between recommended and cited is, and why each query runs twice. I added this because the report will often be opened by someone who wasn't involved in building it. A VP, a client, a teammate. They shouldn't need a walkthrough to understand what they're looking at.

The whole build took two hours. No developer, no sprint cycle, no waiting on engineering capacity. The hard part was never the code. It was deciding what visibility actually means, which queries matter, and how to score it in a way that's honest. Get those decisions right and the tool practically builds itself.

Want me to run a detailed AI visibility audit for your brand? Book a strategy call and we can take it from there.

Usama Khan

Author

AI SEO Strategist for B2B SaaS

7+ years in B2B content marketing and SEO. I help brands rank on Google, get recommended by LLMs, and turn content into pipeline. For agencies, I lead client accounts end to end as a fractional SEO strategist. I also build AI-powered workflows for content teams who want to do more with less. When I’m not in the trenches, I’m probably watching cricket.