How to Audit Your AI Visibility (Step by Step)
An AI visibility audit tells you exactly what major language models say about your brand — and where the gaps are. Here is how to run one yourself.
Before you can improve your AI Visibility, you need to know your starting point. An AI visibility audit is a structured process for discovering exactly what major language models currently say about your brand — how often they cite you, how accurately they describe you, how you compare to competitors, and which technical signals are working or missing. This guide walks through the process step by step.
Why run an AI visibility audit?
The core problem with AI model outputs is that they are invisible by default. Unlike search rankings — where you can simply type your keywords into Google and see where you land — AI model responses are non-deterministic, vary across models and versions, and are not publicly tracked by any dashboard.
This means brands are flying blind. You may assume that because you have a good website and solid SEO, AI models are representing you well. That assumption is frequently wrong. Models can and do:
- Omit your brand entirely from relevant category answers
- Describe your brand with incorrect facts — wrong location, wrong services, wrong founding date
- Mention you in a negative or hedged context ("some users report concerns about...")
- Consistently recommend competitors ahead of you on comparison prompts
- Hallucinate a brand profile that is partly or entirely invented
None of these problems are visible without systematic testing. An audit surfaces them so you can address them with evidence-based remediation rather than guesswork.
Step 1: Build your prompt set
The quality of your audit depends entirely on the quality of your prompt set. A good prompt set covers four categories of question:
Brand recall prompts test what the model knows about your brand specifically. Examples: "What does [brand name] do?", "Who are [brand name]'s typical clients?", "Where is [brand name] based?" These prompts reveal factual accuracy and entity recognition.
Category recommendation prompts test whether the model includes you when recommending providers in your space. Examples: "Which GEO consultancies should I consider?", "What are the best B2B marketing agencies in Switzerland?", "Who are the leading providers of AI visibility services?" These prompts reveal your citation rate and share of voice.
Comparison prompts test how the model positions you relative to named competitors. Examples: "How does [brand] compare to [competitor]?", "What is the difference between [brand] and [competitor] in terms of approach?" These prompts reveal competitive positioning and sentiment.
Purchase-intent prompts simulate the questions a buyer might ask before choosing a vendor. Examples: "Who should I hire for an AI visibility audit?", "Which agency can help me get cited by ChatGPT?", "I need help with GEO — who are the experts?" These prompts are the highest-value category because they capture bottom-of-funnel AI interactions.
Aim for 30 to 50 prompts total across all four categories, tested across at least three models: ChatGPT (GPT-4o), Claude (current version) and Gemini (current version). Add Perplexity if retrieval visibility is a priority for your category.
Step 2: Test with web search disabled — parametric knowledge
Run your full prompt set with the AI model's web search feature explicitly turned off. In ChatGPT, this means ensuring the Browse with Bing toggle is disabled. In Gemini, use the version without grounding. For Claude, which does not have a built-in search feature by default, the standard interface already tests parametric knowledge.
For each response, score four dimensions on a simple scale (for example, 0–2 or 0–5):
- Mention: Is your brand cited in this response? (Binary: yes or no)
- Accuracy: Are the facts stated about your brand correct?
- Sentiment: Is the tone of the description positive, neutral or negative?
- Position: If multiple brands are cited, where does yours appear? (First, middle or last)
Record every response verbatim in a spreadsheet, along with the date, model name and model version. This parametric baseline is your starting point — it represents what the model has absorbed from its training data about your brand, without any live web influence.
Step 3: Test with web search enabled — retrieval layer
Run the same prompt set again with web search turned on in ChatGPT and Gemini (search grounding enabled). Score each response using the same four dimensions.
Now compare your retrieval scores to your parametric scores. The pattern you observe tells you which layer is stronger and where to focus your optimization effort:
- Higher scores with search on than off means your retrieval signals — website content, llms.txt, structured data — are working. The live web is helping models understand and cite you better than training data alone.
- Higher scores with search off than on means your parametric presence (training data layer) is stronger than your current website. This is common for older, established brands whose sites have not been updated for GEO. Priority: improve live web signals.
- Low scores in both modes means the brand has weak signals across both layers. This is the most common finding for newer brands or brands that have never considered GEO. The good news: it means there is room for rapid improvement on both fronts.
- High scores in both modes means the brand has strong AI Visibility. The audit focus shifts to maintaining the lead and watching for competitor improvements.
Step 4: Check technical GEO signals
Alongside the prompt testing, conduct a technical audit of your site's GEO signal infrastructure. The table below lists the key signals to check:
| Signal | What to check | Pass condition |
|---|---|---|
llms.txt | Does the file exist at /llms.txt? Is it accurate and complete? | Present, served as text/plain, under 2,000 words |
| JSON-LD structured data | Is there an Organization or ProfessionalService schema on the homepage? | Valid JSON-LD, correct @type, includes name, url, description, sameAs |
robots.txt | Are AI crawlers (GPTBot, ClaudeBot, PerplexityBot) allowed? | No Disallow rules blocking AI crawlers or /llms.txt |
| Sitemap | Is an XML sitemap present and submitted? | Present at /sitemap.xml, all important URLs included |
| Server-rendered content | Is key content visible in raw HTML (not dependent on JavaScript)? | View-source shows service descriptions, FAQs, key facts |
| Canonical tags | Does every page have a self-referencing canonical URL? | Present on all pages, no duplicate content without canonicalization |
Step 5: Analyze sentiment and competitor positioning
With your prompt responses collected, look for patterns across the full set. Sentiment analysis at this stage is qualitative: read the responses and categorize the language used to describe your brand. Collect the specific adjectives and descriptors the models use — these reveal the narrative each model has constructed about your brand from its training data and retrieval sources.
For competitor positioning, note which brands appear alongside yours in category and comparison prompts, and in what order. If a competitor consistently appears before you across multiple models and prompt types, they have stronger AI Visibility in your category. Study their GEO signals — their llms.txt, their structured data, their third-party references — and identify what they are doing that you are not.
Document all findings in a structured audit report with a score for each dimension, an overall AI Visibility score, a list of the specific signal gaps identified, and a prioritized remediation plan. The audit is the diagnostic — the real work begins with implementation.
Frequently asked questions
Can I run an AI visibility audit myself?
Yes. The core of an AI visibility audit — building a prompt set, running it across ChatGPT, Claude and Gemini, and scoring the responses — requires no specialist tools, just time and a systematic approach. The more challenging parts are building a large, representative prompt set, scoring consistently across dimensions, and interpreting the parametric versus retrieval gap correctly. If you have internal resources, a DIY audit is a good starting point. A professional audit adds value through a larger prompt set, cross-model benchmarking, competitor analysis and structured remediation planning.
How much does a professional AI visibility audit cost?
Professional AI visibility audits vary widely depending on scope. A focused audit covering one brand, three models and 30 to 50 prompts typically ranges from CHF 2,000 to CHF 5,000. A comprehensive audit including multiple product lines, five or more models, 100-plus prompts, competitor benchmarking and a full remediation roadmap can range from CHF 8,000 to CHF 20,000 or more. Ongoing monthly monitoring is typically priced separately as a retainer.
How often should I audit?
Run a full audit at baseline (before any GEO work), then again after 90 days of implementation. After that, a monthly prompt-set check covering your highest-priority prompts is sufficient for most businesses. Run a full re-audit whenever a major model update is released by OpenAI, Anthropic or Google — these can shift your scores without any action on your part. Also audit after any significant change to your own brand signals: a website relaunch, a new llms.txt, or major new press coverage.
What should I do with the audit results?
Audit results should drive a prioritized remediation plan. Start with the technical GEO signals — they are typically fast to fix and have an immediate impact on retrieval-layer performance. Then address factual accuracy issues by correcting inconsistencies across your site, structured data and third-party profiles. Then focus on citation rate improvements through content strategy: FAQ pages, definition content, quotable fact-dense paragraphs. Finally, build toward parametric presence through third-party coverage and entity establishment. Document your baseline scores so you can demonstrate progress.