Methodology
How AIWorthy Designs Its Research
Every finding we publish is only as credible as the questions we asked to produce it. This page explains the principles behind how we designed our research prompts, why we made the specific choices we did, and what those choices mean for the reliability and usefulness of our data.
AI visibility research faces a challenge that traditional brand monitoring does not: AI platforms are non-deterministic. Ask the same question twice and you may receive a different answer. Ask the question in a slightly different way and you may receive a fundamentally different response. This means that prompt design — the specific wording, framing, and sequencing of the questions sent to AI platforms — determines the quality of the data you collect as much as any other methodological choice.
Our 28-prompt research library was designed, tested, and refined around three objectives: accuracy of data capture, consistency across platforms, and the ability to produce findings that are genuinely useful to the companies we study and the audiences who read our research.
Platform Coverage
| Platform | Access Method | In Score |
|---|---|---|
| ChatGPT (GPT-4o) | OpenAI API | Yes |
| Claude (Sonnet) | Anthropic API | Yes |
| Perplexity Sonar | Perplexity API | Yes |
| Gemini | Google Gemini API | Yes |
| Google AI Overviews | SerpApi Developer Plan | Yes |
| Google AI Mode | SerpApi Developer Plan | Pending — browser-only, no API endpoint as of June 2026 |
Data for ChatGPT, Claude, Perplexity, and Gemini is collected via direct API endpoints. Google AI Overviews and Google AI Mode are accessed via SerpApi. Google AI Mode does not currently expose a programmatic API endpoint and is tracked via Google AI Overviews as the closest available proxy. Direct Google AI Mode access will be added to the methodology when Google makes it available.
Principle 1: Prompts Must Reflect How Real People Actually Use AI
The most common methodological error in AI visibility research is using prompts that sound like research instruments rather than genuine user queries. A prompt like "Does [Company Name] appear in AI responses?" tells you nothing about real-world visibility, because real buyers do not query AI that way.
Our prompts are written to mirror the language a professional buyer actually uses when consulting an AI platform. "I'm a mid-sized company in the Research Triangle looking for a locally-based engineering partner — which firms in the region are most respected?" is the kind of question real buyers are asking. That framing produces data about real-world visibility, not about how AI responds to research-style queries.
This distinction matters for accuracy. When prompts sound like genuine user queries, AI platforms respond with the same level of engagement and the same citation behavior they would for an actual user. When prompts sound like monitoring queries, some platforms produce hedged, structured responses that do not reflect how they characterize companies in real conversations.
What this means for our data: Our visibility findings reflect how companies appear when actual buyers consult AI — not how they appear when a monitoring tool scrapes for brand mentions. Those are measurably different things, and we have designed our methodology to capture the former.
Principle 2: Cross-Platform Consistency Requires Deliberate Framing Choices
The five AI platforms we track — ChatGPT, Claude, Perplexity, Google AI Overviews, and Gemini — have different underlying architectures. Perplexity is retrieval-based, meaning it actively searches the web when responding. ChatGPT and Claude draw primarily from training data with varying degrees of retrieval augmentation. Gemini incorporates Google's search infrastructure. These differences mean that identical prompts can produce fundamentally different types of responses on different platforms — not because visibility differs, but because the platforms are answering different versions of the question.
We designed each prompt to produce comparable response types across all five platforms. This required several specific choices:
- We removed recommendation-language framing from category prompts, because Perplexity interprets this as an instruction to search for recommendation lists rather than drawing on its own knowledge. This caused platform divergence that reflected query interpretation, not actual visibility differences.
- We removed meta-framing from industry prompts because this caused Claude and GPT to respond differently than Perplexity, which treats the meta-frame as a retrieval instruction rather than a reflective question.
- We removed leading questions from sentiment prompts because this framing produces near-uniformly positive responses across all platforms, eliminating the meaningful sentiment variation that makes cross-company comparison possible.
- We restricted knowledge recency prompts to platforms that can meaningfully answer them — Claude and GPT — and excluded retrieval-based platforms where the question produces responses that reflect current web content rather than training data vintage.
What this means for our data: When our research shows that Company A scores higher than Company B on a given platform, that finding reflects a genuine difference in how that platform characterizes the two companies — not an artifact of how a prompt was interpreted differently across platforms.
Principle 3: Different Question Types Capture Different Dimensions of Visibility
AI visibility is not a single thing. A company can be well-known to AI platforms without being recommended. It can be frequently mentioned without being accurately described. It can appear in direct brand queries while being absent from the category recommendation queries that actual buyers use most. Our prompt library is structured around eight distinct categories precisely because each captures a different and independent dimension of visibility.
Category 1 — Brand Recognition
Tests whether AI platforms know a company exists and can characterize it accurately. A company that fails here is AI-invisible at the most fundamental level.
Category 2 — Competitive Context
Tests how AI positions a company relative to others in its market. This is the dimension most directly relevant to buyer decision-making, because buyers consult AI to evaluate options, not just to learn about individual companies.
Category 3 — Sentiment
Tests how AI characterizes a company's reputation — not just whether it is mentioned, but whether the characterization is positive, neutral, mixed, or negative.
Category 4 — Category Leadership
Tests whether AI describes a company as a leader, as innovative, or as established in its field. This dimension captures the narrative frame AI applies to a company, which shapes buyer perception even when the underlying facts are the same.
Category 5 — Citation and Sourcing
Tests what sources AI draws on when discussing a company. Citation behavior is the most technically diagnostic dimension — it reveals whether a company has the third-party content infrastructure that AI platforms use to form and justify their characterizations.
Category 6 — AI First Impression
Tests what AI says about a company in a cold, unprompted query — the most authentic representation of the AI-defined brand. This dimension also captures knowledge recency: whether AI's information is current or significantly out of date.
Category 7 — Research Products
Specialized prompts that power our named research products and deeper client deliverables, including the David vs. Goliath Index, the News Event Impact Study, and the Sentiment Divergence Index.
Category 8 — Visibility Baseline
Tests the floor of AI visibility — binary awareness and recommendation confidence. A company that scores zero here is AI-invisible in the most commercially significant sense: AI would not recommend it even when directly asked.
What this means for our data: AI visibility is multidimensional — a company can be frequently mentioned yet rarely recommended, or accurately described on one platform and confused with another elsewhere. Our research captures all eight dimensions. The Month 1 baseline leads with Mention Rate — how often each company surfaces when buyers ask AI about their category — because it is the most directly comparable measure across our full panel. The complete weighted AI Visibility Score, which incorporates position, sentiment, citation behavior, and accuracy alongside mention frequency, rolls out as the Index matures and each component clears the same verification standard we hold for everything we publish.
Where Category Prompts Work — and Where They Don't
Our category prompts ask AI which companies it recommends in a given field — 'the most respected engineering firms in the Research Triangle,' for example. This works well when the field maps to a category buyers actually name. It works less well when an industry label spans companies that don't compete in a single category. 'B2B software,' for instance, returns large enterprise platforms rather than the specialized mid-market firms a buyer would find by searching for their actual need — payments, compliance, or data infrastructure specifically.
We discovered this during testing: the Engineering cohort's original descriptor returned national technology companies until we narrowed it to civil and infrastructure engineering. The same pattern affects our B2B Software and Professional Advisory cohorts, where a single category label cannot capture genuinely distinct sub-markets. For those cohorts, category-level position data is under methodological review, and we are refining the descriptors before incorporating that data into published scores. This is one reason our Month 1 baseline leads with Mention Rate, which does not depend on category-level positioning.
When We Exclude a Company From Published Results
Occasionally a company shares its name with a larger, unrelated brand, and AI platforms conflate the two. When this happens, a visibility score would measure the wrong company rather than the panel member. In our Month 1 research, one Life Sciences panel company — a specialized ophthalmology research organization — shares a name and domain with a national legal-services firm, and several AI platforms described the legal company when asked about it. Rather than publish a score we know is partly measuring a different business, we have excluded this company from Month 1 published results and noted it here. Its data will be published once the brand-disambiguation issue can be cleanly resolved in our prompt design.
Principle 4: Accuracy Scoring Requires Verifiable Facts, Not Inference
One of our five scoring dimensions is accuracy — whether what AI says about a company is factually correct. This is more difficult to score rigorously than the other dimensions, and we are specific about how we do it.
Our accuracy scoring uses a dedicated supplementary prompt (S01) that instructs AI platforms to state specific, verifiable facts about a company: founding year, primary services, headquarters location, and approximate employee count. These responses are then checked against the company's own website and public records. We score accuracy at three levels: Accurate, Partially Accurate, and Inaccurate.
We are explicit that accuracy scoring is researcher-evaluated, not algorithmically determined. A human reviewer compares AI-stated facts against verifiable sources. This is methodologically stronger than automated fact-checking, which introduces its own error rates, and it is honest about the fact that accuracy assessment requires judgment.
For the public monthly report, accuracy is scored by sampling — not every company is comprehensively fact-checked every month. For paid client engagements, accuracy scoring is comprehensive. This distinction is disclosed in our methodology note on every report.
Principle 5: Citation Capture Requires Platform-Specific Methodology
Citation behavior varies fundamentally across AI platforms. Perplexity and Google AI Overviews return explicit URLs when they cite sources. Claude and ChatGPT describe the types of sources they draw on without providing URLs. Gemini's citation behavior varies by query type. Treating these as equivalent would produce misleading citation data.
Our citation methodology accounts for this by using two different approaches. For retrieval-based platforms (Perplexity and SerpApi/Google AI Overviews), we run a supplementary citation maximizer prompt (S02) specifically designed to elicit the maximum number of source URLs. For training-data-primary platforms (Claude and ChatGPT), we score citation behavior as a binary — did the platform reference a named source type, yes or no — rather than attempting to capture URLs that do not exist in those responses.
When we say a company has strong citation coverage, we mean it is explicitly cited by Perplexity and Google AI Overviews with traceable URLs — which is the most commercially significant form of citation, because it is the form that drives referral traffic.
What Our Methodology Deliberately Does Not Do
Several common approaches in AI brand monitoring were considered and rejected for specific methodological reasons.
- We do not run single prompts and report results as findings. A single prompt response can be an outlier. Our eight-category structure across five platforms produces a minimum of 40 data points per company per monthly run, which smooths individual response variance.
- We do not rely on a single platform as a proxy for AI visibility generally. The five platforms we track disagree significantly on which companies they recommend, how they characterize them, and what sources they cite. Single-platform data systematically misrepresents the actual AI visibility landscape.
- We do not include companies that have been in major news within the prior 30 days in that month's scored run. News events create temporary AI visibility spikes that are not representative of underlying performance and would distort longitudinal trend data.
- We do not accept payment for panel inclusion, and panel companies cannot purchase higher scores or more favorable characterizations. Our editorial independence is the source of our research credibility, and we protect it structurally.
The Standard We Hold Ourselves To
Every methodological choice we make is documented. Every prompt is version-controlled. Every scoring decision has a defined rubric. When we change our methodology — to add a platform, refine a prompt, or adjust a scoring weight — we disclose the change and its rationale in the methodology note of the report where the change takes effect.
We hold ourselves to this standard because AI visibility research is genuinely new territory. There is no established industry standard for how to measure it, what to count, or how to score it. In the absence of that standard, the quality of methodology is the only basis on which readers, journalists, and clients can assess whether to trust our findings. We believe our approach is the most rigorous currently available for mid-market company research, and we are committed to being specific about both what it measures and what it does not.
Questions and Corrections
We welcome methodological scrutiny. If you have questions about our research design, believe a finding contains a factual error, or want to understand how a specific score was calculated, contact us at hharreld@aiworthy.ai.
Companies in our research panel may submit factual corrections for consideration. They may not request removal from the panel or changes to scores.
Methodology Change Log
| Version | Date | Changes |
|---|---|---|
| v1.2 | July 2026 | Month 1 baseline established. Mention Rate adopted as the lead published metric; the full weighted AI Visibility Score is sequenced to a later release as each component completes verification. B2B Software and Professional Advisory category-prompt descriptors placed under methodological review (category responses returned companies outside the intended competitive set). One Life Sciences company excluded from published results for brand-disambiguation. Panel: 48 companies tracked, 47 published this month. |
| v1.1 | June 1, 2026 | Pilot test refinements: Engineering cohort industry descriptor revised; Avg Position Score formula updated to partial credit; Google AI Mode confirmed browser-only; Google AI Overviews no-panel behavior documented for company-specific prompts |
| v1.0 | May 2026 | Initial published methodology |
Version 1.2 · July 2026