The Buyer Answer Gap Index

What the study measures, where the questions come from, and how answers are scored, written so the findings can be checked and reproduced.

Index methodology · v0.2 · Question set frozen before any site was run · Coverage / Depth grading

B2B buyers now do most of their evaluation before they ever talk to you, on your site, and increasingly by asking an AI. The Buyer Answer Gap is the distance between the questions those buyers ask and the answers they can actually get. This note defines exactly what we test and how we grade it. The question set was fixed before any site was evaluated, and was not changed in response to results.

What this measures, and what it doesn't

We test one thing: can a serious B2B buyer get a good answer to the questions they ask while evaluating a vendor, in two independent places.

On the vendor's own site. Is the answer there, where a buyer would look, and does it actually answer (explain how), or just claim an outcome?
From AI. When a buyer asks an AI assistant the same question, do they get a real answer grounded in the vendor's own content, or a generic one stitched from third-party and competitor sources the vendor doesn't control?

This is a measure of answer availability and quality, not product quality. A vendor can have an excellent product and score poorly here because the answers aren't reachable. The reverse is also possible.

It is explicitly not: a security audit or a substitute for a SOC 2 / CAIQ review; a design, speed, or UX assessment; or a judgment of whether a vendor is "good." Scope is limited to B2B SaaS.

The question set

Buyers evaluate across two kinds of question. We grade them separately on purpose, because they fail for different reasons.

The questions below are shown exactly as the grader asks them, in the plain language a buyer actually uses, not a formal checklist. The grader fills in the specifics for each vendor (a real competitor's name, the tools the buyer runs, the problem they're solving), so the shape is fixed but the wording a buyer sees is concrete.

Value questions: do we keep looking at you at all?

The questions that decide whether a buyer stays interested. There is no industry-standard list for these, so we use Gartner's six B2B "buying jobs" as the spine.

Category	Representative question (in a buyer's words)	Frame
Problems solved	How do you actually solve our specific problem, what happens, specifically, not just the claim that you do?	Gartner: problem ID
Benefits & outcomes	What's the single biggest outcome customers point to, and how long did it take them to get there?	Gartner: selection
How it works	Walk me through how this works day to day for the person using it, what do they do that they didn't before?	Gartner: requirements
ROI / business case	What would we measure to know this is working, and what do customers typically see move?	Gartner: validation

The most personal version of the ROI and outcomes questions, the payback for a company your size, what changes for your team, can't be answered by any static page, so we don't grade them. They are reported as the Personalization finding (§4) instead.

Due-diligence questions: the shortlist and procurement checks

The vetting questions once you're on the shortlist. Each sits behind a published standard, the Cloud Security Alliance's CAIQ, the Vendor Security Alliance's VSA, the Shared Assessments SIG, and common RFP practice.

Category	Representative question (in a buyer's words)	Source
Security	Are you enterprise-security serious, SOC 2, the basics, or is security going to be a problem when I bring you to my team?	CAIQ / VSA
Data & privacy	Where does our data actually live, and is that going to be a problem for us (EU, regulated industry)?	CAIQ / VSA
Integrations	Will you actually work with the tools we already run, or is connecting you going to be a project?	RFP practice
Implementation	Is this going to be a heavy lift to get running, or can my team get value without a big project?	RFP practice
Support	If something breaks or my team gets stuck, can we get real help, or are we on our own?	RFP practice
Scale	Has this actually worked for companies like us, or would we be the ones figuring it out?	RFP practice
Differentiation	We're also looking at a competitor, why you over them, honestly, and where are they actually better?	Gartner: validation
Searches buyers run	For the real "X vs Y" and "X reviews" searches buyers run about you, is your own answer anywhere they'll find it?	Search-intent
Vendor viability	Who's actually using you, and are you established enough that this isn't a risky bet?	Shared Assessments SIG

The full library holds roughly 90 questions across these categories; the snapshot samples one representative question per category. That coverage asymmetry is itself a finding, not a flaw: the diligence questions each sit behind a published standard, while the value questions have none, and the questions with no standard are the ones vendors most often can't answer.

How the questions were chosen

To remove hand-picking, each category's representative question follows one stated rule:

It is the question a typical B2B buyer asks first in that category, the obvious, first-order question, not the most obscure or the most damaging, fixed before any site was run.

No question was added, removed, or reworded in response to any individual site's results, and none was selected for being one that sites tend to fail. Value questions are phrased to require the mechanism or specifics rather than a yes/no, because a buyer needs to know how, not just whether.

How answers are graded

The site and AI checks are scored separately and never blended, a vendor can answer well on its own site but be invisible to AI, or the reverse. We report one grade per axis, Coverage, and then a set of findings that show whether those answers are any good, whose they are, and the deciding ones no page can answer.

Matching by meaning, not keywords

We don't keyword-match. The grader reads your page text and a language model judges, by meaning, whether that content answers the question. So a conversational question like "is this a heavy lift to get running?" is matched against whatever you say about implementation, onboarding, and time-to-value, even if none of those exact words appear. The same judgment also decides whether an answer explains how it works or only claims an outcome, which drives the findings below.

The grade: Coverage

Coverage asks the simplest question: can a buyer get an answer at all? A real answer or a claim both count; only a missing answer scores zero. It's rolled up to a percentage and banded:

A ≥ 85 B ≥ 70 C ≥ 50 D ≥ 30 F < 30

We grade Coverage and nothing else, for a simple reason: it's the one measure that actually separates sites. The things that matter just as much, whether an answer explains anything, and whether AI's answer is even yours, come back low for almost everyone, so a letter there would tell you little. Those we report as findings instead.

The findings: is the answer any good, and is it yours?

Claim vs. explain (your site). Of the answers a buyer can find, how many explain how the product works, versus just claim an outcome ("we lower costs," "we're more secure")? Explaining the mechanism is the bar, and the single most important quality signal. Almost every site claims more than it explains, and unlike the deciding questions below, this one is fixable.
Whose answer, and how good (AI). When a buyer asks AI, two separate things matter, and we report both: grounding, does the answer come from your own pages, or is it stitched from third parties and competitors you don't control? And depth, is it a real, specific answer, or a generic gist?

Personalization, a finding, not a grade. Some questions can only be answered with the individual buyer's numbers and context, the payback for a company their size, what changes for their team, their implementation path. No static page can answer these for each buyer, and AI can't either without them. We do not grade these, penalizing a website for not personalizing would be penalizing it for being a website. Instead we report them as a finding: the deciding questions a page structurally cannot answer. Where a site offers an interactive tool (an ROI calculator, for example), we credit it inside this finding and note its limits, one fixed question, no follow-up, and invisible to AI.

How we read the site

We fetch your pages and read the rendered content, what a browser actually shows, not the raw markup, so a modern JavaScript-built site is read for what's really on it, not mistaken for empty. We read your top-level pages and follow links one level deeper into the sections buyers check most, customers, security, integrations, docs, so the grade reflects your whole site, not just the homepage. If a page is genuinely blocked by bot protection, that category is marked "couldn't assess"; if the crawler reaches only a few pages, the site is marked "couldn't fully assess" and we hold back a grade, because it would swing run to run.

The free grader and the full study read your site the same way; they differ in how many questions they ask. The instant grader samples one question per category, twelve in all. It leaves out the "searches buyers run" question, because that one needs a real competitor name and search query researched for each vendor. The full study includes it, for thirteen graded categories, and the deep-dive library holds ~90 questions in total.

Sample and limits

Sample. 130 US B2B SaaS vendors, all in the HubSpot ecosystem, graded on the 13 frozen questions across the two axes. 108 graded cleanly; sites blocked by bot protection or reached too thinly were held back rather than failed. Mid-market and enterprise are reported as two labeled cohorts, never blended. Run on June 26, 2026.
Snapshot, not audit. The published grade samples one question per category; the full evaluation asks ~90. A snapshot can shift a band run to run.
Crawl limits. Sites that block automated crawlers receive "couldn't assess" on the site axis, and sites the crawler reaches only thinly receive "couldn't fully assess." In both cases the AI axis still computes; these sites are excluded from the aggregate on the site axis.
AI variance. AI answers can vary between runs, so AI-side grades are labeled snapshots.
Generalizability. Findings apply to B2B SaaS and should not be read as describing software buying generally.

Conflict of interest

SlateCX builds a product premised on the existence of this gap, so we have an interest in the gap being real. We state that plainly, because it's the reason the controls above exist. To keep this honest:

the question set and its sources are frozen before any run;
each question is derived by a stated rule, not selected to taste;
no headline number is chosen in advance;
the grader is free and the method is reproducible, anyone can re-run any site and check our work.

If this becomes a recurring index, the question set and grading are held constant between editions; any change is versioned and disclosed here, so movement in the number reflects the market, not the method.