We just switched GEO tools at Flywheel. That decision sent me down a rabbit hole on a question more marketers are asking me lately: can you actually measure your visibility inside ChatGPT, Gemini, and Claude the way you measure rankings in Google?
Short answer: yes, but not the way you measure SEO. LLM visibility is real and trackable. The data is just thinner, the framework is different, and it’s easy to fool yourself into reporting wins that aren’t there.
Here’s how I think about measuring GEO, where it breaks from SEO, and the one number that tells you whether any of it is working.
Key Takeaways
- GEO and SEO are measured in opposite directions. SEO is top-down, where tools see almost the whole world of search. GEO is bottom-up, where tools infer a small, growing slice of it.
- You can only see a fraction of where you’re cited in AI, and that probably won’t change. AI platforms don’t share their query data by design, so you track a subset and infer the rest.
- Probability doesn’t mean unmeasurable. Most tools run a set of prompts daily and report how often you appear. That’s a valid signal, just a noisier one than a Google ranking.
- Your tool choice matters more than it did in SEO. The crawlers are young and uneven, and the most mature tool tends to win. That’s why we moved from engines.so to Peec.
- Visibility is a leading indicator, not the goal. The real scoreboard is LLM-driven traffic and conversions. Everything else is a proxy for getting there.
Can You Even Measure GEO?
There’s a loud version of this debate, and SparkToro’s Rand Fishkin is at the center of it. In research published in early 2026 with Gumshoe.ai, he ran close to 3,000 prompts across ChatGPT, Claude, and Google’s AI and found that the same question rarely returns the same list of brands twice. The takeaway a lot of people ran with was simple: AI visibility is random, so don’t bother tracking it.
I think that overstates it. Rand gets pretty emotional about this stuff, and the conclusion he points people toward is too pessimistic.
Probability is still something you can measure. If a result only shows up some of the time, you measure how often it shows up. Most GEO tools already do exactly this. They run a fixed set of prompts every day and report the percentage of the time your brand appears.
That’s not far off from how Ahrefs estimates search volume. You sample, you repeat, you infer. The randomness is something to account for, not a wall.
So the honest position sits between the hype and the doom. You can measure GEO. You just can’t measure it like SEO, and pretending otherwise is how you end up with garbage reports.
Top-Down vs Bottom-Up Measurement
The cleanest way I’ve found to explain the difference is direction. SEO measurement works from the top down. GEO measurement works from the bottom up. That single distinction explains almost every other gap between the two.
How SEO Measurement Works
In SEO, you start with a near-complete picture of the entire world of search.
Google Search Console shows you everywhere your brand already appears, including queries you barely rank for. Tools like Ahrefs sit on top of close to the full universe of things people search for.
Because search is long-tail and the crawlers are mature, those tools estimate volume with real confidence. Ahrefs has been estimating search volume independently from Search Console for years, built on a huge pile of data and close keyword variants.
So you work from the total set and draw conclusions down from it. You know roughly what’s searched, how often, and where you stand. This is also the foundation for understanding how GEO and SEO differ as disciplines, not just as measurement problems.
How GEO Measurement Works
GEO flips that. You start with almost nothing and build up.
There’s no Search Console for AI. The platforms don’t hand over how people are prompting them, and that’s by design. The crawlers are newer and less mature, so they infer demand instead of reporting it.
Take volume. Peec gives you an estimate of prompt volume, but it’s openly beta. You get a band like “very low” to “very high,” not “9,200 searches a month.” That’s the honest state of the data right now.
On top of that, people use AI completely differently than search. They type much longer queries. Those queries live inside conversations, with context you never get to see. There’s a whole world of prompts where you’d want to be cited, and you can only ever track a small subset of it.
Why You Only See a Small Slice of GEO
This is the part most people underestimate. In SEO you can see most of the field. In GEO you’re looking through a keyhole, and the room on the other side is enormous.
That smaller window changes how you work. You can’t measure everything, so you have to make smart bets about what’s worth measuring in the first place.
It also means your reporting needs humility built in. A 10-point jump in visibility might be real, or it might just mean you happened to track the slice of prompts where you were always going to look good.
I don’t think this keyhole problem fully goes away, even as the tools improve. The data scarcity is structural, not a temporary gap. The skill isn’t waiting for perfect data. It’s getting good at working with partial data.
What to Get Right When You Measure GEO
Because you’re working with a small, noisy window, the decisions around your measurement matter more than the dashboard itself. Three choices do most of the damage, or most of the good.
Decide Which Questions Matter
You can’t track every prompt, so you have to choose. That means making a real call on which questions matter to your business and inferring which ones will have meaningful volume.
This is judgment work, not a setting you toggle on. Get the prompt set wrong and every number downstream is measuring the wrong thing.
Choose the Most Mature Tool
In SEO, tool choice was mostly preference. In GEO, it’s a genuine fork in the road, because the crawlers are young and some collect far more reliable data than others.
What I’m seeing generally is that the most mature tool wins. We recently moved from engines.so to Peec because it had a more mature measurement framework and a more reliable, better-built crawler. When the underlying data is shaky, the quality of the tool’s collection method is the whole ballgame.
Draw Conclusions Carefully
This is where good teams still trip. It’s very easy to look at a chart and announce “our visibility is improving here,” when all you’ve really done is watch a favorable corner of the world.
If you’re tracking the wrong slice, you can’t draw a valid conclusion from it. Before you report a win, ask whether the improvement reflects the broader picture or just the spot you happened to point your tools at. That same discipline carries over to the work of actually getting cited in ChatGPT and Perplexity, where it’s tempting to celebrate movement that doesn’t mean much.
Visibility Is a Leading Indicator, Not the Goal
Visibility is not the scoreboard. It’s a leading indicator of the scoreboard.
Think about why you ever did the keyword research and the volume estimates in SEO. It was never about the rankings themselves. It was about more organic traffic and more conversions.
GEO is the same. Ultimately you want more LLM-driven traffic and more conversions from it. That’s measurable, it’s objective, and it reflects your actual performance inside these models.
Everything in the visibility layer, like appearance rate and share of voice and prompt coverage, is a subset of that goal or a step toward it. Useful, but not the point. If your visibility is climbing and your LLM traffic and conversions aren’t, your measurement is telling you a story that doesn’t end in revenue.
That’s the lens I’d bring to any GEO report. Track the visibility signals so you can learn and react quickly, but judge the program on traffic and conversions.
Frequently Asked Questions
A few questions I get asked a lot about measuring GEO and LLM visibility.
What is LLM visibility?
LLM visibility is how often and how prominently your brand appears in answers from AI tools like ChatGPT, Gemini, Perplexity, and Claude. It’s the GEO equivalent of search rankings, measured as an appearance rate across a set of prompts rather than a fixed position.
Can you actually measure GEO performance?
Yes. Tools run a set of prompts repeatedly and report how often your brand shows up, which turns AI’s randomness into a trackable percentage. The data is less complete than SEO data, so treat it as a strong signal rather than a precise count.
How is measuring GEO different from measuring SEO?
SEO is top-down, where tools like Google Search Console and Ahrefs see almost the entire world of search and estimate volume with confidence. GEO is bottom-up, where there’s no equivalent data source, so tools infer a small slice of prompt demand and build up from there.
Why can’t GEO tools give exact search volumes?
AI platforms don’t share how people prompt them, and prompts are longer and buried inside conversations. So tools estimate demand in broad bands, like low to high, rather than precise monthly numbers.
What is the best GEO tracking tool?
The most mature tool with the most reliable crawler tends to give the most trustworthy data, which is why we use Peec at Flywheel. The right pick depends on your needs, but prioritize collection quality over dashboard features when the underlying data is this young.
What metric should I actually report for GEO?
Track visibility signals like appearance rate and share of voice to learn and adjust, but judge success on LLM-driven traffic and conversions. Visibility is the leading indicator; traffic and conversions are the outcome that matters.