How ChatGPT Decides What to Cite
Ever wonder why AI mentions some companies but not others? Here's what we know about how LLMs choose their sources.
Key insight: ChatGPT doesn't rank pages like Google does. It draws from training data and real-time search to synthesize answers. Understanding how it selects sources is the first step to getting cited.
When you ask ChatGPT a question, it doesn't search the web in real-time (usually). It draws from its training data plus, in some modes, web search results. So how does it decide what to cite?
Training Data Influence
LLMs are trained on massive text datasets. Content that appears frequently, in authoritative contexts, gets weighted more heavily. This includes:
- Wikipedia and reference sites
- News publications
- Academic papers
- Popular documentation
Training data is the foundation. If your company is mentioned in authoritative contexts — Wikipedia, news, academic papers — you're more likely to appear in AI answers, even without real-time search.
Real-Time Search Integration
When AI assistants do search the web, they look for:
- Relevance: Does the content directly answer the query?
- Recency: Is the information current?
- Authority signals: Is there clear authorship and sourcing?
- Extractability: Can they pull a clean quote?
The Citation Threshold
AI doesn't cite everything it knows. It cites when:
- The claim is specific and verifiable
- The source is clearly identifiable
- The information adds credibility to the answer
This is why structured data and citation signals matter so much. You're not just helping AI find your content—you're making it easy to cite.
What You Can Control
You can't control what's in an LLM's training data. But you can control:
Can't control
- What's in the LLM's training data
- How the model weighs different sources
- Which queries users ask
Can control
- Structured data on your pages
- Clear author attribution
- Publication dates and canonical URLs
- Extractable content (FAQs, definitions)
- AI crawler access via robots.txt
These are the levers of GEO.
See how your site measures up. Run a free AI visibility scan to check your citation signals, structured data, and crawler access. 30 seconds, no signup.