Deep Dive January 25, 2026 10 min read

How ChatGPT Decides What to Cite

Ever wonder why AI mentions some companies but not others? Here's what we know about how LLMs choose their sources.

Key insight: ChatGPT doesn't rank pages like Google does. It draws from training data and real-time search to synthesize answers. Understanding how it selects sources is the first step to getting cited.

When you ask ChatGPT a question, it doesn't search the web in real-time (usually). It draws from its training data plus, in some modes, web search results. So how does it decide what to cite?

Training Data Influence

LLMs are trained on massive text datasets. Content that appears frequently, in authoritative contexts, gets weighted more heavily. This includes:

  • Wikipedia and reference sites
  • News publications
  • Academic papers
  • Popular documentation

Training data is the foundation. If your company is mentioned in authoritative contexts — Wikipedia, news, academic papers — you're more likely to appear in AI answers, even without real-time search.

Real-Time Search Integration

When AI assistants do search the web, they look for:

  • Relevance: Does the content directly answer the query?
  • Recency: Is the information current?
  • Authority signals: Is there clear authorship and sourcing?
  • Extractability: Can they pull a clean quote?

The Citation Threshold

AI doesn't cite everything it knows. It cites when:

  1. The claim is specific and verifiable
  2. The source is clearly identifiable
  3. The information adds credibility to the answer

This is why structured data and citation signals matter so much. You're not just helping AI find your content—you're making it easy to cite.

What You Can Control

You can't control what's in an LLM's training data. But you can control:

Can't control

  • What's in the LLM's training data
  • How the model weighs different sources
  • Which queries users ask

Can control

  • Structured data on your pages
  • Clear author attribution
  • Publication dates and canonical URLs
  • Extractable content (FAQs, definitions)
  • AI crawler access via robots.txt

These are the levers of GEO.

See how your site measures up. Run a free AI visibility scan to check your citation signals, structured data, and crawler access. 30 seconds, no signup.