How ChatGPT Chooses Sources: The Role of RRF and Topical Breadth

If you buy something through a link in our posts, we may get a small share of the sale.

ChatGPT often answers with links. Behind those links lies a retrieval pipeline that attempts to find sources that are both relevant and diverse enough to address the question. Two ideas shape that choice: how to fuse candidate results and how to spread coverage across subtopics.

This article explains a common playbook many modern assistants use. We focus on Reciprocal Rank Fusion (RRF) for combining results and on topical breadth for avoiding one‑note answers. You will walk away knowing what RRF does, why breadth matters, which knobs to turn, and how to apply this in your projects.

Smartphone on peach background displaying the ChatGPT website

Contents

TL;DR
How Web Assistants Typically Pick Sources
RRF or Diversification First: Comparing the Core Trade‑Off
Practical Knobs And Defaults
Examples
- Health Overview Question
- Developer API Troubleshooting
Actionable Steps / Checklist
Glossary
FAQ
Final Thoughts

TL;DR

RRF blends results from multiple searches by summing 1/(k + rank), favoring items that rank well across retrievers.
Topical breadth aims to cover the likely subtopics or intents behind a query, not just the single strongest match.
Utilize RRF to stabilize hybrid search across lexical and vector methods, eliminating the need for fragile score normalization.
Add diversification methods to widen coverage so citations reflect the question’s facets.
In many modern assistants (including ChatGPT Search), the system rewrites queries, retrieves from several angles, fuses rankings, and, when using search, cites a small set of stable, diverse sources.

How Web Assistants Typically Pick Sources

While exact implementations vary, the principles here will help you reason about citations, tune your own RAG stack, and judge answer quality.

The High‑Level Pipeline

A typical pattern for LLM assistants that browse or search looks like this:

Query understanding: The model may rewrite your question into several targeted queries to probe different angles.
Multi‑retrieval: Many systems run lexical search (e.g., BM25), dense vector search, and sometimes specialized filters or metadata retrievers.
Fusion: They combine ranked lists into one unified ranking. RRF is a popular, robust choice.
Diversification: Re‑rank to balance relevance with coverage of distinct subtopics or intents.
Synthesis and attribution: The model reads candidates, drafts an answer, and cites the most useful, trustworthy, and representative sources.

OpenAI’s public help docs confirm that ChatGPT Search rewrites queries, uses the web, and returns answers with citations, but the exact ranking code is not published. Treat the steps above as a well‑established pattern used across industry search and RAG systems.

What Reciprocal Rank Fusion Is

Reciprocal Rank Fusion is a simple way to merge multiple ranked lists. For each document, take its position in each list, convert that position to a reciprocal score 1/(k + rank), and sum across lists.

The constant k controls how quickly contributions decay as rank positions get worse. The effect is intuitive: a page that appears high in several lists floats to the top, even if raw scores are not comparable.

These are reasons why practitioners like it:

Score‑scale agnostic: Lexical and neural retrievers produce scores on different scales. RRF sidesteps score normalization games.
Stable and fast to tune: A single k parameter and optional per‑retriever weights keep it practical in production.
Hybrid by design: It works well to blend BM25, vector search, and learned signals.

What Topical Breadth Means

Topical breadth means the returned set of sources covers the likely subtopics behind your query, not just near‑duplicates of the top hit. In information retrieval, this is called diversification.

Classic methods such as Maximal Marginal Relevance (MMR) penalize redundancy while preserving relevance. Others, like intent‑aware diversification, explicitly model different user intents for ambiguous queries. Breadth matters when:

The query is ambiguous or multi‑intent (jaguar the animal vs the car).
The question has several parts (causes, risks, and treatments).
Freshness or perspectives differ (official docs vs independent evaluations).

Open laptop on a desk showing the “Introducing ChatGPT” webpage

RRF And Breadth Work Together

RRF selects consensus‑strong candidates across retrieval methods. Diversification then pushes the final list to cover subtopics. In practice:

Start with RRF to get a high‑confidence pool of candidates.
Apply a diversification step to avoid redundancy in the top k.
Let the LLM read across that set to produce a balanced answer and pick citations.

RRF or Diversification First: Comparing the Core Trade‑Off

This table compares the RRF fusion step with the Diversification step based on their goals, strengths, and weaknesses to help you choose when to favor one over the other based on the specific search scenario.

Choice	Goal	Strengths	Watchouts	When To Favor It
RRF fusion	Blend multiple retrievers into one stable ranking	Simple; robust across score scales; strong at consensus relevance	Can over‑concentrate near‑duplicates if upstream candidates are similar	You combine BM25 + vector + rules and want a dependable top‑k
Diversification (e.g., MMR)	Increase topical breadth and reduce redundancy	Surfaces have varied facets; better coverage for ambiguous queries	If applied too early, may promote off‑topic items	Your query is multi‑intent, or the answer needs perspectives

Tip: Fuse first with RRF, then diversify the fused top‑N into a final top‑k.

Practical Knobs And Defaults

These are the configurable parameters (knobs) that control the performance of RRF and diversification to help tune your system for optimal search result fusion and quality.

RRF rank constant k: Higher k spreads credit more evenly to lower ranks; lower k concentrates on the very top. Values around k ≈ 60 are common defaults in popular libraries and are a good starting point.
Per‑retriever weights: If one retriever is your workhorse, give it a slightly higher weight in the RRF sum.
Candidate window size: RRF quality improves if each child retriever provides a moderately sized window (for example, 50 to 200) before fusion.
Diversification lambda (MMR): Lambda balances relevance vs novelty. Values around 0.5 are common starting points; tune per domain.
Topical templates: For breadth, generate multiple queries that target definitions, how‑to, comparison, and authoritative sources; retrieve and fuse.

Examples

Below are use cases illustrating how RRF and diversification work in practice for different query types to provide both authoritative and accessible answers.

Health Overview Question

A user asks: What is prediabetes, and how is it treated? The system generates queries for definition, risk factors, lifestyle interventions, and medication options. BM25 returns authoritative guideline pages; a dense retriever finds plain‑language explainers.

RRF fuses them, then diversification ensures the top items include one definition source and one treatment source. The final answer cites a medical guideline and a patient‑friendly explainer, giving both authority and accessibility.

Developer API Troubleshooting

A developer asks: Why is my search relevance worse after switching to vector embeddings? The system runs: BM25 for docs sections, vector search over forum threads, and a site filter for official release notes. RRF elevates pages that appear across methods.

Diversification keeps both a how‑to tuning guide and a release note in the top results. The answer cites an official configuration page and a community post that describes a pitfall, saving the developer time.

Actionable Steps / Checklist

Here is a simple process for implementing a robust search strategy to achieve high-quality, relevant, and diverse search results.

Map your retrievers: Identify which lexical, vector, and rule‑based signals you will combine.
Start with RRF: Use a sensible k (around 60) and gather a window of candidates from each retriever.
Add diversification: Apply MMR or an intent‑aware method to reduce redundancy in the top‑k.
Generate query variants: Produce targeted rewrites to cover definitions, procedures, comparisons, and edge cases.
Calibrate trust: Prefer official documentation or primary sources for facts; mix in high‑quality secondary explainers for clarity.
Log and review: Track which sources are cited, detect over‑reliance on one domain, and adjust weights or diversification lambda.
Test with ambiguous queries: Measure both relevance and diversity using standard metrics before deployment.

Hand holding a phone outdoors with the ChatGPT chat screen visible

Glossary

Take note of these technical terms when implementing or discussing these advanced search techniques.

Reciprocal Rank Fusion (RRF): A method that merges ranked lists by summing 1/(k + rank) across lists, rewarding items that rank well in multiple lists.
Rank Constant (k): The parameter in RRF that controls how fast the contribution decays as the rank gets worse.
Diversification: Re‑ranking to increase topical breadth and reduce redundancy among top results.
Maximal Marginal Relevance (MMR): A classic diversification method that balances relevance with novelty to avoid duplicates.
Hybrid Search: Combining lexical search with vector or semantic search to capture different relevance signals.
Query Rewriting: Generating targeted query variants to probe different facets of the user’s question.
Intent: A plausible subtopic or user goal behind an ambiguous query, such as brand vs animal for the word jaguar.

FAQ

Does ChatGPT always use RRF?

Implementations evolve and are not fully public. RRF is widely used in search and RAG systems because it is simple and robust, so it is a reasonable mental model.

Why not just sort by neural similarity?

Dense scores are not directly comparable to lexical scores. RRF avoids fragile normalization and favors consensus across methods.

How many sources should be cited?

There should be enough sources to support the main claims with authority and breadth. Good answers usually cite one primary source plus one or two complementary references.

Is diversification the same as credibility?

While diversification widens topical coverage, credibility is about source trust. It’s still ideal to use both.

Final Thoughts

When you see citations in an AI answer, imagine a two‑step dance: RRF finds the consensus winners across retrieval methods, and diversification ensures those winners do not all say the same thing. That pairing yields answers that are both on target and well-rounded. Bring the same dance to your own RAG stack, and your users will feel the difference.