Home › Glossary › RAG (Retrieval-Augmented Generation)

RAG (Retrieval-Augmented Generation)

RAG is an AI architecture where a language model retrieves relevant documents from a knowledge base before generating a response, dramatically reducing hallucinations.

Definition: Retrieval-Augmented Generation (RAG) is an AI architecture where a large language model first retrieves relevant documents or passages from a knowledge base, then uses those retrieved sources as context when generating a response. The model's output is grounded in real, retrievable source material rather than relying solely on patterns learned during training.

How it works

Standard LLMs generate output based purely on their training data and prompt context, which can lead to hallucinations when the model lacks information. RAG addresses this by giving the model a 'lookup' step: query a vector database, retrieve relevant documents, then generate an answer that cites or quotes those documents. Tools like Perplexity use RAG over the live web; enterprise tools use RAG over private documentation.

Example

A user asks ChatGPT a question about a niche topic. Standard ChatGPT might hallucinate. ChatGPT with web search (a form of RAG) first searches the web, retrieves the top 5 relevant pages, then generates an answer citing those pages. Tools like NotebookLM let users upload their own documents and use RAG over that private corpus.

Comparison + context

RAG vs fine-tuning: Fine-tuning teaches the model new patterns; RAG gives the model new facts at query time. RAG is generally cheaper, more current, and more verifiable. RAG vs context-window stuffing: RAG selects relevant context; context stuffing dumps everything in. RAG scales better.

RAG (Retrieval-Augmented Generation)

How it works

Example

Comparison + context

See also