Home › Glossary › RAG (Retrieval-Augmented Generation)
RAG is an AI architecture where a language model retrieves relevant documents from a knowledge base before generating a response, dramatically reducing hallucinations.
Standard LLMs generate output based purely on their training data and prompt context, which can lead to hallucinations when the model lacks information. RAG addresses this by giving the model a 'lookup' step: query a vector database, retrieve relevant documents, then generate an answer that cites or quotes those documents. Tools like Perplexity use RAG over the live web; enterprise tools use RAG over private documentation.
A user asks ChatGPT a question about a niche topic. Standard ChatGPT might hallucinate. ChatGPT with web search (a form of RAG) first searches the web, retrieves the top 5 relevant pages, then generates an answer citing those pages. Tools like NotebookLM let users upload their own documents and use RAG over that private corpus.
RAG vs fine-tuning: Fine-tuning teaches the model new patterns; RAG gives the model new facts at query time. RAG is generally cheaper, more current, and more verifiable. RAG vs context-window stuffing: RAG selects relevant context; context stuffing dumps everything in. RAG scales better.