Home › Glossary › Context Window

Context Window

The context window is the maximum amount of text (tokens) a language model can consider in a single request. Larger windows allow for more complex inputs and longer documents.

Definition: The maximum amount of text — measured in tokens, where one token ≈ 0.75 English words — that a large language model can process in a single request. The context window includes both the input prompt and the generated output. Once the window is exceeded, older content is truncated or the model errors.

How it works

Context windows have grown dramatically: early GPT models had 2K-token windows (~1,500 words); modern models support 100K-2M+ tokens (75K-1.5M+ words). Larger context allows analyzing long documents, processing entire codebases, or maintaining long conversation histories without losing earlier context. As of 2026, Claude has industry-leading context windows (200K+); GPT-4 Turbo and Gemini 1.5 Pro also offer 128K-1M+.

Example

An analyst uploads a 50-page research report (~30,000 words / ~40,000 tokens) to Claude with a 200K context window. Claude processes the entire document and answers questions citing specific sections. The same task in a model with a 4K context window would require manually splitting the document.

Comparison + context

Larger context window vs RAG: Large context windows let you dump everything in; RAG retrieves only relevant pieces. Both work; large windows are simpler, RAG scales to bigger corpora. Token cost implications: Most APIs charge per token. Larger context = higher per-request cost. Free tiers usually have lower context limits.

Context Window

How it works

Example

Comparison + context

See also