LLM Context Windows: Managing Tokens in Production AI Apps
The Token Budget Problem Claude claude-sonnet-4-6 has a 200k token context window. GPT-4o has 128k. These sound enormous until you're building a RAG application that needs to pass document context,...

Source: DEV Community
The Token Budget Problem Claude claude-sonnet-4-6 has a 200k token context window. GPT-4o has 128k. These sound enormous until you're building a RAG application that needs to pass document context, conversation history, system prompts, and tool definitions simultaneously. Running out of context window mid-conversation is an unrecoverable failure. Managing it is an engineering discipline. Counting Tokens import Anthropic from '@anthropic-ai/sdk'; import { encoding_for_model } from 'tiktoken'; // for OpenAI // Anthropic: use the API's token counting endpoint const anthropic = new Anthropic(); async function countTokens(messages: Anthropic.MessageParam[]) { const response = await anthropic.messages.countTokens({ model: 'claude-sonnet-4-6', messages, system: 'You are a helpful assistant.', }); return response.input_tokens; } // OpenAI: use tiktoken locally (no API call needed) function countOpenAITokens(text: string, model = 'gpt-4o'): number { const enc = encoding_for_model(model); const