Prompt and context
Prompt and context are fundamental concepts in prompt engineering for large language models (LLM).
Context is a critically important component of a prompt, determining the LLM's ability to generate accurate, relevant, and useful responses. Effective prompt engineering largely consists of the art of collecting, filtering, structuring, and presenting the right context to the model at the right time, while overcoming the limitations of modern LLMs.
Definition of a Prompt
A prompt is the complete set of input data provided to an LLM to generate a response. It is not just a question or a command, but a structured text that can include:
- Instructions: Direct guidance for the model on what to do, in what format, style, or tone.
- Primary Query: The user's direct question or a description of the main task.
- Context: Additional information needed to correctly fulfill the request.
- Examples (Few-shot examples): A demonstration of the desired format or style of response for similar tasks.
The quality and completeness of the prompt directly determine the relevance, accuracy, and usefulness of the LLM's response.
Definition of Context
Context within prompt engineering is any information inside the prompt that helps the model better understand the task, the specifics of the situation, or the user's expectations, but is not part of its original training data. Context provides the situational details necessary to generate an appropriate response.
Context can include:
- The history of the previous conversation (in chatbots).
- Data retrieved from external sources (documents, databases, web pages) — the basis of RAG.
- Information about the user (profile, preferences).
- Specific details of the current task or situation.
- Examples of how similar tasks were completed (as part of few-shot prompting).
It is important to distinguish the context provided in the prompt from the general knowledge the model acquired during its pre-training. Prompt engineering focuses on effectively providing precisely this situational context.
Relationship and Influence
Prompt and context are fundamental concepts in the operation of transformers, defining how the model perceives input data and what it uses as a basis for generating output. Architecturally, a transformer does not specifically distinguish between "prompt" and "context"—both are parts of the input token sequence that are converted into embeddings, enriched with positional information, and processed collectively by self-attention layers.
Prompt and context are inextricably linked: context is an integral part of the prompt.
- Context shapes the prompt: The engineer selects and structures relevant context for inclusion in the prompt.
- Context guides the model: The provided information allows the LLM to narrow down the space of possible answers, focus on relevant aspects, and avoid hallucinations.
- The quality of context determines the quality of the response: Insufficient, irrelevant, or contradictory context leads to inaccurate, generic, or erroneous responses. Accurate and complete context increases the specificity and usefulness of the generation.
- Influence on instruction interpretation: Context can clarify or modify the interpretation of general instructions in the prompt.
The effectiveness of a prompt largely depends on how successfully the engineer has managed to collect, filter, and present relevant context to the model.
Types of Context
Context can be classified according to various criteria:
By Source:
- User-Provided: Explicit input from the user, their question, or task description.
- From Dialogue History: Previous messages from the user and the assistant (short-term memory).
- Retrieved: Data from external sources (documents, DBs, web) using RAG.
- From Profile/Knowledge Base: Long-term information about the user or domain.
- Static/Instructional: Information embedded by the engineer into the prompt template (instructions, examples, role definitions).
By Dynamism:
- Static Context: The unchanging part of the prompt (instructions, definitions, examples). It defines the general task.
- Dynamic Context: Information that changes from one request to another (user data, RAG results, current time). It provides specific details.
By Storage Duration (Memory):
- Short-Term Context: The history of the current dialogue session.
- Long-Term Context: Saved data about the user or previous interactions, requiring storage and retrieval mechanisms.
Context Management
Effective context management is a key task in prompt engineering, especially given the limitations of an LLM's context window. The main methods include:
- RAG: The most common method for working with large volumes of information. It allows for dynamically finding and including only the most relevant fragments from a vast knowledge base into the prompt. It requires a vector database and effective search mechanisms (lexical or semantic).
- Chunking: The process of breaking down large documents into semantically related or fixed-size parts for indexing and subsequent retrieval via RAG.
- Summarization: Compressing a long conversation history or voluminous documents to convey the main meaning within a limited context window.
- Memory Management: Using various strategies for storing and retrieving conversation history in chatbots and agents (e.g., ConversationBufferMemory, ConversationSummaryBufferMemory in LangChain).
- Sliding Window: Retaining only the last N messages of a conversation in the context.
- Filtering and Prioritization: Selecting the most relevant context fragments based on their importance (e.g., using relevance scores provided by a search) before assembling the final prompt.
The Role of Context in Prompt Engineering
- Increasing Relevance: Context allows the model to generate responses that are precisely tailored to the user's query and situation.
- Reducing Hallucinations: Providing factual information (via RAG) forces the model to rely on it rather than inventing facts.
- Personalization: Context about the user (preferences, history) enables the adaptation of responses.
- State Management: In dialogues and multi-step processes, context (history) ensures continuity and keeps the model aware of previous steps.
- Overcoming Model Knowledge Limitations: RAG enables the model to answer questions about events that occurred after its training cutoff or about specific/private data.
Limitations
- Context Window Limit: Despite increases to 1-2 million tokens in modern models, processing such volumes can be expensive and slow. It requires effective compression and selection strategies (RAG, summarization).
- Finding Relevant Context: The effectiveness of RAG depends on the quality of the search. Incorrectly retrieved context can confuse the model ("garbage in, garbage out").
- "Lost in the Middle" Problem: Information located in the middle of a very long prompt may be ignored by the model. This requires structuring the prompt (e.g., using the "sandwich" technique).
- Risk of Contextual Injection: If context is retrieved from untrusted sources (e.g., web pages), it may contain malicious instructions (prompt injections).
References
- Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401.
- Wei, J. et al. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903.
- Wang, X. et al. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171.
- Kojima, T. et al. (2022). Large Language Models are Zero-Shot Reasoners. arXiv:2205.11916.
- Chen, S. et al. (2023). Extending Context Window of Large Language Models via Positional Interpolation. arXiv:2306.15595.
- Schick, T. et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761.
- Besta, M. et al. (2023). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. arXiv:2308.09687.
- Packer, C. et al. (2023). MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560.
- Wang, Y. et al. (2023). Self-Instruct: Aligning Language Models with Self-Generated Instructions. arXiv:2212.10560.
- Sahoo, P. et al. (2024). A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv:2402.07927.
- Kim, S. H. et al. (2024). Theanine: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation. arXiv:2406.10996.
- Chen, S. et al. (2024). StruQ: Defending Against Prompt Injection with Structured Queries. arXiv:2402.06363.
- Zhong, M. et al. (2024). Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective. arXiv:2406.13282.
- Han, H. et al. (2025). Retrieval-Augmented Generation with Graphs (GraphRAG). arXiv:2501.00309.
- Self-Instruct (2025). Aligning Language Models with Self-Generated Instructions. arXiv:2212.10560.
See Also
- Large language models
- Context window
- Vector databases
- LangChain