Core prompt engineering techniques
Prompt engineering encompasses a variety of techniques and methods aimed at optimizing interaction with large language models (LLMs) to achieve desired results. These techniques involve structuring prompts, providing context, managing output style, and enhancing the model's reasoning abilities. Since LLMs generate responses by predicting subsequent tokens based on the input prompt, the quality and structure of the prompt directly influence the outcome. Core prompt engineering techniques allow developers and users to effectively control the model's behavior, reduce errors, and adapt it to specific tasks without needing to modify the model's internal parameters.
Basic Principles and Prompt Structure
While specific implementations vary, effective prompts are often constructed based on common principles and structures:
- Empirical guidelines: One practical framework (though not an industry standard) suggests following these principles: Provide Direction, Specify Format, Provide Examples, Evaluate Quality, and Decompose the Task.
- An effective prompt often includes the following components:
- Introduction/Role: Sets the context for the task or defines the model's role/persona.
- Instructions: Clear directions on what needs to be done.
- Context: Necessary information (either static or dynamically retrieved via RAG).
- Examples (Few-shot): Demonstrations of the desired format/style.
- Output cue: An explicit instruction for the model to begin generating the response.
- System Prompt: In chat-based APIs (e.g., from OpenAI, Anthropic), this allows setting global instructions and a role for the entire session.
- Formatting: Using Markdown, JSON, XML, or YAML helps structure the prompt and simplifies parsing the response. Delimiters (```, `"""`, XML tags) are important for separating instructions from data.
Core Instruction and Example Techniques
- Clear and specific instructions: Describe the task, desired outcome, and constraints as precisely as possible. Avoid ambiguity.
- Role Prompting: Assigning the model a role ("You are an expert in...") to control its tone, style, and knowledge base.
- Zero-shot Prompting: A prompt without any examples. Effective for simple tasks or those the model is already very familiar with.
- Few-Shot Prompting: Providing a few (typically 2 to 5) "question-answer" examples to demonstrate the task. A special case is One-shot Prompting with a single example. This is particularly useful for complex formats or styles. Care must be taken to avoid anchoring bias or the model picking up on spurious patterns from the examples.
Context Management Techniques
- Retrieval-Augmented Generation (RAG): Dynamically adding relevant information from external knowledge bases to the prompt before sending it to the LLM. First proposed by Meta AI in 2020, this method is key to combating hallucinations and ensuring data currency.
- Chunking: Breaking down large texts into smaller pieces (chunks) to fit within the model's context window. Strategies include splitting by sentences, paragraphs, or tokens (with overlap).
- Summarization: Compressing long texts or conversation histories to convey the main ideas within a limited context.
Reasoning Enhancement Techniques
These techniques aim to make the model "think" more carefully and structurally. The effectiveness of many of these, especially CoT, becomes apparent in large-scale models (typically those with over 100 billion parameters).
- Chain-of-Thought (CoT): Instructing the model to generate a step-by-step reasoning process before providing the final answer. This significantly improves performance on math, logic, and multi-step problems.
- Zero-shot CoT: The simplest form, requiring no examples. Simply adding a phrase like "Let's think step by step" to the prompt can trigger a chain of reasoning.
- CoT Variations:
- Auto-CoT: Automatically generating reasoning examples for few-shot prompting.
- Self-Consistency: Generating multiple reasoning chains and selecting the most frequent answer through a "voting" process, which enhances reliability.
- Tree-of-Thoughts (ToT): Exploring multiple reasoning paths in a tree-like structure, with the ability to backtrack and evaluate intermediate steps. This technique has shown high efficacy in complex problems, for example, increasing the success rate for solving the "Game of 24" from ~4% to ~74%.
- Graph-of-Thought (GoT), LogiCoT, and others, which aim for more complex or logically verified reasoning.
- ReAct (Reason and Act): A technique that extends CoT. It involves an iterative cycle where the model alternates between Reasoning steps (Thought) and taking Actions with tools (Act), updating its understanding based on Observations.
- Self-Refine: An iterative process where the model first generates a response, then critiques it, and finally refines it based on its own critique. This "generate-critique-refine" cycle can be repeated multiple times.
- Take a Step Back Prompting: The model first formulates general principles or abstractions and then applies them to the specific problem.
Advanced Techniques: Agents and Tools
- Tool Usage / Function Calling: Giving an LLM the ability to call external APIs (e.g., search, calculator, database). The model generates a structured request to call a tool, which is executed by the application, and the result is returned to the model. Modern LLMs (like GPT-4 and Claude 3) have built-in support for this feature.
- Agents: LLM-based systems that can autonomously plan, use tools, and act to achieve a goal. They often use ReAct-like cycles. Frameworks like LangChain, AutoGen, and CrewAI simplify their creation.
- Multi-agent systems: The interaction of multiple specialized agents to solve complex problems.
Techniques for Reducing Errors and Hallucinations
- RAG: Grounding responses in retrieved factual data.
- Response constraint instructions: Requiring the model to answer only based on the provided context or to indicate uncertainty ("If the answer is unknown, say 'I don't know'").
- Citation prompting: Requiring the model to cite its sources or quote parts of the context on which its answer is based.
- Verification and self-critique: Using techniques such as Chain-of-Verification (CoVe) (generating a response and then verifying and correcting it), Self-Refine, or directly prompting the model to check its answer for errors.
- CoT: The step-by-step reasoning process itself can reduce logical errors.
Evaluation and Iteration Techniques
Although evaluation is a separate process, some of its aspects are part of prompt engineering:
- A/B Testing Prompts: Comparing the effectiveness of different prompt versions on the same tasks.
- Using an LLM for evaluation: Applying a powerful model (e.g., GPT-4) to assess the quality of responses generated by another prompt or model (LLM-as-a-Judge).
Prompt Patterns
Common reusable structures:
- Persona Pattern.
- Output Customization Pattern.
- Question Refinement Pattern.
- Cognitive Verifier / Self-Critique Pattern.
- Step-by-Step / Chain-of-Thought Pattern.
- Template Pattern.
References
- Radford, A. et al. (2019). Language Models are Unsupervised Multitask Learners. PDF.
- Brown, T. B. et al. (2020). Language Models are Few-Shot Learners. arXiv:2005.14165.
- Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401.
- Li, X. L.; Liang, P. (2021). Prefix-Tuning: Optimizing Continuous Prompts for Generation. arXiv:2101.00190.
- Liu, Y. et al. (2021). Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. arXiv:2104.08786.
- Bai, Y. et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073.
- Wei, J. et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903.
- Wang, X. et al. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171.
- Kojima, T. et al. (2022). Large Language Models are Zero-Shot Reasoners. arXiv:2205.11916.
- Zhang, Z. et al. (2022). Automatic Chain of Thought Prompting in Large Language Models. arXiv:2210.03493.
- Zhou, D. et al. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. arXiv:2205.10625.
- Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629.
- Besta, M. et al. (2023). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. arXiv:2308.09687.
- Madaan, A. et al. (2023). Self-Refine: Iterative Refinement with Self-Feedback. arXiv:2303.17651.
- Schick, T. et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761.
- Rafailov, R. et al. (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv:2305.18290.
- Wang, Y. et al. (2023). Self-Instruct: Aligning Language Models with Self-Generated Instructions. arXiv:2212.10560.
- Yao, S. et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601.
- Chen, S. Y. et al. (2023). Extending Context Window of Large Language Models via Positional Interpolation. arXiv:2306.15595.
- Chang, K. et al. (2024). Efficient Prompting Methods for Large Language Models: A Survey. arXiv:2404.01077.
- Genkina, D. (2024). AI Prompt Engineering Is Dead. IEEE Spectrum. [1].
- Li, Z. et al. (2024). Prompt Compression for Large Language Models: A Survey. arXiv:2410.12388.
- Liang, X. et al. (2024). Internal Consistency and Self-Feedback in Large Language Models: A Survey. arXiv:2407.14507.
- Han, H. et al. (2025). Retrieval-Augmented Generation with Graphs (GraphRAG). arXiv:2501.00309.
- Li, W. et al. (2025). A Survey of Automatic Prompt Engineering: An Optimization Perspective. arXiv:2502.11560.
- Wu, Z. et al. (2025). The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models. EMNLP 2025. PDF.
- Yang, B. et al. (2025). Hallucination Detection in Large Language Models with Metamorphic Relations. arXiv:2502.15844.
See Also
- Large Language Models
- Chain-of-Thought Prompting
- LLM Hallucinations and Incorrect Responses