Multi-agent prompting
Multi-agent prompting is a method in prompt engineering and artificial intelligence systems where multiple autonomous agents based on large language models (LLMs) interact with each other to solve complex problems through a structured exchange of instructions and responses[1].
In other words, a multi-agent system consists of several LLM agents that work together on a user's complex query by distributing reasoning steps (subtasks) among agents with different "roles" and competencies. The main goal of this approach is to overcome the limitations of a single model on complex tasks through collective problem-solving. The use of multiple interacting agents is intended to improve the quality of reasoning, factual accuracy, and the reliability of the response[2]. A key feature is its strict instructiveness: each LLM is assigned a specific role or task within the overall problem-solving framework.
Methods and architectural patterns
Researchers have proposed a number of multi-agent prompting schemes that differ in the nature of agent interaction and their roles.
Role-based expert modeling
One or more agents are assigned as domain experts with narrow specializations. For example, in a multi-agent group, different agents can represent different areas of knowledge (a physicist, a chemist, a biologist) or different stages of problem-solving (a planner, an executor, a critic)[1]. This approach allows for effective few-shot prompting, where each expert agent receives example demonstrations in its field, improving overall performance.
Self-correction and critique (Self-reflection)
An agent can act as a "critic" or reflect on the solutions of another agent or its own previous responses. The self-reflection or self-refinement strategy involves an LLM first generating a response, and then the same or another model analyzes and corrects errors in that response[1]. This allows for the iterative improvement of the final result.
Debates between agents
A competitive variant of multi-agent prompting that involves organizing a discussion or debate among several LLMs. In the LLM-Debate scheme, two or more agents argue about the correct answer to a problem (e.g., a mathematical one) and critique each other's arguments[3]. This debate format improves the model's ability for logical reasoning and increases the factual accuracy of answers compared to a single-agent solution.
Planning and task decomposition
One agent acts as a planner, breaking down a complex query into a sequence of steps or subtasks, which are then solved by itself or other agents. Techniques such as ReAct and Reflexion implement a similar principle of iterative planning with feedback. The LLM first generates a plan for the solution before proceeding to execute it, which helps in handling long chains of reasoning[1].
Multi-persona collaboration
Instead of different models, the same LLM can be used, prompting it to "role-play" several agents with different personas or viewpoints. In the multi-persona self-collaboration approach, a single model successively takes on multiple roles during a dialogue and conducts a discussion as if with itself. Although research shows that separate, independent agents provide higher efficiency, this method allows for simulating a team of experts within a single LLM[1].
Applications and results
The multi-agent prompting approach has proven effective in several areas where single LLMs previously faced difficulties.
Mathematical and logical reasoning
Using multiple agents significantly improves accuracy on tasks requiring multi-step inference (complex arithmetic, mathematical proofs, logic puzzles). In a paper by Du et al. (2023), a multi-agent "debate" approach improved results compared to a single agent. The analysis showed that as the number of agents participating in the discussion increases, the accuracy of the answer rises[3].
Scientific and technical tasks
For complex domain-specific problems (physics, chemistry), the CoMM (Collaborative Multi-Agent, Multi-Reasoning-Path Prompting) method was proposed, in which several LLM agents with different roles (experts) apply various reasoning strategies in parallel. In tests on college-level physics problems, CoMM significantly outperformed baseline approaches like chain-of-thought, making fewer errors in formulas and calculations[1].
Code generation and debugging
In the field of programming, multi-agent systems are used to improve code quality and reduce the number of errors. The PromptV system uses several agents to sequentially write, verify, and correct Verilog code. The distribution of roles (generation, review, testing) improved the model's ability to detect and fix errors, resulting in the proportion of successfully compiling solutions increasing to 96.5% on one of the benchmarks[4].
Information retrieval and analysis
Multi-agent systems are particularly useful for open-ended, poorly structured queries. The company Anthropic developed a multi-agent mode for its Claude model, designed for web research. In this system, a lead agent analyzes the query and spawns several parallel subtask agents, each of which performs searches on different aspects of the topic. This architecture was 90% more effective at handling complex search queries compared to a single Claude model[2].
Text classification and NLP tasks
For NLP tasks, Principle-Based Prompting was developed. In this technique, LLM agents first generate a set of "principles" (rules for solving the task), and then a finalizing agent selects the best ones, based on which another agent performs the classification. This approach improved the macro-F1 metric by 1.5–19% compared to baseline methods, approaching the quality of traditional few-shot learning[5].
Limitations and challenges
Computational complexity and cost
The main disadvantage is the sharply increased computational load. Each agent requires its own generation session, leading to significant consumption of tokens and resources. According to Anthropic, their system consumes on average 4 times more tokens per dialogue, and in some cases, up to 15 times more[2]. This makes the approach justifiable only for high-value tasks.
Design and coordination complexity
Successful operation requires careful prompt engineering: it is necessary to clearly define the role of each agent, the message exchange format, and the stopping criteria. Otherwise, agents might duplicate work, get stuck in endless loops, or create useless subtasks[2].
Security and reliability
New attack vectors are emerging. Researchers have demonstrated the Prompt Infection phenomenon, where a malicious fragment of an instruction from one agent is passed to another, spreading through the entire reasoning chain like a virus. This type of LLM-to-LLM attack reveals the vulnerability of multi-agent systems to hidden injections and manipulations, which requires the development of special protective measures, such as tagging the output of each agent (LLM Tagging)[6].
Links
- "How we built our multi-agent research system" — a detailed analysis by Anthropic
- "More Agents Is All You Need" — a scientific paper on the effectiveness of agent ensembles
Further reading
- Chen, P. et al. (2024). CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving. arXiv:2404.17729.
- Li, J. et al. (2024). More Agents Is All You Need. arXiv:2402.05120.
- Mi, Y. et al. (2024). PromptV: Leveraging LLM-Powered Multi-Agent Prompting for High-Quality Verilog Generation. arXiv:2412.11014.
- Wei, P. et al. (2024). Don’t Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification. arXiv:2502.07165.
- Lee, D.; Tiwari, A. (2024). Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems. arXiv:2410.07283.
- Fernando, C. et al. (2023). PromptBreeder: Self-Referential Self-Improvement via Prompt Evolution. arXiv:2309.16797.
- Wu, Q. et al. (2023). AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155.
- Liu, X. et al. (2023). AgentBench: Evaluating LLMs as Agents. arXiv:2308.03688.
- Li, G. et al. (2024). Multi-LLM Debate: Framework, Principles, and Interventions. PDF.
- Du, N. et al. (2023). Improving Factuality and Reasoning in Language Models through Multi-Agent Debate. arXiv:2305.14325.
Notes
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 Chen, Y. et al. "CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving". arXiv, 2024. [1]
- ↑ 2.0 2.1 2.2 2.3 "How we built our multi-agent research system". Anthropic. [2]
- ↑ 3.0 3.1 Li, G. et al. "More Agents Is All You Need". arXiv, 2024. [3]
- ↑ Mi, Y. et al. "PromptV: Leveraging LLM-powered Multi-Agent Prompting for High-quality Verilog Generation". ResearchGate, 2024. [4]
- ↑ Wei, J. et al. "Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification". arXiv, 2024. [5]
- ↑ Lee, K. & Tiwari, A. "Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems". OpenReview, 2024. [6]