Graph of Thoughts
Graph-of-Thoughts (GoT) is a concept in the field of working with large language models (LLMs) that involves representing a model's reasoning process as an arbitrary graph of interconnected "thoughts" (intermediate problem-solving steps)[1]. This approach was proposed by a group of researchers led by Maciej Besta of ETH Zurich and was published at the 2024 AAAI conference[2]. The goal of Graph-of-Thoughts is to expand the capabilities of prompt engineering beyond existing frameworks like Chain-of-Thought and Tree-of-Thoughts[1]. Unlike them, the GoT approach allows any piece of reasoning (a thought) generated by the model to refer to any other, forming a network of ideas rather than a strictly linear or tree-like structure[3]. This, as the authors claim, more accurately reflects the complex, non-linear nature of human thinking and potentially brings the LLM's reasoning mechanism closer to the workings of the human brain (with its recurrent neural connections)[1][1].
Graph-of-Thoughts is a prompting framework, meaning it does not require additional training or fine-tuning of the model itself. Instead, it organizes the dialogue with the LLM in a specific way, guiding the model through a series of "thought" steps connected in a graph structure[1]. This structure allows for combining and reusing different lines of reasoning: for example, the model can consider several hypotheses or parts of a problem in parallel and then merge the most successful ones into a single solution[1]. In an encyclopedic style, Graph-of-Thoughts can be defined as a generalization of previous structured reasoning strategies for LLMs, providing maximum flexibility in organizing thought processes within a single complex prompt[1].
Background: Chain-of-Thought and Tree-of-Thoughts
The Graph-of-Thoughts method evolved from earlier approaches that use an explicit reasoning structure when working with language models. The foundational approach is Chain-of-Thought (CoT). In the CoT method, the user is prompted to include not only the problem statement in the prompt but also the intermediate reasoning steps that lead to the answer[1]. Studies have shown that this approach significantly improves an LLM's ability to solve complex tasks, such as mathematical or logical puzzles, without any changes to the model's parameters[1]. In effect, CoT encourages the model to break down a complex problem into simpler, step-by-step stages, imitating a sequential thought process.
An extension of this idea is the Self-Consistency technique: instead of one chain of thought, several independent chains are generated, after which the most convincing one is selected[1]. This allows the model to consider different approaches to a solution and reduces the risk of an incorrect answer resulting from following a single flawed line of reasoning. However, even multiple CoT chains do not allow for "merging ideas": each chain is considered in isolation, and the model only chooses the best one without combining their content.
To overcome this limitation, the Tree-of-Thoughts (ToT) framework was proposed[1]. In ToT, the reasoning process is organized as a tree: at each point, the model can generate several possible next thoughts (branches), after which these intermediate states are evaluated, and the most promising ones are selected for further exploration[1]. By using a tree search algorithm (e.g., breadth-first search, BFS, or depth-first search, DFS) and the ability to backtrack to nodes and explore a different branch, Tree-of-Thoughts provides the language model with a more powerful mechanism for solving complex problems than linear CoT[1]. New capabilities emerge, such as backtracking and revision: if one branch leads to a dead end, it is possible to return to a previous node and try a different path[1]. This methodology has proven effective in solving logical and search-based tasks where enumeration of options and planning play a significant role.
However, the tree of thoughts also imposes strict constraints: each thought (a node in the tree) belongs to only one branch, interaction occurs only between parent and child nodes, and different branches cannot merge or exchange information[3]. In other words, cross-pollination of ideas between different hypotheses within a single solution is difficult: the branches of the tree develop independently and are only brought together at the root when the best chain of reasoning is selected[3]. In real creative or analytical thinking, a person often returns to a previously considered idea and combines it with another line of reasoning. Such an intertwining of thoughts goes beyond the structure of a tree[1].
These observations led researchers to the idea of a more flexible structure—a graph, where connections between thoughts are unrestricted and can form a complex network. As noted in a 2024 analytical review, the emergence of chains, trees, and graphs of thoughts reflects the rise of a new class of methods capable of significantly enhancing the capabilities of LLMs through the explicit structuring of the reasoning process[4]. Specifically, structured prompts have led to notable improvements in LLM performance in many areas—from solving mathematical problems and logical puzzles to planning and even creative writing[4]. It is against this general background that the Graph-of-Thoughts framework emerged as the next step in the evolution of structured prompting methods.
The Graph-of-Thoughts Concept: A Graph Structure for Thoughts
Graph-of-Thoughts proposes representing the task execution process of a language model as an arbitrary directed graph. Formally, a graph of thoughts in GoT is a set of vertices (thoughts) and edges (dependencies between thoughts)[1]. A vertex in the graph represents a single model thought—a term that refers to any meaningful unit dependent on the task's context: it could be a single statement, a solution step, a fragment of text, a paragraph, a block of code, etc., generated by the model in response to a prompt[1][1]. An edge between vertices means that one thought was used to generate another—i.e., the prompt explicitly states that the model should build upon a specific previous result to obtain a new one[1]. Thus, the edges capture dependencies: which previously obtained data the current reasoning step depends on.
The most important difference of GoT from simpler structures is the ability for aggregation and merging of thoughts. In a graph, a vertex (a new thought) can have multiple predecessors[1]. This corresponds to a situation where two or more separate chains of reasoning are combined: the model receives several previously generated fragments as input and forms a synthesized output based on them[1]. For example, when solving a problem, the model can consider two hypotheses in parallel and then create a new thought that combines the positive aspects of both hypotheses and eliminates their shortcomings[1][1]. Such aggregation operations were impossible within a tree-like framework (where each node has only one parent) but are naturally implemented in a graph[1]. In addition to merging ideas, a graph also allows for feedback loops: in principle, the GoT structure does not prohibit cycles, meaning the model can return a result to an earlier stage of reasoning for reprocessing or refinement[1]. The authors draw an analogy to recurrent connections in the brain's neural networks, where the output of one group of neurons can influence previous layers, forming closed-loop thought processes[1].
Practically, implementing Graph-of-Thoughts requires a special organization of the dialogue with the model. The researchers developed a modular architectural framework for GoT[1]. It includes components for: (1) fine-grained control over individual steps (thoughts)—a "controller" manages the order and logic of thought generation; (2) dynamic prompt generation—for each step, a special module forms a prompt based on the current context and selected graph vertices (predecessors); (3) parsing and evaluating the model's responses—the fragments received from the LLM are analyzed and evaluated for quality, utility, or compliance with task criteria[5]. The GoT architecture thereby allows for the interactive construction of a reasoning graph: after each step, a decision is made about which new vertices to add, how to connect them to previous ones, and which branches to continue or merge. Thanks to its modularity, this framework can be extended with new types of "thought transformations" (e.g., special graph operations) and adapted for various models (the authors successfully tested GoT with LLMs from the GPT-3.5, GPT-4, LLAMA 2 families, and others)[1]. An important property is that GoT does not require changing the parameters of the language model itself—all improvements are achieved through more intelligent construction of prompts and processing of responses[1]. This means that existing powerful LLMs can be used "as is," with Graph-of-Thoughts acting as a superstructure that manages their operation.
It should be noted that the term Graph-of-Thought also appeared in another, independent development, distinct from the approach of Besta and colleagues. In 2023, Yao Yao and co-authors proposed a method for improving reasoning in LLMs through an additional graph-of-thoughts encoder module, which required model fine-tuning[6]. Their paper, titled "Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models," describes a two-stage architecture: first, a graph of interconnected intermediate statements is generated, then it is transformed by a special encoder and integrated into the model via a gated fusion mechanism[6]. This hybrid approach with training demonstrated some accuracy improvements on tasks; for example, on the multimodal ScienceQA question set, accuracy increased from 85.2% to 87.6% when using the T5-base model[6]. However, this approach, although similar in name, is fundamentally different: it requires model modification (fine-tuning) and is not a prompt engineering framework. The authors of the original GoT approach (AAAI 2024) explicitly state that they do not consider the model by Yao et al. in their work, as they focus specifically on methods that do not require updating LLM parameters[1]. Thus, Graph-of-Thoughts in the context of this review is specifically a prompt-level framework, not a modification of the neural network architecture.
Application and Results
The authors of GoT demonstrated its advantages on a range of tasks that are difficult to solve with a single direct prompt (input-output prompting) or even with a chain of thoughts. A characteristic feature of such tasks is that they can be broken down into several parts (subtasks), these parts can be solved separately, and then a complete answer can be synthesized from the partial results[1]. Among the examples considered are: sorting an unordered list, counting keywords in a text (e.g., for document summarization), performing set operations (union, intersection of lists, etc.), and merging text documents (combining information from multiple sources)[1]. In all these cases, Graph-of-Thoughts allows for a natural decomposition of the task. For example, for sorting, the list is divided into parts, each part is sorted separately as an independent branch of thoughts, and then the results are merged (imitating an algorithm like merge sort); or when analyzing texts, the model can extract information from different documents in parallel and then consolidate it.
Experimental results confirm the effectiveness of the GoT framework. According to the report by Besta and colleagues, in the sorting task, the graph of thoughts significantly improved the quality of the solution compared to previous approaches[1]. Specifically, the sorting accuracy using GoT was 70% higher than with a simple Chain-of-Thought (CoT) and 62% higher than with a Tree-of-Thoughts (ToT)[1]. At the same time, the method reduces computational costs: the number of calls to the model (and, consequently, the tokenized volume of prompts) decreased by 31% compared to Tree-of-Thoughts for the same task[1]. This means that the graph-based organization of reasoning not only improved the final result but also made the solution more economical by avoiding redundant computations through the smart combination of intermediate conclusions. Similar gains were achieved on other test tasks, especially those requiring the aggregation of diverse information. The researchers note that GoT is most effective for composite tasks consisting of multiple elements: "Graph-of-Thoughts is particularly well-suited for tasks that naturally decompose into smaller subtasks that are solved separately with subsequent aggregation of results"[1]. In such cases, the graph of thoughts can cover all aspects of the problem and synthesize a more complete solution than when following a single line of reasoning.
For a more nuanced understanding of why one prompting method is better than another, a 2024 paper proposed a special metric—the "volume of a thought"[1]. The volume is defined for each individual thought (a vertex in the graph) as the number of other thoughts from which it can be reached via directed edges (simply put, how many intermediate steps contributed to its information)[1]. In Chain-of-Thought, any thought relies on only one predecessor, so its volume is 1 (a linear chain). In a tree of thoughts, the volume can be larger but is still limited by the structure of a single branch. In a graph, however, thanks to aggregation, a single vertex can accumulate contributions from many others—its "volume" is significantly higher[1]. It has been shown that GoT allows final conclusions to be based on a much larger volume of preceding thoughts by integrating their content. This fact reflects a deeper coverage of the solution space and serves as a quantitative explanation for the advantages of graph-based reasoning over simpler schemes.
Comparison and Significance
Graph-of-Thoughts currently represents the most generalized form of structured prompting for LLMs. In comparison tables of various schemes (CoT, CoT with self-consistency, ToT, and GoT), it is emphasized that only GoT supports an arbitrary topology of the thought process[1]. It encompasses the capabilities of all previous approaches: it can function as a single chain, a tree with branches, or a combination of multiple chains if that is suitable for solving the task[1]. The key is that there are no rigid constraints on the connections between steps, which theoretically makes the space of possible reasoning strategies as broad as possible[1].
It is important to understand that the flexibility of GoT comes at the cost of greater control complexity. Implementing a graph of thoughts requires an external orchestrator algorithm to decide when and which thoughts to generate, which ones to select or combine, and when to stop the process and provide an answer. In simple CoT, no such decisions are needed—the model itself generates a linear line of reasoning to the end. In ToT, part of the control is handled by the tree search algorithm (e.g., choosing a node to expand). In GoT, the degree of freedom is higher, and the method's effectiveness depends on the quality of the heuristics used to evaluate intermediate results and on the proper construction of prompts at each step[1]. Thus, Graph-of-Thoughts can be seen not just as a prompt format, but as a reasoning structure imposed on the interaction process with an LLM—a kind of dynamic plan by which the model solves the task, while the user (or a controller program) guides this process.
The emergence of Graph-of-Thoughts reflects a drive to make the operation of large language models more interpretable and controllable. By explicitly defining the structure of the solution, researchers not only achieve better quality but also gain the ability to analyze how the model arrived at a particular conclusion. This brings approaches in NLP closer to classical methods of algorithmic search and planning, but now the execution of steps is entrusted to a neural network model. A number of experts view structured prompts (chains, trees, and graphs of thoughts) as a promising direction capable of overcoming the "black box" limitations of deep models and increasing their reliability on complex tasks[4][4].
The Graph-of-Thoughts methodology continues to evolve actively. The authors have made the code and examples for implementing GoT open source[1], allowing the community to experiment with the new approach. Extensions are also emerging, such as multimodal versions of the graph of thoughts that combine text with images and other data types[3][3], as well as attempts to integrate GoT ideas directly into model architectures (as in the aforementioned work by Yao et al., 2023). In 2024, a detailed taxonomy and review of Chain-of-Thought, Tree-of-Thoughts, and Graph-of-Thoughts methods was published, systematizing the accumulated knowledge and describing the theoretical foundations of such approaches[4][4]. All this indicates a strong interest from the scientific community in the structured management of LLM reasoning. Graph-of-Thoughts has already established itself as an effective tool for solving complex problems and is likely to become a foundation for further innovations in AI solutions that combine the power of large language models with the transparency and logic of classical algorithms.
Links
- Original paper "Graph of Thoughts: Solving Elaborate Problems with Large Language Models" on arXiv
- HTML version of the original paper
- Review "Demystifying Chains, Trees, and Graphs of Thoughts" on arXiv
- Paper "Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models" on arXiv
- Multimodal Graph-of-Thoughts — Deepgram article
- LLMs Graph of Thoughts Framework — Medium article
Literature
- Besta, M. et al. (2024). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. arXiv:2308.09687.
- Yao, S. et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601.
- Yao, Y. et al. (2023). Beyond Chain-of-Thought: Effective Graph-of-Thought Reasoning in Language Models. arXiv:2305.16582.
- Wei, J. et al. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903.
- Wang, X. et al. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171.
- Wei, J. et al. (2024). Demystifying Chains, Trees, and Graphs of Thoughts. arXiv:2401.14295.
- Huang, S. et al. (2023). Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1). arXiv:2302.14045.
- Mitra, C. et al. (2024). Compositional Chain-of-Thought Prompting for Large Multimodal Models. In CVPR 2024. PDF.
- Zheng, G. et al. (2023). DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models. arXiv:2310.16436.
- Mu, J. et al. (2023). Learning to Compress Prompts with Gist Tokens. arXiv:2304.08467.
Notes
- ↑ 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.30 1.31 1.32 1.33 1.34 1.35 1.36 1.37 1.38 1.39 1.40 1.41 1.42 1.43 Besta, Maciej et al. "Graph of Thoughts: Solving Elaborate Problems with Large Language Models". ar5iv.labs.arxiv.org. [1]
- ↑ Besta, Maciej et al. "Graph of Thoughts: Solving Elaborate Problems with Large Language Models". arXiv. [2]
- ↑ 3.0 3.1 3.2 3.3 3.4 Grygiel, Jacek. "Multimodal Graph-of-Thoughts: How Text, Images, and Graphs Lead to Better Reasoning". Deepgram. [3]
- ↑ 4.0 4.1 4.2 4.3 4.4 4.5 Wei, Jason et al. "Demystifying Chains, Trees, and Graphs of Thoughts". arXiv. [4]
- ↑ Wo, Jacek. "LLMs Graph of Thoughts Framework. Case study". Medium. [5]
- ↑ 6.0 6.1 6.2 Yao, Yuqing et al. "Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models". arXiv. [6]