Generated Knowledge Prompting

Generated Knowledge Prompting (GKP) is a prompt engineering technique designed to improve the ability of large language models (LLMs) to solve problems that require reasoning and reliance on factual knowledge^[1]. The core idea of GKP is to have the model perform the task in two stages: first, generate a set of relevant facts on the topic of the query, and then, based on this knowledge, formulate the final answer^[2].

This approach allows LLMs to activate and use their internal, parametric knowledge, which is implicitly encoded in billions of parameters but is often inaccessible with standard queries. GKP addresses the problem where models "don't know what they know" and helps them connect disparate facts to construct a correct inference^[2].

History and Origin

The Generated Knowledge Prompting method was first introduced in the research paper "Generated Knowledge Prompting for Commonsense Reasoning", prepared by a group of researchers led by Jiacheng Liu. The initial version of the work was published in the arXiv preprint repository on October 15, 2021, and the final version was presented at the prestigious Association for Computational Linguistics (ACL) conference in 2022^[1].

GKP became one of the first and most significant methods to propose a paradigm shift in interacting with LLMs, moving the focus from single-step answer generation to a two-stage cognitive process.

Two-Stage Mechanism

The GKP mechanism divides a complex task into two simpler sub-processes: retrieving relevant information and then using it for inference.

Stage 1: Knowledge Generation

In the first stage, a language model (the "knowledge generator") is used to create several (M) knowledge fragments relevant to the original question. This process is initiated using a few-shot prompting technique, where the model is provided with a few examples for "in-context learning."

The prompt for knowledge generation has a strictly defined structure:

Instruction: A general directive, for example: "Generate some facts about the topic."
Demonstration Examples: A few human-written "question-knowledge" pairs. These examples play a critical role, as they teach the model what type of information is considered useful. The generated knowledge should not contain the answer itself directly.
New Question: The user's original query for which knowledge needs to be generated.

For a single question, M knowledge variants are generated (M=20 in the original paper) to obtain a diverse set of facts for the second stage^[1].

Stage 2: Knowledge Integration and Answer Formation

In the second stage, another language model (the "inference model") is used, which can operate in either zero-shot mode or be fine-tuned for the specific task.

The integration process is as follows:

Query Augmentation: The original question (q) is sequentially combined with each of the M generated knowledge fragments (k_m). This results in M+1 augmented queries (including the original question without knowledge).
Answer Evaluation and Selection: The inference model evaluates the conditional probability of each possible answer choice (a) for each augmented query. The final answer is the option that receives the highest probability score for at least one of the queries.

This two-stage mechanism introduces a kind of metacognitive process: "Before you answer, think and formulate what you know on this topic".

Effectiveness and Test Results

The effectiveness of GKP has been tested on a number of academic benchmarks for evaluating commonsense reasoning. The method showed a significant performance improvement compared to baseline approaches.

Summary of GKP results on major benchmarks (data from Liu et al., 2022)^[1]
Benchmark Name	Task	Base Model Accuracy (%)	Accuracy with GKP (%)	Improvement (%)
NumerSense	Numerical commonsense	64.05	72.47	+8.42
CommonsenseQA	General commonsense	39.89	47.26	+7.37
CommonsenseQA 2.0	General commonsense	70.20	73.03	+2.83
QASC	Scientific commonsense	76.74	80.33	+3.59

The greatest improvement is observed in zero-shot mode, which proves GKP's ability to effectively activate the model's internal knowledge without additional fine-tuning.

Comparative Analysis with Other Techniques

GKP vs. Chain-of-Thought (CoT)

The key difference between GKP and Chain-of-Thought (CoT) lies in the type of information generated:

GKP generates declarative knowledge—facts, definitions, and statements about the world ("what"). It provides the model with additional context.
CoT generates procedural knowledge—logical steps, calculations, and a sequence of inferences ("how"). It provides the model with a reasoning path.

Thus, GKP provides a factual basis, while CoT provides a logical structure for inference^[3].

GKP vs. Retrieval-Augmented Generation (RAG)

Unlike GKP, the Retrieval-Augmented Generation (RAG) method uses external, non-parametric knowledge sources.

GKP uses internal knowledge that the model learned during its training. It forces the model to "recall" what it already knows.
RAG uses external knowledge from databases, documents, or the internet. It forces the model to "search for" information in the external world.

The choice between GKP and RAG depends on the task: GKP is effective if the required knowledge is common and well-represented in the training data, whereas RAG is indispensable for highly specialized, recent, or proprietary data.

Limitations and Risks

The "Hallucination" Problem: The main risk of GKP is the potential for generating incorrect facts. If the model generates a false statement in the first stage, it will be treated as true in the second stage, leading to a confident but completely wrong answer.
Computational Costs: The method requires multiple calls to the LLM (M+1 calls for a single query), which significantly increases response time (latency) and usage costs compared to standard prompting.
Prompt Development Complexity: The effectiveness of GKP heavily depends on the quality of the few-shot examples, the creation of which is a non-trivial and labor-intensive task.

Evolution and Hybrid Approaches

The ideas behind GKP catalyzed the development of more complex and reliable prompting techniques, such as:

Hint-before-Solving (HSP): A direct conceptual successor to GKP, which applies the two-stage principle ("knowledge first, then action") not to a simple answer, but to the more complex reasoning process in CoT^[4].
Verify-and-Edit (VE): A hybrid framework that addresses the problem of "hallucinations" in GKP and CoT. VE first generates a chain of reasoning (like CoT), then automatically verifies key facts using an external search (like RAG), and edits the reasoning before generating the final answer^[5].

Links

Generated Knowledge Prompting in the Prompt Engineering Guide

Literature

Liu, J. et al. (2021). Generated Knowledge Prompting for Commonsense Reasoning. arXiv:2110.08387
Liu, J. et al. (2022). Generated Knowledge Prompting for Commonsense Reasoning. In *Proc. ACL 2022*. ACL:2022
Wei, J. et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903
Wang, X. et al. (2022). Self-Consistency Improves Chain-of-Thought Reasoning in Language Models. arXiv:2203.11171
Fu, J. et al. (2024). Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Encoded Knowledge. arXiv:2402.14310
Zhao, R. et al. (2023). Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework. arXiv:2305.03268
Lin, B. et al. (2020). NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models. Dataset page
Talmor, A. et al. (2019). CommonsenseQA: A Question-Answering Challenge Targeting Commonsense Knowledge. ACL paper
Khot, T. et al. (2019). QASC: A Dataset for Question Answering via Sentence Composition. arXiv:1910.11473
Mu, J. et al. (2023). Learning to Compress Prompts with Gist Tokens. arXiv:2304.08467

Notes

↑ ^1.0 ^1.1 ^1.2 ^1.3 Liu, J., Liu, A., Lu, X., et al. (2022). "Generated Knowledge Prompting for Commonsense Reasoning". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. [1]
↑ ^2.0 ^2.1 Liu, J., Liu, A., Lu, X., et al. (2021). "Generated Knowledge Prompting for Commonsense Reasoning". arXiv preprint arXiv:2110.08387. [2]
↑ Wei, J., Wang, X., Schuurmans, D., et al. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv preprint arXiv:2201.11903. [3]
↑ Katz, O., Maman, Y., et al. (2024). "Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Scaffolding". arXiv preprint arXiv:2402.14310. [4]
↑ Zhao, R., Zhang, J., et al. (2023). "Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework". arXiv preprint arXiv:2305.03268. [5]

[liu2022acl-1] 1.0 ^1.1 ^1.2 ^1.3 Liu, J., Liu, A., Lu, X., et al. (2022). "Generated Knowledge Prompting for Commonsense Reasoning". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. [1]

[liu2021arxiv-2] 2.0 ^2.1 Liu, J., Liu, A., Lu, X., et al. (2021). "Generated Knowledge Prompting for Commonsense Reasoning". arXiv preprint arXiv:2110.08387. [2]

[wei2022cot-3] Wei, J., Wang, X., Schuurmans, D., et al. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv preprint arXiv:2201.11903. [3]

[hsp2024-4] Katz, O., Maman, Y., et al. (2024). "Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Scaffolding". arXiv preprint arXiv:2402.14310. [4]

[zhao2023ve-5] Zhao, R., Zhang, J., et al. (2023). "Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework". arXiv preprint arXiv:2305.03268. [5]

[1]

[2]

[3]

[4]

[5]

Generated Knowledge Prompting

Contents

History and Origin

Two-Stage Mechanism

Stage 1: Knowledge Generation

Stage 2: Knowledge Integration and Answer Formation

Effectiveness and Test Results

Comparative Analysis with Other Techniques

GKP vs. Chain-of-Thought (CoT)

GKP vs. Retrieval-Augmented Generation (RAG)

Limitations and Risks

Evolution and Hybrid Approaches

Links

Literature

Notes

Navigation menu

Generated Knowledge Prompting

History and Origin

Two-Stage Mechanism

Stage 1: Knowledge Generation

Stage 2: Knowledge Integration and Answer Formation

Effectiveness and Test Results

Comparative Analysis with Other Techniques

GKP vs. Chain-of-Thought (CoT)

GKP vs. Retrieval-Augmented Generation (RAG)

Limitations and Risks

Evolution and Hybrid Approaches

Links

Literature

Notes

Navigation menu

Search