LLM hallucinations

From Systems Analysis Wiki
Jump to navigation Jump to search

Hallucination in the context of large language models (LLMs) is a phenomenon where the model confidently generates a plausible-looking response that is factually incorrect, not grounded in the provided context, or is internally inconsistent[1][2]. The model "invents" facts, details, or logical conclusions that are absent from the source data.

It is important to note that a hallucination is not a failure or a bug in the traditional sense. The model is operating as designed: it predicts the most probable continuation of a text based on patterns learned from its training data. It has no built-in mechanism for truth verification[3]. Hallucinations differ from simple errors in that they present confidently delivered but false information, often including non-existent facts, citations, or events[4]. This phenomenon has become so significant that in 2023, the Cambridge Dictionary added a new definition for the term "hallucination" related to artificial intelligence[5].

Definitions and Classification of Hallucinations

Although various terms are used (e.g., "confabulation," "fabrication"), hallucinations in LLMs can be divided into two broad categories: those related to factual accuracy and those related to faithfulness to the source (contextual consistency)[6].

Factual Hallucinations

This occurs when the model provides factually incorrect information about the real world. The model asserts a false "fact" as truth[1].

  • Example: "Charles Lindbergh was the first person to walk on the moon" — a completely fabricated fact.
  • False citations and references: The model might invent a reference to a non-existent scientific paper or law, mimicking the format of a real citation[2]. This undermines trust in models, especially in applications requiring accuracy (education, news, consulting)[7].

Logical Errors

The model makes an inconsistency or an error in reasoning. Individual facts in the response may be correct, but the conclusion is illogical or contradicts basic logic[2]. This often happens in complex reasoning or in math and causality tasks, where the model operates on probabilistic word associations rather than formal logic[2][2].

  • Example: "Since birds fly, astronauts do not experience gravity" — the text appears coherent but is logically incorrect.

Contextual Hallucinations

The model's response does not align with the provided context or instruction. The model "drifts" from the context, adding extraneous information or ignoring required details[1].

  • Instruction Violation: When asked to "translate the text into Spanish," the model responds in English[1].
  • Information Not from the Source: In a summarization task, the model "adds" facts that were not in the original document or misrepresents them[1].
  • Context Blending: In the middle of a response, the model may suddenly start discussing something from a different domain. For example, when asked about NBA commissioner Adam Silver, the model might switch to his predecessor David Stern, blending two different contexts[6].

Inconsistency

A type of hallucination where the model contradicts itself within a single response or a series of responses[6]. One study found that the self-contradiction rate in ChatGPT's responses is around 14%[6][6].

  • Example: "Company X was founded in 1990... and a few sentences later... Company X, established in 2000..."

Hallucinations in Code

LLMs trained on code can generate syntactically correct but non-functional snippets that use non-existent functions, libraries, or parameters[2]. For example, a model might generate `import quantum` in Python, although no such standard module exists. In 2024, the term "code hallucination" was proposed, and the CodeMirage benchmark was created to systematize this problem[8].

Causes

The phenomenon of hallucination is driven by a combination of factors, ranging from model architecture to data quality.

  • Architecture and Training Principle: Most LLMs (e.g., GPT) are autoregressive transformers trained to predict the next token. Their goal is to maximize the likelihood of the text, not to verify the truthfulness of statements[2]. The model does not distinguish between facts and fiction in its training data, treating everything as text patterns[2].
  • Quality of Training Data: LLMs are trained on vast text corpora from the internet, which contain numerous inaccuracies, myths, and outdated information[1]. The model memorizes and reproduces these errors. The knowledge cutoff—the point in time up to which the model has information—is also crucial.
  • Text Generation Method: The stochastic nature of generation (sampling with temperature) allows the model to create more "creative" but less accurate responses. A limited context length can cause the model to "forget" earlier details of a conversation and start contradicting itself[6].

Evaluation and Measurement Methods

Automated metrics, human evaluation, and specialized benchmarks are used to detect and measure hallucinations.

  • Automated Metrics: These include approaches where another LLM acts as a "judge" (LLM-as-a-judge) to assess the correctness of a response[9], or analyzing the model's entropy (uncertainty) during generation[10].
  • Human Annotation: Considered the "gold standard." Experts or crowd-assessors manually evaluate responses, flagging errors. This method is used in training models with RLHF[11].
  • Benchmarks and Stress Tests: Special datasets have been created, such as TruthfulQA, which contains questions designed to provoke the model into reproducing common myths[12]. There are also leaderboards, like the Hugging Face Hallucination Leaderboard, where models are compared based on their hallucination levels[13].

Mitigation and Prevention Methods

  • Retrieval-Augmented Generation (RAG): The most successful approach, which "grounds" the model in external knowledge. Before generating a response, the model retrieves relevant information from a database, search engine, or API. This allows the model to base its answer on verified data rather than speculation[2].
  • Chain-of-Thought Reasoning and Self-Verification: The model first generates a step-by-step reasoning process before giving the final answer, which improves accuracy. In more advanced methods like Self-Verification, the model generates a draft response and is then tasked with reviewing and correcting it[14].
  • Built-in Rules and Filters: Models are trained to refuse to answer if they are uncertain. For example, Claude models from Anthropic follow a "truthfulness" principle and often respond with "I don't know for sure..." instead of inventing facts[11].
  • Integration with External Tools: Models like Gemini can automatically recognize when they need an external tool (e.g., a calculator for computations or a search for recent news) and use it, significantly reducing the number of hallucinations[11].

Risks and Consequences

  • Legal and Reputational Risks: In the legal field, hallucinations can have serious consequences. The Mata v. Avianca (2023) case gained widespread attention, where a lawyer used ChatGPT to find legal precedents, and it fabricated several non-existent cases. The lawyers were fined, and the incident served as a lesson about the unacceptability of trusting AI without verification[1].
  • Spread of Disinformation: On a societal scale, LLMs can amplify the problem of fake news. A well-known case is the Galactica model from Meta, created to assist scientists, which began generating pseudoscientific texts with fabricated experiments and citations. Public access to the model was shut down after three days[15].
  • Making Flawed Decisions: Users, especially inexperienced ones, tend to trust confidently phrased AI responses, which can lead to poor decisions in finance, medicine, and other critical areas[7].

Practical Examples

  • The Air Canada Case (2023): The airline's chatbot invented a non-existent ticket refund policy. When a customer demanded its application, the company refused. The Civil Resolution Tribunal of Canada held Air Canada responsible for the information provided by its chatbot and ordered it to compensate the customer for their losses[9].
  • Defamation Lawsuit Against OpenAI (2023): Radio host Mark Walters sued OpenAI because ChatGPT, in response to a journalist's query, falsely accused him of fraud. This case highlighted the legal liability of companies for the content generated by their models[6].

Literature

  • Holtzman, A. et al. (2020). The Curious Case of Neural Text Degeneration. arXiv:1904.09751.
  • Caccia, M. et al. (2018). Language GANs Falling Short. arXiv:1811.02549.
  • Fan, A. et al. (2018). Hierarchical Neural Story Generation. arXiv:1805.04833.
  • Su, Y.; Collier, N. (2022). Contrastive Search Is What You Need for Neural Text Generation. arXiv:2210.14140.
  • Meister, C. et al. (2023). Locally Typical Sampling. arXiv:2202.00666.
  • O’Brien, S.; Lewis, M. (2023). Contrastive Decoding Improves Reasoning in Large Language Models. arXiv:2309.09117.
  • Finlayson, M. et al. (2024). Basis-Aware Truncation Sampling for Neural Text Generation. arXiv:2412.14352.
  • Tan, Q. et al. (2024). A Thorough Examination of Decoding Methods in the Era of Large Language Models. arXiv:2402.06925.
  • Yu, S. et al. (2023). Conformal Nucleus Sampling. arXiv:2305.02633.
  • Chen, S. J. et al. (2024). Decoding Game: On Minimax Optimality of Heuristic Text Generation Methods. arXiv:2410.03968.

Notes

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 “The Beginner's Guide to Hallucinations in Large Language Models”. Lakera. [1]
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 “What Is LLM Hallucination and How To Prevent It”. Astera. [2]
  3. “Hallucination (artificial intelligence)”. In Wikipedia. [3]
  4. “OpenAI describes LLM hallucinations as 'making up facts' in moments of uncertainty'”. [source not cited in text].
  5. “Cambridge Dictionary adds new definition for 'hallucinate'”. [source not cited in text].
  6. 6.0 6.1 6.2 6.3 6.4 6.5 6.6 “LLM Hallucination—Types, Causes, and Solutions”. Nexla. [4]
  7. 7.0 7.1 “Effective Tips to Prevent AI Hallucinations in Generative AI”. QuickCreator. [5]
  8. [2408.08333] CodeMirage: Hallucinations in Code Generated by Large Language Models. arXiv. [6]
  9. 9.0 9.1 “LLM hallucinations and failures: lessons from 4 examples”. Evidently AI Blog. [7]
  10. “How to Perform Hallucination Detection for LLMs”. Kolena. [8]
  11. 11.0 11.1 11.2 “ChatGPT vs Google Gemini vs Anthropic Claude: Comprehensive Comparison & Report”. DataStudios. [9]
  12. “Mastering LLM Accuracy: How to Test, Detect, and Fix Hallucinations in AI Models”. Stephen Weber on Medium. [10]
  13. “LLM Benchmarks and Leaderboards: Avoiding Foundation Model Mistakes”. Arize Blog. [11]
  14. “Improving the Reliability of LLMs: Combining Chain-of-Thought Reasoning and Retrieval-Augmented Generation”. arXiv. [12]
  15. “Why Meta Took Down its 'Hallucinating' AI Model Galactica?”. Analytics India Magazine. [13]