Generation bias (LLM) — 生成中的偏见

大语言模型（LLM）中的偏见是指生成文本时出现的系统性扭曲，模型会反映或加剧社会中已存在的与性别、种族、文化、政治观点及其他社会范畴相关的刻板印象和偏见。这一现象的产生是由于LLM在海量的人类数据上进行训练，而这些数据不可避免地包含了带有偏见的信息^[1]。

偏见是人工智能（AI）开发中的关键伦理和技术问题之一，因为它可能导致歧视、传播虚假信息，并损害人们对技术的信任。

LLM中的偏见类型

LLM中的偏见可以以多种形式表现出来。

性别偏见

模型倾向于复现传统的性别刻板印象，将职业和特征与特定性别联系起来。

联合国教科文组织（UNESCO）2024年的一项研究显示，LLM在描述女性时，将其与家庭角色（“家”、“家庭”、“孩子”）相关联的频率是男性的四倍，而男性则与“商业”和“职业”等概念相关联^[2]。
《自然科学报告》（Nature Scientific Reports）上的一项研究揭示，包括ChatGPT和LLaMA在内的七个主流LLM生成的内容中存在显著的性别和种族偏见^[3]。
在俄语语境下，模型对于中性角色（如“医生”、“主任”）通常默认使用阳性形式，并且很难生成女性称谓^[4]。

种族与族裔偏见

LLM可能对不同族裔群体表现出隐性歧视。

彭博社（Bloomberg）的一项研究显示，与黑人候选人相比，ChatGPT 3.5更偏好亚裔候选人的简历^[5]。
在俄语语境下，RuBia数据集发现，如果训练语料库中存在反犹太和反移民的刻板印象（例如，同意“移民是懒惰的”这一说法），模型可能会复现这些偏见^[6]。

政治与意识形态偏见

尽管许多LLM声称保持中立，但它们常常表现出对特定政治派别的倾向。

政策研究中心（Centre for Policy Studies）的一项研究发现，在23个（共24个）被测试的LLM中，存在左翼自由主义偏见^[7]。
华盛顿大学和卡内基梅隆大学的测试表明，ChatGPT和GPT-4最具左翼自由意志主义倾向，而Meta的LLaMA则最具右翼威权主义倾向^[8]。

偏见的产生机制

训练数据：主要来源。LLM在来自互联网的海量文本语料库上进行训练，这些文本是社会的“镜子”，反映了社会中所有的刻板印象^[9]。
架构与训练算法：Transformer架构本身可能会增强数据中已有的相关性。
微调与RLHF：人类反馈强化学习（RLHF）阶段也可能引入偏见，因为人类评估员不可避免地会受到自身观点的影响。

检测与缓解方法

偏见检测

刻板印象测试集： 使用专门的数据集，例如：
- CrowS-Pairs： 涵盖九种类型的偏见，包括种族、宗教和年龄^[10]。
- StereoSet： 衡量在性别、职业、种族和宗教四个领域的刻板印象偏见^[11]。
- RuBia： 用于检测俄语模型偏见的专门数据集^[12]。
- 多语言资源： 适应性数据集，如French CrowS-Pairs^[13]和Chinese Bias Benchmark (CBBQ)^[14]。
- 特定领域分析： 在招聘^[15]、医疗^[16]等领域的偏见研究。

偏见缓解

数据层面（预处理）： 清理、筛选和重新平衡训练语料库。相关方法在Holistic AI文档中有描述^[17]。
训练层面（处理中）： 修改训练算法以考虑公平性。
输出层面（后处理）： 对已生成的响应进行筛选和审核。

法律与伦理后果

人工智能中的偏见会带来严重后果，包括在关键领域的歧视和虚假信息的传播。

监管： 世界各国政府正开始引入规范以控制人工智能。
欧洲通过了《人工智能法案》（AI Act），该法案自2024年8月1日起分阶段生效。它对高风险系统提出了严格要求，包括强制性偏见评估，并规定了最高可达公司全球营业额7%的罚款^[18]。
2021年，俄罗斯多家领先科技公司签署了一份自愿性的《人工智能领域伦理准则》，承诺将歧视最小化。到2021年底，已有超过100家组织签署了该准则^[19]。

对抗偏见是一个持续的权衡过程。过于激进的过滤可能导致“过度政治正确”，即模型拒绝讨论任何敏感话题。因此，开发者正在寻求在模型的安全性、客观性和信息量之间找到平衡。

文献

Guo, Y. et al. (2024). Bias in Large Language Models: Origin, Evaluation, and Mitigation. arXiv:2411.10915.
Gallegos, I. O. et al. (2023). Bias and Fairness in Large Language Models: A Survey. arXiv:2309.00770.
Bender, E. M. et al. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. doi:10.1145/3442188.3445922.
Nadeem, M. et al. (2020). StereoSet: Measuring Stereotypical Bias in Pretrained Language Models. arXiv:2004.09456.
Nangia, N. et al. (2020). CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. ACL 2020.
Bai, X. et al. (2024). Measuring Implicit Bias in Explicitly Unbiased Large Language Models. arXiv:2402.04105.
Hofmann, V. et al. (2024). AI Generates Covertly Racist Decisions about People Based on Their Dialect. Nature, 633, 147-154. Full text.
Fang, X. et al. (2024). Bias of AI-Generated Content: An Examination of News Produced by Large Language Models. Scientific Reports, 14, 5224. Full text.
Grigoreva, V. et al. (2024). RuBia: A Russian Language Bias Detection Dataset. arXiv:2403.17553.
Du, L. et al. (2024). Causal-Guided Active Learning for Debiasing Large Language Models. arXiv:2408.12942.
Ayaz, A. et al. (2023). Taught by the Internet: Exploring Bias in OpenAI’s GPT-3. arXiv:2306.02428.

参见

大语言模型

注释

↑ “Bias in Large Language Models: Origin, Evaluation, and Mitigation”. arXiv. [1]
↑ “Generative AI: UNESCO study reveals alarming evidence of regressive gender stereotypes”. UNESCO. [2]
↑ “Gender and race stereotypes in Large Language Models”. Nature Scientific Reports. [3]
↑ “俄语LLM的偏见：机器认为谁是‘普通人’？”. Habr. [4]
↑ “ChatGPT’s Racial Bias in Hiring Decisions”. Business Insider. [5]
↑ “RuBia: A Russian-language Bias Detection Dataset”. The Moonlight. [6]
↑ “Left-leaning bias commonplace in AI-powered chatbots, shows new report”. Centre for Policy Studies. [7]
↑ “AI language models are rife with political biases”. MIT Technology Review. [8]
↑ “语言模型：如何克服偏见并确保安全”. RBC Trends. [9]
↑ “CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models”. ACL Anthology. [10]
↑ “StereoSet: Measuring stereotypical bias in pretrained language models”. arXiv. [11]
↑ “RuBia: A Russian Language Bias Detection Dataset”. arXiv. [12]
↑ “French CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in French Language Models”. ACL Anthology. [13]
↑ “CBBQ: A Chinese Bias Benchmark for Large Language Models”. arXiv. [14]
↑ “Bias in Large Language Models and Who Should Be Held Accountable”. Stanford Law School. [15]
↑ “Racial bias in psychiatric diagnosis and treatment with large language models”. Nature Digital Medicine. [16]
↑ “Preprocessing Bias Mitigation”. Holistic AI Documentation. [17]
↑ “EU AI Act: First Rules Take Effect on Prohibited AI Systems”. Jones Day. [18]
↑ “Over 100 organizations signed up for Code of Ethics in AI by end of 2021”. TASS. [19]

[arxiv-bias-origin-1] “Bias in Large Language Models: Origin, Evaluation, and Mitigation”. arXiv. [1]

[unesco-gender-bias-2] “Generative AI: UNESCO study reveals alarming evidence of regressive gender stereotypes”. UNESCO. [2]

[nature-gender-race-bias-3] “Gender and race stereotypes in Large Language Models”. Nature Scientific Reports. [3]

[habr-bias-experiment-4] “俄语LLM的偏见：机器认为谁是‘普通人’？”. Habr. [4]

[bloomberg-hiring-bias-5] “ChatGPT’s Racial Bias in Hiring Decisions”. Business Insider. [5]

[rubia-dataset-themoonlight-6] “RuBia: A Russian-language Bias Detection Dataset”. The Moonlight. [6]

[cps-left-leaning-bias-7] “Left-leaning bias commonplace in AI-powered chatbots, shows new report”. Centre for Policy Studies. [7]

[mit-review-political-bias-8] “AI language models are rife with political biases”. MIT Technology Review. [8]

[rbc-bias-safety-9] “语言模型：如何克服偏见并确保安全”. RBC Trends. [9]

[crows-pairs-acl-10] “CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models”. ACL Anthology. [10]

[stereoset-arxiv-11] “StereoSet: Measuring stereotypical bias in pretrained language models”. arXiv. [11]

[rubia-dataset-arxiv-12] “RuBia: A Russian Language Bias Detection Dataset”. arXiv. [12]

[13] “French CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in French Language Models”. ACL Anthology. [13]

[14] “CBBQ: A Chinese Bias Benchmark for Large Language Models”. arXiv. [14]

[stanford-law-bias-15] “Bias in Large Language Models and Who Should Be Held Accountable”. Stanford Law School. [15]

[nature-digital-medicine-bias-16] “Racial bias in psychiatric diagnosis and treatment with large language models”. Nature Digital Medicine. [16]

[holistic-ai-docs-17] “Preprocessing Bias Mitigation”. Holistic AI Documentation. [17]

[jonesday-ai-act-18] “EU AI Act: First Rules Take Effect on Prohibited AI Systems”. Jones Day. [18]

[tass-ethics-code-19] “Over 100 organizations signed up for Code of Ethics in AI by end of 2021”. TASS. [19]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

Generation bias (LLM) — 生成中的偏见

Contents

LLM中的偏见类型

性别偏见

种族与族裔偏见

政治与意识形态偏见

偏见的产生机制

检测与缓解方法

偏见检测

偏见缓解

法律与伦理后果

文献

参见

注释

Navigation menu

Generation bias (LLM) — 生成中的偏见

LLM中的偏见类型

性别偏见

种族与族裔偏见

政治与意识形态偏见

偏见的产生机制

检测与缓解方法

偏见检测

偏见缓解

法律与伦理后果

文献

参见

注释

Navigation menu

Search