Top-p sampling — Top-p 采样

Top-p 采样，也称为核采样（英文：Nucleus Sampling），是一种用于大型语言模型（LLM）生成文本的随机解码方法。该方法由 Ari Holtzman 等人于 2019 年提出，作为固定 Top-k 采样的一种改进替代方案。其思想是在生成的每一步，根据累积概率阈值 $p$ 动态选择候选集。^[1]

概念

Top-p 的核心思想是在每一步选择包含最高概率 token 的最小集合，使其总概率不小于给定的阈值 $p$ （即“核”，英文：nucleus）。数学上，对于词汇表 $V$ 上的条件分布 $P (x ∣ x_{1 : i - 1})$ ，其核 $V^{(p)}$ 可以定义如下：

\sum_{x \in V^{(p)}} P (x ∣ x_{1 : i - 1}) \geq p 且 \forall S \subset V^{(p)} : \sum_{x \in S} P (x ∣ x_{1 : i - 1}) < p .

等价的表述是：将所有 token 按 $P (x ∣ x_{1 : i - 1})$ 降序排列，并选取累积概率质量 ≥ $p$ 的最短前缀。^[1]

确定核之后， $V^{(p)}$ 之外的 token 的概率被置为零，而核内的概率则进行归一化（使其总和为 1）。下一个 token 从这个截断后的分布中采样。

动态调整

当分布“尖锐”（模型非常确定）时，核很小：少数几个 token 的概率质量就已 ≥ $p$ ，这可以提高文本的连贯性。
当分布“平坦”（有许多合理的续写选项）时，核很大：选择范围扩大，从而增加多样性。^[1]

与其他解码方法的比较

Top-p vs. Top-k - Top-p 与 Top-k 对比

Top-k 总是从固定数量 $k$ 个最可能的 token 中进行选择。在“尖锐”分布中，这可能会为了凑数而加入不必要的低概率选项；而在“平坦”分布中，则可能切掉未进入前 $k$ 名的合理续写。
Top-p 根据当前步骤的数据调整候选集的大小，使其在不同类型的分布上表现得更灵活、更稳定。^[1]

Top-p vs. Temperature - Top-p 与 Temperature 对比

温度（temperature）会重塑整个分布的形状（使其更尖锐或更平滑），但不会剔除任何 token：即使是概率极低的选项也保留了非零的被选中机会。^[2]
Top-p 引入了对分布尾部的硬截断——低概率的 token 会被完全排除在采样之外，这有助于防止生成明显不合适的续写。^[1]

服务提供商的实践建议是：在调整风格或随机性时，通常只修改 `temperature` 或 `top_p`，而不是同时修改两者，以避免对分布产生“双重”影响，并简化调试过程。^[3]

实际应用与建议

Top-p 因其灵活性和可控性的结合，在现代 LLM 中得到广泛应用。

典型取值范围。实践中，常使用 p≈0.90–0.95 的值（参见 Transformers 的指南和示例；在许多 SDK 中，0.95 作为“默认值”或推荐值出现）。^[2]^[4]
- 接近 1.0 的值（如 0.98–0.99）会增加多样性，因为更多 token 会被纳入核中。
- 较小的值（如 0.80–0.90）会提高输出的确定性和“保守性”。
- 当 $p = 1$ 时，截断消失：在整个词汇表中进行选择（同时考虑温度的影响）。^[2]

与库和 API 的兼容性。
- Transformers 中实现了 TopPLogitsWarper，其中额外使用 `min_tokens_to_keep` 阈值（通常 ≥1），以防止在 $p$ 值极小和分布“尖锐”时核发生退化。^[5]
- 在一些 API 中，`top_p` 参数可用，而 `top_k` 可能不存在；参数的支持情况及其语义取决于具体的模型/提供商（例如，部分推理模型可能会限制对随机性的调整）。请参考 OpenAI/Azure/Google 的官方文档。^[6]^[3]^[4]

长文本与重复性。一系列实验表明，与贪心搜索/集束搜索（greedy/beam）和固定的 Top-k 相比，核采样能减少文本退化（重复、套话）的倾向，尤其是在长序列上。^[1]^[7]

参考文献

Holtzman, A. et al. (2020). The Curious Case of Neural Text Degeneration. arXiv:1904.09751.
Fan, A. et al. (2018). Hierarchical Neural Story Generation. arXiv:1805.04833.
Meister, C. et al. (2023). Locally Typical Sampling. arXiv:2202.00666.
Su, Y.; Collier, N. (2022). Contrastive Search Is What You Need for Neural Text Generation. arXiv:2210.14140.
O’Brien, S.; Lewis, M. (2023). Contrastive Decoding Improves Reasoning in Large Language Models. arXiv:2309.09117.
Yu, S. et al. (2023). Conformal Nucleus Sampling. ACL Findings 2023.
Tan, Q. et al. (2024). A Thorough Examination of Decoding Methods in the Era of Large Language Models. arXiv:2402.06925.
Finlayson, M. et al. (2024). Basis‑Aware Truncation Sampling for Neural Text Generation. arXiv:2412.14352.
Chen, S. J. et al. (2025). Decoding Game: On Minimax Optimality of Heuristic Text Generation Methods. arXiv:2410.03968.
Sen, J. et al. (2025). Advancing Decoding Strategies: Enhancements in Locally Typical Sampling for LLMs. arXiv:2506.05387.

注释

↑ ^1.0 ^1.1 ^1.2 ^1.3 ^1.4 ^1.5 Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2019). The Curious Case of Neural Text Degeneration. arXiv:1904.09751. [1]
↑ ^2.0 ^2.1 ^2.2 Hugging Face Transformers. Generation strategies (top‑k, top‑p, temperature). [2]
↑ ^3.0 ^3.1 Microsoft Learn (Azure OpenAI). Text/Chat Completions — parameters. Recommendation to "alter temperature or top_p but not both". [3]
↑ ^4.0 ^4.1 Google AI / Vertex AI. Generation parameters (topP/topK) for text/Gemini. Examples with topP≈0.95. [4] [5]
↑ Transformers API. TopPLogitsWarper (parameters and behavior, including `min_tokens_to_keep`). [6]
↑ OpenAI API Reference. top_p. [7]
↑ Tan, Q. et al. (2024). A Thorough Examination of Decoding Methods in the Era of Large Language Models. arXiv:2402.06925. [8]

另见

温度
大型语言模型

[holtzman2019-1] 1.0 ^1.1 ^1.2 ^1.3 ^1.4 ^1.5 Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2019). The Curious Case of Neural Text Degeneration. arXiv:1904.09751. [1]

[hf-gen-2] 2.0 ^2.1 ^2.2 Hugging Face Transformers. Generation strategies (top‑k, top‑p, temperature). [2]

[azure-rec-3] 3.0 ^3.1 Microsoft Learn (Azure OpenAI). Text/Chat Completions — parameters. Recommendation to "alter temperature or top_p but not both". [3]

[vertex-params-4] 4.0 ^4.1 Google AI / Vertex AI. Generation parameters (topP/topK) for text/Gemini. Examples with topP≈0.95. [4] [5]

[hf-warp-5] Transformers API. TopPLogitsWarper (parameters and behavior, including `min_tokens_to_keep`). [6]

[openai-top-p-6] OpenAI API Reference. top_p. [7]

[tan2024-7] Tan, Q. et al. (2024). A Thorough Examination of Decoding Methods in the Era of Large Language Models. arXiv:2402.06925. [8]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Top-p sampling — Top-p 采样

Contents

概念

动态调整

与其他解码方法的比较

Top-p vs. Top-k - Top-p 与 Top-k 对比

Top-p vs. Temperature - Top-p 与 Temperature 对比

实际应用与建议

参考文献

注释

另见

Navigation menu

Top-p sampling — Top-p 采样

概念

动态调整

与其他解码方法的比较

Top-p vs. Top-k - Top-p 与 Top-k 对比

Top-p vs. Temperature - Top-p 与 Temperature 对比

实际应用与建议

参考文献

注释

另见

Navigation menu

Search