Stop sequences (language models)

From Systems Analysis Wiki
Jump to navigation Jump to search

Stop Sequence in the context of large language models (LLMs) is a special sequence of characters or tokens that signals the model to stop generating text[1]. This mechanism is a crucial component of autoregressive language models, ensuring controlled and predictable completion of the response.

When a stop sequence is used, the model checks at each generation step whether the already generated text ends with one of the specified sequences. If a match is found, the process immediately terminates, and the stop sequence itself is not included in the final output[2]. This allows a developer to precisely control the boundaries of the response without altering the prompt itself.

Basic Principles of Operation

In autoregressive language models, text generation occurs sequentially, token by token. At each step, the model predicts the next token based on the entire preceding sequence (the input prompt and the text already generated). Mathematically, this is expressed as a conditional probability:

P(yty<t,x)

where yt is the current token being generated, y<t is the sequence of previously generated tokens, and x is the input sequence[3].

The stop sequence mechanism functions as an external criterion for interrupting this iterative process.

Types of Stop Sequences

There are several main types of stopping mechanisms that can be used either individually or in combination.

1. End-of-Sequence (EOS) Tokens

End-of-Sequence (EOS) tokens are special tokens (e.g., `<|endoftext|>`) built into the model's vocabulary, designed to signify the end of a logical text segment. The model is trained to generate an EOS token when it considers the response complete, as all texts in the training dataset end with this token[4]. Upon detection of an EOS token, generation automatically stops.

Research indicates that the presence of EOS tokens influences the attention architecture: models develop internal position-counting mechanisms, which, however, can limit their ability to extrapolate to sequences significantly longer than the training examples[5].

2. Custom Sequences

These are arbitrary strings that a developer specifies for a particular task. They are not part of the model's vocabulary but are tracked at the character level. Examples include:

  • Newline characters: `\n` or `\n\n` to stop after a paragraph.
  • Contextual markers: `Human:`, `User:`, or `Q:` to separate turns in a dialogue.
  • Special markers: `###`, `</output>`, or `END`.

3. Structural Sequences

These are specialized markers used to terminate specific structural elements, which is critical when generating formatted content[1]:

  • Code: Triple backticks (```) to terminate a code block.
  • JSON/XML: Closing brackets (`}`) or tags (`</element>`).

Technical Implementation and Challenges

Effective detection of a stop sequence is a non-trivial task associated with several complexities.

Detection Algorithm and Optimization

The detection process in real-world systems includes:

  1. Check at each step: After generating each new token, the system checks if the current output ends with one of the specified stop sequences.
  2. Handling partial matches: The system must track situations where part of a sequence has been generated, but a full match has not yet occurred.
  3. Multi-criteria check: Most systems (e.g., the OpenAI API) allow tracking several (up to four) stop sequences simultaneously[2].

In the Hugging Face Transformers framework, this is implemented through the abstract `StoppingCriteria` class, which allows for the creation of custom stopping criteria, such as `MaxLengthCriteria` (by length) or `EosTokenCriteria` (by EOS token)[4].

Problems and Limitations

  • Tokenization problem: This is the primary technical challenge. The same sequence of characters (e.g., `\nUser:`) can be split into tokens differently depending on the context. This complicates reliable detection, as a stop sequence might be divided across multiple tokens[5].
  • Performance: Checking for multiple long stop sequences at each step can slow down generation, especially when working with long sequences in real time.
  • False positives: A specified sequence might accidentally appear in the middle of the desired response, leading to premature termination. Therefore, it is important to choose sufficiently unique and specific markers (e.g., `\n###\n`)[6].

Applications and Use Cases

Stop sequences are a powerful tool for controlling the behavior of LLMs.

  • Controlling length and cost: They allow limiting the maximum response size and, consequently, reducing token consumption, which is important when using paid APIs.
  • Dialogue systems: Used to clearly separate turns between speakers, preventing the model-assistant from generating a response on behalf of the user.
  • Generating structured content: Indispensable for obtaining correct output in formats like JSON, XML, or when writing code, preventing the addition of extraneous information after the structure is complete[7].
  • Preventing undesirable behavior: Help to interrupt generation when repetitive or incorrect content (hallucinations) appears.
  • Training and fine-tuning: In training datasets, unique markers (e.g., `###`) are often used as stop sequences to teach the model to end the response at the correct point[6].

Current Research Directions

  • Adaptive stopping criteria: Development of methods that dynamically determine the termination point based on the context and quality of the generated text.
  • Entropy-based approaches: Using the entropy of the token distribution as a criterion. High entropy can indicate model uncertainty and serve as a signal to stop generation.

Literature

  • Sutskever, I.; Vinyals, O.; Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. arXiv:1409.3215.
  • Vaswani, A. et al. (2017). Attention Is All You Need. arXiv:1706.03762.
  • Keskar, N. S. et al. (2019). CTRL: A Conditional Transformer Language Model for Controllable Generation. arXiv:1909.05858.
  • Holtzman, A. et al. (2020). The Curious Case of Neural Text Degeneration. arXiv:1904.09751.
  • Brown, T. et al. (2020). Language Models are Few-Shot Learners. arXiv:2005.14165.
  • Zong, M.; Krishnamachari, B. (2022). A Survey on GPT-3. arXiv:2212.00857.
  • Zhao, Y. et al. (2022). Calibrating Sequence Likelihood Improves Conditional Language Generation. arXiv:2210.00045.
  • Hu, J. C.; Cavicchioli, R.; Capotondi, A. (2023). A Request for Clarity over the End-of-Sequence Token in the Self-Critical Sequence Training. arXiv:2305.12254.
  • Zhu, W. et al. (2024). Improving Open-Ended Text Generation via Adaptive Decoding. arXiv:2402.18223.
  • Zhang, H. et al. (2024). Adaptable Logical Control for Large Language Models. arXiv:2406.13892.
  • Suh, Y. J. et al. (2025). The Curious Case of Sequentially Mis-calibrated Language Models. arXiv:2205.11916.

Notes

  1. 1.0 1.1 “Stop Sequence: Understanding & Setting It Correctly”. Promptitude.io Help Center. [1]
  2. 2.0 2.1 “How do I use stop sequences in the OpenAI API?”. OpenAI Help Center. [2]
  3. “How to use stop sequences?”. Vellum. [3]
  4. 4.0 4.1 Brown, Tom, et al. “A Survey on GPT-3”. arXiv:2212.00857 [cs.CL], 1 Dec. 2022. [4]
  5. 5.0 5.1 Suh, Y. J., et al. “The Curious Case of Sequentially Mis-calibrated Language Models”. arXiv:2205.11916 [cs.CL], 24 May 2022. [5]
  6. 6.0 6.1 Eric, Mihail. “How to Finetune GPT3”. mihaileric.com. [6]
  7. Corin, Daniel. “Way Enough - Cursor Triple Backticks Stop Sequence”. danielcorin.com. [7]