YandexGPT (language model)
Jump to navigation
Jump to search
YandexGPT (Yet another GPT) is a family of large language models developed by Yandex and first introduced in May 2023.[1] YandexGPT models are used in the Alisa voice assistant, Yandex Search, and other services, and are also available via the public API of the Yandex Cloud platform.[2]
YaLM-100B (2022) was a preceding open-source research model with 100 billion parameters. It served as a "proof of concept," but YandexGPT was developed separately for commercial use.[3]
Release History
| Date | Release | Key Features |
|---|---|---|
| Jun 2022 | YaLM-100B | 100B parameters, 1.7 TB of data; Apache 2.0.[3] |
| May 17, 2023 | YandexGPT 1.0 | Integration into Alisa.[1] |
| Sep 7, 2023 | YandexGPT 2 | +67% quality improvement based on internal benchmarks.[4] |
| Mar 28, 2024 | YandexGPT 3 Pro / Lite | New enterprise API lineup.[5] |
| Oct 24, 2024 | YandexGPT 4 Pro / Lite | 32,000-token context; hidden reasoning (chain-of-thought).[6] |
| Feb 25, 2025 | YandexGPT 5 Pro | Parity with GPT-4o in 64% of tasks.[7] |
| Mar 31, 2025 | YandexGPT 5 Lite Instruct | 8-billion parameter model released open-source; Llama format.[8] |
Architecture and Training
- Base architecture: Transformer, optimized for the Russian language.
- YandexGPT 5 Lite: Llama-compatible; pre-training ≈ 15 trillion tokens, subsequent fine-tuning ≈ 320 billion.[8]
Context and Limits
- Architectural context limit: 32,000 tokens (versions 4/5).[6]
- The public API limits a single request (prompt + completion) to 7,400 tokens.[9]
- The maximum **response** size is 2,000 tokens, according to the "Quotas and limits" section.[10]
Current Models (June 2025)
| Model | Parameters | Context | License | Notes |
|---|---|---|---|---|
| YandexGPT 5 Pro | N/A | 32,000 | Proprietary | Available via API and Alisa Pro.[7] |
| YandexGPT 5 Lite | 8 billion | 32,000 | Yandex GPT-Lite License | Open-source; Llama-compatible.[8] |
| YaLM-100B | 100 billion | 2,048 | Apache 2.0 | Original project.[3] |
Benchmarks
- Internal tests: 5 Pro achieved parity with GPT-4o in 64% of tasks; performance improvement over 4 Pro is 67%.[7]
- ru-LLM Arena: YandexGPT holds the leading position in ELO rating among Russian-language models.[11]
Fine-tuning
The LoRA method is officially supported for 5 Lite; a usage example is published in the model card.[8]
API Modes
- Synchronous — for fast responses (Lite).
- Asynchronous — for resource-intensive tasks (Pro).[2]
Multimodality
The YandexGPT family remains text-based; multimodal services ("Neuro", "YandexArt", "Yandex Vision") are developed separately.[6]
Links
- YandexGPT in Yandex Cloud — service page
- YandexGPT-5 Lite weights on Hugging Face
- YaLM-100B repository on GitHub
Literature
- Matkin, N. et al. (2024). Comparative Analysis of Encoder-Based NER and Large Language Models for Skill Extraction from Russian Job Vacancies. arXiv:2407.19816.
- Tsanda, A.; Bruches, E. (2024). Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers. arXiv:2405.07886.
- Goloburda, M. et al. (2025). Qorǵau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts. arXiv:2502.13640.
- Togmanov, M. et al. (2025). KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan. arXiv:2502.12829.
- Noels, S. et al. (2025). What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices. arXiv:2504.03803.
Notes
- ↑ 1.0 1.1 "Yandex adds ChatGPT analog to Alisa". RBC. [1]
- ↑ 2.0 2.1 "Getting started with YandexGPT (Quickstart)". Yandex Cloud Docs. [2]
- ↑ 3.0 3.1 3.2 "yandex/YaLM-100B: Pretrained language model with 100B". GitHub. [3]
- ↑ "How Yandex decided to monetize its ChatGPT analog". RBC. [4]
- ↑ "Yandex introduced the third generation of YandexGPT neural networks". RBC. [5]
- ↑ 6.0 6.1 6.2 "A more powerful family of YandexGPT 4 models". Habr. [6]
- ↑ 7.0 7.1 7.2 "Yandex integrates YandexGPT 5 Pro into the Alisa Pro chat". AdIndex. [7]
- ↑ 8.0 8.1 8.2 8.3 "yandex/YandexGPT-5-Lite-8B-pretrain". Hugging Face. [8]
- ↑ "ChatYandexGPT API Reference (max_tokens = 7400)". LangChain Docs. [9]
- ↑ "Yandex Cloud service quotas and limits → Foundation Models". Yandex Cloud Docs. [10]
- ↑ "llmarena/llmarena — a Russian crowdsourcing platform for LLM evaluation". GitHub. [11]