YandexGPT (language model)

YandexGPT (Yet another GPT) is a family of large language models developed by Yandex and first introduced in May 2023.^[1] YandexGPT models are used in the Alisa voice assistant, Yandex Search, and other services, and are also available via the public API of the Yandex Cloud platform.^[2]

YaLM-100B (2022) was a preceding open-source research model with 100 billion parameters. It served as a "proof of concept," but YandexGPT was developed separately for commercial use.^[3]

Release History

Major Versions
Date	Release	Key Features
Jun 2022	YaLM-100B	100B parameters, 1.7 TB of data; Apache 2.0.^[3]
May 17, 2023	YandexGPT 1.0	Integration into Alisa.^[1]
Sep 7, 2023	YandexGPT 2	+67% quality improvement based on internal benchmarks.^[4]
Mar 28, 2024	YandexGPT 3 Pro / Lite	New enterprise API lineup.^[5]
Oct 24, 2024	YandexGPT 4 Pro / Lite	32,000-token context; hidden reasoning (chain-of-thought).^[6]
Feb 25, 2025	YandexGPT 5 Pro	Parity with GPT-4o in 64% of tasks.^[7]
Mar 31, 2025	YandexGPT 5 Lite Instruct	8-billion parameter model released open-source; Llama format.^[8]

Architecture and Training

Base architecture: Transformer, optimized for the Russian language.
YandexGPT 5 Lite: Llama-compatible; pre-training ≈ 15 trillion tokens, subsequent fine-tuning ≈ 320 billion.^[8]

Context and Limits

Architectural context limit: 32,000 tokens (versions 4/5).^[6]
The public API limits a single request (prompt + completion) to 7,400 tokens.^[9]
The maximum **response** size is 2,000 tokens, according to the "Quotas and limits" section.^[10]

Current Models (June 2025)

Model	Parameters	Context	License	Notes
YandexGPT 5 Pro	N/A	32,000	Proprietary	Available via API and Alisa Pro.^[7]
YandexGPT 5 Lite	8 billion	32,000	Yandex GPT-Lite License	Open-source; Llama-compatible.^[8]
YaLM-100B	100 billion	2,048	Apache 2.0	Original project.^[3]

Benchmarks

Internal tests: 5 Pro achieved parity with GPT-4o in 64% of tasks; performance improvement over 4 Pro is 67%.^[7]
ru-LLM Arena: YandexGPT holds the leading position in ELO rating among Russian-language models.^[11]

Fine-tuning

The LoRA method is officially supported for 5 Lite; a usage example is published in the model card.^[8]

API Modes

Synchronous — for fast responses (Lite).
Asynchronous — for resource-intensive tasks (Pro).^[2]

Multimodality

The YandexGPT family remains text-based; multimodal services ("Neuro", "YandexArt", "Yandex Vision") are developed separately.^[6]

Links

YandexGPT in Yandex Cloud — service page
YandexGPT-5 Lite weights on Hugging Face
YaLM-100B repository on GitHub

Literature

Matkin, N. et al. (2024). Comparative Analysis of Encoder-Based NER and Large Language Models for Skill Extraction from Russian Job Vacancies. arXiv:2407.19816.
Tsanda, A.; Bruches, E. (2024). Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers. arXiv:2405.07886.
Goloburda, M. et al. (2025). Qorǵau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts. arXiv:2502.13640.
Togmanov, M. et al. (2025). KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan. arXiv:2502.12829.
Noels, S. et al. (2025). What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices. arXiv:2504.03803.

Notes

↑ ^1.0 ^1.1 "Yandex adds ChatGPT analog to Alisa". RBC. [1]
↑ ^2.0 ^2.1 "Getting started with YandexGPT (Quickstart)". Yandex Cloud Docs. [2]
↑ ^3.0 ^3.1 ^3.2 "yandex/YaLM-100B: Pretrained language model with 100B". GitHub. [3]
↑ "How Yandex decided to monetize its ChatGPT analog". RBC. [4]
↑ "Yandex introduced the third generation of YandexGPT neural networks". RBC. [5]
↑ ^6.0 ^6.1 ^6.2 "A more powerful family of YandexGPT 4 models". Habr. [6]
↑ ^7.0 ^7.1 ^7.2 "Yandex integrates YandexGPT 5 Pro into the Alisa Pro chat". AdIndex. [7]
↑ ^8.0 ^8.1 ^8.2 ^8.3 "yandex/YandexGPT-5-Lite-8B-pretrain". Hugging Face. [8]
↑ "ChatYandexGPT API Reference (max_tokens = 7400)". LangChain Docs. [9]
↑ "Yandex Cloud service quotas and limits → Foundation Models". Yandex Cloud Docs. [10]
↑ "llmarena/llmarena — a Russian crowdsourcing platform for LLM evaluation". GitHub. [11]

[rbc-may17-1] 1.0 ^1.1 "Yandex adds ChatGPT analog to Alisa". RBC. [1]

[cloud-api-2] 2.0 ^2.1 "Getting started with YandexGPT (Quickstart)". Yandex Cloud Docs. [2]

[github-yalm-3] 3.0 ^3.1 ^3.2 "yandex/YaLM-100B: Pretrained language model with 100B". GitHub. [3]

[rbc-sep8-4] "How Yandex decided to monetize its ChatGPT analog". RBC. [4]

[rbc-mar28-5] "Yandex introduced the third generation of YandexGPT neural networks". RBC. [5]

[habr-gpt4-6] 6.0 ^6.1 ^6.2 "A more powerful family of YandexGPT 4 models". Habr. [6]

[adindex-5pro-7] 7.0 ^7.1 ^7.2 "Yandex integrates YandexGPT 5 Pro into the Alisa Pro chat". AdIndex. [7]

[hf-5lite-8] 8.0 ^8.1 ^8.2 ^8.3 "yandex/YandexGPT-5-Lite-8B-pretrain". Hugging Face. [8]

[langchain-7400-9] "ChatYandexGPT API Reference (max_tokens = 7400)". LangChain Docs. [9]

[cloud-limits-10] "Yandex Cloud service quotas and limits → Foundation Models". Yandex Cloud Docs. [10]

[llmarena-11] "llmarena/llmarena — a Russian crowdsourcing platform for LLM evaluation". GitHub. [11]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]