Gemini (Google)

Google Gemini is a family of multimodal large language models (LLMs) developed by Google DeepMind. The Gemini models, first introduced in December 2023, are built on a neural network Transformer architecture with native support for processing and generating data across multiple modalities, including text, images, audio, video, and code.

As of February 2026, the current generation is the Gemini 3.x lineup. Architectural development is focused on integrating scalable inference-time reasoning mechanisms (inference-time scaling) and optimizing models for use within autonomous agentic systems (Agentic AI). The Gemini app has over 750 million monthly active users.

Naming and philosophy

The name "Gemini" (Latin for Twins) symbolizes the merger of two leading Google research groups — Google Brain and DeepMind — to create this project. Jeff Dean, co-technical lead of Google DeepMind, confirmed this in an official blog post (May 2024): "The twins here are the folks in the legacy Brain team and the legacy DeepMind team." The project was originally codenamed "Titan"; Dean proposed the name "Gemini" in April 2023 — the same month as the formal merger of Google Brain and DeepMind. The name also references NASA's Gemini program (1965–1968), whose role in preparing for the Apollo program resonated with the development team.

A key feature and philosophical foundation of Gemini is native multimodality. Unlike many previous models, where multimodal capabilities were layered on top of an existing text-based foundation, Gemini was designed from the ground up for simultaneous understanding, manipulation, and combination of different types of information. The Gemini 1.0 technical report (arXiv:2312.11805) confirms that the model was "trained jointly across image, audio, video, and text data." This enables the model not merely to translate data between modalities, but to form a deeper, holistic understanding of them.

Architecture and key technologies

The capabilities of Gemini models are defined by a number of fundamental architectural decisions. Google does not publish the complete low-level design of all internal Gemini components; however, public sources establish the architecture class: all models from the 1.5 family onward are sparse mixture-of-experts transformer-based models with native multimodal support (confirmed by the Gemini 2.5 Flash model card).

Native multimodal architecture

Gemini's architecture is based on the concept of early fusion. Image pixel patches, video temporal frames, audiograms, and text tokens are projected into a unified latent space. The Gemini 2.5 technical report describes this approach as "Unified Multimodal Token Interleaving." Since all tokens from different modalities are processed within a shared sequence, standard self-attention mechanisms naturally provide cross-modal integration of data from different modalities at every layer. Audio signals are processed by specialized encoders directly from the raw waveform, preserving acoustic characteristics (intonation, timbre, background noise) that are lost when using intermediate Speech-to-Text transcription systems.

For the transformer class, the fundamental operation is the attention mechanism:

$A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{⊤}}{\sqrt{d_{k}}}) V$

where $Q$ is the query matrix, $K$ is the key matrix, $V$ is the value matrix, and $d_{k}$ is the key dimensionality.

Sparse Mixture-of-Experts (MoE)

Starting with version 1.5, Gemini models use a Sparse Mixture-of-Experts (MoE) architecture. Gemini 1.0 used a dense transformer; the transition to MoE is explicitly described in the 1.5 technical report: "This is our first release from Gemini 1.5, a new family… which incorporates a novel mixture-of-experts architecture."

In the MoE architecture, standard feed-forward network (FFN) layers are replaced by a set of specialized sub-networks — "experts." For an input token $x \in ℝ^{d}$ , the output $y$ is computed as the weighted sum of outputs from $k$ active experts ( $k ≪ E$ , where $E$ is the total number of experts):

$y = \sum_{i \in 𝒯_{k} (x)} g_{i} (x) E_{i} (x)$

where $E_{i} (x)$ is the nonlinear function of the $i$ -th expert, $𝒯_{k} (x)$ is the index set of $k$ selected sub-networks, and the routing weight $g_{i} (x)$ is computed by a learned routing function applying a Softmax function over the top- $k$ values.

This approach allows the total parametric capacity of the model to be significantly increased while keeping computational costs (FLOPs) low, since only a subset of parameters is activated for each token. Google has not disclosed the actual parameter count of Gemini models.

Long context and in-context learning

Gemini 1.5 achieved a breakthrough by expanding the context window to 1 million tokens in production mode (with experimental testing up to 10 million tokens). This is an order of magnitude larger than previous models (e.g., GPT-4 Turbo with 128K tokens). Google reported a 99% score on the Needle In A Haystack test at a context length of 1 million tokens. For subsequent generations, long context became one of the key features of the lineup. This large-scale context enables the model to:

Analyze entire books, hours-long videos (up to 3 hours), or large codebases within a single query.
Perform in-context learning on vast amounts of data provided in the prompt, enabling highly customized responses without the need for fine-tuning.

Thinking models and inference-time compute scaling

Starting with Gemini 2.5, Google designates thinking as a separate operating mode. Official documentation defines it as an internal computational process that improves multi-step planning and reasoning. Version 2.5 models (described as "thinking models") are capable of internally generating and evaluating intermediate reasoning steps before producing a final answer. This significantly improves accuracy on complex logical and mathematical tasks.

It is important to distinguish between two mechanisms:

Built-in Thinking: The base mode for 2.5 and 3-series models, generating a hidden chain-of-thought (CoT). The API can return thought summaries — brief summaries of internal reasoning rather than the full stream of raw "thoughts." Starting with the 3.1 Pro model, the thinking budget is controlled by the thinking_level parameter with values from Low to Max.
Deep Think: A separate experimental enhanced reasoning mode that uses parallel hypothesis generation and requires significantly greater computational resources. It was announced at Google I/O on May 20, 2025, and made available to AI Ultra subscribers on August 1, 2025. Deep Think should not be conflated with the base thinking mechanism.

Agentic capabilities

Starting with version 2.0, Gemini can interact with the external world: invoke tools, perform Google Search, execute code, and control UI elements. Google explicitly positioned Gemini 2.0 as a model for the "new agentic era" with native tool use support.

As of February 2026, the Gemini API includes a formally established agentic capabilities layer with support for tools: Google Search, Google Maps, Code Execution, URL Context, Computer Use, File Search, and Live API for bidirectional real-time interaction.

Evolution of Gemini models

The Gemini family evolves at an exceptionally rapid pace: four major model generations were released between December 2023 and February 2026.

Gemini 1.0 (December 2023)

The first generation, establishing the foundation of native multimodality. Publicly introduced on December 6, 2023.

Versions: Ultra (flagship for the most complex tasks), Pro (general-purpose model), and Nano (compact for mobile devices; subdivided into Nano-1 with 1.8B parameters and Nano-2 with 3.25B).
Context window: 32,768 tokens for all versions.
Achievements: Gemini 1.0 Ultra became the first model to reach and surpass human-expert performance on the MMLU benchmark with a score of 90.04% (using the CoT@32 technique — chain-of-thought with 32 sampled chains and majority voting; under standard 5-shot prompting, the score was approximately 83.7%). It achieved SOTA results on 30 out of 32 academic benchmarks.
Deprecation: Gemini 1.0 Pro was deprecated on February 18, 2025.

Gemini 1.5 (February–May 2024)

A breakthrough in context length and efficiency.

Architecture: Transition from dense transformer to Mixture-of-Experts (MoE).
Context window: Up to 1 million tokens in production (2 million via waitlist for 1.5 Pro, announced at Google I/O in May 2024).
Versions: 1.5 Pro (announced February 2024; quality on par with 1.0 Ultra at significantly lower cost) and 1.5 Flash (lightweight and fast version, added May 2024).
Deprecation: All Gemini 1.5 models (Pro, Flash, Flash-8B) were shut down on September 29, 2025.

Gemini 2.0 (December 2024–February 2025)

The transition to the "agentic era."

Timeline: December 11, 2024 — announcement of 2.0 Flash Experimental (multimodal input, text output); February 5, 2025 — broad availability (GA) of 2.0 Flash, release of 2.0 Pro Experimental and 2.0 Flash-Lite.
Key innovations: Built-in agentic capabilities (tool use), native image and audio generation (initially in limited mode for early-access partners), focus on agentic scenarios.
Context window: Up to 2M tokens (2.0 Pro); up to 1M tokens (2.0 Flash-Lite).
Deprecation: 2.0 Flash and Flash-Lite models are scheduled for shutdown on June 1, 2026.

Gemini 2.5 (March–June 2025)

The first "thinking model" with configurable reasoning budgets.

Timeline: March 25, 2025 — announcement of 2.5 Pro Experimental; April 17 — 2.5 Flash (the first fully hybrid reasoning model with toggleable thinking); May 20 (Google I/O) — updates to 2.5 Pro and Flash, Deep Think announcement; June 17, 2025 — simultaneous GA for 2.5 Pro and 2.5 Flash; same day — 2.5 Flash-Lite preview (GA July 22). August 1 — Deep Think made available to AI Ultra subscribers.
Key innovations: Built-in "thinking" mechanism with configurable budgets; Deep Think as a separate enhanced mode. SOTA results on complex mathematical, logical, and software benchmarks (AIME 2025 — 86.7%, GPQA Diamond — 84.0%, Humanity's Last Exam — 18.8% without tools).
Context window: 1 million tokens input, up to 64,000 tokens output. The promised expansion to 2M tokens for 2.5 Pro was never confirmed as delivered during the model's lifecycle.
Specialized variants: Gemini 2.5 Flash Image (codename "Nano Banana," anonymously appeared on the Arena August 12, officially released August 26, 2025 — went viral for photorealistic "3D figurine" images, attracting 10 million new users); Computer Use Preview (October 7, 2025, based on 2.5 Pro); Text-to-Speech models (2.5 Flash TTS, 2.5 Pro TTS).
Technical report: The combined Gemini 2.X report was published on arXiv on July 7, 2025 (arXiv:2507.06261), listing over 3,300 authors and covering models 2.5 Pro, 2.5 Flash, 2.0 Flash, and 2.0 Flash-Lite.

Gemini 3.x (November 2025–February 2026)

The third generation marked the transition from basic generation to long-running agentic workflows and interdisciplinary scientific problem-solving.

Gemini 3 Pro (November 18, 2025): Announced by Alphabet CEO Sundar Pichai and DeepMind CEO Demis Hassabis as "Google's most intelligent model." The first Gemini model deployed to Google Search on launch day. Became the first model to break the 1,500 Elo barrier on LMArena (1,501 at launch). Results: GPQA Diamond — 91.9%; SWE-bench Verified — 76.2%; Humanity's Last Exam — 37.5% (without tools); SimpleQA — 72.1%.
Gemini 3 Flash (December 17, 2025): Became the default model in the Gemini app. At a price of $0.50/1M input tokens, it outperformed 3 Pro on SWE-bench Verified (78%) while using 30% fewer tokens on reasoning tasks. GPQA Diamond — 90.4%; HLE — 33.7%.
Gemini 3.1 Pro (February 19, 2026): The flagship model as of the publication date. The first incremental ".1" release (previous generations used .5 intervals). Key result — ARC-AGI-2: 77.1% (more than double 3 Pro's 31.1%). AIME 2025 — 91.2%; GPQA Diamond — 94.3%; SWE-bench Verified — 80.6%. Introduced a new MEDIUM thinking level via the thinking_level parameter. Dedicated endpoint gemini-3.1-pro-preview-customtools for bash terminal and custom function use. Resolved output truncation issues on long generations. Channels: Gemini App, Vertex AI, AI Studio, Gemini API, NotebookLM.
Gemini 3 Deep Think (updated February 12, 2026): A major update to the specialized "thinking" mode. Expanded beyond mathematics and programming: gold-medal-level results on the 2025 International Physics Olympiad (IPhO) and International Chemistry Olympiad (IChO); ARC-AGI-2 — 84.6%; Humanity's Last Exam — 48.4%; CMT-Benchmark (condensed matter theoretical physics) — 50.5%; Codeforces Elo — 3,455. The Deep Think–based research agent Aletheia autonomously solved several open problems from the Erdős problem collection (including the Erdős-1051 conjecture).

Summary table of Gemini generations

Evolution of key characteristics of Gemini models
Generation	Release year	Key versions	Max context window	Key architectural innovations and improvements
Gemini 1.0	2023	Ultra, Pro, Nano	32,768 tokens	Native multimodality from scratch; dense transformer; surpassing human expert on MMLU (90.04% CoT@32).
Gemini 1.5	2024	Pro, Flash	1,000,000 tokens (2M via waitlist)	Mixture-of-Experts (MoE) architecture; revolutionary context expansion; 99% Needle In A Haystack.
Gemini 2.0	2024–2025	Pro, Flash, Flash-Lite	1,000,000–2,000,000 tokens	The "agentic AI" era: native tool integration, image and audio generation, Live API.
Gemini 2.5	2025	Pro, Flash, Flash-Lite	1,000,000 tokens (input), 64,000 (output)	"Thinking model"; configurable reasoning budgets; Deep Think; image generation (Nano Banana); Computer Use.
Gemini 3.x	2025–2026	3 Pro, 3 Flash, 3.1 Pro, 3 Deep Think	1,000,000 tokens	Agentic workflows; thinking_level parameter; breakthroughs on ARC-AGI-2 and science olympiads; Aletheia.

Key results and benchmarks

As classical benchmarks (such as MMLU) have become saturated, evaluation of Gemini model performance has shifted toward abstract reasoning, scientific modeling, and autonomous software engineering tasks. Results are reported from official Google data (self-reported); comparisons are valid only when inference mode, tool use presence, sampling method (single-attempt vs. majority voting), and specific model-id all match.

Gemini model results on key benchmarks (data as of February 2026)
Benchmark	Task description	Gemini 2.5 Pro (Jun 2025)	Gemini 3 Pro (Nov 2025)	Gemini 3.1 Pro (Feb 2026)	Gemini 3 Deep Think (Feb 2026)
MMLU	Multitask language understanding	—	—	—	—
GPQA Diamond	PhD-level science questions	84.0%	91.9%	94.3%	N/A
Humanity's Last Exam	Frontier domain knowledge	18.8%	37.5%	44.4%	48.4%
ARC-AGI-2	Abstract logical puzzles	4.9%	31.1%	77.1%	84.6%
SWE-bench Verified	Autonomous GitHub issue resolution	63.8%*	76.2%	80.6%	N/A
AIME 2025	Olympiad-level math problems	86.7%	—	91.2%	—
Codeforces (Elo)	Competitive programming rating	—	—	2,887	3,455

* The 2.5 Pro result on SWE-bench was obtained with a custom agent setup.

LMArena rankings (late February 2026 snapshot)

LMArena (formerly Chatbot Arena) is an independent platform for blind pairwise voting. Rankings are dynamically recalculated; values at a model's launch date may differ from current ones.

Overall (snapshot: February 24, 2026)
Model	Rating	Rank	Votes	Note
Gemini 3.1 Pro Preview	1,500 ± 9	#3	4,060	Preliminary
Gemini 3 Pro	1,486 ± 4	#5	37,854
Gemini 3 Flash	1,473 ± 5	#7	28,847
Gemini 2.5 Pro	1,464 ± 3	#9	97,296
Gemini 2.5 Flash	1,411 ± 3	#64	96,163

At launch on November 18, 2025, Gemini 3 Pro reached a rating of 1,501 Elo, becoming the first model to break the 1,500 barrier on LMArena.

Specialized and agentic systems

The Gemini ecosystem has been extended with models and platforms capable of performing multi-step actions in digital and physical environments.

Autonomous agents

Jules — an autonomous coding agent operating asynchronously in secure cloud virtual machines. It creates branches and pull requests on GitHub. Entered public beta at Google I/O on May 20, 2025 (over 140,000 code improvements during the beta period); GA on August 6, 2025. By late 2025, it had become one of the largest contributors to Google's internal repositories.
Project Mariner — a research prototype of a browser-based agent for multi-step web tasks. Migrated to cloud VMs supporting up to 10 parallel tasks and a "Teach & Repeat" feature. Achieved 83.5% on the WebVoyager benchmark. Computer Use capabilities were ported to the Gemini API.
Google Antigravity — an integrated development environment (IDE) for managing AI agents, introduced in November 2025. Agents autonomously modify code, interact with the terminal and a built-in browser, returning verifiable artifacts (e.g., code diffs) for developer approval.
Aletheia agent — a specialized mathematical research agent built on Gemini 3 Deep Think. Equipped with a natural-language verification module and web-search tools for literature review. In early 2026, it autonomously solved several open mathematical problems from the Erdős collection and co-authored scientific publications.

Consumer AI agents

Phone Automations — integration of an autonomous agent at the Android OS level (beta for Pixel 10 and Samsung Galaxy S26). Operates within a secure sandbox, capable of navigating third-party applications based on visual GUI analysis.
Gemini in Chrome (Auto Browse) — a browser agent for automating multi-step web tasks, available to all Chrome users since September 2025 (updated to Gemini 3 in January 2026).

Computer Use

Gemini 2.5 Computer Use models are optimized for controlling graphical user interfaces (GUIs). The system takes screenshots and action history as input, generating $(x, y)$ coordinates for programmatic cursor simulation and keyboard input commands.

Gemini Robotics

Vision-Language-Action (VLA) and Embodied Reasoning (ER) models introduced in March 2025. These architectures process spatiotemporal information and predict 3D trajectories of robotic manipulators as a native output modality (arXiv:2503.20020).

Specialized generative models (early 2026)

Nano Banana 2 (Gemini 3.1 Flash Image) — released February 26, 2026; a visual model combining Flash-architecture speed with Pro-level quality. Provides strict character consistency across different scenes, native text-in-image generation (typography), and integration of SynthID cryptographic watermarks with C2PA metadata.
Lyria 3 — a music model integrated into the Gemini app on February 18, 2026. Generates 30-second musical compositions (including vocals and instrumentals) from text prompts, uploaded photos, or videos.
Veo 3.1 — a video generation model. Supports creating clips using up to three reference images ("Ingredients to Video"), transition generation between specified first and last frames, native vertical video rendering (9:16), and upscaling to 4K resolution.
Med-Gemini — a domain-specific model for medical tasks (arXiv:2404.18416, arXiv:2405.03162).

Applications and ecosystem

Google deeply integrates Gemini into its consumer and developer products.

Consumer products

Gemini app: A chatbot (formerly Bard, renamed February 8, 2024) using Gemini family models as a universal AI assistant. As of February 2026, it has over 750 million active users. Current rollout includes the 3.1 Pro model. Subscriptions: Google AI Pro ($19.99/month, replaced Google One AI Premium) and Google AI Ultra ($249.99/month, with access to Deep Think, Veo 3, and priority features).
Google Workspace: Gemini integration in Gmail, Docs, Sheets, and Meet for writing assistance, data analysis, and content generation (rebranded from Duet AI).
Google Search: The AI Overviews feature generates summary answers to complex queries using a specialized Gemini model. AI Mode, launched at Google I/O 2025, provides deep search with agentic capabilities (booking, shopping).
Android and Pixel: Gemini Nano (v3 on Pixel 10 with Tensor G5 chip, August 2025) runs locally on smartphones, providing smart replies, summarization, scam call detection, and accessibility features while preserving data privacy. ML Kit GenAI APIs for developers support on-device summarization, proofreading, and speech recognition.
NotebookLM: Evolved from a note-taking tool into a full creative platform. Joined Google Workspace in March 2025. Supports interactive Audio Overviews, Video Overviews, Mind Maps, slides, and infographics. Updated to Gemini 3 in December 2025; full 1M-token context window for chat available from February 2026.
Gemini Live: Camera and screen-sharing features from Project Astra became free for all Android and iOS users.

Developer platforms

Google AI Studio and Gemini API: Primary interfaces for accessing Gemini models via API. As of February 2026, they support capability blocks: Thinking, Thought signatures, Long context, Tools and agents (Google Search, Maps, Code Execution, URL Context, Computer Use, File Search, Deep Research, Live API).
Vertex AI: Enterprise platform with enhanced security and management capabilities.
Google Gen AI SDK: Reached GA for Python, JavaScript/TypeScript, Go, and Java by May 2025, providing unified access to the Gemini Developer API and Vertex AI. Supports Model Context Protocol (MCP).
Gemini CLI: A command-line tool for AI coding in the terminal (launched June 2025).
Interactions API: A unified interface for models and agents (beta since December 2025).

API lifecycle and version management

Gemini models in the API are categorized as stable, preview, latest, and experimental. A specific model_id and a model family are not the same thing; for production scenarios, binding to a specific version and its support timeline is critical. The API documentation maintains a deprecation registry with shutdown dates.

To support long-running autonomous tasks, the following were introduced: Session Resumption (server-side session state storage for up to 24 hours) and Context Compression (a sliding-window mechanism for automatic context compression when limits are exceeded).

In December 2025, Google reduced free-tier API quotas by approximately 92% (without prior notice), causing a sharp reaction from the developer community. Meanwhile, Gemini serving unit costs fell by 78% over 2025 through model optimizations.

Limitations and open problems

Hallucinations and confabulations: Models retain a tendency to generate factually incorrect information, especially when grounding features (Search Grounding) are disabled. Gemini 3.1 Pro reduced hallucination rates on the SimpleQA benchmark compared to previous versions, but the problem remains systemic across all LLMs.
Subconscious plagiarism: Experiments with the Aletheia agent revealed a problem where the model reproduces non-trivial proofs from its training set, presenting them as autonomous discoveries, complicating the validation of novelty in AI research.
Long-context degradation: When processing contexts of 1 million tokens or more, models are subject to the "Lost in the Middle" effect — reduced accuracy in retrieving facts located in the middle of a document.
High computational costs: Inference with maximum Deep Think settings requires significantly more time and resources (TPUs), limiting application in synchronous real-time scenarios.
Over-refusals: Due to strict alignment algorithms, reasoning models tend to reject legitimate requests by falsely classifying them as potentially harmful (especially in the context of code analysis and information security). Model cards also note issues with "preachy" refusal tones.
Reasoning limitations: Model cards for the 2.5 and 3 series list limitations in causal understanding, complex logical deduction, and counterfactual reasoning, as well as incomplete predictability in adhering to thinking budgets.

Ethical aspects and safety

The deployment of Gemini models is accompanied by a multi-layered safety system.

General frameworks

Secure AI Framework (SAIF) is Google's general approach to AI system safety (announced June 2023), forming the development context but not being a Gemini-specific standard. Frontier Safety Framework v3 (September 2025) covers CBRN, cybersecurity, ML R&D, harmful manipulation, and an exploratory approach to misalignment risks.

Gemini-specific measures

Model cards are the primary sources of information on limitations and safety for specific models. They contain sections on Intended Usage and Limitations, Ethics and Content Safety, and Frontier Safety. The Gemini 3 Pro model card confirmed that the model did not reach any Critical Capability Levels (CCLs) across CBRN and cybersecurity domains.
Bias and toxicity testing: Analysis and mitigation of bias in training data and content generation.
Red teaming: Attack simulation to identify vulnerabilities and undesirable behavior. Independent misalignment testing found "some upticks in situational awareness" but no critical risks.

Safety probes

To prevent the generation of harmful content, hidden activation classification is used. To address signal loss in long contexts, the MultiMax architecture is employed: the probe extracts the maximum value across all layers $H$ for each token $j$ in sequence $n_{i}$ :

$f_{MultiMax} (S_{i}) = \sum_{h = 1}^{H} \max_{j \in [n_{i}]} [v_{h}^{⊤} y_{i, j}]$

Probes are combined with base models into cascade classifiers, improving filtering accuracy at low computational cost (arXiv:2601.11516).

Cryptographic watermarking (SynthID)

Audio data generated through the Live API and images (from Nano Banana / Flash Image models) are watermarked using the SynthID algorithm. An invisible watermark is embedded at the pixel or audio spectrum level, enabling machine detection of generated content. The Nano Banana 2 model (February 2026) integrates SynthID with C2PA metadata.

Thinking and the question of transparency

Models with thinking mode (2.5/3 series) can return thought summaries — brief summaries of internal reasoning rather than the full stream of intermediate tokens. This provides a certain level of transparency, but has been criticized for hiding actual "raw" reasoning chains behind simplified summaries.

Regulatory aspects

Under the EU AI Act, Google signed the EU AI Code of Practice (published July 10, 2025) alongside OpenAI and Anthropic. Gemini is classified as a general-purpose AI (GPAI) model with systemic risk, which entails additional safety obligations (effective August 2, 2025).

Competitive landscape

The November–December 2025 period became the most compressed competitive cycle in AI history: Gemini 3 Pro (November 18), Claude Opus 4.5 by Anthropic (November 24), and GPT-5.2 by OpenAI (December 11) were all released within 24 days. As of February 2026, no single model dominates all categories: Gemini 3 Pro leads LMArena in text, vision, search, and multilingual; GPT-5.2 leads in pure math (100% AIME 2025 without tools) and SWE-bench Pro; Claude Opus 4.5 competes on SWE-bench Verified. In terms of API pricing, Gemini is approximately 42% cheaper than GPT-5 for comparable calls.

Business metrics

According to Alphabet's Q4 2025 earnings report (published February 4, 2026): Google Cloud revenue was $17.7 billion for the quarter (+48% year-over-year); operating margin was 29.9%; Cloud backlog reached $240 billion (doubled year-over-year). Over 120,000 enterprises use Gemini. In January 2026, Apple announced plans to integrate Gemini into Siri. Google processes over 10 billion tokens per minute via the API. Google's internal AI agents generate approximately 50% of the company's own code. Capital expenditures for 2026 are planned at $175–185 billion (nearly double 2025's $91.45B).

External links

References

Primary Gemini technical reports

Gemini Team, Google (2023). Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805.
Gemini Team, Google (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv:2403.05530.
Comanici, G. et al. (2025). Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities. arXiv:2507.06261.

Specialized models and applications

Saab, K. et al. (2024). Capabilities of Gemini Models in Medicine. arXiv:2404.18416.
Yang, L. et al. (2024). Advancing Multimodal Medical Capabilities of Gemini. arXiv:2405.03162.
Gemini Robotics Team (2025). Gemini Robotics: Bringing AI into the Physical World. arXiv:2503.20020.
Feng, T., Trinh, T., Bingham, G. et al. (2026). Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems. arXiv:2601.22401.
DeepMind Research Team (2026). Building Production-Ready Probes For Gemini. arXiv:2601.11516.
Fu, Y., Wang, X., Tian, Y., Zhao, J. (2025). Deep Think with Confidence. arXiv:2508.15260.

Background literature (surveys and methods)

Wei, J. et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903.
Zhang, Z. et al. (2022). Automatic Chain of Thought Prompting in Large Language Models. arXiv:2210.03493.
Zhang, Z. et al. (2023). Multimodal Chain-of-Thought Reasoning in Language Models. arXiv:2302.00923.
Cai, W. et al. (2024). A Survey on Mixture of Experts in Large Language Models. arXiv:2407.06204.
Dai, Z. et al. (2019). Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. arXiv:1901.02860.
Ding, J. et al. (2023). LongNet: Scaling Transformers to 1,000,000,000 Tokens. arXiv:2307.02486.
Yin, S. et al. (2024). A Survey on Multimodal Large Language Models. arXiv:2306.13549.
Wang, X. et al. (2023). Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey. arXiv:2302.10035.
Chen, Q. et al. (2025). Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models. arXiv:2503.09567.

Official Google blog posts

Google (2023). Introducing Gemini: Google's most capable AI model yet. The Keyword, 12/06/2023.
Google DeepMind (2024). Introducing Gemini 1.5. The Keyword, 02/15/2024.
Google (2024). Introducing Gemini 2.0: A new AI model for the agentic era. The Keyword, 12/11/2024.
Google DeepMind (2025). Gemini 2.0 model updates. The Keyword, 02/05/2025.
Google DeepMind (2025). Gemini 2.5: Our newest Gemini model with thinking. The Keyword, 03/25/2025.
Google DeepMind (2025). Google I/O 2025: Updates to Gemini 2.5. The Keyword, 05/20/2025.
Google (2025). Gemini 3: Introducing the latest Gemini AI model. The Keyword, 11/18/2025.
The Deep Think Team (2026). Gemini 3 Deep Think: Advancing science, research and engineering. Google Blog, 02/12/2026.
The Gemini Team (2026). Gemini 3.1 Pro: A smarter model for your most complex tasks. Google Blog, 02/19/2026.