Falcon (language model family)

Falcon is a family of large language models (LLMs), multimodal models, and computer-vision models developed by the Technology Innovation Institute (TII) in Abu Dhabi, UAE. Many Falcon releases provide publicly downloadable model weights, but the family is not distributed under a single license: Falcon-RW-1B, Falcon-7B, and Falcon-40B use Apache 2.0; Falcon-180B uses a separate Falcon-180B TII License and acceptable-use policy; many later language-model checkpoints use variants of TII's Falcon-specific licenses, including the Falcon-LLM and Falcon-Mamba licenses; and Falcon Perception and Falcon OCR use Apache 2.0^[1]^[2]^[3]^[4]^[5]^[6]^[7]. Consequently, the Falcon family cannot be described uniformly as open-source: the availability of weights and the applicable terms must be assessed for each release, while many individual models are more precisely described as open-weight.

The family ranges from 90-million-parameter specialized models intended for constrained devices to a 180-billion-parameter dense language model requiring multi-GPU infrastructure. Its technical development includes the RefinedWeb training corpus, Transformer models optimized with shared key/value attention, a pure Mamba state-space model, hybrid Transformer–Mamba models, ternary-weight BitNet models, and early-fusion vision-language systems. In June 2025, TII reported that the Falcon ecosystem had exceeded 55 million downloads worldwide^[8].

History and Development

TII introduced the Falcon project in March 2023, and publicly released the first widely used model weights, Falcon-7B and Falcon-40B, in May 2023^[9]^[10]. The initial Falcon LLM license included a revenue-based royalty condition for some commercial uses. On 31 May 2023, TII relicensed Falcon-7B and Falcon-40B under Apache 2.0, removing that royalty provision^[11].

On 6 September 2023, TII released Falcon-180B, a 180-billion-parameter dense decoder-only model trained on 3.5 trillion tokens. It was among the largest openly accessible dense language models released at the time and was substantially larger than Meta's Llama 2 70B, although access to its weights required accepting a model-specific license and acceptable-use policy^[12]^[13]^[4].

Further development of the family included new generations and specialized releases:

Falcon 2 (13 May 2024): a second-generation 11B language model trained on 5.5 trillion tokens, accompanied by Falcon 2 11B VLM, the family's first publicly released vision-language model^[14]^[15].
Falcon Mamba 7B (12 August 2024): the first Falcon language model based entirely on the Mamba state-space model architecture rather than Transformer attention^[16].
Falcon 3 (17 December 2024): a compact family with five base models—1B, 3B, Mamba-7B, 7B, and 10B—and corresponding instruction-tuned and quantized variants. The 7B model was trained in a 14-trillion-token run; the smaller and larger variants used additional pruning, distillation, depth up-scaling, or continued-training procedures rather than each being trained independently on 14 trillion tokens^[17].
Falcon-E or Falcon-Edge (15 May 2025): nominal 1B and 3B model series trained with ternary weights in the BitNet 1.58-bit format, with base, instruction-tuned, pre-quantized, and bfloat16-compatible checkpoints^[18].
Falcon Arabic and Falcon-H1 (21 May 2025): an Arabic-focused adaptation of Falcon 3-7B and a six-size hybrid Transformer–Mamba family ranging from 0.5B to 34B parameters^[19].
Falcon-H1R and Falcon-H1 Arabic (5 January 2026): respectively, a reasoning-specialized 7B model and an Arabic-focused hybrid family in 3B, 7B, and 34B sizes^[20]^[21].
Falcon-H1-Tiny (January 2026): a group of specialized hybrid models at approximately 90M–0.6B scale for general instruction following, multilingual use, coding, tool calling, and mathematical or coding reasoning^[22]^[23].
Falcon Perception and Falcon OCR (31 March–2 April 2026): compact early-fusion multimodal models for open-vocabulary object grounding and segmentation, and for document text recognition. TII announced Falcon Perception on 31 March, and TII's technical release article published on 2 April also introduced Falcon OCR^[24]^[25]^[6]^[7].

As of 11 July 2026, TII's official Falcon catalog and Hugging Face organization page did not list a later Falcon-branded model release than Falcon Perception and Falcon OCR^[23]^[26].

Key Models in the Falcon Family
Model	Parameters (billions)	Key Features	License
Falcon-180B	180	Largest first-generation model; trained on 3.5 trillion tokens; full bfloat16 inference requires approximately 400 GB of accelerator memory^[4].	Falcon-180B TII License 1.0 and Acceptable Use Policy; shared or managed inference/fine-tuning API hosting requires separate permission from TII^[27]
Falcon-40B	40	First-generation model trained on 1 trillion tokens; multilingual capabilities concentrated in English and several European languages^[3].	Apache 2.0 since 31 May 2023^[11]
Falcon-7B	7	First-generation compact model trained on 1.5 trillion tokens; the official model card recommends at least 16 GB of memory for straightforward inference^[2].	Apache 2.0
Falcon-RW-1B	1	Research model trained only on 350 billion tokens of RefinedWeb; English-language and intended primarily for studying web-only pretraining^[1].	Apache 2.0
Falcon 2 11B	11	Second generation; trained on 5.5 trillion tokens; 8K context; text-only and VLM versions^[15]^[28].	TII Falcon License 2.0^[29]
Falcon Mamba 7B	7	Pure Mamba state-space language model trained on 5.8 trillion tokens; 8K training context^[30].	TII Falcon-Mamba License 2.0^[31]
Falcon 3	1–10	Five base models; Transformer and Mamba variants; context up to 32K tokens except 8K for the 1B model; text models support English, French, Spanish, and Portuguese^[17]^[5].	TII Falcon-LLM License 2.0^[5]
Falcon-E (Edge)	1; 3 (nominal sizes)	Ternary {-1, 0, 1} weights, corresponding to about 1.58 bits per weight; trained on approximately 1.5 trillion tokens; multiple checkpoint formats for inference and fine-tuning^[18].	Falcon-LLM License^[32]
Falcon Arabic	7	Arabic-focused adaptation of Falcon 3-7B; 32K context; tokenizer extended with 32,000 Arabic-specific tokens; trained on native Arabic data covering Modern Standard Arabic and regional dialects^[33].	TII announced the model under the TII Falcon License; see the availability qualification below^[19]
Falcon-H1	0.5–34 (marketed sizes)	Six model sizes, including a deeper 1.5B variant; parallel Transformer-attention and Mamba-2 heads; context up to 256K tokens; evaluated across 18 languages^[34]^[35].	Falcon-LLM License^[36]
Falcon-H1R	7 (marketed size; repository metadata lists about 8B)	Reasoning-specialized Falcon-H1 derivative trained with long reasoning traces and GRPO reinforcement learning^[37].	Falcon-LLM License^[38]
Falcon-H1 Arabic	3; 7; 34	Arabic-focused hybrid family with context up to 256K tokens and expanded coverage of dialects, STEM, code, and long-context tasks^[21].	The cited launch announcement does not state checkpoint-specific license terms; users should verify the license supplied with the distributed model^[21]
Falcon-H1-Tiny	0.09–0.6	Specialized hybrid models for English, multilingual text, coding, tool calling, and reasoning; includes 90M/100M variants and 0.6B reasoning checkpoints^[22]^[23].	Falcon-LLM License^[39]
Falcon Perception / Falcon OCR	0.6 / 0.3	Early-fusion vision-language models for grounding and instance segmentation, and for document OCR with text, LaTeX, and HTML-table output^[6]^[7].	Apache 2.0

Architecture and Technical Features

Transformer Architecture

Most Falcon text models are causal decoder-only Transformer language models. Important architectural choices vary by generation:

Shared key/value attention: Falcon-7B uses Multi-Query Attention (MQA), in which multiple query heads share key and value projections. Falcon-40B and Falcon-180B use TII's multigroup variant of multi-query attention, with separate key/value groups aligned to tensor-parallel partitions. These designs reduce the size of the key/value cache and improve autoregressive inference efficiency compared with conventional Multi-Head Attention^[13]^[40]^[41]. Falcon 3 Transformer models use GQA explicitly^[5].
Positional encoding: Falcon-7B, Falcon-40B, Falcon-180B, and later Transformer models use Rotary Positional Embeddings (RoPE). Falcon-RW-1B is an exception and uses ALiBi positional biases^[2]^[1]^[42]^[43].
Efficient attention kernels: First-generation Falcon models used FlashAttention-compatible implementations to reduce memory traffic during exact attention computation. FlashAttention is an implementation technique rather than a separate model architecture^[44]^[2].
Parallel blocks: Falcon-7B and related first-generation models compute attention and the feed-forward network in parallel within a decoder block and use a reduced number of layer-normalization operations^[2]^[13].

Mamba Architecture (State Space Model)

Falcon Mamba 7B, released in August 2024, replaces Transformer self-attention with the selective state-space mechanism introduced by Mamba. Its sequence computation scales linearly with sequence length, and autoregressive generation maintains a fixed-size recurrent state instead of a Transformer key/value cache that grows with the retained context^[45]^[30]. This improves memory scaling for long generation, but it does not by itself guarantee that useful information can be retained with equal accuracy over arbitrarily long sequences.

The original model was trained on 5.8 trillion tokens. In December 2024, TII continued training it on an additional 1.5 trillion high-quality tokens and included the resulting checkpoint in the Falcon 3 family as Falcon3-Mamba-7B^[17].

Hybrid Architecture (Falcon-H1)

The Falcon-H1 generation combines Transformer attention and Mamba-2 state-space processing. Within its hybrid mixer blocks, attention heads and Mamba-2 heads operate in parallel; TII can vary the proportion of the two mechanisms by model size. The family includes 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B base and instruction-tuned models^[34]^[35].

Falcon-H1 supports context windows of up to 256K tokens and was evaluated by TII across 18 languages: Arabic, Czech, German, English, Spanish, French, Hindi, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Romanian, Russian, Swedish, Urdu, and Chinese. A tokenizer designed to scale to more languages is not equivalent to verified native model performance in all of those languages^[35]^[8]. The same broad hybrid design underlies Falcon-H1R, Falcon-H1 Arabic, and Falcon-H1-Tiny.

Early-Fusion Multimodal Architecture

Falcon Perception and Falcon OCR use an early-fusion design in which image and text tokens are processed by a single dense autoregressive Transformer rather than by a separately frozen vision encoder connected to a language model at a late stage. Falcon Perception accepts natural-language prompts for open-vocabulary grounding and instance segmentation, while Falcon OCR converts document images into plain text, mathematical LaTeX, or HTML-formatted tables^[46]^[6]^[7].

Training Data

The foundational dataset associated with the first Falcon generation is RefinedWeb, constructed from Common Crawl through filtering and large-scale deduplication. TII publicly released a 600-billion-token extract for research^[47].

Falcon-RW-1B was trained on 350 billion RefinedWeb tokens only, making it a research artifact for evaluating web-only pretraining^[1].
Falcon-7B was trained on 1.5 trillion tokens, mostly RefinedWeb English, with smaller book, conversation, code, French-language, and technical corpora. Falcon-40B used 1 trillion tokens^[2]^[3].
Falcon-180B was trained on 3.5 trillion tokens, predominantly English and European-language RefinedWeb data, supplemented by books, dialogue, code, and technical material^[4]^[13].
Falcon 2 11B was trained on 5.5 trillion tokens from web, code, conversational, and multilingual sources^[15].
For Falcon 3, the 7B Transformer model received the 14-trillion-token pretraining run. The 10B model was produced by depth up-scaling the 7B model and continuing training on 2 trillion additional high-quality tokens; the 1B and 3B models were created using pruning and knowledge distillation with less than 100 billion curated training tokens; and Falcon3-Mamba-7B received 1.5 trillion additional tokens^[17].
Falcon-E was trained on an internal mixture of approximately 1.5 trillion tokens while simulating ternary-weight and quantized-activation behavior during training^[18].
Falcon Arabic was continuously pretrained from Falcon 3-7B on native, non-machine-translated Arabic data, with a curriculum covering Modern Standard Arabic, dialect-rich material, mathematics, code, and reasoning. TII's product page states a total of 600 billion Arabic, multilingual, and technical tokens^[33]^[48].

Specialized Models

Falcon Arabic

Falcon Arabic is a 7B Arabic-focused adaptation of Falcon 3-7B announced on 21 May 2025. Its vocabulary was expanded with 32,000 Arabic-specific tokens, and it supports a 32K-token context window. TII states that its training data was native rather than machine-translated and covered Modern Standard Arabic as well as Gulf, Levantine, and other regional dialects^[33]^[48].

In TII's OALL v2 evaluation at release, Falcon Arabic outperformed the comparison models in its size class and some models up to four times larger. This is a developer-reported, point-in-time result whose meaning depends on the benchmark version, prompt format, scoring procedure, and compared checkpoints^[33]. TII's separate press release and product page used a broader promotional claim of competing with models up to ten times its size^[19]^[48].

There is an important availability qualification. Although TII announced Falcon Arabic as an open model, the official TII Hugging Face collection checked on 11 July 2026 contained a translation demo and two evaluation-detail datasets, but did not list downloadable Falcon-Arabic base or instruction model-weight repositories^[49]. The model could therefore be tested through hosted interfaces, but the availability of its weights should not be inferred from the evaluation artifacts alone^[48].

In January 2026, TII announced Falcon-H1 Arabic in 3B, 7B, and 34B sizes. The family uses the Falcon-H1 hybrid architecture and supports contexts of up to 256K tokens. In TII's published Open Arabic LLM Leaderboard evaluation at launch, the models scored 61.87, 71.47, and 75.36 respectively; TII reported that the 34B version exceeded Qwen2.5-72B and Llama 3.3-70B in that specific evaluation^[21]. These figures should be read as launch-time benchmark results, not as a permanent or universal ranking. The launch announcement linked to TII's public playground rather than to checkpoint repositories, so model-weight availability and checkpoint-specific license terms should be verified separately^[21].

Falcon-E (Edge)

Falcon-E is a series of nominal 1B and 3B models trained with ternary weights {-1, 0, 1}, which carry log2(3), or approximately 1.58, bits of information per weight. TII provides native BitNet checkpoints, pre-quantized checkpoints intended for further training, and bfloat16 counterparts derived from the same training procedure^[18].

The ternary representation can reduce model storage and multiplication cost, especially with optimized BitNet kernels. Actual speed and energy gains depend on runtime, processor support, memory layout, and whether the deployment stack can exploit ternary operations; merely downloading a 1.58-bit checkpoint does not guarantee efficient execution on every general-purpose CPU^[18]^[50].

Reasoning Model Falcon-H1R

Falcon H1R 7B was released on 5 January 2026 as a reasoning-specialized derivative of Falcon-H1-7B. TII first applied supervised fine-tuning using long reasoning traces of up to 48K tokens and then reinforcement learning with Group Relative Policy Optimization (GRPO)^[51]^[37].

The technical report states that Falcon H1R matched or exceeded selected reasoning models two to seven times larger on mathematics, coding, logic, and instruction-following tests, and studied performance scaling when the model was allowed to generate longer reasoning traces. These comparisons are results reported by the model's developers under their evaluation setup rather than independent proof of superiority across all tasks^[37]^[20].

Falcon-H1-Tiny

Falcon-H1-Tiny extends the hybrid architecture to very small, task-focused models. The official collection includes 90M general base and instruction models, approximately 100M multilingual models, 90M coding and tool-calling models, and reasoning models at approximately 90M and 0.6B parameters. Some checkpoints also have GGUF or pre-GRPO versions^[22]^[23].

The series accompanied research on learnable multipliers, which introduces learned row-, column-, or matrix-level scaling parameters to relax weight-norm constraints arising during optimization. TII's paper reports improvements over tuned muP baselines with Adam and Muon optimizers; those optimization results should not be interpreted as evidence that every Tiny variant is a general-purpose substitute for a much larger model^[52].

Multimodal Capabilities

Falcon 2 11B VLM was the first released multimodal model in the family. It combines Falcon 2's language component with a CLIP ViT-L/14 vision encoder and converts image inputs into text outputs^[28].
Falcon 3 began as a text-model release in December 2024. TII subsequently made separate Falcon 3 multimodal models for image, video, and audio inputs; the official Falcon 3 page identifies these multimodal variants as English-language systems and describes text as their output modality^[53].
Falcon Perception is a 0.6B early-fusion model for open-vocabulary object grounding and instance segmentation through natural-language prompts. TII also provides a 0.3B detection-focused variant^[24]^[6].
Falcon OCR is a 0.3B early-fusion model for document recognition. Its model card describes output formats including plain text, LaTeX for mathematical expressions, and HTML for tables^[7].

Performance and Issues

Comparison with Competitors

Benchmark claims for Falcon span different model types, dates, test suites, and evaluation harnesses. They should be treated as point-in-time comparisons rather than a single continuous ranking.

Falcon-40B reached the top of the Hugging Face Open LLM Leaderboard shortly after its 2023 release, a historically important result that helped attract attention to the family^[54].
The Falcon-180B technical report found substantial gains over Llama 2 and Inflection-1 and performance approaching PaLM 2 Large on the authors' aggregate evaluation. It does not support a blanket claim that Falcon-180B outperforms every version of GPT-3.5 on most tasks^[13].
At the release of Falcon 2 11B, TII reported a Hugging Face Open LLM Leaderboard score of 64.28, essentially tied with Gemma 7B at 64.29 and above the cited Llama 3 8B result in that leaderboard snapshot^[14].
The Falcon Mamba 7B paper reported stronger aggregate results than several contemporary 7B–8B Transformer baselines, including Mistral 7B and Llama 3.1 8B, under the authors' evaluation suite^[30].
TII's internal Falcon 3 evaluation reported strong results for the 7B and 10B models in reasoning, mathematics, coding, and instruction following, but the model card also shows that Falcon3-7B did not lead every individual benchmark against all comparison models^[17]^[5].
The Falcon-H1 technical report states that Falcon-H1-34B matched or exceeded models including Qwen3-32B, Qwen2.5-72B, and Llama 3.3-70B on the authors' aggregate evaluations, while several smaller H1 models compared favorably with larger baselines. Reproduction depends on matching the report's prompts, quantization, sampling, and benchmark settings^[35].
TII reported launch-time leading results for Falcon-H1R on selected compact reasoning-model comparisons and for Falcon-H1 Arabic on the Open Arabic LLM Leaderboard^[37]^[21].

Limitations and Issues

Language coverage: Early Falcon models were trained predominantly on English and a smaller number of European-language sources. Falcon-RW-1B is English-only, and the first-generation model cards warn that generalization to unsupported languages is limited. Falcon 3 text models officially support English, French, Spanish, and Portuguese; Falcon-H1 was evaluated across 18 languages, including Russian^[1]^[2]^[5]^[35].
Base versus instruction models: Base checkpoints are next-token predictors rather than chat assistants. TII's model cards recommend additional fine-tuning for many practical uses; safety behavior and instruction-following quality differ materially between base, instruction-tuned, and reasoning variants^[2]^[4]^[5].
Hallucination, bias, and unsafe output: Falcon models can generate false, unsupported, biased, or harmful content, particularly because large portions of their pretraining data derive from the public web. TII model cards recommend task-specific evaluation, guardrails, and risk mitigation before production use. The NIST Generative AI Profile likewise treats confabulation and harmful bias as material risks for generative systems^[2]^[4]^[55].
Benchmark comparability: Scores can change with benchmark revisions, chat templates, prompt wording, few-shot examples, contamination controls, decoding settings, and evaluation software. Several Falcon comparisons come from TII's internal pipeline or release materials; they are useful evidence but should be attributed to the developer and not described as timeless global rankings^[5]^[35].
Long-context claims: A maximum context-window setting does not guarantee uniform accuracy throughout the window. Mamba's fixed-size recurrent state reduces cache growth, and Falcon-H1 supports configurations up to 256K tokens, but retrieval and reasoning quality over long inputs still require workload-specific testing^[30]^[35].
Compute requirements: Compact models can run on consumer hardware after quantization, but the largest models remain expensive. The official Falcon-180B card estimates approximately 400 GB of memory, or roughly eight A100 80 GB GPUs, for full-bfloat16 inference^[4].
Licensing differences: Falcon-RW-1B, Falcon-7B, and Falcon-40B are Apache 2.0 models. Falcon-180B uses a custom license that prohibits unapproved "Hosting Use"—shared instances or managed third-party inference/fine-tuning APIs—while permitting integrated end-user products that use the model in the background. Falcon 2, Falcon Mamba, Falcon 3, Falcon-E, Falcon-H1, Falcon-H1R, and Falcon-H1-Tiny use Falcon-specific licenses with acceptable-use restrictions. Users should review the exact license bundled with each checkpoint rather than infer terms from the family name^[27]^[29]^[31]^[5]^[32]^[36]^[38]^[39].
Model availability: A press announcement, hosted demonstration, benchmark dataset, or evaluation-results repository is not necessarily a downloadable model-weight release. Falcon Arabic is a documented example: TII announced the model in May 2025, but its official Hugging Face collection did not list weight repositories when checked on 11 July 2026^[19]^[49].
Modality and task limits: The Falcon 3 multimodal page describes its image, video, and audio models as English-language systems. Their existence does not imply that every Falcon 3 text checkpoint can process non-text inputs or that multimodal support is equally strong across languages. Falcon Perception is designed for grounding and segmentation rather than open-ended visual question answering, while Falcon OCR's model card identifies degraded scans and very small text as continuing challenges^[53]^[6]^[7].

External links

Literature

Ainslie, J. et al. (2023). GQA: Training Generalized Multi‑Query Transformer Models from Multi‑Head Checkpoints. arXiv:2305.13245.
Almazrouei, E. et al. (2023). The Falcon Series of Open Language Models. arXiv:2311.16867.
Dao, T. et al. (2022). FlashAttention: Fast and Memory‑Efficient Exact Attention with IO‑Awareness. arXiv:2205.14135.
Falcon LLM Team et al. (2026). Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling. arXiv:2601.02346.
Falcon Perception Team (2026). Falcon Perception. arXiv:2603.27365.
Gu, A.; Dao, T. (2023). Mamba: Linear‑Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752.
Ma, S. et al. (2024). The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits. arXiv:2402.17764.
Malartic, Q. et al. (2024). Falcon2-11B Technical Report. arXiv:2407.14885.
Penedo, G. et al. (2023). The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. arXiv:2306.01116.
Press, O. et al. (2021). Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. arXiv:2108.12409.
Shazeer, N. (2019). Fast Transformer Decoding: One Write‑Head is All You Need. arXiv:1911.02150.
Su, J. et al. (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv:2104.09864.
Velikanov, M. et al. (2026). Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers. arXiv:2601.04890.
Zuo, J. et al. (2024). Falcon Mamba: The First Competitive Attention-free 7B Language Model. arXiv:2410.05355.
Zuo, J. et al. (2025). Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance. arXiv:2507.22448.

References

↑ ^1.0 ^1.1 ^1.2 ^1.3 ^1.4 "Falcon-RW-1B". Hugging Face model card. [1]
↑ ^2.0 ^2.1 ^2.2 ^2.3 ^2.4 ^2.5 ^2.6 ^2.7 ^2.8 "Falcon-7B". Hugging Face model card. [2]
↑ ^3.0 ^3.1 ^3.2 "Falcon-40B". Hugging Face model card. [3]
↑ ^4.0 ^4.1 ^4.2 ^4.3 ^4.4 ^4.5 ^4.6 "Falcon-180B". Hugging Face model card. [4]
↑ ^5.0 ^5.1 ^5.2 ^5.3 ^5.4 ^5.5 ^5.6 ^5.7 ^5.8 "Falcon3-7B-Base". Hugging Face model card. [5]
↑ ^6.0 ^6.1 ^6.2 ^6.3 ^6.4 ^6.5 "Falcon-Perception". Hugging Face model card. [6]
↑ ^7.0 ^7.1 ^7.2 ^7.3 ^7.4 ^7.5 "Falcon-OCR". Hugging Face model card. [7]
↑ ^8.0 ^8.1 "Technology Innovation Institute Announces Falcon-H1 Model Availability as NVIDIA NIM to Deliver Sovereign AI at Scale". TII News, 12 June 2025. [8]
↑ "Abu Dhabi-based Technology Innovation Institute Introduces Falcon LLM: Foundational Large Language Model". TII News, 15 March 2023. [9]
↑ "UAE's TII Launches Open-Source Falcon 40B Large Language Model". TII News, 25 May 2023. [10]
↑ ^11.0 ^11.1 "UAE's Falcon 40B is now Royalty Free". TII News, 31 May 2023. [11]
↑ "Technology Innovation Institute Introduces Falcon 180B". TII News, 6 September 2023. [12]
↑ ^13.0 ^13.1 ^13.2 ^13.3 ^13.4 Almazrouei, Ebtesam, et al. "The Falcon Series of Open Language Models". arXiv, 29 November 2023. [13]
↑ ^14.0 ^14.1 "Falcon 2: UAE’s Technology Innovation Institute Releases New AI Model Series". TII News, 13 May 2024. [14]
↑ ^15.0 ^15.1 ^15.2 Malartic, Quentin, et al. "Falcon2-11B Technical Report". arXiv, 20 July 2024. [15]
↑ "UAE's Technology Innovation Institute Revolutionizes AI Language Models with a New Architecture". TII News, 12 August 2024. [16]
↑ ^17.0 ^17.1 ^17.2 ^17.3 ^17.4 "Welcome to the Falcon 3 Family of Open Models!". Hugging Face Blog, 17 December 2024. [17]
↑ ^18.0 ^18.1 ^18.2 ^18.3 ^18.4 "Falcon-Edge: A Series of Powerful, Universal, Fine-tunable 1.58bit Language Models". Hugging Face Blog, 15 May 2025. [18]
↑ ^19.0 ^19.1 ^19.2 ^19.3 "TII Launches Two New AI Models: Falcon Arabic and Falcon-H1". TII News, 21 May 2025. [19]
↑ ^20.0 ^20.1 "TII Launches Falcon Reasoning: Best 7B AI Model Globally, Also Outperforms Larger Models". TII News, 5 January 2026. [20]
↑ ^21.0 ^21.1 ^21.2 ^21.3 ^21.4 ^21.5 "Abu Dhabi's TII Launches Falcon-H1 Arabic". TII News, 5 January 2026. [21]
↑ ^22.0 ^22.1 ^22.2 "Falcon-H1-Tiny". TII collection on Hugging Face, updated 2 March 2026. [22]
↑ ^23.0 ^23.1 ^23.2 ^23.3 "Falcon Models". Falcon LLM — TII. [23]
↑ ^24.0 ^24.1 "TII Launches Falcon Perception, a New Multimodal AI Model". TII News, 31 March 2026. [24]
↑ "Falcon Perception". Hugging Face Blog, 2 April 2026. [25]
↑ "Technology Innovation Institute". Hugging Face organization page, checked 11 July 2026. [26]
↑ ^27.0 ^27.1 "Falcon 180B TII License Version 1.0". TII/Hugging Face, September 2023. [27]
↑ ^28.0 ^28.1 "Falcon 2 11B VLM". Hugging Face model card. [28]
↑ ^29.0 ^29.1 "Falcon 2 11B". Hugging Face model card. [29]
↑ ^30.0 ^30.1 ^30.2 ^30.3 Zuo, Jingwei, et al. "Falcon Mamba: The First Competitive Attention-free 7B Language Model". arXiv, 7 October 2024. [30]
↑ ^31.0 ^31.1 "Falcon Mamba 7B". Hugging Face model card. [31]
↑ ^32.0 ^32.1 "Falcon-E-1B-Base". Hugging Face model card. [32]
↑ ^33.0 ^33.1 ^33.2 ^33.3 "Falcon-Arabic: A Breakthrough in Arabic Language Models". Hugging Face Blog, 21 May 2025. [33]
↑ ^34.0 ^34.1 "Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance". Falcon LLM Blog, 20 May 2025. [34]
↑ ^35.0 ^35.1 ^35.2 ^35.3 ^35.4 ^35.5 ^35.6 Zuo, Jingwei, et al. "Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance". arXiv, 30 July 2025. [35]
↑ ^36.0 ^36.1 "Falcon-H1-7B-Base". Hugging Face model card. [36]
↑ ^37.0 ^37.1 ^37.2 ^37.3 Falcon LLM Team, et al. "Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling". arXiv, January 2026. [37]
↑ ^38.0 ^38.1 "Falcon-H1R-7B". Hugging Face model card. [38]
↑ ^39.0 ^39.1 "Falcon-H1-Tiny-90M-Instruct". Hugging Face model card. [39]
↑ Shazeer, Noam. "Fast Transformer Decoding: One Write-Head is All You Need". arXiv, 2019. [40]
↑ Ainslie, Joshua, et al. "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints". arXiv, 2023. [41]
↑ Su, Jianlin, et al. "RoFormer: Enhanced Transformer with Rotary Position Embedding". arXiv, 2021. [42]
↑ Press, Ofir, et al. "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation". arXiv, 2021. [43]
↑ Dao, Tri, et al. "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness". arXiv, 2022. [44]
↑ Gu, Albert; Dao, Tri. "Mamba: Linear-Time Sequence Modeling with Selective State Spaces". arXiv, 2023. [45]
↑ Falcon Perception Team. "Falcon Perception". arXiv, March 2026. [46]
↑ Penedo, Guilherme, et al. "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only". arXiv, 1 June 2023. [47]
↑ ^48.0 ^48.1 ^48.2 ^48.3 "Falcon Arabic". Falcon LLM — TII. [48]
↑ ^49.0 ^49.1 "Falcon-Arabic". TII collection on Hugging Face, checked 11 July 2026. [49]
↑ Ma, Shuming, et al. "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits". arXiv, 2024. [50]
↑ "Introducing Falcon H1R 7B". Falcon LLM Blog, 5 January 2026. [51]
↑ Velikanov, Maksim, et al. "Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers". arXiv, 8 January 2026. [52]
↑ ^53.0 ^53.1 "Introducing the Technology Innovation Institute's Falcon 3". Falcon LLM — TII. [53]
↑ "Falcon: The 7B and 40B Models Democratizing the LLM Landscape". Hugging Face Blog, 2023. [54]
↑ National Institute of Standards and Technology. Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile. NIST AI 600-1, July 2024. [55]

[hf-falcon-rw1b-1] 1.0 ^1.1 ^1.2 ^1.3 ^1.4 "Falcon-RW-1B". Hugging Face model card. [1]

[hf-falcon7b-2] 2.0 ^2.1 ^2.2 ^2.3 ^2.4 ^2.5 ^2.6 ^2.7 ^2.8 "Falcon-7B". Hugging Face model card. [2]

[hf-falcon40b-3] 3.0 ^3.1 ^3.2 "Falcon-40B". Hugging Face model card. [3]

[hf-falcon180b-4] 4.0 ^4.1 ^4.2 ^4.3 ^4.4 ^4.5 ^4.6 "Falcon-180B". Hugging Face model card. [4]

[hf-falcon3-card-5] 5.0 ^5.1 ^5.2 ^5.3 ^5.4 ^5.5 ^5.6 ^5.7 ^5.8 "Falcon3-7B-Base". Hugging Face model card. [5]

[hf-perception-6] 6.0 ^6.1 ^6.2 ^6.3 ^6.4 ^6.5 "Falcon-Perception". Hugging Face model card. [6]

[hf-ocr-7] 7.0 ^7.1 ^7.2 ^7.3 ^7.4 ^7.5 "Falcon-OCR". Hugging Face model card. [7]

[tii-downloads-55m-8] 8.0 ^8.1 "Technology Innovation Institute Announces Falcon-H1 Model Availability as NVIDIA NIM to Deliver Sovereign AI at Scale". TII News, 12 June 2025. [8]

[tii-falcon-introduction-9] "Abu Dhabi-based Technology Innovation Institute Introduces Falcon LLM: Foundational Large Language Model". TII News, 15 March 2023. [9]

[tii-falcon40-release-10] "UAE's TII Launches Open-Source Falcon 40B Large Language Model". TII News, 25 May 2023. [10]

[tii-40b-royaltyfree-11] 11.0 ^11.1 "UAE's Falcon 40B is now Royalty Free". TII News, 31 May 2023. [11]

[tii-falcon180-release-12] "Technology Innovation Institute Introduces Falcon 180B". TII News, 6 September 2023. [12]

[falcon-series-paper-13] 13.0 ^13.1 ^13.2 ^13.3 ^13.4 Almazrouei, Ebtesam, et al. "The Falcon Series of Open Language Models". arXiv, 29 November 2023. [13]

[tii-falcon2-14] 14.0 ^14.1 "Falcon 2: UAE’s Technology Innovation Institute Releases New AI Model Series". TII News, 13 May 2024. [14]

[falcon2-paper-15] 15.0 ^15.1 ^15.2 Malartic, Quentin, et al. "Falcon2-11B Technical Report". arXiv, 20 July 2024. [15]

[tii-falconmamba-news-16] "UAE's Technology Innovation Institute Revolutionizes AI Language Models with a New Architecture". TII News, 12 August 2024. [16]

[hf-blog-falcon3-17] 17.0 ^17.1 ^17.2 ^17.3 ^17.4 "Welcome to the Falcon 3 Family of Open Models!". Hugging Face Blog, 17 December 2024. [17]

[hf-blog-falcon-edge-18] 18.0 ^18.1 ^18.2 ^18.3 ^18.4 "Falcon-Edge: A Series of Powerful, Universal, Fine-tunable 1.58bit Language Models". Hugging Face Blog, 15 May 2025. [18]

[tii-arabic-h1-19] 19.0 ^19.1 ^19.2 ^19.3 "TII Launches Two New AI Models: Falcon Arabic and Falcon-H1". TII News, 21 May 2025. [19]

[tii-h1r-news-20] 20.0 ^20.1 "TII Launches Falcon Reasoning: Best 7B AI Model Globally, Also Outperforms Larger Models". TII News, 5 January 2026. [20]

[tii-h1-arabic-news-21] 21.0 ^21.1 ^21.2 ^21.3 ^21.4 ^21.5 "Abu Dhabi's TII Launches Falcon-H1 Arabic". TII News, 5 January 2026. [21]

[hf-h1-tiny-collection-22] 22.0 ^22.1 ^22.2 "Falcon-H1-Tiny". TII collection on Hugging Face, updated 2 March 2026. [22]

[tii-falcon-models-23] 23.0 ^23.1 ^23.2 ^23.3 "Falcon Models". Falcon LLM — TII. [23]

[tii-perception-news-24] 24.0 ^24.1 "TII Launches Falcon Perception, a New Multimodal AI Model". TII News, 31 March 2026. [24]

[hf-blog-falcon-perception-25] "Falcon Perception". Hugging Face Blog, 2 April 2026. [25]

[hf-tii-profile-current-26] "Technology Innovation Institute". Hugging Face organization page, checked 11 July 2026. [26]

[falcon180-license-27] 27.0 ^27.1 "Falcon 180B TII License Version 1.0". TII/Hugging Face, September 2023. [27]

[hf-falcon2-vlm-28] 28.0 ^28.1 "Falcon 2 11B VLM". Hugging Face model card. [28]

[hf-falcon2-card-29] 29.0 ^29.1 "Falcon 2 11B". Hugging Face model card. [29]

[falconmamba-paper-30] 30.0 ^30.1 ^30.2 ^30.3 Zuo, Jingwei, et al. "Falcon Mamba: The First Competitive Attention-free 7B Language Model". arXiv, 7 October 2024. [30]

[hf-falconmamba-card-31] 31.0 ^31.1 "Falcon Mamba 7B". Hugging Face model card. [31]

[hf-falcon-edge-card-32] 32.0 ^32.1 "Falcon-E-1B-Base". Hugging Face model card. [32]

[hf-blog-falcon-arabic-33] 33.0 ^33.1 ^33.2 ^33.3 "Falcon-Arabic: A Breakthrough in Arabic Language Models". Hugging Face Blog, 21 May 2025. [33]

[hf-blog-falcon-h1-34] 34.0 ^34.1 "Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance". Falcon LLM Blog, 20 May 2025. [34]

[falcon-h1-paper-35] 35.0 ^35.1 ^35.2 ^35.3 ^35.4 ^35.5 ^35.6 Zuo, Jingwei, et al. "Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance". arXiv, 30 July 2025. [35]

[hf-falcon-h1-card-36] 36.0 ^36.1 "Falcon-H1-7B-Base". Hugging Face model card. [36]

[falcon-h1r-paper-37] 37.0 ^37.1 ^37.2 ^37.3 Falcon LLM Team, et al. "Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling". arXiv, January 2026. [37]

[hf-falcon-h1r-card-38] 38.0 ^38.1 "Falcon-H1R-7B". Hugging Face model card. [38]

[hf-h1-tiny-card-39] 39.0 ^39.1 "Falcon-H1-Tiny-90M-Instruct". Hugging Face model card. [39]

[mqa-paper-40] Shazeer, Noam. "Fast Transformer Decoding: One Write-Head is All You Need". arXiv, 2019. [40]

[gqa-paper-41] Ainslie, Joshua, et al. "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints". arXiv, 2023. [41]

[rope-paper-42] Su, Jianlin, et al. "RoFormer: Enhanced Transformer with Rotary Position Embedding". arXiv, 2021. [42]

[alibi-paper-43] Press, Ofir, et al. "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation". arXiv, 2021. [43]

[flashattention-paper-44] Dao, Tri, et al. "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness". arXiv, 2022. [44]

[mamba-paper-45] Gu, Albert; Dao, Tri. "Mamba: Linear-Time Sequence Modeling with Selective State Spaces". arXiv, 2023. [45]

[falcon-perception-paper-46] Falcon Perception Team. "Falcon Perception". arXiv, March 2026. [46]

[refinedweb-paper-47] Penedo, Guilherme, et al. "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only". arXiv, 1 June 2023. [47]

[tii-falcon-arabic-page-48] 48.0 ^48.1 ^48.2 ^48.3 "Falcon Arabic". Falcon LLM — TII. [48]

[hf-falcon-arabic-collection-49] 49.0 ^49.1 "Falcon-Arabic". TII collection on Hugging Face, checked 11 July 2026. [49]

[bitnet-paper-50] Ma, Shuming, et al. "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits". arXiv, 2024. [50]

[falcon-h1r-blog-51] "Introducing Falcon H1R 7B". Falcon LLM Blog, 5 January 2026. [51]

[learnable-multipliers-paper-52] Velikanov, Maksim, et al. "Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers". arXiv, 8 January 2026. [52]

[tii-falcon3-page-53] 53.0 ^53.1 "Introducing the Technology Innovation Institute's Falcon 3". Falcon LLM — TII. [53]

[hf-blog-falcon-54] "Falcon: The 7B and 40B Models Democratizing the LLM Landscape". Hugging Face Blog, 2023. [54]

[nist-genai-profile-55] National Institute of Standards and Technology. Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile. NIST AI 600-1, July 2024. [55]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

Falcon (language model family)

Contents

History and Development

Architecture and Technical Features

Transformer Architecture

Mamba Architecture (State Space Model)

Hybrid Architecture (Falcon-H1)

Early-Fusion Multimodal Architecture

Training Data

Specialized Models

Falcon Arabic

Falcon-E (Edge)

Reasoning Model Falcon-H1R

Falcon-H1-Tiny

Multimodal Capabilities

Performance and Issues

Comparison with Competitors

Limitations and Issues

External links

Literature

References

Navigation menu

Falcon (language model family)

History and Development

Architecture and Technical Features

Transformer Architecture

Mamba Architecture (State Space Model)

Hybrid Architecture (Falcon-H1)

Early-Fusion Multimodal Architecture

Training Data

Specialized Models

Falcon Arabic

Falcon-E (Edge)

Reasoning Model Falcon-H1R

Falcon-H1-Tiny

Multimodal Capabilities

Performance and Issues

Comparison with Competitors

Limitations and Issues

External links

Literature

References

Navigation menu

Search