Jamba (language model)

Jamba is a family of large language models (LLMs) developed by the Israeli research company AI21 Labs. Jamba introduces a first-of-its-kind hybrid architecture that combines key elements from two dominant approaches in AI development: Transformers and State Space Models (SSMs), specifically the Mamba architecture^[1].

Jamba's primary goal is to address a fundamental trade-off in modern LLMs: the high quality and performance (characteristic of Transformers) versus the efficiency and ability to process ultra-long contexts (characteristic of SSMs). By combining these approaches and adding sparsity through Mixture-of-Experts (MoE), Jamba offers a model that is simultaneously powerful, efficient, and capable of handling vast amounts of text in a single query.

Jamba's Architecture in Detail

Jamba does not simply alternate between Transformer and Mamba layers. It employs a meticulously designed block structure, where each block consists of eight layers.

Structure of a single Jamba block:

One Transformer Layer: This layer is responsible for "deep" understanding and complex reasoning. The Mixture-of-Experts (MoE) architecture is built into this layer.
Seven Mamba Layers: These layers follow the Transformer layer and are responsible for efficient sequence processing and propagating information across a long context^[2].

This asymmetric structure allows the model to manage computational resources efficiently: the heavy but powerful Transformer operations are performed less frequently, while the lightweight and fast Mamba operations are performed more often.

Mixture-of-Experts (MoE) Integration

Jamba utilizes the MoE architecture to further enhance its efficiency.

MoE is applied only to the feed-forward network (FFN) blocks within the Transformer layers^[3]. The Mamba layers remain dense.
The first Jamba model had 16 experts.
For each token, a router network selects the top 2 experts (Top-2 gating).

This means that while the model's total parameter count is large (52 billion), only 2 out of the 16 experts are active during each token processing step in a Transformer layer, making the computation very fast.

Evolution of Jamba Models

Jamba-v0.1 (March 2024)

The first model introduced in this family has the following specifications:

Jamba-v0.1 Technical Specifications
Specification	Value
Total Parameters	52 billion
Active Parameters	~12 billion
Number of Experts (MoE)	16 (2 active)
Context Window	256,000 tokens
License	Apache 2.0^[4]

Thanks to its hybrid architecture, Jamba-1 can process a context length of 256,000 tokens, equivalent to an approximately 400-page novel, and can be deployed on a single consumer GPU with 80 GB of memory^[5].

Jamba-1.5 (2024)

In 2024, AI21 Labs introduced the updated Jamba 1.5 family of models, which includes two versions: Jamba 1.5 Mini (12B active parameters out of 52B total) and Jamba 1.5 Large (94B active parameters out of 398B total)^[6]. These models demonstrate significant performance improvements:

Up to 2.5 times faster inference on long contexts compared to competitors.
Support for nine languages, including English, Spanish, French, and Arabic^[7].

Key Advantages and Performance

Massive Context Window: 256,000 tokens—one of the largest context windows among all available models (including proprietary ones) at the time of its release. This makes Jamba ideal for tasks requiring the analysis of large documents, such as legal contracts, scientific papers, entire codebases, or long dialogues.
High Performance and Efficiency: In benchmarks, Jamba demonstrates performance comparable to or exceeding that of leading open models of a similar size, such as Llama and Mixtral, while achieving 3 times higher throughput on long contexts.
Openness and Accessibility: Jamba is distributed under the permissive Apache 2.0 license, allowing for free use in commercial and research applications. The model weights are available on the Hugging Face platform.

Benchmark Results

Jamba 1.5 shows competitive results on various benchmarks:

Jamba 1.5 Mini scored 46.1 on Arena Hard, making it the leading public model in its category^[8].
Jamba 1.5 Large scored 65.4 on Arena Hard, outperforming Llama 3.1 70B and 405B.

Applications and Availability

Jamba is optimized for business applications and supports capabilities such as function calling, structured JSON output, and document processing. The model is available on multiple platforms, including:

Hugging Face
Google Cloud Vertex AI
Microsoft Azure
NVIDIA API catalog
Amazon Bedrock
AI21 Studio

To support cost-effective inference, AI21 Labs introduced ExpertsInt8, a new quantization technique that allows Jamba 1.5 Large to be hosted on a machine with eight 80GB GPUs without quality loss when processing a 256K token context^[9].

References

↑ “Announcing Jamba: AI21's Groundbreaking SSM-Transformer Model”. AI21 Labs Blog. [1]
↑ Lieber, O., et al. (2024). Jamba: A Hybrid Transformer-Mamba Language Model. arXiv:2403.19887.
↑ “Jamba Documentation”. Hugging Face Transformers. [2]
↑ “ai21labs/Jamba-v0.1”. Hugging Face. [3]
↑ “AI21 Labs' Jamba: A New Hybrid LLM Architecture”. Gradient Flow. [4]
↑ “Announcing the Jamba-1.5 model family”. AI21 Labs Blog. [5]
↑ “ai21labs/AI21-Jamba-Large-1.5”. Hugging Face. [6]
↑ “Jamba-1.5 family of models by AI21 Labs is now available in Amazon Bedrock”. AWS What's New. [7]
↑ “ExpertsInt8: A new paradigm for efficient inference of MoE-based LLMs”. OpenReview. [8]

[ai21-announcement-1] “Announcing Jamba: AI21's Groundbreaking SSM-Transformer Model”. AI21 Labs Blog. [1]

[arxiv-jamba-2] Lieber, O., et al. (2024). Jamba: A Hybrid Transformer-Mamba Language Model. arXiv:2403.19887.

[transformers-doc-jamba-3] “Jamba Documentation”. Hugging Face Transformers. [2]

[hf-jamba-v0.1-4] “ai21labs/Jamba-v0.1”. Hugging Face. [3]

[gradientflow-jamba-5] “AI21 Labs' Jamba: A New Hybrid LLM Architecture”. Gradient Flow. [4]

[jamba-1.5-family-6] “Announcing the Jamba-1.5 model family”. AI21 Labs Blog. [5]

[hf-jamba-large-1.5-7] “ai21labs/AI21-Jamba-Large-1.5”. Hugging Face. [6]

[aws-jamba-1.5-bedrock-8] “Jamba-1.5 family of models by AI21 Labs is now available in Amazon Bedrock”. AWS What's New. [7]

[openreview-expertsint8-9] “ExpertsInt8: A new paradigm for efficient inference of MoE-based LLMs”. OpenReview. [8]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Jamba (language model)

Contents

Jamba's Architecture in Detail

Mixture-of-Experts (MoE) Integration

Evolution of Jamba Models

Jamba-v0.1 (March 2024)

Jamba-1.5 (2024)

Key Advantages and Performance

Benchmark Results

Applications and Availability

Further Reading

References

Navigation menu

Jamba (language model)

Jamba's Architecture in Detail

Mixture-of-Experts (MoE) Integration

Evolution of Jamba Models

Jamba-v0.1 (March 2024)

Jamba-1.5 (2024)

Key Advantages and Performance

Benchmark Results

Applications and Availability

Further Reading

References

Navigation menu

Search