Few-Shot and Zero-Shot Learning

From Systems Analysis Wiki
Jump to navigation Jump to search

Few-shot Learning (FSL) and Zero-shot Learning (ZSL) are machine learning paradigms aimed at solving the problem of labeled data scarcity. They enable models to learn and generalize knowledge from a very limited amount of information, which is crucial for applying artificial intelligence in real-world scenarios where collecting large datasets is impossible or impractical.

The Pervasive Problem of Data Scarcity

Modern deep learning models demonstrate impressive results, but their effectiveness typically depends directly on vast amounts of labeled data. Collecting and annotating such data is a costly, labor-intensive, and often impossible process. This problem, known as "data scarcity," is particularly acute in fields such as:

  • Diagnosing rare diseases.
  • Specialized industrial manufacturing.
  • Robotics and interaction with new objects.
  • Categorizing constantly emerging new products or topics.

FSL and ZSL offer a solution by shifting the focus from "big data" to "smart data," concentrating on efficient knowledge transfer and generalization.

Few-shot Learning (FSL)

Few-shot Learning (FSL) is a paradigm in which a model learns to recognize new classes based on a very small number of labeled examples (typically 1 to 5), called the support set.

Key Idea

The core idea of FSL is not just to learn features for specific classes but to learn the process of learning itself (learning to learn). The model must quickly adapt to new, previously unseen tasks using a minimal number of examples. This is achieved by leveraging prior knowledge and adaptation strategies.

Main Approaches in FSL

  • Meta-learning ("Learning to learn"): This is the dominant paradigm in FSL. The model is trained on a multitude of diverse tasks to learn how to adapt effectively to new ones.
    • Metric-based: Models (e.g., Siamese or Prototypical Networks) learn to construct an embedding space where the distance between vectors reflects semantic similarity. A new example is classified by comparing its embedding with the embeddings from the support set.
    • Optimization-based: Models (e.g., MAML) learn to find a parameter initialization that allows them to quickly fine-tune for a new task within a few gradient descent steps.
  • In-Context Learning: With the advent of Large Language Models (LLMs), an approach has become popular where examples from the support set are provided to the model directly in the prompt. The model performs "implicit meta-learning" at inference time, adapting to the task without updating its weights.

Advanced Approaches in FSL

  • One-shot Learning: This is a special case of FSL where the number of examples for each class is one (K=1 in "N-way K-shot" notation). Classic examples of architectures include Matching Networks and Siamese Networks.
  • Generative Methods and Data Augmentation: FSL actively uses generative models (GANs, VAEs, diffusion models) to synthesize additional data examples. This allows for the artificial expansion of the support set and improves classification quality, especially for rare or unusual classes.
  • Transductive FSL: Unlike the standard (inductive) approach, here the model considers not only the labeled support set but also the entire set of unlabeled test examples (queries) during adaptation. This allows it to capture the data structure within the queries, for example, using label propagation techniques, and improve classification robustness.

Zero-shot Learning (ZSL)

Zero-shot Learning (ZSL) is a paradigm where a model is capable of recognizing classes for which it has seen zero examples during training.

Key Idea

This is achieved by leveraging auxiliary semantic information that describes both seen and unseen classes. The model learns a mapping from the input feature space (e.g., images) to a common semantic space.

Mechanisms of Semantic Information

  • Semantic Attributes: Classes are described by a set of human-defined attributes (e.g., for the class "zebra": [has_stripes, has_hooves, is_a_mammal]).
  • Word Embeddings: Class names or their textual descriptions are converted into embeddings using pre-trained language models (e.g., Word2Vec, BERT).
  • Prompting in LLMs: With the emergence of multimodal models like CLIP, ZSL can be performed by comparing an image's embedding with the embeddings of textual descriptions of classes (e.g., "a photo of a dog," "a photo of a cat").

Types of ZSL: Conventional, Generalized, Inductive, and Transductive

  • Conventional ZSL: At test time, the model must classify examples only from unseen classes.
  • Generalized ZSL (GZSL): A more realistic and challenging scenario where the model must classify examples from both seen and unseen classes. This requires the model not only to recognize the new but also to distinguish it from the familiar, combating the bias towards seen classes.
  • Inductive ZSL: The standard setting where the model is trained only on data and semantic descriptions of seen classes, without any access to information about unseen ones.
  • Transductive ZSL: A more advanced setup where the model can use unlabeled examples from unseen classes during training. This allows it to pre-adapt the semantic space and improve final performance.

Generative Approaches in ZSL

  • Feature Generation for Unseen Classes: To address the bias problem in GZSL, generative models (VAEs, GANs) are widely used. They synthesize not the images themselves, but their feature vectors (embeddings) for unseen classes based on their semantic descriptions. This allows for balancing the dataset for training the final classifier.

Comparative Analysis and Synergy

Comparison of Limited-Data Learning Paradigms
Aspect Few-shot Learning (FSL) Zero-shot Learning (ZSL)
Core Idea To learn to quickly adapt to new classes based on a few examples. To learn to recognize new classes based on their semantic descriptions.
Data Requirements for a New Task/Class A few labeled examples (1-5) for each new class. Zero labeled examples; requires a semantic description.
Knowledge Transfer Learning procedural knowledge ("how to adapt") or a good feature space. Learning semantic relationships and attributes, transferring knowledge via a common semantic space.
Typical Use Cases Rapid prototyping, personalization, rare object recognition, robotics. New species detection, categorization of emerging topics, handling entirely new product types.
  • Distinction between Zero-shot Learning and Zero-shot Prompting: It's important to distinguish between these concepts. ZSL is an architectural machine learning paradigm that requires a special model and semantic information. Zero-shot Prompting is an applied prompt engineering technique where a large language model (e.g., GPT-4) solves a task without examples in the prompt, relying solely on its internal knowledge.
  • The Role of Large-Scale Pre-training: The modern success of FSL and ZSL is largely due to powerful foundation models (BERT, CLIP, GPT-4). Their pre-training on massive datasets creates a universal and semantically rich embedding front-end, which serves as an excellent foundation for rapid adaptation and semantic inference in data-scarce conditions.

FSL and ZSL represent different points on the spectrum of data efficiency and are often used in conjunction. For example, ZSL can be used to initialize representations that are then fine-tuned using FSL when the first examples of a new class become available.

Key Research Institutions and Contributors

Research in FSL and ZSL is actively pursued in both academic circles and industrial labs.

  • Universities: Stanford University, Peking University, National University of Singapore.
  • Industrial Labs: Google AI, Meta AI, OpenAI.

With the advent of powerful foundation models, the research focus has shifted from developing specialized architectures for FSL/ZSL to methods for effectively adapting these models.

Literature

  • Finn, C. et al. (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. arXiv:1703.03400.
  • Koch, G. et al. (2015). Siamese Neural Networks for One-Shot Image Recognition. PDF.
  • Vinyals, O. et al. (2016). Matching Networks for One Shot Learning. arXiv:1606.04080.
  • Snell, J. et al. (2017). Prototypical Networks for Few-Shot Learning. arXiv:1703.05175.
  • Sung, F. et al. (2018). Learning to Compare: Relation Network for Few-Shot Learning. arXiv:1711.06025.
  • Chen, Y. et al. (2021). Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning. arXiv:2003.04390.
  • Wang, Y. et al. (2020). Generalizing from a Few Examples: A Survey on Few-Shot Learning. arXiv:1904.05046.
  • Xian, Y. et al. (2018). Zero-Shot Learning — A Comprehensive Evaluation of the State of the Art. arXiv:1707.00600.
  • Verma, V. K. et al. (2018). Generalized Zero-Shot Learning via Synthesized Examples. arXiv:1712.03878.
  • Radford, A. et al. (2021). Learning Transferable Visual Models from Natural Language Supervision. arXiv:2103.00020.
  • Xian, Y. et al. (2019). Zero-Shot Learning via Simultaneous Generating and Learning. arXiv:1910.09446.
  • Verma, V. K. et al. (2017). Zero-Shot Learning via Generative Adversarial Training of Class-Conditional Feature Vectors. arXiv:1712.00981.

See Also

  • Large Language Models