63 research outputs found
Deep knowledge transfer for generalization across tasks and domains under data scarcity
Over the last decade, deep learning approaches have achieved tremendous performance in a wide variety of fields, e.g., computer vision and natural language understanding, and across several sectors such as healthcare, industrial manufacturing, and driverless mobility. Most deep learning successes were accomplished in learning scenarios fulfilling the two following requirements. First, large amounts of data are available for training the deep learning model and there are no access restrictions to the data. Second, the data used for training and testing is independent and identically distributed (i.i.d.). However, many real-world applications infringe at least one of the aforementioned requirements, which results in challenging learning problems. The present thesis comprises four contributions to address four such learning problems. In each contribution, we propose a novel method and empirically demonstrate its effectiveness for the corresponding problem setting.
The first part addresses the underexplored intersection of the few-shot learning and the one-class classification problems. In this learning scenario, the model has to learn a new task using only a few examples from only the majority class, without overfitting to the few examples or to the majority class. This learning scenario is faced in real-world applications of anomaly detection where data is scarce. We propose an episode sampling technique to adapt meta-learning algorithms designed for class-balanced few-shot classification to the addressed few-shot one-class classification problem. This is done by optimizing for a model initialization tailored for the addressed scenario. In addition, we provide theoretical and empirical analyses to investigate the need for second-order derivatives to learn such parameter initializations. Our experiments on 8 image and time-series datasets, including a real-world dataset of industrial sensor readings, demonstrate the effectiveness of our method.
The second part tackles the intersection of the continual learning and the anomaly detection problems, which we are the first to explore, to the best of our knowledge. In this learning scenario, the model is exposed to a stream of anomaly detection tasks, i.e., only examples from the normal class are available, that it has to learn sequentially. Such problem settings are encountered in anomaly detection applications where the data distribution continuously changes. We propose a meta-learning approach that learns parameter-specific initializations and learning rates suitable for continual anomaly detection. Our empirical evaluations show that a model trained with our algorithm is able to learn up 100 anomaly detection tasks sequentially with minimal catastrophic forgetting and overfitting to the majority class.
In the third part, we address the domain generalization problem, in which a model trained on several source domains is expected to generalize well to data from a previously unseen target domain, without any modification or exposure to its data. This challenging learning scenario is present in applications involving domain shift, e.g., different clinical centers using different MRI scanners or data acquisition protocols. We assume that learning to extract a richer set of features improves the transfer to a wider set of unknown domains. Motivated by this, we propose an algorithm that identifies the already learned features and corrupts them, hence enforcing new feature discovery. We leverage methods from the explainable machine learning literature to identify the features, and apply the targeted corruption on multiple representation levels, including input data and high-level embeddings. Our extensive empirical evaluation shows that our approach outperforms 18 domain generalization algorithms on multiple benchmark datasets.
The last part of the thesis addresses the intersection of domain generalization and data-free learning methods, which we are the first to explore, to the best of our knowledge. Hereby, we address the learning scenario where a model robust to domain shift is needed and only models trained on the same task but different domains are available instead of the original datasets. This learning scenario is relevant for any domain generalization application where the access to the data of the source domains is restricted, e.g., due to concerns about data privacy concerns or intellectual property infringement. We develop an approach that extracts and fuses domain-specific knowledge from the available teacher models into a student model robust to domain shift, by generating synthetic cross-domain data. Our empirical evaluation demonstrates the effectiveness of our method which outperforms ensemble and data-free knowledge distillation baselines. Most importantly, the proposed approach substantially reduces the gap between the best data-free baseline and the upper-bound baseline that uses the original private data
Multi-Modal Fusion by Meta-Initialization
When experience is scarce, models may have insufficient information to adapt
to a new task. In this case, auxiliary information - such as a textual
description of the task - can enable improved task inference and adaptation. In
this work, we propose an extension to the Model-Agnostic Meta-Learning
algorithm (MAML), which allows the model to adapt using auxiliary information
as well as task experience. Our method, Fusion by Meta-Initialization (FuMI),
conditions the model initialization on auxiliary information using a
hypernetwork, rather than learning a single, task-agnostic initialization.
Furthermore, motivated by the shortcomings of existing multi-modal few-shot
learning benchmarks, we constructed iNat-Anim - a large-scale image
classification dataset with succinct and visually pertinent textual class
descriptions. On iNat-Anim, FuMI significantly outperforms uni-modal baselines
such as MAML in the few-shot regime. The code for this project and a dataset
exploration tool for iNat-Anim are publicly available at
https://github.com/s-a-malik/multi-few .Comment: The first two authors contributed equall
Writer adaptation for offline text recognition: An exploration of neural network-based methods
Handwriting recognition has seen significant success with the use of deep
learning. However, a persistent shortcoming of neural networks is that they are
not well-equipped to deal with shifting data distributions. In the field of
handwritten text recognition (HTR), this shows itself in poor recognition
accuracy for writers that are not similar to those seen during training. An
ideal HTR model should be adaptive to new writing styles in order to handle the
vast amount of possible writing styles. In this paper, we explore how HTR
models can be made writer adaptive by using only a handful of examples from a
new writer (e.g., 16 examples) for adaptation. Two HTR architectures are used
as base models, using a ResNet backbone along with either an LSTM or
Transformer sequence decoder. Using these base models, two methods are
considered to make them writer adaptive: 1) model-agnostic meta-learning
(MAML), an algorithm commonly used for tasks such as few-shot classification,
and 2) writer codes, an idea originating from automatic speech recognition.
Results show that an HTR-specific version of MAML known as MetaHTR improves
performance compared to the baseline with a 1.4 to 2.0 improvement in word
error rate (WER). The improvement due to writer adaptation is between 0.2 and
0.7 WER, where a deeper model seems to lend itself better to adaptation using
MetaHTR than a shallower model. However, applying MetaHTR to larger HTR models
or sentence-level HTR may become prohibitive due to its high computational and
memory requirements. Lastly, writer codes based on learned features or Hinge
statistical features did not lead to improved recognition performance.Comment: 21 pages including appendices, 6 figures, 10 table
Learning Transferable Adversarial Robust Representations via Multi-view Consistency
Despite the success on few-shot learning problems, most meta-learned models
only focus on achieving good performance on clean examples and thus easily
break down when given adversarially perturbed samples. While some recent works
have shown that a combination of adversarial learning and meta-learning could
enhance the robustness of a meta-learner against adversarial attacks, they fail
to achieve generalizable adversarial robustness to unseen domains and tasks,
which is the ultimate goal of meta-learning. To address this challenge, we
propose a novel meta-adversarial multi-view representation learning framework
with dual encoders. Specifically, we introduce the discrepancy across the two
differently augmented samples of the same data instance by first updating the
encoder parameters with them and further imposing a novel label-free
adversarial attack to maximize their discrepancy. Then, we maximize the
consistency across the views to learn transferable robust representations
across domains and tasks. Through experimental validation on multiple
benchmarks, we demonstrate the effectiveness of our framework on few-shot
learning tasks from unseen domains, achieving over 10\% robust accuracy
improvements against previous adversarial meta-learning baselines.Comment: *Equal contribution (Author ordering determined by coin flip).
NeurIPS SafetyML workshop 2022, Under revie
ARCADe: A Rapid Continual Anomaly Detector
Although continual learning and anomaly detection have separately been
well-studied in previous works, their intersection remains rather unexplored.
The present work addresses a learning scenario where a model has to
incrementally learn a sequence of anomaly detection tasks, i.e. tasks from
which only examples from the normal (majority) class are available for
training. We define this novel learning problem of continual anomaly detection
(CAD) and formulate it as a meta-learning problem. Moreover, we propose A Rapid
Continual Anomaly Detector (ARCADe), an approach to train neural networks to be
robust against the major challenges of this new learning problem, namely
catastrophic forgetting and overfitting to the majority class. The results of
our experiments on three datasets show that, in the CAD problem setting, ARCADe
substantially outperforms baselines from the continual learning and anomaly
detection literature. Finally, we provide deeper insights into the learning
strategy yielded by the proposed meta-learning algorithm
- …