2,247 research outputs found
A Comprehensive Empirical Evaluation on Online Continual Learning
Online continual learning aims to get closer to a live learning experience by
learning directly on a stream of data with temporally shifting distribution and
by storing a minimum amount of data from that stream. In this empirical
evaluation, we evaluate various methods from the literature that tackle online
continual learning. More specifically, we focus on the class-incremental
setting in the context of image classification, where the learner must learn
new classes incrementally from a stream of data. We compare these methods on
the Split-CIFAR100 and Split-TinyImagenet benchmarks, and measure their average
accuracy, forgetting, stability, and quality of the representations, to
evaluate various aspects of the algorithm at the end but also during the whole
training period. We find that most methods suffer from stability and
underfitting issues. However, the learned representations are comparable to
i.i.d. training under the same computational budget. No clear winner emerges
from the results and basic experience replay, when properly tuned and
implemented, is a very strong baseline. We release our modular and extensible
codebase at https://github.com/AlbinSou/ocl_survey based on the avalanche
framework to reproduce our results and encourage future research.Comment: ICCV Visual Continual Learning Workshop 2023 accepted pape
Open Set Classification for Deep Learning in Large-Scale and Continual Learning Models
Supervised classification methods often assume the train and test data distributions are the same and that all classes in the test set are present in the training set. However, deployed classifiers require the ability to recognize inputs from outside the training set as unknowns and update representations in near real-time to account for novel concepts unknown during offline training. This problem has been studied under multiple paradigms including out-of-distribution detection and open set recognition; however, for convolutional neural networks, there have been two major approaches: 1) inference methods to separate known inputs from unknown inputs and 2) feature space regularization strategies to improve model robustness to novel inputs. In this dissertation, we explore the relationship between the two approaches and directly compare performance on large-scale datasets that have more than a few dozen categories. Using the ImageNet large-scale classification dataset, we identify novel combinations of regularization and specialized inference methods that perform best across multiple open set classification problems of increasing difficulty level. We find that input perturbation and temperature scaling yield significantly better performance on large-scale datasets than other inference methods tested, regardless of the feature space regularization strategy. Conversely, we also find that improving performance with advanced regularization schemes during training yields better performance when baseline inference techniques are used; however, this often requires supplementing the training data with additional background samples which is difficult in large-scale problems.
To overcome this problem we further propose a simple regularization technique that can be easily applied to existing convolutional neural network architectures that improves open set robustness without the requirement for a background dataset. Our novel method achieves state-of-the-art results on open set classification baselines and easily scales to large-scale problems.
Finally, we explore the intersection of open set and continual learning to establish baselines for the first time for novelty detection while learning from online data streams. To accomplish this we establish a novel dataset created for evaluating image open set classification capabilities of streaming learning algorithms. Finally, using our new baselines we draw conclusions as to what the most computationally efficient means of detecting novelty in pre-trained models and what properties of an efficient open set learning algorithm operating in the streaming paradigm should possess
Graceful Degradation and Related Fields
When machine learning models encounter data which is out of the distribution
on which they were trained they have a tendency to behave poorly, most
prominently over-confidence in erroneous predictions. Such behaviours will have
disastrous effects on real-world machine learning systems. In this field
graceful degradation refers to the optimisation of model performance as it
encounters this out-of-distribution data. This work presents a definition and
discussion of graceful degradation and where it can be applied in deployed
visual systems. Following this a survey of relevant areas is undertaken,
novelly splitting the graceful degradation problem into active and passive
approaches. In passive approaches, graceful degradation is handled and achieved
by the model in a self-contained manner, in active approaches the model is
updated upon encountering epistemic uncertainties. This work communicates the
importance of the problem and aims to prompt the development of machine
learning strategies that are aware of graceful degradation
Omnidirectional Transfer for Quasilinear Lifelong Learning
In biological learning, data are used to improve performance not only on the
current task, but also on previously encountered and as yet unencountered
tasks. In contrast, classical machine learning starts from a blank slate, or
tabula rasa, using data only for the single task at hand. While typical
transfer learning algorithms can improve performance on future tasks, their
performance on prior tasks degrades upon learning new tasks (called
catastrophic forgetting). Many recent approaches for continual or lifelong
learning have attempted to maintain performance given new tasks. But striving
to avoid forgetting sets the goal unnecessarily low: the goal of lifelong
learning, whether biological or artificial, should be to improve performance on
all tasks (including past and future) with any new data. We propose
omnidirectional transfer learning algorithms, which includes two special cases
of interest: decision forests and deep networks. Our key insight is the
development of the omni-voter layer, which ensembles representations learned
independently on all tasks to jointly decide how to proceed on any given new
data point, thereby improving performance on both past and future tasks. Our
algorithms demonstrate omnidirectional transfer in a variety of simulated and
real data scenarios, including tabular data, image data, spoken data, and
adversarial tasks. Moreover, they do so with quasilinear space and time
complexity
On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey
Recent advances in NLP are brought by a range of large-scale pretrained
language models (PLMs). These PLMs have brought significant performance gains
for a range of NLP tasks, circumventing the need to customize complex designs
for specific tasks. However, most current work focus on finetuning PLMs on a
domain-specific datasets, ignoring the fact that the domain gap can lead to
overfitting and even performance drop. Therefore, it is practically important
to find an appropriate method to effectively adapt PLMs to a target domain of
interest. Recently, a range of methods have been proposed to achieve this
purpose. Early surveys on domain adaptation are not suitable for PLMs due to
the sophisticated behavior exhibited by PLMs from traditional models trained
from scratch and that domain adaptation of PLMs need to be redesigned to take
effect. This paper aims to provide a survey on these newly proposed methods and
shed light in how to apply traditional machine learning methods to newly
evolved and future technologies. By examining the issues of deploying PLMs for
downstream tasks, we propose a taxonomy of domain adaptation approaches from a
machine learning system view, covering methods for input augmentation, model
optimization and personalization. We discuss and compare those methods and
suggest promising future research directions
DyAnNet: A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network
Unsupervised approaches for video anomaly detection may not perform as good
as supervised approaches. However, learning unknown types of anomalies using an
unsupervised approach is more practical than a supervised approach as
annotation is an extra burden. In this paper, we use isolation tree-based
unsupervised clustering to partition the deep feature space of the video
segments. The RGB- stream generates a pseudo anomaly score and the flow stream
generates a pseudo dynamicity score of a video segment. These scores are then
fused using a majority voting scheme to generate preliminary bags of positive
and negative segments. However, these bags may not be accurate as the scores
are generated only using the current segment which does not represent the
global behavior of a typical anomalous event. We then use a refinement strategy
based on a cross-branch feed-forward network designed using a popular I3D
network to refine both scores. The bags are then refined through a segment
re-mapping strategy. The intuition of adding the dynamicity score of a segment
with the anomaly score is to enhance the quality of the evidence. The method
has been evaluated on three popular video anomaly datasets, i.e., UCF-Crime,
CCTV-Fights, and UBI-Fights. Experimental results reveal that the proposed
framework achieves competitive accuracy as compared to the state-of-the-art
video anomaly detection methods.Comment: 10 pages, 8 figures, and 4 tables. (ACCEPTED AT WACV 2023
- …