64 research outputs found
Sit Back and Relax: Learning to Drive Incrementally in All Weather Conditions
In autonomous driving scenarios, current object detection models show strong
performance when tested in clear weather. However, their performance
deteriorates significantly when tested in degrading weather conditions. In
addition, even when adapted to perform robustly in a sequence of different
weather conditions, they are often unable to perform well in all of them and
suffer from catastrophic forgetting. To efficiently mitigate forgetting, we
propose Domain-Incremental Learning through Activation Matching (DILAM), which
employs unsupervised feature alignment to adapt only the affine parameters of a
clear weather pre-trained network to different weather conditions. We propose
to store these affine parameters as a memory bank for each weather condition
and plug-in their weather-specific parameters during driving (i.e. test time)
when the respective weather conditions are encountered. Our memory bank is
extremely lightweight, since affine parameters account for less than 2% of a
typical object detector. Furthermore, contrary to previous domain-incremental
learning approaches, we do not require the weather label when testing and
propose to automatically infer the weather condition by a majority voting
linear classifier.Comment: Intelligent Vehicle Conference (oral presentation
E2-AEN: End-to-End Incremental Learning with Adaptively Expandable Network
Expandable networks have demonstrated their advantages in dealing with
catastrophic forgetting problem in incremental learning. Considering that
different tasks may need different structures, recent methods design dynamic
structures adapted to different tasks via sophisticated skills. Their routine
is to search expandable structures first and then train on the new tasks,
which, however, breaks tasks into multiple training stages, leading to
suboptimal or overmuch computational cost. In this paper, we propose an
end-to-end trainable adaptively expandable network named E2-AEN, which
dynamically generates lightweight structures for new tasks without any accuracy
drop in previous tasks. Specifically, the network contains a serial of powerful
feature adapters for augmenting the previously learned representations to new
tasks, and avoiding task interference. These adapters are controlled via an
adaptive gate-based pruning strategy which decides whether the expanded
structures can be pruned, making the network structure dynamically changeable
according to the complexity of the new tasks. Moreover, we introduce a novel
sparsity-activation regularization to encourage the model to learn
discriminative features with limited parameters. E2-AEN reduces cost and can be
built upon any feed-forward architectures in an end-to-end manner. Extensive
experiments on both classification (i.e., CIFAR and VDD) and detection (i.e.,
COCO, VOC and ICCV2021 SSLAD challenge) benchmarks demonstrate the
effectiveness of the proposed method, which achieves the new remarkable
results
Continual Learning of Natural Language Processing Tasks: A Survey
Continual learning (CL) is an emerging learning paradigm that aims to emulate
the human capability of learning and accumulating knowledge continually without
forgetting the previously learned knowledge and also transferring the knowledge
to new tasks to learn them better. This survey presents a comprehensive review
of the recent progress of CL in the NLP field. It covers (1) all CL settings
with a taxonomy of existing techniques. Besides dealing with forgetting, it
also focuses on (2) knowledge transfer, which is of particular importance to
NLP. Both (1) and (2) are not mentioned in the existing survey. Finally, a list
of future directions is also discussed
SRCD: Semantic Reasoning with Compound Domains for Single-Domain Generalized Object Detection
This paper provides a novel framework for single-domain generalized object
detection (i.e., Single-DGOD), where we are interested in learning and
maintaining the semantic structures of self-augmented compound cross-domain
samples to enhance the model's generalization ability. Different from DGOD
trained on multiple source domains, Single-DGOD is far more challenging to
generalize well to multiple target domains with only one single source domain.
Existing methods mostly adopt a similar treatment from DGOD to learn
domain-invariant features by decoupling or compressing the semantic space.
However, there may have two potential limitations: 1) pseudo attribute-label
correlation, due to extremely scarce single-domain data; and 2) the semantic
structural information is usually ignored, i.e., we found the affinities of
instance-level semantic relations in samples are crucial to model
generalization. In this paper, we introduce Semantic Reasoning with Compound
Domains (SRCD) for Single-DGOD. Specifically, our SRCD contains two main
components, namely, the texture-based self-augmentation (TBSA) module, and the
local-global semantic reasoning (LGSR) module. TBSA aims to eliminate the
effects of irrelevant attributes associated with labels, such as light, shadow,
color, etc., at the image level by a light-yet-efficient self-augmentation.
Moreover, LGSR is used to further model the semantic relationships on instance
features to uncover and maintain the intrinsic semantic structures. Extensive
experiments on multiple benchmarks demonstrate the effectiveness of the
proposed SRCD.Comment: 10 pages, 5 figure
Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI
Influenced by the great success of deep learning via cloud computing and the
rapid development of edge chips, research in artificial intelligence (AI) has
shifted to both of the computing paradigms, i.e., cloud computing and edge
computing. In recent years, we have witnessed significant progress in
developing more advanced AI models on cloud servers that surpass traditional
deep learning models owing to model innovations (e.g., Transformers, Pretrained
families), explosion of training data and soaring computing capabilities.
However, edge computing, especially edge and cloud collaborative computing, are
still in its infancy to announce their success due to the resource-constrained
IoT scenarios with very limited algorithms deployed. In this survey, we conduct
a systematic review for both cloud and edge AI. Specifically, we are the first
to set up the collaborative learning mechanism for cloud and edge modeling with
a thorough review of the architectures that enable such mechanism. We also
discuss potentials and practical experiences of some on-going advanced edge AI
topics including pretraining models, graph neural networks and reinforcement
learning. Finally, we discuss the promising directions and challenges in this
field.Comment: 20 pages, Transactions on Knowledge and Data Engineerin
Toward Efficient and Robust Computer Vision for Large-Scale Edge Applications
The past decade has been witnessing remarkable advancements in computer vision and deep learning algorithms, ushering in a transformative wave of large-scale edge applications across various industries. These image processing methods, however, still encounter numerous challenges when it comes to meeting real-world demands, especially in terms of accuracy and latency at scale. Indeed, striking a balance among efficiency, robustness, and scalability remains a common obstacle. This dissertation investigates these issues in the context of different computer vision tasks, including image classification, semantic segmentation, depth estimation, and object detection. We introduce novel solutions, focusing on utilizing adjustable neural networks, joint multi-task architecture search, and generalized supervision interpolation. The first obstacle revolves around the ability to trade off between speed and accuracy in convolutional neural networks (CNNs) during inference on resource-constrained platforms. Despite their progress, CNNs are typically monolithic at runtime, which can present practical difficulties since computational budgets may vary over time. To address this, we introduce Any-Width Network, an adjustable-width CNN architecture that utilizes a novel Triangular Convolution module to enable fine-grained control over speed and accuracy during inference. The second challenge focuses on the computationally demanding nature of dense prediction tasks such as semantic segmentation and depth estimation. This issue becomes especially problematic for edge platforms with limited resources. To tackle this, we propose a novel and scalable framework named EDNAS. EDNAS leverages the synergistic relationship between Multi-Task Learning and hardware-aware Neural Architecture Search to significantly enhance on-device speed and accuracy of dense predictions. Finally, to improve the robustness of object detection, we introduce a novel data mixing augmentation. While mixing techniques such as Mixup have proven successful in image classification, their application to object detection is non-trivial due to spatial misalignment, foreground/background distinction, and instance multiplicity. To address these issues, we propose a generalized data mixing principle, Supervision Interpolation, and its simple yet effective implementation, LossMix. By addressing these challenges, this dissertation aims to facilitate better efficiency, accuracy, and scalability of computer vision and deep learning algorithms and contribute to the advancement of large-scale edge applications across different domains.Doctor of Philosoph
Regularization, Adaptation and Generalization of Neural Networks
The ability to generalize to unseen data is one of the fundamental, desired
properties in a learning system. This thesis reports dierent research eorts
in improving the generalization properties of machine learning systems at
dierent levels, focusing on neural networks for computer vision tasks.
First, a novel regularization method is presented, Curriculum Dropout. It
combines Curriculum Learning and Dropout, and shows better regularization
eects than the original algorithm in a variety of tasks, without requiring
substantially any additional implementation eorts.
While regularization methods are extremely powerful to better generalize
to unseen data from the same distribution as the training one, they are not
very successful in mitigating the dataset bias issue. This problem constitutes
in models learning the peculiarities of the training set, and poorly generalizing
to unseen domains. Unsupervised domain adaptation has been one of the main
solutions to this problem. Two novel adaptation approaches are presented in
this thesis. First, we introduce the DIFA algorithm, which combines domain
invariance and feature augmentation to better adapt models to new domains
by relying on adversarial training. Next, we propose an original procedure that
exploits the \mode collapse" behavior of Generative Adversarial Networks.
Finally, the general applicability of domain adaptation algorithms is
questioned (due to the assumptions of knowing the target distribution a
priori and being able to sample from it). A novel framework is presented to
overcome its liabilities, where the goal is to generalize to unseen domains by
relying only on data from a single source distribution. We face this problem
through the lens of robust statistics, dening a worst-case formulation where
the model parameters are optimized with respect to populations which are
-distant from the source domain on a semantic space
Federated Learning for Medical Image Analysis: A Survey
Machine learning in medical imaging often faces a fundamental dilemma, namely
the small sample size problem. Many recent studies suggest using multi-domain
data pooled from different acquisition sites/datasets to improve statistical
power. However, medical images from different sites cannot be easily shared to
build large datasets for model training due to privacy protection reasons. As a
promising solution, federated learning, which enables collaborative training of
machine learning models based on data from different sites without cross-site
data sharing, has attracted considerable attention recently. In this paper, we
conduct a comprehensive survey of the recent development of federated learning
methods in medical image analysis. We first introduce the background and
motivation of federated learning for dealing with privacy protection and
collaborative learning issues in medical imaging. We then present a
comprehensive review of recent advances in federated learning methods for
medical image analysis. Specifically, existing methods are categorized based on
three critical aspects of a federated learning system, including client end,
server end, and communication techniques. In each category, we summarize the
existing federated learning methods according to specific research problems in
medical image analysis and also provide insights into the motivations of
different approaches. In addition, we provide a review of existing benchmark
medical imaging datasets and software platforms for current federated learning
research. We also conduct an experimental study to empirically evaluate typical
federated learning methods for medical image analysis. This survey can help to
better understand the current research status, challenges and potential
research opportunities in this promising research field.Comment: 19 pages, 6 figure
- …