109 research outputs found
Common and Unique Feature Learning for Data Fusion
University of Technology Sydney. Faculty of Engineering and Information Technology.In today’s era of big data, information about a phenomenon of interest is available from multiple acquisitions. Data captured from each of these acquisition frameworks are commonly known as modality, where each modality provides information in a complementary manner. Despite the evident benefits and plethora of works on data fusion, two challenging issues persist, 1) feature representation: how to exploit the data diversity that multiple modalities offer, and 2) feature fusion: how to combine the heterogeneous information for better decision making.
To address these challenges, this thesis presents a significantly improved model of two widely utilised fusion techniques, a) early fusion: combining features from multiple modalities for joint prediction, and b) late fusion: combining modality-specific predictions at the decision level. I illustrate how both these techniques have their own specific limitations, with late fusion unable to harness the inter-modality benefits, and the reliance of early fusion on a single model causing failure when information from any modality is futile. To overcome these drawbacks, I developed novel multimodal systems that performs feature extraction and feature fusion in a consolidated frameworks. Technically, I designed feature extraction schemes to capture both unique information from individual modalities and common information from multimode representations. I then combine these two kinds of information for supervised prediction, by designing efficient fusion schemes that enable this frameworks to perform information discovery and feature fusion simultaneously.
In this thesis, I also demonstrated the benefits of fusing both the common and unique information in supervised learning and validate the significance of the developed techniques on multimodal, multiview, and multisource datasets. The designed methods leverage the multimodal benefits by creating additional diversity, and obtain a more unified view of the underlying phenomenon for better decision making
Guided data augmentation for improved semi-supervised image classification in low data regime.
Deep learning models have achieved state of the art performances, especially for computer vision applications. Much of the recent successes can be attributed to the existence of large, high quality, labeled datasets. However, in many real-world applications, collecting similar datasets is often cumbersome and time consuming. For instance, developing robust automatic target recognition models from infrared images still faces major challenges. This is mainly due to the difficulty of acquiring high resolution inputs, sensitivity to the thermal sensors\u27 calibration, meteorological conditions, targets\u27 scale and viewpoint invariance. Ideally, a good training set should contain enough variations within each class for the model to learn the most optimal decision boundaries. However, when there are under-represented regions in the training feature space, especially in low data regime or in presence of low-quality inputs, the model risks learning sub-optimal decision boundaries, resulting in sub-optimal predictions. This dissertation presents novel data augmentation (DA) strategies aimed at improving the performance of machine learning models in low data regimes. The proposed techniques are designed to augment limited labeled datasets, providing the models with additional information to learn from.\\ The first contribution of this work is the development of Confidence-Guided Generative Augmentation (CGG-DA), a technique that trains and learns a generative model, such as Variational Autoencoder (VAE) and Deep Convolutional Generative Adversarial Networks (DCGAN), to generate synthetic augmentations. These generative models can generate labeled and/or unlabeled data by drawing from the same distribution as the under-performing samples based on a baseline reference model. By augmenting the training dataset with these synthetic images, CGG-DA aims to bridge the performance gap across different regions of the training feature space. We also introduce a Tool-Supported Contextual Augmentation (TSC-DA) technique that leverages existing ML models, such as classifiers or object detectors, to label available unlabeled data. Samples with consistent and high confidence predictions are used as labeled augmentations. On the other hand, samples with low confidence predictions might still contain some information even though they are more likely to be noisy and inconsistent. Hence, we keep them and use them as unlabeled samples during. Our third proposed DA explores the use of existing ML tools and external image repositories for data augmentation. This approach, called Guided External Data Augmentation (EG-DA), leverages external image repositories to augment the available dataset. External repositories are typically noisy, and might include a lot of out-of-distribution (OOD) samples. If included in the training process without proper handling, OOD samples can confuse the model and degrade the performance. To tackle this issue, we design and train a VAE-based anomaly detection component and use it to filter out any OOD samples. Since our DA includes both labeled data and a larger set of unlabeled data, we use semi-supervised training to exploit the information contained in the generated augmentations. This can guide the network to learn complex representations, and generalize to new data. The proposed data augmentation techniques are evaluated on two computer vision applications, and using multiple scenarios. We also compare our approach, using benchmark datasets, to baseline models trained on the initial labeled data only, and to existing data augmentation techniques. We show that each proposed augmentation consistently improve the results. We also perform an in-depth analysis to justify the observed improvements
Domain Generalization -- A Causal Perspective
Machine learning models rely on various assumptions to attain high accuracy.
One of the preliminary assumptions of these models is the independent and
identical distribution, which suggests that the train and test data are sampled
from the same distribution. However, this assumption seldom holds in the real
world due to distribution shifts. As a result models that rely on this
assumption exhibit poor generalization capabilities. Over the recent years,
dedicated efforts have been made to improve the generalization capabilities of
these models collectively known as -- \textit{domain generalization methods}.
The primary idea behind these methods is to identify stable features or
mechanisms that remain invariant across the different distributions. Many
generalization approaches employ causal theories to describe invariance since
causality and invariance are inextricably intertwined. However, current surveys
deal with the causality-aware domain generalization methods on a very
high-level. Furthermore, we argue that it is possible to categorize the methods
based on how causality is leveraged in that method and in which part of the
model pipeline is it used. To this end, we categorize the causal domain
generalization methods into three categories, namely, (i) Invariance via Causal
Data Augmentation methods which are applied during the data pre-processing
stage, (ii) Invariance via Causal representation learning methods that are
utilized during the representation learning stage, and (iii) Invariance via
Transferring Causal mechanisms methods that are applied during the
classification stage of the pipeline. Furthermore, this survey includes
in-depth insights into benchmark datasets and code repositories for domain
generalization methods. We conclude the survey with insights and discussions on
future directions
Domain Generalization by Rejecting Extreme Augmentations
Data augmentation is one of the most effective techniques for regularizing
deep learning models and improving their recognition performance in a variety
of tasks and domains. However, this holds for standard in-domain settings, in
which the training and test data follow the same distribution. For the
out-of-domain case, where the test data follow a different and unknown
distribution, the best recipe for data augmentation is unclear. In this paper,
we show that for out-of-domain and domain generalization settings, data
augmentation can provide a conspicuous and robust improvement in performance.
To do that, we propose a simple training procedure: (i) use uniform sampling on
standard data augmentation transformations; (ii) increase the strength
transformations to account for the higher data variance expected when working
out-of-domain, and (iii) devise a new reward function to reject extreme
transformations that can harm the training. With this procedure, our data
augmentation scheme achieves a level of accuracy that is comparable to or
better than state-of-the-art methods on benchmark domain generalization
datasets. Code: \url{https://github.com/Masseeh/DCAug
A Probabilistic Approach to Self-Supervised Learning using Cyclical Stochastic Gradient MCMC
In this paper we present a practical Bayesian self-supervised learning method
with Cyclical Stochastic Gradient Hamiltonian Monte Carlo (cSGHMC). Within this
framework, we place a prior over the parameters of a self-supervised learning
model and use cSGHMC to approximate the high dimensional and multimodal
posterior distribution over the embeddings. By exploring an expressive
posterior over the embeddings, Bayesian self-supervised learning produces
interpretable and diverse representations. Marginalizing over these
representations yields a significant gain in performance, calibration and
out-of-distribution detection on a variety of downstream classification tasks.
We provide experimental results on multiple classification tasks on four
challenging datasets. Moreover, we demonstrate the effectiveness of the
proposed method in out-of-distribution detection using the SVHN and CIFAR-10
datasets
Efficient Teacher: Semi-Supervised Object Detection for YOLOv5
Semi-Supervised Object Detection (SSOD) has been successful in improving the
performance of both R-CNN series and anchor-free detectors. However, one-stage
anchor-based detectors lack the structure to generate high-quality or flexible
pseudo labels, leading to serious inconsistency problems in SSOD. In this
paper, we propose the Efficient Teacher framework for scalable and effective
one-stage anchor-based SSOD training, consisting of Dense Detector, Pseudo
Label Assigner, and Epoch Adaptor. Dense Detector is a baseline model that
extends RetinaNet with dense sampling techniques inspired by YOLOv5. The
Efficient Teacher framework introduces a novel pseudo label assignment
mechanism, named Pseudo Label Assigner, which makes more refined use of pseudo
labels from Dense Detector. Epoch Adaptor is a method that enables a stable and
efficient end-to-end semi-supervised training schedule for Dense Detector. The
Pseudo Label Assigner prevents the occurrence of bias caused by a large number
of low-quality pseudo labels that may interfere with the Dense Detector during
the student-teacher mutual learning mechanism, and the Epoch Adaptor utilizes
domain and distribution adaptation to allow Dense Detector to learn globally
distributed consistent features, making the training independent of the
proportion of labeled data. Our experiments show that the Efficient Teacher
framework achieves state-of-the-art results on VOC, COCO-standard, and
COCO-additional using fewer FLOPs than previous methods. To the best of our
knowledge, this is the first attempt to apply Semi-Supervised Object Detection
to YOLOv5.Comment: 14 page
Data Optimization in Deep Learning: A Survey
Large-scale, high-quality data are considered an essential factor for the
successful application of many deep learning techniques. Meanwhile, numerous
real-world deep learning tasks still have to contend with the lack of
sufficient amounts of high-quality data. Additionally, issues such as model
robustness, fairness, and trustworthiness are also closely related to training
data. Consequently, a huge number of studies in the existing literature have
focused on the data aspect in deep learning tasks. Some typical data
optimization techniques include data augmentation, logit perturbation, sample
weighting, and data condensation. These techniques usually come from different
deep learning divisions and their theoretical inspirations or heuristic
motivations may seem unrelated to each other. This study aims to organize a
wide range of existing data optimization methodologies for deep learning from
the previous literature, and makes the effort to construct a comprehensive
taxonomy for them. The constructed taxonomy considers the diversity of split
dimensions, and deep sub-taxonomies are constructed for each dimension. On the
basis of the taxonomy, connections among the extensive data optimization
methods for deep learning are built in terms of four aspects. We probe into
rendering several promising and interesting future directions. The constructed
taxonomy and the revealed connections will enlighten the better understanding
of existing methods and the design of novel data optimization techniques.
Furthermore, our aspiration for this survey is to promote data optimization as
an independent subdivision of deep learning. A curated, up-to-date list of
resources related to data optimization in deep learning is available at
\url{https://github.com/YaoRujing/Data-Optimization}
Dual Contrastive Learning for Spatio-temporal Representation
Contrastive learning has shown promising potential in self-supervised
spatio-temporal representation learning. Most works naively sample different
clips to construct positive and negative pairs. However, we observe that this
formulation inclines the model towards the background scene bias. The
underlying reasons are twofold. First, the scene difference is usually more
noticeable and easier to discriminate than the motion difference. Second, the
clips sampled from the same video often share similar backgrounds but have
distinct motions. Simply regarding them as positive pairs will draw the model
to the static background rather than the motion pattern. To tackle this
challenge, this paper presents a novel dual contrastive formulation.
Concretely, we decouple the input RGB video sequence into two complementary
modes, static scene and dynamic motion. Then, the original RGB features are
pulled closer to the static features and the aligned dynamic features,
respectively. In this way, the static scene and the dynamic motion are
simultaneously encoded into the compact RGB representation. We further conduct
the feature space decoupling via activation maps to distill static- and
dynamic-related features. We term our method as \textbf{D}ual
\textbf{C}ontrastive \textbf{L}earning for spatio-temporal
\textbf{R}epresentation (DCLR). Extensive experiments demonstrate that DCLR
learns effective spatio-temporal representations and obtains state-of-the-art
or comparable performance on UCF-101, HMDB-51, and Diving-48 datasets.Comment: ACM MM 2022 camera read
- …