4 research outputs found
Learning reliable representations when proxy objectives fail
Representation learning involves using an objective to learn a mapping from data space to a representation space. When the downstream task for which a mapping must be learned is unknown, or is too costly to cast as an objective, we must rely on proxy objectives for learning. In this Thesis I focus on representation learning for images, and address three cases where proxy objectives fail to produce a mapping that performs well on the downstream tasks.
When learning neural network mappings from image space to a discrete hash space for fast content-based image retrieval, a proxy objective is needed which captures the requirement for relevant responses to be nearer to the hash of any query than irrelevant ones. At the same time, it is important to ensure an even distribution of image hashes across the whole hash space for efficient information use and high discrimination. Proxy objectives fail when they do not meet these requirements. I propose composing hash codes in two parts. First a standard classifier is used to predict class labels that are converted to a binary representation for state-of-the-art performance on the image retrieval task. Second, a binary deep decision tree layer (DDTL) is used to model further intra-class differences and produce approximately evenly distributed hash codes. The DDTL requires no discretisation during learning and produces hash codes that enable better discrimination between data in the same class when compared to previous methods, while remaining robust to real-world augmentations in the data space.
In the scenario where we require a neural network to partition the data into clusters that correspond well with ground-truth labels, a proxy objective is needed to define how these clusters are formed. One such proxy objective involves maximising the mutual information between cluster assignments made by a neural network from multiple views. In this context, views are different augmentations of the same image and the cluster assignments are the representations computed by a neural network. I demonstrate that this proxy objective produces parameters for the neural network that are sub-optimal in that a better set of parameters can be found using the same objective and a different training method. I introduce deep hierarchical object grouping (DHOG) as a method to learn a hierarchy (in the sense of easy-to-hard orderings, not structure) of solutions to the proxy objective and show how this improves performance on the downstream task.
When there are features in the training data from which it is easier to compute class predictions (e.g., background colour), when compared to features for which it is relatively more difficult to compute class predictions (e.g., digit type), standard classification objectives (e.g., cross-entropy) fail to produce robust classifiers. The problem is that if a model learns to rely on `easy' features it will also ignore `complex' features (easy versus complex are purely relative in this case). I introduce latent adversarial debiasing (LAD) to decouple easy features from the class labels by first modelling the underlying structure of the training data as a latent representation using a vector-quantised variational autoencoder, and then I use a gradient-based procedure to adjust the features in this representation to confuse the predictions of a constrained classifier trained to predict class labels from the same representation. The adjusted representations of the data are then decoded to produce an augmented training dataset that can be used for training in a standard manner.
I show in the aforementioned scenarios that proxy objectives can fail and demonstrate that alternative approaches can mitigate against the associated failures. I suggest an analytic approach to understanding the limits of proxy objectives for every use case in order to make the adjustments to the data or the objectives and ensure good performance on downstream tasks
Learning Debiased Classifier with Biased Committee
Neural networks are prone to be biased towards spurious correlations between
classes and latent attributes exhibited in a major portion of training data,
which ruins their generalization capability. We propose a new method for
training debiased classifiers with no spurious attribute label. The key idea is
to employ a committee of classifiers as an auxiliary module that identifies
bias-conflicting data, i.e., data without spurious correlation, and assigns
large weights to them when training the main classifier. The committee is
learned as a bootstrapped ensemble so that a majority of its classifiers are
biased as well as being diverse, and intentionally fail to predict classes of
bias-conflicting data accordingly. The consensus within the committee on
prediction difficulty thus provides a reliable cue for identifying and
weighting bias-conflicting data. Moreover, the committee is also trained with
knowledge transferred from the main classifier so that it gradually becomes
debiased along with the main classifier and emphasizes more difficult data as
training progresses. On five real-world datasets, our method outperforms prior
arts using no spurious attribute label like ours and even surpasses those
relying on bias labels occasionally.Comment: Conference on Neural Information Processing Systems (NeurIPS), New
Orleans, 202
Self-supervised and semi-supervised learning for road condition estimation from distributed road-side cameras
Monitoring road conditions, e.g., water build-up due to intense rainfall, plays a fundamental role in ensuring road safety while increasing resilience to the effects of climate change. Distributed cameras provide an easy and affordable alternative to instrumented weather stations, enabling diffused and capillary road monitoring. Here, we propose a deep learning-based solution to automatically detect wet road events in continuous video streams acquired by road-side surveillance cameras. Our contribution is two-fold: first, we employ a convolutional Long Short-Term Memory model (convLSTM) to detect subtle changes in the road appearance, introducing a novel temporally consistent data augmentation to increase robustness to outdoor illumination conditions. Second, we present a contrastive self-supervised framework that is uniquely tailored to
surveillance camera networks. The proposed technique was validated on a large-scale dataset comprising roughly 2000 full day sequences (roughly 400K video frames, of which 300K unlabelled), acquired from several road-side cameras over a span of two years. Experimental results show the effectiveness of self-supervised and semi-supervised learning, increasing the
frame classification performance (measured by the Area under the ROC curve) from 0.86 to 0.92. From the standpoint of event detection, we show that incorporating temporal features through a convLSTM model both improves the detection rate of wet road events (+10%) and reduces false positive alarms (–45%). The proposed techniques could benefit also other tasks related
to weather analysis from road-side and vehicle-mounted cameras
Group Robust Classification Without Any Group Information
Empirical risk minimization (ERM) is sensitive to spurious correlations in
the training data, which poses a significant risk when deploying systems
trained under this paradigm in high-stake applications. While the existing
literature focuses on maximizing group-balanced or worst-group accuracy,
estimating these accuracies is hindered by costly bias annotations. This study
contends that current bias-unsupervised approaches to group robustness continue
to rely on group information to achieve optimal performance. Firstly, these
methods implicitly assume that all group combinations are represented during
training. To illustrate this, we introduce a systematic generalization task on
the MPI3D dataset and discover that current algorithms fail to improve the ERM
baseline when combinations of observed attribute values are missing. Secondly,
bias labels are still crucial for effective model selection, restricting the
practicality of these methods in real-world scenarios. To address these
limitations, we propose a revised methodology for training and validating
debiased models in an entirely bias-unsupervised manner. We achieve this by
employing pretrained self-supervised models to reliably extract bias
information, which enables the integration of a logit adjustment training loss
with our validation criterion. Our empirical analysis on synthetic and
real-world tasks provides evidence that our approach overcomes the identified
challenges and consistently enhances robust accuracy, attaining performance
which is competitive with or outperforms that of state-of-the-art methods,
which, conversely, rely on bias labels for validation.Comment: Accepted at the 37th Conference on Neural Information Processing
Systems (NeurIPS 2023). Code is available at https://github.com/tsirif/uL