287 research outputs found
Infinite Class Mixup
Mixup is a widely adopted strategy for training deep networks, where
additional samples are augmented by interpolating inputs and labels of training
pairs. Mixup has shown to improve classification performance, network
calibration, and out-of-distribution generalisation. While effective, a
cornerstone of Mixup, namely that networks learn linear behaviour patterns
between classes, is only indirectly enforced since the output interpolation is
performed at the probability level. This paper seeks to address this limitation
by mixing the classifiers directly instead of mixing the labels for each mixed
pair. We propose to define the target of each augmented sample as a uniquely
new classifier, whose parameters are a linear interpolation of the classifier
vectors of the input pair. The space of all possible classifiers is continuous
and spans all interpolations between classifier pairs. To make optimisation
tractable, we propose a dual-contrastive Infinite Class Mixup loss, where we
contrast the classifier of a mixed pair to both the classifiers and the
predicted outputs of other mixed pairs in a batch. Infinite Class Mixup is
generic in nature and applies to many variants of Mixup. Empirically, we show
that it outperforms standard Mixup and variants such as RegMixup and Remix on
balanced, long-tailed, and data-constrained benchmarks, highlighting its broad
applicability.Comment: BMVC 202
Incremental Cluster Validity Index-Guided Online Learning for Performance and Robustness to Presentation Order
In streaming data applications, the incoming samples are processed and discarded, and therefore, intelligent decision-making is crucial for the performance of lifelong learning systems. In addition, the order in which the samples arrive may heavily affect the performance of incremental learners. The recently introduced incremental cluster validity indices (iCVIs) provide valuable aid in addressing such class of problems. Their primary use case has been cluster quality monitoring; nonetheless, they have been recently integrated in a streaming clustering method. In this context, the work presented, here, introduces the first adaptive resonance theory (ART)-based model that uses iCVIs for unsupervised and semi-supervised online learning. Moreover, it shows how to use iCVIs to regulate ART vigilance via an iCVI-based match tracking mechanism. The model achieves improved accuracy and robustness to ordering effects by integrating an online iCVI module as module B of a topological ART predictive mapping (TopoARTMAP)—thereby being named iCVI-TopoARTMAP—and using iCVI-driven postprocessing heuristics at the end of each learning step. The online iCVI module provides assignments of input samples to clusters at each iteration in accordance to any of the several iCVIs. The iCVI-TopoARTMAP maintains useful properties shared by the ART predictive mapping (ARTMAP) models, such as stability, immunity to catastrophic forgetting, and the many-to-one mapping capability via the map field module. The performance and robustness to the presentation order of iCVI-TopoARTMAP were evaluated via experiments with synthetic and real-world datasets
Efficacy of Neural Prediction-Based NAS for Zero-Shot NAS Paradigm
In prediction-based Neural Architecture Search (NAS), performance indicators
derived from graph convolutional networks have shown significant success. These
indicators, achieved by representing feed-forward structures as component
graphs through one-hot encoding, face a limitation: their inability to evaluate
architecture performance across varying search spaces. In contrast, handcrafted
performance indicators (zero-shot NAS), which use the same architecture with
random initialization, can generalize across multiple search spaces. Addressing
this limitation, we propose a novel approach for zero-shot NAS using deep
learning. Our method employs Fourier sum of sines encoding for convolutional
kernels, enabling the construction of a computational feed-forward graph with a
structure similar to the architecture under evaluation. These encodings are
learnable and offer a comprehensive view of the architecture's topological
information. An accompanying multi-layer perceptron (MLP) then ranks these
architectures based on their encodings. Experimental results show that our
approach surpasses previous methods using graph convolutional networks in terms
of correlation on the NAS-Bench-201 dataset and exhibits a higher convergence
rate. Moreover, our extracted feature representation trained on each
NAS-Benchmark is transferable to other NAS-Benchmarks, showing promising
generalizability across multiple search spaces. The code is available at:
https://github.com/minh1409/DFT-NPZS-NASComment: 12 pages, 6 figure
- …