84,284 research outputs found
Topic Identification for Speech without ASR
Modern topic identification (topic ID) systems for speech use automatic
speech recognition (ASR) to produce speech transcripts, and perform supervised
classification on such ASR outputs. However, under resource-limited conditions,
the manually transcribed speech required to develop standard ASR systems can be
severely limited or unavailable. In this paper, we investigate alternative
unsupervised solutions to obtaining tokenizations of speech in terms of a
vocabulary of automatically discovered word-like or phoneme-like units, without
depending on the supervised training of ASR systems. Moreover, using automatic
phoneme-like tokenizations, we demonstrate that a convolutional neural network
based framework for learning spoken document representations provides
competitive performance compared to a standard bag-of-words representation, as
evidenced by comprehensive topic ID evaluations on both single-label and
multi-label classification tasks.Comment: 5 pages, 2 figures; accepted for publication at Interspeech 201
Bi-directional representation learning for multi-label classification
Multi-label classification is a central problem in many application domains. In this paper, we present a novel supervised bi-directional model that learns a low-dimensional mid-level representation for multi-label classification. Unlike traditional multi-label learning methods which identify intermediate representations from either the input space or the output space but not both, the mid-level representation in our model has two complementary parts that capture intrinsic information of the input data and the output labels respectively under the autoencoder principle while augmenting each other for the target output label prediction. The resulting optimization problem can be solved efficiently using an iterative procedure with alternating steps, while closed-form solutions exist for one major step. Our experiments conducted on a variety of multi-label data sets demonstrate the efficacy of the proposed bi-directional representation learning model for multi-label classification
Weakly Supervised Semantic Segmentation by Knowledge Graph Inference
Currently, existing efforts in Weakly Supervised Semantic Segmentation (WSSS)
based on Convolutional Neural Networks (CNNs) have predominantly focused on
enhancing the multi-label classification network stage, with limited attention
given to the equally important downstream segmentation network. Furthermore,
CNN-based local convolutions lack the ability to model the extensive
inter-category dependencies. Therefore, this paper introduces a graph
reasoning-based approach to enhance WSSS. The aim is to improve WSSS
holistically by simultaneously enhancing both the multi-label classification
and segmentation network stages. In the multi-label classification network
segment, external knowledge is integrated, coupled with GCNs, to globally
reason about inter-class dependencies. This encourages the network to uncover
features in non-salient regions of images, thereby refining the completeness of
generated pseudo-labels. In the segmentation network segment, the proposed
Graph Reasoning Mapping (GRM) module is employed to leverage knowledge obtained
from textual databases, facilitating contextual reasoning for class
representation within image regions. This GRM module enhances feature
representation in high-level semantics of the segmentation network's local
convolutions, while dynamically learning semantic coherence for individual
samples. Using solely image-level supervision, we have achieved
state-of-the-art performance in WSSS on the PASCAL VOC 2012 and MS-COCO
datasets. Extensive experimentation on both the multi-label classification and
segmentation network stages underscores the effectiveness of the proposed graph
reasoning approach for advancing WSSS
Semi-Supervised Learning with Scarce Annotations
While semi-supervised learning (SSL) algorithms provide an efficient way to
make use of both labelled and unlabelled data, they generally struggle when the
number of annotated samples is very small. In this work, we consider the
problem of SSL multi-class classification with very few labelled instances. We
introduce two key ideas. The first is a simple but effective one: we leverage
the power of transfer learning among different tasks and self-supervision to
initialize a good representation of the data without making use of any label.
The second idea is a new algorithm for SSL that can exploit well such a
pre-trained representation.
The algorithm works by alternating two phases, one fitting the labelled
points and one fitting the unlabelled ones, with carefully-controlled
information flow between them. The benefits are greatly reducing overfitting of
the labelled data and avoiding issue with balancing labelled and unlabelled
losses during training. We show empirically that this method can successfully
train competitive models with as few as 10 labelled data points per class. More
in general, we show that the idea of bootstrapping features using
self-supervised learning always improves SSL on standard benchmarks. We show
that our algorithm works increasingly well compared to other methods when
refining from other tasks or datasets.Comment: Workshop on Deep Vision, CVPR 202
Radio Galaxy Zoo: Towards building the first multi-purpose foundation model for radio astronomy with self-supervised learning
In this work, we apply self-supervised learning with instance differentiation
to learn a robust, multi-purpose representation for image analysis of resolved
extragalactic continuum images. We train a multi-use model which compresses our
unlabelled data into a structured, low dimensional representation which can be
used for a variety of downstream tasks (e.g. classification, similarity
search). We exceed baseline supervised Fanaroff-Riley classification
performance by a statistically significant margin, with our model reducing the
test set error by up to half. Our model is also able to maintain high
classification accuracy with very few labels, with only 7.79% error when only
using 145 labels. We further demonstrate that by using our foundation model,
users can efficiently trade off compute, human labelling cost and test set
accuracy according to their respective budgets, allowing for efficient
classification in a wide variety of scenarios. We highlight the
generalizability of our model by showing that it enables accurate
classification in a label scarce regime with data from the new MIGHTEE survey
without any hyper-parameter tuning, where it improves upon the baseline by ~8%.
Visualizations of our labelled and un-labelled data show that our model's
representation space is structured with respect to physical properties of the
sources, such as angular source extent. We show that the learned representation
is scientifically useful even if no labels are available by performing a
similarity search, finding hybrid sources in the RGZ DR1 data-set without any
labels. We show that good augmentation design and hyper-parameter choice can
help achieve peak performance, while emphasising that optimal hyper-parameters
are not required to obtain benefits from self-supervised pre-training
A generic self-supervised learning (SSL) framework for representation learning from spectra-spatial feature of unlabeled remote sensing imagery
Remote sensing data has been widely used for various Earth Observation (EO)
missions such as land use and cover classification, weather forecasting,
agricultural management, and environmental monitoring. Most existing remote
sensing data-based models are based on supervised learning that requires large
and representative human-labelled data for model training, which is costly and
time-consuming. Recently, self-supervised learning (SSL) enables the models to
learn a representation from orders of magnitude more unlabelled data. This
representation has been proven to boost the performance of downstream tasks and
has potential for remote sensing applications. The success of SSL is heavily
dependent on a pre-designed pretext task, which introduces an inductive bias
into the model from a large amount of unlabelled data. Since remote sensing
imagery has rich spectral information beyond the standard RGB colour space, the
pretext tasks established in computer vision based on RGB images may not be
straightforward to be extended to the multi/hyperspectral domain. To address
this challenge, this work has designed a novel SSL framework that is capable of
learning representation from both spectra-spatial information of unlabelled
data. The framework contains two novel pretext tasks for object-based and
pixel-based remote sensing data analysis methods, respectively. Through two
typical downstream tasks evaluation (a multi-label land cover classification
task on Sentienl-2 multispectral datasets and a ground soil parameter retrieval
task on hyperspectral datasets), the results demonstrate that the
representation obtained through the proposed SSL achieved a significant
improvement in model performance
A review of multi-instance learning assumptions
Multi-instance (MI) learning is a variant of inductive machine learning, where each learning example contains a bag of instances instead of a single feature vector. The term commonly refers to the supervised setting, where each bag is associated with a label. This type of representation is a natural fit for a number of real-world learning scenarios, including drug activity prediction and image classification, hence many MI learning algorithms have been proposed. Any MI learning method must relate instances to bag-level class labels, but many types of relationships between instances and class labels are possible. Although all early work in MI learning assumes a specific MI concept class known to be appropriate for a drug activity prediction domain; this ‘standard MI assumption’ is not guaranteed to hold in other domains. Much of the recent work in MI learning has concentrated on a relaxed view of the MI problem, where the standard MI assumption is dropped, and alternative assumptions are considered instead. However, often it is not clearly stated what particular assumption is used and how it relates to other assumptions that have been proposed. In this paper, we aim to clarify the use of alternative MI assumptions by reviewing the work done in this area
Extreme multi-label learning with Gaussian processes
In modern probabilistic machine learning, Gaussian process models have provided both powerful and principled ways to approach a series of challenging problems. Nonetheless, their applicability can be significantly limited by cases where the number of training data points is large, something very typical in many modern machine learning applications. An additional restriction can be imposed when the posterior distribution is intractable due to non-Gaussian likelihoods used. Despite the fact that these two limitations have been efficiently addressed over the last decade, applications of Gaussian process models under extreme regimes where the number of the training data points and the dimensionality of both input and output space is extremely large have not appeared in literature so far. This thesis is focused on this kind of applications of Gaussian processes where supervised tasks such as multi-class and multi-label classification are considered. We start by discussing the main mathematical tools required in order to successfully cope with the large scale of the datasets. Those include a variational inference framework, suitably tailored for Gaussian processes. Furthermore, in our attempt to alleviate the computational burden, we introduce a new parametrization for the variational distribution while a representation trick for reducing storage requirements for large input dimensions is also discussed. A methodology is then presented which is based on this variational inference framework and a computationally efficient bound on the softmax function that allows the use of Gaussian processes for multi-class classification problems that involve arbitrarily large number of classes. A series of experiments test and compare the performance of this methodology with other methods. Finally, we move to the more general multi-label classification task and we develop a method, also relied on the same variational inference framework, which can deal with datasets involving hundreds of thousands data points, input dimensions and labels. The effectiveness of our method is supported by experiments on several real-world multi-label datasets
- …