84,284 research outputs found

    Topic Identification for Speech without ASR

    Full text link
    Modern topic identification (topic ID) systems for speech use automatic speech recognition (ASR) to produce speech transcripts, and perform supervised classification on such ASR outputs. However, under resource-limited conditions, the manually transcribed speech required to develop standard ASR systems can be severely limited or unavailable. In this paper, we investigate alternative unsupervised solutions to obtaining tokenizations of speech in terms of a vocabulary of automatically discovered word-like or phoneme-like units, without depending on the supervised training of ASR systems. Moreover, using automatic phoneme-like tokenizations, we demonstrate that a convolutional neural network based framework for learning spoken document representations provides competitive performance compared to a standard bag-of-words representation, as evidenced by comprehensive topic ID evaluations on both single-label and multi-label classification tasks.Comment: 5 pages, 2 figures; accepted for publication at Interspeech 201

    Bi-directional representation learning for multi-label classification

    Get PDF
    Multi-label classification is a central problem in many application domains. In this paper, we present a novel supervised bi-directional model that learns a low-dimensional mid-level representation for multi-label classification. Unlike traditional multi-label learning methods which identify intermediate representations from either the input space or the output space but not both, the mid-level representation in our model has two complementary parts that capture intrinsic information of the input data and the output labels respectively under the autoencoder principle while augmenting each other for the target output label prediction. The resulting optimization problem can be solved efficiently using an iterative procedure with alternating steps, while closed-form solutions exist for one major step. Our experiments conducted on a variety of multi-label data sets demonstrate the efficacy of the proposed bi-directional representation learning model for multi-label classification

    Weakly Supervised Semantic Segmentation by Knowledge Graph Inference

    Full text link
    Currently, existing efforts in Weakly Supervised Semantic Segmentation (WSSS) based on Convolutional Neural Networks (CNNs) have predominantly focused on enhancing the multi-label classification network stage, with limited attention given to the equally important downstream segmentation network. Furthermore, CNN-based local convolutions lack the ability to model the extensive inter-category dependencies. Therefore, this paper introduces a graph reasoning-based approach to enhance WSSS. The aim is to improve WSSS holistically by simultaneously enhancing both the multi-label classification and segmentation network stages. In the multi-label classification network segment, external knowledge is integrated, coupled with GCNs, to globally reason about inter-class dependencies. This encourages the network to uncover features in non-salient regions of images, thereby refining the completeness of generated pseudo-labels. In the segmentation network segment, the proposed Graph Reasoning Mapping (GRM) module is employed to leverage knowledge obtained from textual databases, facilitating contextual reasoning for class representation within image regions. This GRM module enhances feature representation in high-level semantics of the segmentation network's local convolutions, while dynamically learning semantic coherence for individual samples. Using solely image-level supervision, we have achieved state-of-the-art performance in WSSS on the PASCAL VOC 2012 and MS-COCO datasets. Extensive experimentation on both the multi-label classification and segmentation network stages underscores the effectiveness of the proposed graph reasoning approach for advancing WSSS

    Semi-Supervised Learning with Scarce Annotations

    Full text link
    While semi-supervised learning (SSL) algorithms provide an efficient way to make use of both labelled and unlabelled data, they generally struggle when the number of annotated samples is very small. In this work, we consider the problem of SSL multi-class classification with very few labelled instances. We introduce two key ideas. The first is a simple but effective one: we leverage the power of transfer learning among different tasks and self-supervision to initialize a good representation of the data without making use of any label. The second idea is a new algorithm for SSL that can exploit well such a pre-trained representation. The algorithm works by alternating two phases, one fitting the labelled points and one fitting the unlabelled ones, with carefully-controlled information flow between them. The benefits are greatly reducing overfitting of the labelled data and avoiding issue with balancing labelled and unlabelled losses during training. We show empirically that this method can successfully train competitive models with as few as 10 labelled data points per class. More in general, we show that the idea of bootstrapping features using self-supervised learning always improves SSL on standard benchmarks. We show that our algorithm works increasingly well compared to other methods when refining from other tasks or datasets.Comment: Workshop on Deep Vision, CVPR 202

    Radio Galaxy Zoo: Towards building the first multi-purpose foundation model for radio astronomy with self-supervised learning

    Full text link
    In this work, we apply self-supervised learning with instance differentiation to learn a robust, multi-purpose representation for image analysis of resolved extragalactic continuum images. We train a multi-use model which compresses our unlabelled data into a structured, low dimensional representation which can be used for a variety of downstream tasks (e.g. classification, similarity search). We exceed baseline supervised Fanaroff-Riley classification performance by a statistically significant margin, with our model reducing the test set error by up to half. Our model is also able to maintain high classification accuracy with very few labels, with only 7.79% error when only using 145 labels. We further demonstrate that by using our foundation model, users can efficiently trade off compute, human labelling cost and test set accuracy according to their respective budgets, allowing for efficient classification in a wide variety of scenarios. We highlight the generalizability of our model by showing that it enables accurate classification in a label scarce regime with data from the new MIGHTEE survey without any hyper-parameter tuning, where it improves upon the baseline by ~8%. Visualizations of our labelled and un-labelled data show that our model's representation space is structured with respect to physical properties of the sources, such as angular source extent. We show that the learned representation is scientifically useful even if no labels are available by performing a similarity search, finding hybrid sources in the RGZ DR1 data-set without any labels. We show that good augmentation design and hyper-parameter choice can help achieve peak performance, while emphasising that optimal hyper-parameters are not required to obtain benefits from self-supervised pre-training

    A generic self-supervised learning (SSL) framework for representation learning from spectra-spatial feature of unlabeled remote sensing imagery

    Get PDF
    Remote sensing data has been widely used for various Earth Observation (EO) missions such as land use and cover classification, weather forecasting, agricultural management, and environmental monitoring. Most existing remote sensing data-based models are based on supervised learning that requires large and representative human-labelled data for model training, which is costly and time-consuming. Recently, self-supervised learning (SSL) enables the models to learn a representation from orders of magnitude more unlabelled data. This representation has been proven to boost the performance of downstream tasks and has potential for remote sensing applications. The success of SSL is heavily dependent on a pre-designed pretext task, which introduces an inductive bias into the model from a large amount of unlabelled data. Since remote sensing imagery has rich spectral information beyond the standard RGB colour space, the pretext tasks established in computer vision based on RGB images may not be straightforward to be extended to the multi/hyperspectral domain. To address this challenge, this work has designed a novel SSL framework that is capable of learning representation from both spectra-spatial information of unlabelled data. The framework contains two novel pretext tasks for object-based and pixel-based remote sensing data analysis methods, respectively. Through two typical downstream tasks evaluation (a multi-label land cover classification task on Sentienl-2 multispectral datasets and a ground soil parameter retrieval task on hyperspectral datasets), the results demonstrate that the representation obtained through the proposed SSL achieved a significant improvement in model performance

    A review of multi-instance learning assumptions

    Get PDF
    Multi-instance (MI) learning is a variant of inductive machine learning, where each learning example contains a bag of instances instead of a single feature vector. The term commonly refers to the supervised setting, where each bag is associated with a label. This type of representation is a natural fit for a number of real-world learning scenarios, including drug activity prediction and image classification, hence many MI learning algorithms have been proposed. Any MI learning method must relate instances to bag-level class labels, but many types of relationships between instances and class labels are possible. Although all early work in MI learning assumes a specific MI concept class known to be appropriate for a drug activity prediction domain; this ‘standard MI assumption’ is not guaranteed to hold in other domains. Much of the recent work in MI learning has concentrated on a relaxed view of the MI problem, where the standard MI assumption is dropped, and alternative assumptions are considered instead. However, often it is not clearly stated what particular assumption is used and how it relates to other assumptions that have been proposed. In this paper, we aim to clarify the use of alternative MI assumptions by reviewing the work done in this area

    Extreme multi-label learning with Gaussian processes

    Get PDF
    In modern probabilistic machine learning, Gaussian process models have provided both powerful and principled ways to approach a series of challenging problems. Nonetheless, their applicability can be significantly limited by cases where the number of training data points is large, something very typical in many modern machine learning applications. An additional restriction can be imposed when the posterior distribution is intractable due to non-Gaussian likelihoods used. Despite the fact that these two limitations have been efficiently addressed over the last decade, applications of Gaussian process models under extreme regimes where the number of the training data points and the dimensionality of both input and output space is extremely large have not appeared in literature so far. This thesis is focused on this kind of applications of Gaussian processes where supervised tasks such as multi-class and multi-label classification are considered. We start by discussing the main mathematical tools required in order to successfully cope with the large scale of the datasets. Those include a variational inference framework, suitably tailored for Gaussian processes. Furthermore, in our attempt to alleviate the computational burden, we introduce a new parametrization for the variational distribution while a representation trick for reducing storage requirements for large input dimensions is also discussed. A methodology is then presented which is based on this variational inference framework and a computationally efficient bound on the softmax function that allows the use of Gaussian processes for multi-class classification problems that involve arbitrarily large number of classes. A series of experiments test and compare the performance of this methodology with other methods. Finally, we move to the more general multi-label classification task and we develop a method, also relied on the same variational inference framework, which can deal with datasets involving hundreds of thousands data points, input dimensions and labels. The effectiveness of our method is supported by experiments on several real-world multi-label datasets
    corecore