149,827 research outputs found

    Semi-supervised binary classification with latent distance learning

    Full text link
    Binary classification (BC) is a practical task that is ubiquitous in real-world problems, such as distinguishing healthy and unhealthy objects in biomedical diagnostics and defective and non-defective products in manufacturing inspections. Nonetheless, fully annotated data are commonly required to effectively solve this problem, and their collection by domain experts is a tedious and expensive procedure. In contrast to BC, several significant semi-supervised learning techniques that heavily rely on stochastic data augmentation techniques have been devised for solving multi-class classification. In this study, we demonstrate that the stochastic data augmentation technique is less suitable for solving typical BC problems because it can omit crucial features that strictly distinguish between positive and negative samples. To address this issue, we propose a new learning representation to solve the BC problem using a few labels with a random k-pair cross-distance learning mechanism. First, by harnessing a few labeled samples, the encoder network learns the projection of positive and negative samples in angular spaces to maximize and minimize their inter-class and intra-class distances, respectively. Second, the classifier learns to discriminate between positive and negative samples using on-the-fly labels generated based on the angular space and labeled samples to solve BC tasks. Extensive experiments were conducted using four real-world publicly available BC datasets. With few labels and without any data augmentation techniques, the proposed method outperformed state-of-the-art semi-supervised and self-supervised learning methods. Moreover, with 10% labeling, our semi-supervised classifier could obtain competitive accuracy compared with a fully supervised setting

    Self-taught semi-supervised dictionary learning with non-negative constraint

    Get PDF
    This paper investigates classification by dictionary learning. A novel unified framework termed self-taught semisupervised dictionary learning with non-negative constraint (NNST-SSDL) is proposed for simultaneously optimizing the components of a dictionary and a graph Laplacian. Specifically, an atom graph Laplacian regularization is built by using sparse coefficients to effectively capture the underlying manifold structure. It is more robust to noisy samples and outliers because atoms are more concise and representative than training samples. A non-negative constraint imposed on the sparse coefficients guarantees that each sample is in the middle of its related atoms. In this way the dependency between samples and atoms is made explicit. Furthermore, a self-taught mechanism is introduced to effectively feed back the manifold structure induced by atom graph Laplacian regularization and the supervised information hidden in unlabeled samples in order to learn a better dictionary. An efficient algorithm, combining a block coordinate descent method with the alternating direction method of multipliers is derived to optimize the unified framework. Experimental results on several benchmark datasets show the effectiveness of the proposed model

    Constructing a Non-Negative Low Rank and Sparse Graph with Data-Adaptive Features

    Full text link
    This paper aims at constructing a good graph for discovering intrinsic data structures in a semi-supervised learning setting. Firstly, we propose to build a non-negative low-rank and sparse (referred to as NNLRS) graph for the given data representation. Specifically, the weights of edges in the graph are obtained by seeking a nonnegative low-rank and sparse matrix that represents each data sample as a linear combination of others. The so-obtained NNLRS-graph can capture both the global mixture of subspaces structure (by the low rankness) and the locally linear structure (by the sparseness) of the data, hence is both generative and discriminative. Secondly, as good features are extremely important for constructing a good graph, we propose to learn the data embedding matrix and construct the graph jointly within one framework, which is termed as NNLRS with embedded features (referred to as NNLRS-EF). Extensive experiments on three publicly available datasets demonstrate that the proposed method outperforms the state-of-the-art graph construction method by a large margin for both semi-supervised classification and discriminative analysis, which verifies the effectiveness of our proposed method

    Semi-supervised heterogeneous evolutionary co-clustering

    Get PDF
    One of the challenges of the machine learning problem is the absence of sufficient number of labeled instances or training instances. At the same time generating labeled data is expensive and time consuming. The semi-supervised approach has shown promising results to solve the problem of insufficient or fewer labeled instance datasets. The key challenge is incorporating the semi-supervised knowledge into the heterogeneous data which is evolving in nature. Most of the prior work that uses semi-supervised knowledge has been performed on heterogeneous static data. The semi-supervised knowledge is incorporated into data which aid the clustering algorithm to obtain better clusters. The semi-supervised knowledge is provided as constrained based or distance based. I am proposing a framework to incorporate prior knowledge to perform co-clustering on the evolving heterogeneous data. This framework can be used to solve a wide range of problems dealing with text analysis, web analysis and image grouping. In the semi-supervised approach we incorporate the domain knowledge by placing the constraints which aid the clustering process in performing effective clustering of the data. In the proposed framework, I am using the constraint based semi-supervised non-negative matrix factorization approach to obtain the co-clustering on the heterogeneous evolving data. The constraint based semi-supervised approach uses the user provided must-link or cannot-link constraints on the central data type before performing co-clustering. To process the original datasets efficiently in terms of time and space I am using the low rank approximation technique to obtain the sparse representation of the input data matrix using the Dynamic Colibri approach

    Noisy multi-label semi-supervised dimensionality reduction

    Get PDF
    Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted on solving the challenge posed by noisy labels in non-standard settings. This includes situations where only a fraction of the samples are labeled (semi-supervised) and each high-dimensional sample is associated with multiple labels. In this work, we present a novel semi-supervised and multi-label dimensionality reduction method that effectively utilizes information from both noisy multi-labels and unlabeled data. With the proposed Noisy multi-label semi-supervised dimensionality reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled data are labeled simultaneously via a specially designed label propagation algorithm. NMLSDR then learns a projection matrix for reducing the dimensionality by maximizing the dependence between the enlarged and denoised multi-label space and the features in the projected space. Extensive experiments on synthetic data, benchmark datasets, as well as a real-world case study, demonstrate the effectiveness of the proposed algorithm and show that it outperforms state-of-the-art multi-label feature extraction algorithms.Comment: 38 page

    Few-Shot Non-Parametric Learning with Deep Latent Variable Model

    Full text link
    Most real-world problems that machine learning algorithms are expected to solve face the situation with 1) unknown data distribution; 2) little domain-specific knowledge; and 3) datasets with limited annotation. We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV), a learning framework for any dataset with abundant unlabeled data but very few labeled ones. By only training a generative model in an unsupervised way, the framework utilizes the data distribution to build a compressor. Using a compressor-based distance metric derived from Kolmogorov complexity, together with few labeled data, NPC-LV classifies without further training. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime and even outperform semi-supervised learning methods on CIFAR-10. We demonstrate how and when negative evidence lowerbound (nELBO) can be used as an approximate compressed length for classification. By revealing the correlation between compression rate and classification accuracy, we illustrate that under NPC-LV, the improvement of generative models can enhance downstream classification accuracy.Comment: Accepted to NeurIPS202
    • …
    corecore