149,827 research outputs found
Semi-supervised binary classification with latent distance learning
Binary classification (BC) is a practical task that is ubiquitous in
real-world problems, such as distinguishing healthy and unhealthy objects in
biomedical diagnostics and defective and non-defective products in
manufacturing inspections. Nonetheless, fully annotated data are commonly
required to effectively solve this problem, and their collection by domain
experts is a tedious and expensive procedure. In contrast to BC, several
significant semi-supervised learning techniques that heavily rely on stochastic
data augmentation techniques have been devised for solving multi-class
classification. In this study, we demonstrate that the stochastic data
augmentation technique is less suitable for solving typical BC problems because
it can omit crucial features that strictly distinguish between positive and
negative samples. To address this issue, we propose a new learning
representation to solve the BC problem using a few labels with a random k-pair
cross-distance learning mechanism. First, by harnessing a few labeled samples,
the encoder network learns the projection of positive and negative samples in
angular spaces to maximize and minimize their inter-class and intra-class
distances, respectively. Second, the classifier learns to discriminate between
positive and negative samples using on-the-fly labels generated based on the
angular space and labeled samples to solve BC tasks. Extensive experiments were
conducted using four real-world publicly available BC datasets. With few labels
and without any data augmentation techniques, the proposed method outperformed
state-of-the-art semi-supervised and self-supervised learning methods.
Moreover, with 10% labeling, our semi-supervised classifier could obtain
competitive accuracy compared with a fully supervised setting
Self-taught semi-supervised dictionary learning with non-negative constraint
This paper investigates classification by dictionary learning. A novel unified framework termed self-taught semisupervised dictionary learning with non-negative constraint (NNST-SSDL) is proposed for simultaneously optimizing the components of a dictionary and a graph Laplacian. Specifically, an atom graph Laplacian regularization is built by using sparse coefficients to effectively capture the underlying manifold structure. It is more robust to noisy samples and outliers because atoms are more concise and representative than training samples. A non-negative constraint imposed on the sparse coefficients guarantees that each sample is in the middle of its related atoms. In this way the dependency between samples and atoms is made explicit. Furthermore, a self-taught mechanism is introduced to effectively feed back the manifold structure induced by atom graph Laplacian regularization and the supervised information hidden in unlabeled samples in order to learn a better dictionary. An efficient algorithm, combining a block coordinate descent method with the alternating direction method of multipliers is derived to optimize the unified framework. Experimental results
on several benchmark datasets show the effectiveness of the proposed model
Constructing a Non-Negative Low Rank and Sparse Graph with Data-Adaptive Features
This paper aims at constructing a good graph for discovering intrinsic data
structures in a semi-supervised learning setting. Firstly, we propose to build
a non-negative low-rank and sparse (referred to as NNLRS) graph for the given
data representation. Specifically, the weights of edges in the graph are
obtained by seeking a nonnegative low-rank and sparse matrix that represents
each data sample as a linear combination of others. The so-obtained NNLRS-graph
can capture both the global mixture of subspaces structure (by the low
rankness) and the locally linear structure (by the sparseness) of the data,
hence is both generative and discriminative. Secondly, as good features are
extremely important for constructing a good graph, we propose to learn the data
embedding matrix and construct the graph jointly within one framework, which is
termed as NNLRS with embedded features (referred to as NNLRS-EF). Extensive
experiments on three publicly available datasets demonstrate that the proposed
method outperforms the state-of-the-art graph construction method by a large
margin for both semi-supervised classification and discriminative analysis,
which verifies the effectiveness of our proposed method
Semi-supervised heterogeneous evolutionary co-clustering
One of the challenges of the machine learning problem is the absence of sufficient number of labeled instances or training instances. At the same time generating labeled data is expensive and time consuming. The semi-supervised approach has shown promising results to solve the problem of insufficient or fewer labeled instance datasets. The key challenge is incorporating the semi-supervised knowledge into the heterogeneous data which is evolving in nature. Most of the prior work that uses semi-supervised knowledge has been performed on heterogeneous static data. The semi-supervised knowledge is incorporated into data which aid the clustering algorithm to obtain better clusters. The semi-supervised knowledge is provided as constrained based or distance based. I am proposing a framework to incorporate prior knowledge to perform co-clustering on the evolving heterogeneous data. This framework can be used to solve a wide range of problems dealing with text analysis, web analysis and image grouping. In the semi-supervised approach we incorporate the domain knowledge by placing the constraints which aid the clustering process in performing effective clustering of the data. In the proposed framework, I am using the constraint based semi-supervised non-negative matrix factorization approach to obtain the co-clustering on the heterogeneous evolving data. The constraint based semi-supervised approach uses the user provided must-link or cannot-link constraints on the central data type before performing co-clustering. To process the original datasets efficiently in terms of time and space I am using the low rank approximation technique to obtain the sparse representation of the input data matrix using the Dynamic Colibri approach
Noisy multi-label semi-supervised dimensionality reduction
Noisy labeled data represent a rich source of information that often are
easily accessible and cheap to obtain, but label noise might also have many
negative consequences if not accounted for. How to fully utilize noisy labels
has been studied extensively within the framework of standard supervised
machine learning over a period of several decades. However, very little
research has been conducted on solving the challenge posed by noisy labels in
non-standard settings. This includes situations where only a fraction of the
samples are labeled (semi-supervised) and each high-dimensional sample is
associated with multiple labels. In this work, we present a novel
semi-supervised and multi-label dimensionality reduction method that
effectively utilizes information from both noisy multi-labels and unlabeled
data. With the proposed Noisy multi-label semi-supervised dimensionality
reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled
data are labeled simultaneously via a specially designed label propagation
algorithm. NMLSDR then learns a projection matrix for reducing the
dimensionality by maximizing the dependence between the enlarged and denoised
multi-label space and the features in the projected space. Extensive
experiments on synthetic data, benchmark datasets, as well as a real-world case
study, demonstrate the effectiveness of the proposed algorithm and show that it
outperforms state-of-the-art multi-label feature extraction algorithms.Comment: 38 page
Few-Shot Non-Parametric Learning with Deep Latent Variable Model
Most real-world problems that machine learning algorithms are expected to
solve face the situation with 1) unknown data distribution; 2) little
domain-specific knowledge; and 3) datasets with limited annotation. We propose
Non-Parametric learning by Compression with Latent Variables (NPC-LV), a
learning framework for any dataset with abundant unlabeled data but very few
labeled ones. By only training a generative model in an unsupervised way, the
framework utilizes the data distribution to build a compressor. Using a
compressor-based distance metric derived from Kolmogorov complexity, together
with few labeled data, NPC-LV classifies without further training. We show that
NPC-LV outperforms supervised methods on all three datasets on image
classification in low data regime and even outperform semi-supervised learning
methods on CIFAR-10. We demonstrate how and when negative evidence lowerbound
(nELBO) can be used as an approximate compressed length for classification. By
revealing the correlation between compression rate and classification accuracy,
we illustrate that under NPC-LV, the improvement of generative models can
enhance downstream classification accuracy.Comment: Accepted to NeurIPS202
- …