Search CORE

24 research outputs found

Topics in Graph Construction for Semi-Supervised Learning

Author: Talukdar Partha Pratim
Publication venue: ScholarlyCommons
Publication date: 27/08/2009
Field of study

Graph-based Semi-Supervised Learning (SSL) methods have had empirical success in a variety of domains, ranging from natural language processing to bioinformatics. Such methods consist of two phases. In the first phase, a graph is constructed from the available data; in the second phase labels are inferred for unlabeled nodes in the constructed graph. While many algorithms have been developed for label inference, thus far little attention has been paid to the crucial graph construction phase and only recently has the importance of the graph construction for the resulting success in label inference been recognized. In this report, we shall review some of the recently proposed graph construction methods for graph-based SSL. We shall also present suggestions for future research in this area

ScholarlyCommons@Penn

Regular graph construction for semi-supervised learning

Author: Berton Lilian
Eberle André Mantini
Liang Zhao
Lopes Alneu de Andrade
Vega-Oliveros Didier Augusto
Publication venue: Bristol
Publication date: 01/01/2014
Field of study

Semi-supervised learning (SSL) stands out for using a small amount of labeled points for data clustering and classification. In this scenario graph-based methods allow the analysis of local and global characteristics of the available data by identifying classes or groups regardless data distribution and representing submanifold in Euclidean space. Most of methods used in literature for SSL classification do not worry about graph construction. However, regular graphs can obtain better classification accuracy compared to traditional methods such as k-nearest neighbor (kNN), since kNN benefits the generation of hubs and it is not appropriate for high-dimensionality data. Nevertheless, methods commonly used for generating regular graphs have high computational cost. We tackle this problem introducing an alternative method for generation of regular graphs with better runtime performance compared to methods usually find in the area. Our technique is based on the preferential selection of vertices according some topological measures, like closeness, generating at the end of the process a regular graph. Experiments using the global and local consistency method for label propagation show that our method provides better or equal classification rate in comparison with kNN.Sao Paulo Research Foundation (FAPESP) (Grant 2011/21880-3)National Council for Scientific and Technological (CNPq)2nd International Conference on Mathematical Modeling in Physical Sciences (IC-MSQUARE 2013).\ud Prague, Czech Republic. 01-05 september 2013

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidade de São Paulo

Similarity Learning via Kernel Preserving Embedding

Author: Kang Zhao
Li Changsheng
Lu Yiwei
Su Yuanzhang
Xu Zenglin
Publication venue
Publication date: 11/03/2019
Field of study

Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has been developed and successfully applied in various models, such as low-rank representation, sparse subspace learning, semi-supervised learning. However, it just tries to reconstruct the original data and some valuable information, e.g., the manifold structure, is largely ignored. In this paper, we argue that it is beneficial to preserve the overall relations when we extract similarity information. Specifically, we propose a novel similarity learning framework by minimizing the reconstruction error of kernel matrices, rather than the reconstruction error of original data adopted by existing work. Taking the clustering task as an example to evaluate our method, we observe considerable improvements compared to other state-of-the-art methods. More importantly, our proposed framework is very general and provides a novel and fundamental building block for many other similarity-based tasks. Besides, our proposed kernel preserving opens up a large number of possibilities to embed high-dimensional data into low-dimensional space.Comment: Published in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Distributed Low-rank Subspace Segmentation

Author: Chang Shih-Fu
Jordan Michael I.
Mackey Lester
Mu Yadong
Talwalkar Ameet
Publication venue
Publication date: 15/10/2013
Field of study

Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data. Low-Rank Representation (LRR), a convex formulation of the subspace segmentation problem, is provably and empirically accurate on small problems but does not scale to the massive sizes of modern vision datasets. Moreover, past work aimed at scaling up low-rank matrix factorization is not applicable to LRR given its non-decomposable constraints. In this work, we propose a novel divide-and-conquer algorithm for large-scale subspace segmentation that can cope with LRR's non-decomposable constraints and maintains LRR's strong recovery guarantees. This has immediate implications for the scalability of subspace segmentation, which we demonstrate on a benchmark face recognition dataset and in simulations. We then introduce novel applications of LRR-based subspace segmentation to large-scale semi-supervised learning for multimedia event detection, concept detection, and image tagging. In each case, we obtain state-of-the-art results and order-of-magnitude speed ups

arXiv.org e-Print Archive

Crossref

Analysis of label noise in graph-based semi-supervised learning

Author: Belkin M.
Mao Yu
Xia Zheng
Zhu Xiaojin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/09/2020
Field of study

In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data. However, the labeling process can be tedious, long, costly, and error-prone. It is often the case that most of our data is unlabeled. Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution. This paradigm has been successful in practice, but most SSL algorithms end up fully trusting the few available labels. In real life, both humans and automated systems are prone to mistakes; it is essential that our algorithms are able to work with labels that are both few and also unreliable. Our work aims to perform an extensive empirical evaluation of existing graph-based semi-supervised algorithms, like Gaussian Fields and Harmonic Functions, Local and Global Consistency, Laplacian Eigenmaps, Graph Transduction Through Alternating Minimization. To do that, we compare the accuracy of classifiers while varying the amount of labeled data and label noise for many different samples. Our results show that, if the dataset is consistent with SSL assumptions, we are able to detect the noisiest instances, although this gets harder when the number of available labels decreases. Also, the Laplacian Eigenmaps algorithm performed better than label propagation when the data came from high-dimensional clusters

arXiv.org e-Print Archive

Crossref

Robust graph learning from noisy data

Author: HOI Steven C. H.
KANG Zhao
PAN Haiqi
XU Zenglin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2020
Field of study

Institutional Knowledge at Singapore Management University