Search CORE

186 research outputs found

The Normalized Graph Cut and Cheeger Constant: from Discrete to Continuous

Author: Arias-Castro Ery
Pelletier Bruno
Pudlo Pierre
Publication venue: 'Applied Probability Trust'
Publication date: 09/06/2011
Field of study

Let M be a bounded domain of a Euclidian space with smooth boundary. We relate the Cheeger constant of M and the conductance of a neighborhood graph defined on a random sample from M. By restricting the minimization defining the latter over a particular class of subsets, we obtain consistency (after normalization) as the sample size increases, and show that any minimizing sequence of subsets has a subsequence converging to a Cheeger set of M

arXiv.org e-Print Archive

CiteSeerX

HAL-Rennes 1

Continuum limit of total variation on point clouds

Author: Slepčev Dejan
Trillos Nicolás García
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/09/2014
Field of study

We consider point clouds obtained as random samples of a measure on a Euclidean domain. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect. Our goal is to develop mathematical tools needed to study the consistency, as the number of available data points increases, of graph-based machine learning algorithms for tasks such as clustering. In particular, we study when is the cut capacity, and more generally total variation, on these graphs a good approximation of the perimeter (total variation) in the continuum setting. We address this question in the setting of

\Gamma

-convergence. We obtain almost optimal conditions on the scaling, as number of points increases, of the size of the neighborhood over which the points are connected by an edge for the

\Gamma

-convergence to hold. Taking the limit is enabled by a transportation based metric which allows to suitably compare functionals defined on different point clouds

arXiv.org e-Print Archive

CiteSeerX

Compressive Embedding and Visualization using Graphs

Author: Paratte Johan
Perraudin Nathanaël
Vandergheynst Pierre
Publication venue
Publication date: 19/02/2017
Field of study

Visualizing high-dimensional data has been a focus in data analysis communities for decades, which has led to the design of many algorithms, some of which are now considered references (such as t-SNE for example). In our era of overwhelming data volumes, the scalability of such methods have become more and more important. In this work, we present a method which allows to apply any visualization or embedding algorithm on very large datasets by considering only a fraction of the data as input and then extending the information to all data points using a graph encoding its global similarity. We show that in most cases, using only

\mathcal{O}(\log(N))

samples is sufficient to diffuse the information to all

N

data points. In addition, we propose quantitative methods to measure the quality of embeddings and demonstrate the validity of our technique on both synthetic and real-world datasets

arXiv.org e-Print Archive

Multiclass Semi-Supervised Learning on Graphs using Ginzburg-Landau Functional Minimization

Author: A Bertozzi
A Bertozzi
A Subramanya
AD Szlam
AL Bertozzi
D Zhou
EL Allwein
G Gilboa
GE Hinton
JA Dobrosotskaya
JA Dobrosotskaya
L Zelnik-Manor
RR Coifman
RV Kohn
TG Dietterich
Y LeCun
Y Li
YM Jung
Publication venue
Publication date: 06/06/2013
Field of study

We present a graph-based variational algorithm for classification of high-dimensional data, generalizing the binary diffuse interface model to the case of multiple classes. Motivated by total variation techniques, the method involves minimizing an energy functional made up of three terms. The first two terms promote a stepwise continuous classification function with sharp transitions between classes, while preserving symmetry among the class labels. The third term is a data fidelity term, allowing us to incorporate prior information into the model in a semi-supervised framework. The performance of the algorithm on synthetic data, as well as on the COIL and MNIST benchmark datasets, is competitive with state-of-the-art graph-based multiclass segmentation methods.Comment: 16 pages, to appear in Springer's Lecture Notes in Computer Science volume "Pattern Recognition Applications and Methods 2013", part of series on Advances in Intelligent and Soft Computin

arXiv.org e-Print Archive