186 research outputs found
The Normalized Graph Cut and Cheeger Constant: from Discrete to Continuous
Let M be a bounded domain of a Euclidian space with smooth boundary. We
relate the Cheeger constant of M and the conductance of a neighborhood graph
defined on a random sample from M. By restricting the minimization defining the
latter over a particular class of subsets, we obtain consistency (after
normalization) as the sample size increases, and show that any minimizing
sequence of subsets has a subsequence converging to a Cheeger set of M
Continuum limit of total variation on point clouds
We consider point clouds obtained as random samples of a measure on a
Euclidean domain. A graph representing the point cloud is obtained by assigning
weights to edges based on the distance between the points they connect. Our
goal is to develop mathematical tools needed to study the consistency, as the
number of available data points increases, of graph-based machine learning
algorithms for tasks such as clustering. In particular, we study when is the
cut capacity, and more generally total variation, on these graphs a good
approximation of the perimeter (total variation) in the continuum setting. We
address this question in the setting of -convergence. We obtain almost
optimal conditions on the scaling, as number of points increases, of the size
of the neighborhood over which the points are connected by an edge for the
-convergence to hold. Taking the limit is enabled by a transportation
based metric which allows to suitably compare functionals defined on different
point clouds
Compressive Embedding and Visualization using Graphs
Visualizing high-dimensional data has been a focus in data analysis
communities for decades, which has led to the design of many algorithms, some
of which are now considered references (such as t-SNE for example). In our era
of overwhelming data volumes, the scalability of such methods have become more
and more important. In this work, we present a method which allows to apply any
visualization or embedding algorithm on very large datasets by considering only
a fraction of the data as input and then extending the information to all data
points using a graph encoding its global similarity. We show that in most
cases, using only samples is sufficient to diffuse the
information to all data points. In addition, we propose quantitative
methods to measure the quality of embeddings and demonstrate the validity of
our technique on both synthetic and real-world datasets
Multiclass Semi-Supervised Learning on Graphs using Ginzburg-Landau Functional Minimization
We present a graph-based variational algorithm for classification of
high-dimensional data, generalizing the binary diffuse interface model to the
case of multiple classes. Motivated by total variation techniques, the method
involves minimizing an energy functional made up of three terms. The first two
terms promote a stepwise continuous classification function with sharp
transitions between classes, while preserving symmetry among the class labels.
The third term is a data fidelity term, allowing us to incorporate prior
information into the model in a semi-supervised framework. The performance of
the algorithm on synthetic data, as well as on the COIL and MNIST benchmark
datasets, is competitive with state-of-the-art graph-based multiclass
segmentation methods.Comment: 16 pages, to appear in Springer's Lecture Notes in Computer Science
volume "Pattern Recognition Applications and Methods 2013", part of series on
Advances in Intelligent and Soft Computin
- …