1,978 research outputs found
Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck operators
This paper presents a diffusion based probabilistic interpretation of
spectral clustering and dimensionality reduction algorithms that use the
eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency
matrix of all points, we define a diffusion distance between any two data
points and show that the low dimensional representation of the data by the
first few eigenvectors of the corresponding Markov matrix is optimal under a
certain mean squared error criterion. Furthermore, assuming that data points
are random samples from a density p(\x) = e^{-U(\x)} we identify these
eigenvectors as discrete approximations of eigenfunctions of a Fokker-Planck
operator in a potential 2U(\x) with reflecting boundary conditions. Finally,
applying known results regarding the eigenvalues and eigenfunctions of the
continuous Fokker-Planck operator, we provide a mathematical justification for
the success of spectral clustering and dimensional reduction algorithms based
on these first few eigenvectors. This analysis elucidates, in terms of the
characteristics of diffusion processes, many empirical findings regarding
spectral clustering algorithms.Comment: submitted to NIPS 200
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
- …