6 research outputs found

    Cheeger Inequalities for Vertex Expansion and Reweighted Eigenvalues

    Full text link
    The classical Cheeger's inequality relates the edge conductance Ο•\phi of a graph and the second smallest eigenvalue Ξ»2\lambda_2 of the Laplacian matrix. Recently, Olesker-Taylor and Zanetti discovered a Cheeger-type inequality ψ2/log⁑∣Vβˆ£β‰²Ξ»2βˆ—β‰²Οˆ\psi^2 / \log |V| \lesssim \lambda_2^* \lesssim \psi connecting the vertex expansion ψ\psi of a graph G=(V,E)G=(V,E) and the maximum reweighted second smallest eigenvalue Ξ»2βˆ—\lambda_2^* of the Laplacian matrix. In this work, we first improve their result to ψ2/log⁑d≲λ2βˆ—β‰²Οˆ\psi^2 / \log d \lesssim \lambda_2^* \lesssim \psi where dd is the maximum degree in GG, which is optimal assuming the small-set expansion conjecture. Also, the improved result holds for weighted vertex expansion, answering an open question by Olesker-Taylor and Zanetti. Building on this connection, we then develop a new spectral theory for vertex expansion. We discover that several interesting generalizations of Cheeger inequalities relating edge conductances and eigenvalues have a close analog in relating vertex expansions and reweighted eigenvalues. These include an analog of Trevisan's result on bipartiteness, an analog of higher order Cheeger's inequality, and an analog of improved Cheeger's inequality. Finally, inspired by this connection, we present negative evidence to the 0/10/1-polytope edge expansion conjecture by Mihail and Vazirani. We construct 0/10/1-polytopes whose graphs have very poor vertex expansion. This implies that the fastest mixing time to the uniform distribution on the vertices of these 0/10/1-polytopes is almost linear in the graph size. This does not provide a counterexample to the conjecture, but this is in contrast with known positive results which proved poly-logarithmic mixing time to the uniform distribution on the vertices of subclasses of 0/10/1-polytopes.Comment: 65 pages, 1 figure. Minor change

    Sparse Methods for Learning Multiple Subspaces from Large-scale, Corrupted and Imbalanced Data

    Get PDF
    In many practical applications in machine learning, computer vision, data mining and information retrieval one is confronted with datasets whose intrinsic dimension is much smaller than the dimension of the ambient space. This has given rise to the challenge of effectively learning multiple low-dimensional subspaces from such data. Multi-subspace learning methods based on sparse representation, such as sparse representation based classification (SRC) and sparse subspace clustering (SSC) have become very popular due to their conceptual simplicity and empirical success. However, there have been very limited theoretical explanations for the correctness of such approaches in the literature. Moreover, the applicability of existing algorithms to real world datasets is limited due to their high computational and memory complexity, sensitivity to data corruptions as well as sensitivity to imbalanced data distributions. This thesis attempts to advance our theoretical understanding of sparse representation based multi-subspace learning methods, as well as develop new algorithms for handling large-scale, corrupted and imbalanced data. The first contribution of this thesis is a theoretical analysis of the correctness of such methods. In our geometric and randomized analysis, we answer important theoretical questions such as the effect of subspace arrangement, data distribution, subspace dimension, data sampling density, and so on. The second contribution of this thesis is the development of practical subspace clustering algorithms that are able to deal with large-scale, corrupted and imbalanced datasets. To deal with large-scale data, we study different approaches based on active support and divide-and-conquer ideas, and show that these approaches offer a good tradeoff between high accuracy and low running time. To deal with corrupted data, we construct a Markov chain whose stationary distribution can be used to separate between inliers and outliers. Finally, we propose an efficient exemplar selection and subspace clustering method that outperforms traditional methods on imbalanced data