64 research outputs found

    Approximating the Spectrum of a Graph

    Full text link
    The spectrum of a network or graph G=(V,E)G=(V,E) with adjacency matrix AA, consists of the eigenvalues of the normalized Laplacian L=ID1/2AD1/2L= I - D^{-1/2} A D^{-1/2}. This set of eigenvalues encapsulates many aspects of the structure of the graph, including the extent to which the graph posses community structures at multiple scales. We study the problem of approximating the spectrum λ=(λ1,,λV)\lambda = (\lambda_1,\dots,\lambda_{|V|}), 0λ1,,λV20 \le \lambda_1,\le \dots, \le \lambda_{|V|}\le 2 of GG in the regime where the graph is too large to explicitly calculate the spectrum. We present a sublinear time algorithm that, given the ability to query a random node in the graph and select a random neighbor of a given node, computes a succinct representation of an approximation λ~=(λ~1,,λ~V)\widetilde \lambda = (\widetilde \lambda_1,\dots,\widetilde \lambda_{|V|}), 0λ~1,,λ~V20 \le \widetilde \lambda_1,\le \dots, \le \widetilde \lambda_{|V|}\le 2 such that λ~λ1ϵV\|\widetilde \lambda - \lambda\|_1 \le \epsilon |V|. Our algorithm has query complexity and running time exp(O(1/ϵ))exp(O(1/\epsilon)), independent of the size of the graph, V|V|. We demonstrate the practical viability of our algorithm on 15 different real-world graphs from the Stanford Large Network Dataset Collection, including social networks, academic collaboration graphs, and road networks. For the smallest of these graphs, we are able to validate the accuracy of our algorithm by explicitly calculating the true spectrum; for the larger graphs, such a calculation is computationally prohibitive. In addition we study the implications of our algorithm to property testing in the bounded degree graph model

    Efficient Classification for Metric Data

    Full text link
    Recent advances in large-margin classification of data residing in general metric spaces (rather than Hilbert spaces) enable classification under various natural metrics, such as string edit and earthmover distance. A general framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004] left open the questions of computational efficiency and of providing direct bounds on generalization error. We design a new algorithm for classification in general metric spaces, whose runtime and accuracy depend on the doubling dimension of the data points, and can thus achieve superior classification performance in many common scenarios. The algorithmic core of our approach is an approximate (rather than exact) solution to the classical problems of Lipschitz extension and of Nearest Neighbor Search. The algorithm's generalization performance is guaranteed via the fat-shattering dimension of Lipschitz classifiers, and we present experimental evidence of its superiority to some common kernel methods. As a by-product, we offer a new perspective on the nearest neighbor classifier, which yields significantly sharper risk asymptotics than the classic analysis of Cover and Hart [IEEE Trans. Info. Theory, 1967].Comment: This is the full version of an extended abstract that appeared in Proceedings of the 23rd COLT, 201

    Bourgain's discretization theorem

    Full text link
    Bourgain's discretization theorem asserts that there exists a universal constant C(0,)C\in (0,\infty) with the following property. Let X,YX,Y be Banach spaces with dimX=n\dim X=n. Fix D(1,)D\in (1,\infty) and set δ=enCn\delta= e^{-n^{Cn}}. Assume that N\mathcal N is a δ\delta-net in the unit ball of XX and that N\mathcal N admits a bi-Lipschitz embedding into YY with distortion at most DD. Then the entire space XX admits a bi-Lipschitz embedding into YY with distortion at most CDCD. This mostly expository article is devoted to a detailed presentation of a proof of Bourgain's theorem. We also obtain an improvement of Bourgain's theorem in the important case when Y=LpY=L_p for some p[1,)p\in [1,\infty): in this case it suffices to take δ=C1n5/2\delta= C^{-1}n^{-5/2} for the same conclusion to hold true. The case p=1p=1 of this improved discretization result has the following consequence. For arbitrarily large nNn\in \mathbb{N} there exists a family Y\mathscr Y of nn-point subsets of 1,...,n2R2{1,...,n}^2\subseteq \mathbb{R}^2 such that if we write Y=N|\mathscr Y|= N then any L1L_1 embedding of Y\mathscr Y, equipped with the Earthmover metric (a.k.a. transportation cost metric or minimumum weight matching metric) incurs distortion at least a constant multiple of loglogN\sqrt{\log\log N}; the previously best known lower bound for this problem was a constant multiple of logloglogN\sqrt{\log\log \log N}.Comment: Proof of Lemma 5.1 corrected; its statement remains unchange
    corecore