Search CORE

64 research outputs found

Approximating the Spectrum of a Graph

Author: Cohen-Steiner David
Kong Weihao
Sohler Christian
Valiant Gregory
Publication venue
Publication date: 05/12/2017
Field of study

The spectrum of a network or graph

G=(V,E)

with adjacency matrix

A

, consists of the eigenvalues of the normalized Laplacian

L= I - D^{-1/2} A D^{-1/2}

. This set of eigenvalues encapsulates many aspects of the structure of the graph, including the extent to which the graph posses community structures at multiple scales. We study the problem of approximating the spectrum

\lambda = (\lambda_1,\dots,\lambda_{|V|})

0 \le \lambda_1,\le \dots, \le \lambda_{|V|}\le 2

G

in the regime where the graph is too large to explicitly calculate the spectrum. We present a sublinear time algorithm that, given the ability to query a random node in the graph and select a random neighbor of a given node, computes a succinct representation of an approximation

\widetilde \lambda = (\widetilde \lambda_1,\dots,\widetilde \lambda_{|V|})

0 \le \widetilde \lambda_1,\le \dots, \le \widetilde \lambda_{|V|}\le 2

such that

\|\widetilde \lambda - \lambda\|_1 \le \epsilon |V|

. Our algorithm has query complexity and running time

exp(O(1/\epsilon))

, independent of the size of the graph,

|V|

. We demonstrate the practical viability of our algorithm on 15 different real-world graphs from the Stanford Large Network Dataset Collection, including social networks, academic collaboration graphs, and road networks. For the smallest of these graphs, we are able to validate the accuracy of our algorithm by explicitly calculating the true spectrum; for the larger graphs, such a calculation is computationally prohibitive. In addition we study the implications of our algorithm to property testing in the bounded degree graph model

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Efficient Classification for Metric Data

Author: Gottlieb Lee-Ad
Kontorovich Aryeh
Krauthgamer Robert
Publication venue
Publication date: 10/07/2014
Field of study

Recent advances in large-margin classification of data residing in general metric spaces (rather than Hilbert spaces) enable classification under various natural metrics, such as string edit and earthmover distance. A general framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004] left open the questions of computational efficiency and of providing direct bounds on generalization error. We design a new algorithm for classification in general metric spaces, whose runtime and accuracy depend on the doubling dimension of the data points, and can thus achieve superior classification performance in many common scenarios. The algorithmic core of our approach is an approximate (rather than exact) solution to the classical problems of Lipschitz extension and of Nearest Neighbor Search. The algorithm's generalization performance is guaranteed via the fat-shattering dimension of Lipschitz classifiers, and we present experimental evidence of its superiority to some common kernel methods. As a by-product, we offer a new perspective on the nearest neighbor classifier, which yields significantly sharper risk asymptotics than the classic analysis of Cover and Hart [IEEE Trans. Info. Theory, 1967].Comment: This is the full version of an extended abstract that appeared in Proceedings of the 23rd COLT, 201

arXiv.org e-Print Archive

CiteSeerX

Bourgain's discretization theorem

Author: Giladi Ohad
Naor Assaf
Schechtman Gideon
Publication venue
Publication date: 25/02/2015
Field of study

Bourgain's discretization theorem asserts that there exists a universal constant

C\in (0,\infty)

with the following property. Let

X,Y

be Banach spaces with

\dim X=n

. Fix

D\in (1,\infty)

and set

\delta= e^{-n^{Cn}}

. Assume that

\mathcal N

is a

\delta

-net in the unit ball of

X

and that

\mathcal N

admits a bi-Lipschitz embedding into

Y

with distortion at most

D

. Then the entire space

X

admits a bi-Lipschitz embedding into

Y

with distortion at most

CD

. This mostly expository article is devoted to a detailed presentation of a proof of Bourgain's theorem. We also obtain an improvement of Bourgain's theorem in the important case when

Y=L_p

for some

p\in [1,\infty)

: in this case it suffices to take

\delta= C^{-1}n^{-5/2}

for the same conclusion to hold true. The case

p=1

of this improved discretization result has the following consequence. For arbitrarily large

n\in \mathbb{N}

there exists a family

\mathscr Y

n

-point subsets of

{1,...,n}^2\subseteq \mathbb{R}^2

such that if we write

|\mathscr Y|= N

then any

L_1

embedding of

\mathscr Y

, equipped with the Earthmover metric (a.k.a. transportation cost metric or minimumum weight matching metric) incurs distortion at least a constant multiple of

\sqrt{\log\log N}

; the previously best known lower bound for this problem was a constant multiple of

\sqrt{\log\log \log N}

.Comment: Proof of Lemma 5.1 corrected; its statement remains unchange

arXiv.org e-Print Archive

CiteSeerX