64 research outputs found
Approximating the Spectrum of a Graph
The spectrum of a network or graph with adjacency matrix ,
consists of the eigenvalues of the normalized Laplacian . This set of eigenvalues encapsulates many aspects of the structure
of the graph, including the extent to which the graph posses community
structures at multiple scales. We study the problem of approximating the
spectrum , of in the regime where the graph is too
large to explicitly calculate the spectrum. We present a sublinear time
algorithm that, given the ability to query a random node in the graph and
select a random neighbor of a given node, computes a succinct representation of
an approximation , such that . Our algorithm has query complexity and running time ,
independent of the size of the graph, . We demonstrate the practical
viability of our algorithm on 15 different real-world graphs from the Stanford
Large Network Dataset Collection, including social networks, academic
collaboration graphs, and road networks. For the smallest of these graphs, we
are able to validate the accuracy of our algorithm by explicitly calculating
the true spectrum; for the larger graphs, such a calculation is computationally
prohibitive.
In addition we study the implications of our algorithm to property testing in
the bounded degree graph model
Efficient Classification for Metric Data
Recent advances in large-margin classification of data residing in general
metric spaces (rather than Hilbert spaces) enable classification under various
natural metrics, such as string edit and earthmover distance. A general
framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004]
left open the questions of computational efficiency and of providing direct
bounds on generalization error.
We design a new algorithm for classification in general metric spaces, whose
runtime and accuracy depend on the doubling dimension of the data points, and
can thus achieve superior classification performance in many common scenarios.
The algorithmic core of our approach is an approximate (rather than exact)
solution to the classical problems of Lipschitz extension and of Nearest
Neighbor Search. The algorithm's generalization performance is guaranteed via
the fat-shattering dimension of Lipschitz classifiers, and we present
experimental evidence of its superiority to some common kernel methods. As a
by-product, we offer a new perspective on the nearest neighbor classifier,
which yields significantly sharper risk asymptotics than the classic analysis
of Cover and Hart [IEEE Trans. Info. Theory, 1967].Comment: This is the full version of an extended abstract that appeared in
Proceedings of the 23rd COLT, 201
Bourgain's discretization theorem
Bourgain's discretization theorem asserts that there exists a universal
constant with the following property. Let be Banach
spaces with . Fix and set .
Assume that is a -net in the unit ball of and that
admits a bi-Lipschitz embedding into with distortion at most
. Then the entire space admits a bi-Lipschitz embedding into with
distortion at most . This mostly expository article is devoted to a
detailed presentation of a proof of Bourgain's theorem.
We also obtain an improvement of Bourgain's theorem in the important case
when for some : in this case it suffices to take
for the same conclusion to hold true. The case
of this improved discretization result has the following consequence. For
arbitrarily large there exists a family of
-point subsets of such that if we write
then any embedding of , equipped with the
Earthmover metric (a.k.a. transportation cost metric or minimumum weight
matching metric) incurs distortion at least a constant multiple of
; the previously best known lower bound for this problem was
a constant multiple of .Comment: Proof of Lemma 5.1 corrected; its statement remains unchange
- …