7,412 research outputs found
Fault-tolerant routing in peer-to-peer systems
We consider the problem of designing an overlay network and routing mechanism
that permits finding resources efficiently in a peer-to-peer system. We argue
that many existing approaches to this problem can be modeled as the
construction of a random graph embedded in a metric space whose points
represent resource identifiers, where the probability of a connection between
two nodes depends only on the distance between them in the metric space. We
study the performance of a peer-to-peer system where nodes are embedded at grid
points in a simple metric space: a one-dimensional real line. We prove upper
and lower bounds on the message complexity of locating particular resources in
such a system, under a variety of assumptions about failures of either nodes or
the connections between them. Our lower bounds in particular show that the use
of inverse power-law distributions in routing, as suggested by Kleinberg
(1999), is close to optimal. We also give efficient heuristics to dynamically
maintain such a system as new nodes arrive and old nodes depart. Finally, we
give experimental results that suggest promising directions for future work.Comment: Full version of PODC 2002 paper. New version corrects missing
conditioning in Lemma 9 and some related details in the proof of Theorem 10,
with no changes to main result
Delay Performance and Mixing Times in Random-Access Networks
We explore the achievable delay performance in wireless random-access
networks. While relatively simple and inherently distributed in nature,
suitably designed queue-based random-access schemes provide the striking
capability to match the optimal throughput performance of centralized
scheduling mechanisms in a wide range of scenarios. The specific type of
activation rules for which throughput optimality has been established, may
however yield excessive queues and delays.
Motivated by that issue, we examine whether the poor delay performance is
inherent to the basic operation of these schemes, or caused by the specific
kind of activation rules. We derive delay lower bounds for queue-based
activation rules, which offer fundamental insight in the cause of the excessive
delays. For fixed activation rates we obtain lower bounds indicating that
delays and mixing times can grow dramatically with the load in certain
topologies as well
HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces
Nearest neighbor searching of large databases in high-dimensional spaces is
inherently difficult due to the curse of dimensionality. A flavor of
approximation is, therefore, necessary to practically solve the problem of
nearest neighbor search. In this paper, we propose a novel yet simple indexing
scheme, HD-Index, to solve the problem of approximate k-nearest neighbor
queries in massive high-dimensional databases. HD-Index consists of a set of
novel hierarchical structures called RDB-trees built on Hilbert keys of
database objects. The leaves of the RDB-trees store distances of database
objects to reference objects, thereby allowing efficient pruning using distance
filters. In addition to triangular inequality, we also use Ptolemaic inequality
to produce better lower bounds. Experiments on massive (up to billion scale)
high-dimensional (up to 1000+) datasets show that HD-Index is effective,
efficient, and scalable.Comment: PVLDB 11(8):906-919, 201
Efficient k-NN search on vertically decomposed data
Applications like multimedia retrieval require efficient support for similarity search on large data collections. Yet, nearest neighbor search is a difficult problem in high dimensional spaces, rendering efficient applications hard to realize: index structures degrade rapidly with increasing dimensionality, while sequential search is not an attractive solution for repositories with millions of objects. This paper approaches the problem from a different angle. A solution is sought in an unconventional storage scheme, that opens up a new range of techniques for processing k-NN queries, especially suited for high dimensional spaces. The suggested (physical) database design accommodates well a novel variant of branch-and-bound search, t
Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain
Real-world data typically contain repeated and periodic patterns. This
suggests that they can be effectively represented and compressed using only a
few coefficients of an appropriate basis (e.g., Fourier, Wavelets, etc.).
However, distance estimation when the data are represented using different sets
of coefficients is still a largely unexplored area. This work studies the
optimization problems related to obtaining the \emph{tightest} lower/upper
bound on Euclidean distances when each data object is potentially compressed
using a different set of orthonormal coefficients. Our technique leads to
tighter distance estimates, which translates into more accurate search,
learning and mining operations \textit{directly} in the compressed domain.
We formulate the problem of estimating lower/upper distance bounds as an
optimization problem. We establish the properties of optimal solutions, and
leverage the theoretical analysis to develop a fast algorithm to obtain an
\emph{exact} solution to the problem. The suggested solution provides the
tightest estimation of the -norm or the correlation. We show that typical
data-analysis operations, such as k-NN search or k-Means clustering, can
operate more accurately using the proposed compression and distance
reconstruction technique. We compare it with many other prevalent compression
and reconstruction techniques, including random projections and PCA-based
techniques. We highlight a surprising result, namely that when the data are
highly sparse in some basis, our technique may even outperform PCA-based
compression.
The contributions of this work are generic as our methodology is applicable
to any sequential or high-dimensional data as well as to any orthogonal data
transformation used for the underlying data compression scheme.Comment: 25 pages, 20 figures, accepted in VLD
- …