15,562 research outputs found
Ptolemaic Indexing
This paper discusses a new family of bounds for use in similarity search,
related to those used in metric indexing, but based on Ptolemy's inequality,
rather than the metric axioms. Ptolemy's inequality holds for the well-known
Euclidean distance, but is also shown here to hold for quadratic form metrics
in general, with Mahalanobis distance as an important special case. The
inequality is examined empirically on both synthetic and real-world data sets
and is also found to hold approximately, with a very low degree of error, for
important distances such as the angular pseudometric and several Lp norms.
Indexing experiments demonstrate a highly increased filtering power compared to
existing, triangular methods. It is also shown that combining the Ptolemaic and
triangular filtering can lead to better results than using either approach on
its own
Iterative Residual Rescaling: An Analysis and Generalization of LSI
We consider the problem of creating document representations in which
inter-document similarity measurements correspond to semantic similarity. We
first present a novel subspace-based framework for formalizing this task. Using
this framework, we derive a new analysis of Latent Semantic Indexing (LSI),
showing a precise relationship between its performance and the uniformity of
the underlying distribution of documents over topics. This analysis helps
explain the improvements gained by Ando's (2000) Iterative Residual Rescaling
(IRR) algorithm: IRR can compensate for distributional non-uniformity. A
further benefit of our framework is that it provides a well-motivated,
effective method for automatically determining the rescaling factor IRR depends
on, leading to further improvements. A series of experiments over various
settings and with several evaluation metrics validates our claims.Comment: To appear in the proceedings of SIGIR 2001. 11 page
Lower Bounds for Sparse Recovery
We consider the following k-sparse recovery problem: design an m x n matrix
A, such that for any signal x, given Ax we can efficiently recover x'
satisfying
||x-x'||_1 <= C min_{k-sparse} x"} ||x-x"||_1.
It is known that there exist matrices A with this property that have only O(k
log (n/k)) rows.
In this paper we show that this bound is tight. Our bound holds even for the
more general /randomized/ version of the problem, where A is a random variable
and the recovery algorithm is required to work for any fixed x with constant
probability (over A).Comment: 11 pages. Appeared at SODA 201
Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms
We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques
Combined 3D thinning and greedy algorithm to approximate realistic particles with corrected mechanical properties
The shape of irregular particles has significant influence on micro- and
macro-scopic behavior of granular systems. This paper presents a combined 3D
thinning and greedy set-covering algorithm to approximate realistic particles
with a clump of overlapping spheres for discrete element method (DEM)
simulations. First, the particle medial surface (or surface skeleton), from
which all candidate (maximal inscribed) spheres can be generated, is computed
by the topological 3D thinning. Then, the clump generation procedure is
converted into a greedy set-covering (SCP) problem.
To correct the mass distribution due to highly overlapped spheres inside the
clump, linear programming (LP) is used to adjust the density of each component
sphere, such that the aggregate properties mass, center of mass and inertia
tensor are identical or close enough to the prototypical particle. In order to
find the optimal approximation accuracy (volume coverage: ratio of clump's
volume to the original particle's volume), particle flow of 3 different shapes
in a rotating drum are conducted. It was observed that the dynamic angle of
repose starts to converge for all particle shapes at 85% volume coverage
(spheres per clump < 30), which implies the possible optimal resolution to
capture the mechanical behavior of the system.Comment: 34 pages, 13 figure
- …