15,562 research outputs found

    Ptolemaic Indexing

    Full text link
    This paper discusses a new family of bounds for use in similarity search, related to those used in metric indexing, but based on Ptolemy's inequality, rather than the metric axioms. Ptolemy's inequality holds for the well-known Euclidean distance, but is also shown here to hold for quadratic form metrics in general, with Mahalanobis distance as an important special case. The inequality is examined empirically on both synthetic and real-world data sets and is also found to hold approximately, with a very low degree of error, for important distances such as the angular pseudometric and several Lp norms. Indexing experiments demonstrate a highly increased filtering power compared to existing, triangular methods. It is also shown that combining the Ptolemaic and triangular filtering can lead to better results than using either approach on its own

    Iterative Residual Rescaling: An Analysis and Generalization of LSI

    Full text link
    We consider the problem of creating document representations in which inter-document similarity measurements correspond to semantic similarity. We first present a novel subspace-based framework for formalizing this task. Using this framework, we derive a new analysis of Latent Semantic Indexing (LSI), showing a precise relationship between its performance and the uniformity of the underlying distribution of documents over topics. This analysis helps explain the improvements gained by Ando's (2000) Iterative Residual Rescaling (IRR) algorithm: IRR can compensate for distributional non-uniformity. A further benefit of our framework is that it provides a well-motivated, effective method for automatically determining the rescaling factor IRR depends on, leading to further improvements. A series of experiments over various settings and with several evaluation metrics validates our claims.Comment: To appear in the proceedings of SIGIR 2001. 11 page

    Lower Bounds for Sparse Recovery

    Get PDF
    We consider the following k-sparse recovery problem: design an m x n matrix A, such that for any signal x, given Ax we can efficiently recover x' satisfying ||x-x'||_1 <= C min_{k-sparse} x"} ||x-x"||_1. It is known that there exist matrices A with this property that have only O(k log (n/k)) rows. In this paper we show that this bound is tight. Our bound holds even for the more general /randomized/ version of the problem, where A is a random variable and the recovery algorithm is required to work for any fixed x with constant probability (over A).Comment: 11 pages. Appeared at SODA 201

    Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

    Get PDF
    We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

    Combined 3D thinning and greedy algorithm to approximate realistic particles with corrected mechanical properties

    Full text link
    The shape of irregular particles has significant influence on micro- and macro-scopic behavior of granular systems. This paper presents a combined 3D thinning and greedy set-covering algorithm to approximate realistic particles with a clump of overlapping spheres for discrete element method (DEM) simulations. First, the particle medial surface (or surface skeleton), from which all candidate (maximal inscribed) spheres can be generated, is computed by the topological 3D thinning. Then, the clump generation procedure is converted into a greedy set-covering (SCP) problem. To correct the mass distribution due to highly overlapped spheres inside the clump, linear programming (LP) is used to adjust the density of each component sphere, such that the aggregate properties mass, center of mass and inertia tensor are identical or close enough to the prototypical particle. In order to find the optimal approximation accuracy (volume coverage: ratio of clump's volume to the original particle's volume), particle flow of 3 different shapes in a rotating drum are conducted. It was observed that the dynamic angle of repose starts to converge for all particle shapes at 85% volume coverage (spheres per clump < 30), which implies the possible optimal resolution to capture the mechanical behavior of the system.Comment: 34 pages, 13 figure
    • …
    corecore