7,520 research outputs found

    A Model of Optimal Network Structure for Decentralized Nearest Neighbor Search

    Full text link
    One of the approaches for the nearest neighbor search problem is to build a network which nodes correspond to the given set of indexed objects. In this case the search of the closest object can be thought as a search of a node in a network. A procedure in a network is called decentralized if it uses only local information about visited nodes and its neighbors. Networks, which structure allows efficient performing the nearest neighbour search by a decentralised search procedure started from any node, are of particular interest especially for pure distributed systems. Several algorithms that construct such networks have been proposed in literature. However, the following questions arise: "Are there network models in which decentralised search can be performed faster?"; "What are the optimal networks for the decentralised search?"; "What are their properties?". In this paper we partially give answers to these questions. We propose a mathematical programming model for the problem of determining an optimal network structure for decentralized nearest neighbor search. We have found an exact solution for a regular lattice of size 4x4 and heuristic solutions for sizes from 5x5 to 7x7. As a distance function we use L1 , L2 and L_inf metrics. We hope that our results and the proposed model will initiate study of optimal network structures for decentralised nearest neighbour search

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    Efficient data structures for model-free data-driven computational mechanics

    Get PDF
    The data-driven computing paradigm initially introduced by Kirchdoerfer & Ortiz (2016) enables finite element computations in solid mechanics to be performed directly from material data sets, without an explicit material model. From a computational effort point of view, the most challenging task is the projection of admissible states at material points onto their closest states in the material data set. In this study, we compare and develop several possible data structures for solving the nearest-neighbor problem. We show that approximate nearest-neighbor (ANN) algorithms can accelerate material data searches by several orders of magnitude relative to exact searching algorithms. The approximations are suggested by—and adapted to—the structure of the data-driven iterative solver and result in no significant loss of solution accuracy. We assess the performance of the ANN algorithm with respect to material data set size with the aid of a 3D elasticity test case. We show that computations on a single processor with up to one billion material data points are feasible within a few seconds execution time with a speed up of more than 10⁶ with respect to exact k-d trees
    corecore