7,520 research outputs found
A Model of Optimal Network Structure for Decentralized Nearest Neighbor Search
One of the approaches for the nearest neighbor search problem is to build a
network which nodes correspond to the given set of indexed objects. In this
case the search of the closest object can be thought as a search of a node in a
network. A procedure in a network is called decentralized if it uses only local
information about visited nodes and its neighbors. Networks, which structure
allows efficient performing the nearest neighbour search by a decentralised
search procedure started from any node, are of particular interest especially
for pure distributed systems. Several algorithms that construct such networks
have been proposed in literature. However, the following questions arise: "Are
there network models in which decentralised search can be performed faster?";
"What are the optimal networks for the decentralised search?"; "What are their
properties?". In this paper we partially give answers to these questions. We
propose a mathematical programming model for the problem of determining an
optimal network structure for decentralized nearest neighbor search. We have
found an exact solution for a regular lattice of size 4x4 and heuristic
solutions for sizes from 5x5 to 7x7. As a distance function we use L1 , L2 and
L_inf metrics. We hope that our results and the proposed model will initiate
study of optimal network structures for decentralised nearest neighbour search
Entropy-scaling search of massive biological data
Many datasets exhibit a well-defined structure that can be exploited to
design faster search tools, but it is not always clear when such acceleration
is possible. Here, we introduce a framework for similarity search based on
characterizing a dataset's entropy and fractal dimension. We prove that
searching scales in time with metric entropy (number of covering hyperspheres),
if the fractal dimension of the dataset is low, and scales in space with the
sum of metric entropy and information-theoretic entropy (randomness of the
data). Using these ideas, we present accelerated versions of standard tools,
with no loss in specificity and little loss in sensitivity, for use in three
domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics
(MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search
(esFragBag, 10x speedup of FragBag). Our framework can be used to achieve
"compressive omics," and the general theory can be readily applied to data
science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
Efficient data structures for model-free data-driven computational mechanics
The data-driven computing paradigm initially introduced by Kirchdoerfer & Ortiz (2016) enables finite element computations in solid mechanics to be performed directly from material data sets, without an explicit material model. From a computational effort point of view, the most challenging task is the projection of admissible states at material points onto their closest states in the material data set. In this study, we compare and develop several possible data structures for solving the nearest-neighbor problem. We show that approximate nearest-neighbor (ANN) algorithms can accelerate material data searches by several orders of magnitude relative to exact searching algorithms. The approximations are suggested by—and adapted to—the structure of the data-driven iterative solver and result in no significant loss of solution accuracy. We assess the performance of the ANN algorithm with respect to material data set size with the aid of a 3D elasticity test case. We show that computations on a single processor with up to one billion material data points are feasible within a few seconds execution time with a speed up of more than 10⁶ with respect to exact k-d trees
- …