88,207 research outputs found
Indexing Metric Spaces for Exact Similarity Search
With the continued digitalization of societal processes, we are seeing an
explosion in available data. This is referred to as big data. In a research
setting, three aspects of the data are often viewed as the main sources of
challenges when attempting to enable value creation from big data: volume,
velocity and variety. Many studies address volume or velocity, while much fewer
studies concern the variety. Metric space is ideal for addressing variety
because it can accommodate any type of data as long as its associated distance
notion satisfies the triangle inequality. To accelerate search in metric space,
a collection of indexing techniques for metric data have been proposed.
However, existing surveys each offers only a narrow coverage, and no
comprehensive empirical study of those techniques exists. We offer a survey of
all the existing metric indexes that can support exact similarity search, by i)
summarizing all the existing partitioning, pruning and validation techniques
used for metric indexes, ii) providing the time and storage complexity analysis
on the index construction, and iii) report on a comprehensive empirical
comparison of their similarity query processing performance. Here, empirical
comparisons are used to evaluate the index performance during search as it is
hard to see the complexity analysis differences on the similarity query
processing and the query performance depends on the pruning and validation
abilities related to the data distribution. This article aims at revealing
different strengths and weaknesses of different indexing techniques in order to
offer guidance on selecting an appropriate indexing technique for a given
setting, and directing the future research for metric indexes
Improved dynamical particle swarm optimization method for structural dynamics
A methodology to the multiobjective structural design of buildings based on an improved particle swarm optimization algorithm is presented, which has proved to be very efficient and robust in nonlinear problems and when the optimization objectives are in conflict. In particular, the behaviour of the particle swarm optimization (PSO) classical algorithm is improved by dynamically adding autoadaptive mechanisms that enhance the exploration/exploitation trade-off and diversity of the proposed algorithm, avoiding getting trapped in local minima. A novel integrated optimization system was developed, called DI-PSO, to solve this problem which is able to control and even improve the structural behaviour under seismic excitations. In order to demonstrate the effectiveness of the proposed approach, the methodology is tested against some benchmark problems. Then a 3-story-building model is optimized under different objective cases, concluding that the improved multiobjective optimization methodology using DI-PSO is more efficient as compared with those designs obtained using single optimization.Peer ReviewedPostprint (published version
Entropy-scaling search of massive biological data
Many datasets exhibit a well-defined structure that can be exploited to
design faster search tools, but it is not always clear when such acceleration
is possible. Here, we introduce a framework for similarity search based on
characterizing a dataset's entropy and fractal dimension. We prove that
searching scales in time with metric entropy (number of covering hyperspheres),
if the fractal dimension of the dataset is low, and scales in space with the
sum of metric entropy and information-theoretic entropy (randomness of the
data). Using these ideas, we present accelerated versions of standard tools,
with no loss in specificity and little loss in sensitivity, for use in three
domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics
(MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search
(esFragBag, 10x speedup of FragBag). Our framework can be used to achieve
"compressive omics," and the general theory can be readily applied to data
science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
Optimal scheduling for refueling multiple autonomous aerial vehicles
The scheduling, for autonomous refueling, of multiple unmanned aerial vehicles (UAVs) is posed as a combinatorial optimization problem. An efficient dynamic programming (DP) algorithm is introduced for finding the optimal initial refueling sequence. The optimal sequence needs to be recalculated when conditions change, such as when UAVs join or leave the queue unexpectedly. We develop a systematic shuffle scheme to reconfigure the UAV sequence using the least amount of shuffle steps. A similarity metric over UAV sequences is introduced to quantify the reconfiguration effort which is treated as an additional cost and is integrated into the DP algorithm. Feasibility and limitations of this novel approach are also discussed
- …