38,189 research outputs found
Indexing Metric Spaces for Exact Similarity Search
With the continued digitalization of societal processes, we are seeing an
explosion in available data. This is referred to as big data. In a research
setting, three aspects of the data are often viewed as the main sources of
challenges when attempting to enable value creation from big data: volume,
velocity and variety. Many studies address volume or velocity, while much fewer
studies concern the variety. Metric space is ideal for addressing variety
because it can accommodate any type of data as long as its associated distance
notion satisfies the triangle inequality. To accelerate search in metric space,
a collection of indexing techniques for metric data have been proposed.
However, existing surveys each offers only a narrow coverage, and no
comprehensive empirical study of those techniques exists. We offer a survey of
all the existing metric indexes that can support exact similarity search, by i)
summarizing all the existing partitioning, pruning and validation techniques
used for metric indexes, ii) providing the time and storage complexity analysis
on the index construction, and iii) report on a comprehensive empirical
comparison of their similarity query processing performance. Here, empirical
comparisons are used to evaluate the index performance during search as it is
hard to see the complexity analysis differences on the similarity query
processing and the query performance depends on the pruning and validation
abilities related to the data distribution. This article aims at revealing
different strengths and weaknesses of different indexing techniques in order to
offer guidance on selecting an appropriate indexing technique for a given
setting, and directing the future research for metric indexes
Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency
Subsequence matching has appeared to be an ideal approach for solving many
problems related to the fields of data mining and similarity retrieval. It has
been shown that almost any data class (audio, image, biometrics, signals) is or
can be represented by some kind of time series or string of symbols, which can
be seen as an input for various subsequence matching approaches. The variety of
data types, specific tasks and their partial or full solutions is so wide that
the choice, implementation and parametrization of a suitable solution for a
given task might be complicated and time-consuming; a possibly fruitful
combination of fragments from different research areas may not be obvious nor
easy to realize. The leading authors of this field also mention the
implementation bias that makes difficult a proper comparison of competing
approaches. Therefore we present a new generic Subsequence Matching Framework
(SMF) that tries to overcome the aforementioned problems by a uniform frame
that simplifies and speeds up the design, development and evaluation of
subsequence matching related systems. We identify several relatively separate
subtasks solved differently over the literature and SMF enables to combine them
in straightforward manner achieving new quality and efficiency. This framework
can be used in many application domains and its components can be reused
effectively. Its strictly modular architecture and openness enables also
involvement of efficient solutions from different fields, for instance
efficient metric-based indexes. This is an extended version of a paper
published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201
Ptolemaic Indexing
This paper discusses a new family of bounds for use in similarity search,
related to those used in metric indexing, but based on Ptolemy's inequality,
rather than the metric axioms. Ptolemy's inequality holds for the well-known
Euclidean distance, but is also shown here to hold for quadratic form metrics
in general, with Mahalanobis distance as an important special case. The
inequality is examined empirically on both synthetic and real-world data sets
and is also found to hold approximately, with a very low degree of error, for
important distances such as the angular pseudometric and several Lp norms.
Indexing experiments demonstrate a highly increased filtering power compared to
existing, triangular methods. It is also shown that combining the Ptolemaic and
triangular filtering can lead to better results than using either approach on
its own
Non-Linear Shallow Water Equations numerical integration on curvilinear boundary-conforming grids
An Upwind Weighted Essentially Non-Oscillatory scheme for the solution of the Shallow Water Equations on generalized curvilinear coordinate systems is proposed. The Shallow Water Equations are expressed in a contravariant formulation in which Christoffel symbols are avoided. The equations are solved by using a high-resolution finite-volume method incorporated with an exact Riemann Solver. A procedure developed in order to correct errors related to the difficulties of numerically satisfying the metric identities on generalized boundary-conforming grids is presented; this procedure allows the numerical scheme to satisfy the freestream preservation property on highly-distorted grids. The capacity of the proposed model is verified against test cases present in literature. The results obtained are compared with analytical solutions and alternative numerical solutions
Net Energy Index: A New Way To Measure Energy Efficient Buildings
Energy efficiency indexes are useful for providing tangible measurements of energy efficiency in buildings. Buildings use approximately 70% of all electricity in the USA. Using that energy efficiently has two primary benefits: limiting greenhouse gas emissions and reducing grid strain. Utilizing local renewable energy sources contributes to the same benefits. Currently, there is no index that considers renewable energy sources when measuring energy efficiency. Therefore, this paper proposes the Net Energy Index, which compares the net power usage of a building to the floor area of the building in order to determine energy efficiency. If renewable energy supplies power to a building, this index is not only useful and justified, but it is also practical through advances in energy meters
- …