256,411 research outputs found
Efficient Motion Retrieval in Large Motion Databases
There has been a recent paradigm shift in the computer animation industry with an increasing use of pre-recorded motion for animating virtual characters. A fundamental requirement to using motion capture data is an efficient method for indexing and retrieving motions. In this paper, we propose a flexible, efficient method for searching arbitrarily complex motions in large motion databases. Motions are encoded using keys which represent a wide array of structural, geometric and, dynamic features of human motion. Keys provide a representative search space for indexing motions and users can specify sequences of key values as well as multiple combination of key sequences to search for complex motions. We use a trie-based data structure to provide an efficient mapping from key sequences to motions. The search times (even on a single CPU) are very fast, opening the possibility of using large motion data sets in real-time applications
MASCOT: a mechanism for attention-based scale-invariant object recognition in images
The efficient management of large multimedia databases requires the development of new techniques to process, characterize, and search for multimedia objects. Especially in the case of image data, the rapidly growing amount of documents prohibits a manual description of the images’ content. Instead, the automated characterization is highly desirable to support annotation and retrieval of digital images. However, this is a very complex and still unsolved task. To contribute to a solution of this problem, we have developed a mechanism for recognizing objects in images based on the query by example paradigm. Therefore, the most salient image features of an example image representing the searched object are extracted to obtain a scale-invariant object model. The use of this model provides an efficient and robust strategy for recognizing objects in images independently of their size. Further applications of the mechanism are classical recognition tasks such as scene decomposition or object tracking in video sequences
Fast Multivariate Search on Large Aviation Datasets
Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations Both these tests show that our algorithms have very high prune rates (>95%) thus needing actua
Fast and Flexible Multivariate Time Series Subsequence Search
Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which often contain several gigabytes of data. Surprisingly, research on MTS search is very limited. Most of the existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two algorithms to solve this problem (1) a List Based Search (LBS) algorithm which uses sorted lists for indexing, and (2) a R*-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences. Both algorithms guarantee that all matching patterns within the specified thresholds will be returned (no false dismissals). The very few false alarms can be removed by a post-processing step. Since our framework is also capable of Univariate Time-Series (UTS) subsequence search, we first demonstrate the efficiency of our algorithms on several UTS datasets previously used in the literature. We follow this up with experiments using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>99%) thus needing actual disk access for only less than 1% of the observations. To the best of our knowledge, MTS subsequence search has never been attempted on datasets of the size we have used in this paper
GraphFind: enhancing graph searching by low support data mining techniques
<p>Abstract</p> <p>Background</p> <p>Biomedical and chemical databases are large and rapidly growing in size. Graphs naturally model such kinds of data. To fully exploit the wealth of information in these graph databases, a key role is played by systems that search for all exact or approximate occurrences of a query graph. To deal efficiently with graph searching, advanced methods for indexing, representation and matching of graphs have been proposed.</p> <p>Results</p> <p>This paper presents GraphFind. The system implements efficient graph searching algorithms together with advanced filtering techniques that allow approximate search. It allows users to select candidate subgraphs rather than entire graphs. It implements an effective data storage based also on low-support data mining.</p> <p>Conclusions</p> <p>GraphFind is compared with Frowns, GraphGrep and gIndex. Experiments show that GraphFind outperforms the compared systems on a very large collection of small graphs. The proposed low-support mining technique which applies to any searching system also allows a significant index space reduction.</p
Efficient image copy detection using multi-scale fingerprints
Inspired by multi-resolution histogram, we propose
a multi-scale SIFT descriptor to improve the discriminability.
A series of SIFT descriptions with different scale are first
acquired by varying the actual size of each spatial bin. Then
principle component analysis (PCA) is employed to reduce them
to low dimensional vectors, which are further combined into one
128-dimension multi-scale SIFT description. Next, an entropy
maximization based binarization is employed to encode the
descriptions into binary codes called fingerprints for indexing
the local features. Furthermore, an efficient search architecture
consisting of lookup tables and inverted image ID list is designed
to improve the query speed. Since the fingerprint building is
of low-complexity, this method is very efficient and scalable to
very large databases. In addition, the multi-scale fingerprints
are very discriminative such that the copies can be effectively
distinguished from similar objects, which leads to an improved
performance in the detection of copies. The experimental evaluation shows that our approach outperforms the state of the art
methods.Inspired by multi-resolution histogram, we propose a multi-scale SIFT descriptor to improve the discriminability. A series of SIFT descriptions with different scale are first acquired by varying the actual size of each spatial bin. Then principle component analysis (PCA) is employed to reduce them to low dimensional vectors, which are further combined into one 128-dimension multi-scale SIFT description. Next, an entropy maximization based binarization is employed to encode the descriptions into binary codes called fingerprints for indexing the local features. Furthermore, an efficient search architecture consisting of lookup tables and inverted image ID list is designed to improve the query speed. Since the fingerprint building is of low-complexity, this method is very efficient and scalable to very large databases. In addition, the multi-scale fingerprints are very discriminative such that the copies can be effectively distinguished from similar objects, which leads to an improved performance in the detection of copies. The experimental evaluation shows that our approach outperforms the state of the art methods
EmbAssi: Embedding Assignment Costs for Similarity Search in Large Graph Databases
The graph edit distance is an intuitive measure to quantify the dissimilarity
of graphs, but its computation is NP-hard and challenging in practice. We
introduce methods for answering nearest neighbor and range queries regarding
this distance efficiently for large databases with up to millions of graphs. We
build on the filter-verification paradigm, where lower and upper bounds are
used to reduce the number of exact computations of the graph edit distance.
Highly effective bounds for this involve solving a linear assignment problem
for each graph in the database, which is prohibitive in massive datasets.
Index-based approaches typically provide only weak bounds leading to high
computational costs verification. In this work, we derive novel lower bounds
for efficient filtering from restricted assignment problems, where the cost
function is a tree metric. This special case allows embedding the costs of
optimal assignments isometrically into space, rendering efficient
indexing possible. We propose several lower bounds of the graph edit distance
obtained from tree metrics reflecting the edit costs, which are combined for
effective filtering. Our method termed EmbAssi can be integrated into existing
filter-verification pipelines as a fast and effective pre-filtering step.
Empirically we show that for many real-world graphs our lower bounds are
already close to the exact graph edit distance, while our index construction
and search scales to very large databases
- …