1,272 research outputs found
Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency
Subsequence matching has appeared to be an ideal approach for solving many
problems related to the fields of data mining and similarity retrieval. It has
been shown that almost any data class (audio, image, biometrics, signals) is or
can be represented by some kind of time series or string of symbols, which can
be seen as an input for various subsequence matching approaches. The variety of
data types, specific tasks and their partial or full solutions is so wide that
the choice, implementation and parametrization of a suitable solution for a
given task might be complicated and time-consuming; a possibly fruitful
combination of fragments from different research areas may not be obvious nor
easy to realize. The leading authors of this field also mention the
implementation bias that makes difficult a proper comparison of competing
approaches. Therefore we present a new generic Subsequence Matching Framework
(SMF) that tries to overcome the aforementioned problems by a uniform frame
that simplifies and speeds up the design, development and evaluation of
subsequence matching related systems. We identify several relatively separate
subtasks solved differently over the literature and SMF enables to combine them
in straightforward manner achieving new quality and efficiency. This framework
can be used in many application domains and its components can be reused
effectively. Its strictly modular architecture and openness enables also
involvement of efficient solutions from different fields, for instance
efficient metric-based indexes. This is an extended version of a paper
published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201
Recommended from our members
A general state-based temporal pattern recognition
Time-series and state-sequences are ubiquitous patterns in temporal logic and are widely used to present temporal data in data mining. Generally speaking, there are three known choices for the time primitive: points, intervals, points and intervals. In this thesis, a formal characterization of time-series and state-sequences is presented for both complete and incomplete situations, where a state-sequence is defined as a list of sequential data validated on the corresponding time-series. In addition, subsequence matching is addressed to associate the state-sequences, where both non-temporal aspects as well as rich temporal aspects including temporal order, temporal duration and temporal gap should be taken into account.
Firstly, based on the typed point based time-elements and time-series, a formal characterization of time-series and state-sequences is introduced for both complete and incomplete situations, where a state-sequence is defined as a list of sequential data validated on the corresponding time-series. A time-series is formalized as a tetrad (T, R, Tdur, Tgap), which denotes: the temporal order of time- elements; the temporal relationship between time-elements; the temporal duration of each time-element and the temporal gap between each adjacent pair of time-elements respectively.
Secondly, benefiting from the formal characterization of time-series and state-sequences, a general similarity measurement (GSM) that takes into account both non-temporal and rich temporal information, including temporal order as well as temporal duration and temporal gap, is introduced for subsequence matching. This measurement is general enough to subsume most of the popular existing measurements as special cases. In particular, a new conception of temporal common subsequence is proposed. Furthermore, a new LCS-based algorithm named Optimal Temporal Common Subsequence (OTCS), which takes into account rich temporal information, is designed. The experimental results on 6 benchmark datasets demonstrate the effectiveness and robustness of GSM and its new case OTCS. Compared with binary-value distance measurements, GSM can distinguish between the distance caused by different states in the same operation; compared with the real-penalty distance measurements, it can filter out the noise that may push the similarity into abnormal levels.
Finally, two case studies are investigated for temporal pattern recognition: basketball zone-defence detection and video copy detection.
In the case of basketball zone-defence detection, the computational technique and algorithm for detecting zone-defence patterns from basketball videos is introduced, where the Laplacian Matrix-based algorithm is extended to take into account the effects from zoom and single defender‘s translation in zone-defence graph matching and a set of character-angle based features was proposed to describe the zone-defence graph. The experimental results show that the approach explored is useful in helping the coach of the defensive side check whether the players are keeping to the correct zone-defence strategy, as well as detecting the strategy of the opponent side. It can describe the structure relationship between defender-lines for basketball zone-defence, and has a robust performance in both simulation and real-life applications, especially when disturbances exist.
In the case of video copy detection, a framework for subsequence matching is introduced. A hybrid similarity framework addressing both non-temporal and temporal relationships between state-sequences, represented by bipartite graphs, is proposed. The experimental results using real-life video databases demonstrated that the proposed similarity framework is robust to states alignment with different numbers and different values, and various reordering including inversion and crossover
Efficient Motion Retrieval in Large Motion Databases
There has been a recent paradigm shift in the computer animation industry with an increasing use of pre-recorded motion for animating virtual characters. A fundamental requirement to using motion capture data is an efficient method for indexing and retrieving motions. In this paper, we propose a flexible, efficient method for searching arbitrarily complex motions in large motion databases. Motions are encoded using keys which represent a wide array of structural, geometric and, dynamic features of human motion. Keys provide a representative search space for indexing motions and users can specify sequences of key values as well as multiple combination of key sequences to search for complex motions. We use a trie-based data structure to provide an efficient mapping from key sequences to motions. The search times (even on a single CPU) are very fast, opening the possibility of using large motion data sets in real-time applications
A quick search method for audio signals based on a piecewise linear representation of feature trajectories
This paper presents a new method for a quick similarity-based search through
long unlabeled audio streams to detect and locate audio clips provided by
users. The method involves feature-dimension reduction based on a piecewise
linear representation of a sequential feature trajectory extracted from a long
audio stream. Two techniques enable us to obtain a piecewise linear
representation: the dynamic segmentation of feature trajectories and the
segment-based Karhunen-L\'{o}eve (KL) transform. The proposed search method
guarantees the same search results as the search method without the proposed
feature-dimension reduction method in principle. Experiment results indicate
significant improvements in search speed. For example the proposed method
reduced the total search time to approximately 1/12 that of previous methods
and detected queries in approximately 0.3 seconds from a 200-hour audio
database.Comment: 20 pages, to appear in IEEE Transactions on Audio, Speech and
Language Processin
- …