177 research outputs found
Efficient time series matching by wavelets.
by Chan, Kin Pong.Thesis (M.Phil.)--Chinese University of Hong Kong, 1999.Includes bibliographical references (leaves 100-105).Abstracts in English and Chinese.Acknowledgments --- p.iiAbstract --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Wavelet Transform --- p.4Chapter 1.2 --- Time Warping --- p.5Chapter 1.3 --- Outline of the Thesis --- p.6Chapter 2 --- Related Work --- p.8Chapter 2.1 --- Similarity Models for Time Series --- p.8Chapter 2.2 --- Dimensionality Reduction --- p.11Chapter 2.3 --- Wavelet Transform --- p.15Chapter 2.4 --- Similarity Search under Time Warping --- p.16Chapter 3 --- Dimension Reduction by Wavelets --- p.21Chapter 3.1 --- The Proposed Approach --- p.21Chapter 3.1.1 --- Haar Wavelets --- p.23Chapter 3.1.2 --- DFT versus Haar Transform --- p.27Chapter 3.1.3 --- Guarantee of no False Dismissal --- p.29Chapter 3.2 --- The Overall Strategy --- p.34Chapter 3.2.1 --- Pre-processing --- p.35Chapter 3.2.2 --- Range Query --- p.35Chapter 3.2.3 --- Nearest Neighbor Query --- p.36Chapter 3.3 --- Performance Evaluation --- p.39Chapter 3.3.1 --- Stock Data --- p.39Chapter 3.3.2 --- Synthetic Random Walk Data --- p.45Chapter 3.3.3 --- Scalability Test --- p.51Chapter 3.3.4 --- Other Wavelets --- p.52Chapter 4 --- Time Warping --- p.55Chapter 4.1 --- Similarity Search based on K-L Transform --- p.60Chapter 4.2 --- Low Resolution Time Warping --- p.63Chapter 4.2.1 --- Resolution Reduction of Sequences --- p.63Chapter 4.2.2 --- Distance Compensation --- p.67Chapter 4.2.3 --- Time Complexity --- p.73Chapter 4.3 --- Adaptive Time Warping --- p.77Chapter 4.3.1 --- Time Complexity --- p.79Chapter 4.4 --- Performance Evaluation --- p.80Chapter 4.4.1 --- Accuracy versus Runtime --- p.80Chapter 4.4.2 --- Precision versus Recall --- p.85Chapter 4.4.3 --- Overall Runtime --- p.91Chapter 4.4.4 --- Starting Up Evaluation --- p.93Chapter 5 --- Conclusion and Future Work --- p.95Chapter 5.1 --- Conclusion --- p.95Chapter 5.2 --- Future Work --- p.96Chapter 5.2.1 --- Application of Wavelets on Biomedical Signals --- p.96Chapter 5.2.2 --- Moving Average Similarity --- p.98Chapter 5.2.3 --- Clusters-based Matching in Time Warping --- p.98Bibliography --- p.9
De Novo Assembly of Nucleotide Sequences in a Compressed Feature Space
Sequencing technologies allow for an in-depth analysis
of biological species but the size of the generated datasets
introduce a number of analytical challenges. Recently, we
demonstrated the application of numerical sequence representations
and data transformations for the alignment of short
reads to a reference genome. Here, we expand out approach
for de novo assembly of short reads. Our results demonstrate
that highly compressed data can encapsulate the signal suffi-
ciently to accurately assemble reads to big contigs or complete
genomes
Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency
Subsequence matching has appeared to be an ideal approach for solving many
problems related to the fields of data mining and similarity retrieval. It has
been shown that almost any data class (audio, image, biometrics, signals) is or
can be represented by some kind of time series or string of symbols, which can
be seen as an input for various subsequence matching approaches. The variety of
data types, specific tasks and their partial or full solutions is so wide that
the choice, implementation and parametrization of a suitable solution for a
given task might be complicated and time-consuming; a possibly fruitful
combination of fragments from different research areas may not be obvious nor
easy to realize. The leading authors of this field also mention the
implementation bias that makes difficult a proper comparison of competing
approaches. Therefore we present a new generic Subsequence Matching Framework
(SMF) that tries to overcome the aforementioned problems by a uniform frame
that simplifies and speeds up the design, development and evaluation of
subsequence matching related systems. We identify several relatively separate
subtasks solved differently over the literature and SMF enables to combine them
in straightforward manner achieving new quality and efficiency. This framework
can be used in many application domains and its components can be reused
effectively. Its strictly modular architecture and openness enables also
involvement of efficient solutions from different fields, for instance
efficient metric-based indexes. This is an extended version of a paper
published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201
The Extended Edit Distance Metric
Similarity search is an important problem in information retrieval. This
similarity is based on a distance. Symbolic representation of time series has
attracted many researchers recently, since it reduces the dimensionality of
these high dimensional data objects. We propose a new distance metric that is
applied to symbolic data objects and we test it on time series data bases in a
classification task. We compare it to other distances that are well known in
the literature for symbolic data objects. We also prove, mathematically, that
our distance is metric.Comment: Technical repor
Piecewise Linear Representation Segmentation as a Multiobjective Optimization Problem
Proceedings of: Forth International Workshop on User-Centric Technologies and applications (CONTEXTS 2010). Valencia, September 7-10, 2010Actual time series exhibit huge amounts of data which require an unaffordable computational load to be processed, leading to approximate representations to aid these processes. Segmentation processes deal with this issue dividing time series into a certain number of segments and approximating those segments with a basic function. Among the most extended segmentation approaches, piecewise linear representation is highlighted due to its simplicity. This work presents an approach based on the formalization of the segmentation process as a multiobjetive optimization problem and the resolution of that problem with an evolutionary algorithm.This work was supported in part by Projects CICYT TIN2008-06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, CAM CONTEXTS (S2009/TIC-1485) and DPS2008-07029-C02-02.Publicad
DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams
Similarity matching and join of time series data streams has gained a lot of
relevance in today's world that has large streaming data. This process finds
wide scale application in the areas of location tracking, sensor networks,
object positioning and monitoring to name a few. However, as the size of the
data stream increases, the cost involved to retain all the data in order to aid
the process of similarity matching also increases. We develop a novel framework
to addresses the following objectives. Firstly, Dimension reduction is
performed in the preprocessing stage, where large stream data is segmented and
reduced into a compact representation such that it retains all the crucial
information by a technique called Multi-level Segment Means (MSM). This reduces
the space complexity associated with the storage of large time-series data
streams. Secondly, it incorporates effective Similarity Matching technique to
analyze if the new data objects are symmetric to the existing data stream. And
finally, the Pruning Technique that filters out the pseudo data object pairs
and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction
Factor. We have performed exhaustive experimental trials to show that the
proposed framework is both efficient and competent in comparison with earlier
works.Comment: 20 pages,8 figures, 6 Table
- …