199 research outputs found
Fast Subsequence Matching in Time-Series Databases
We present an efficient indexing method to locate 1-dimensional
subsequences within a collection of sequences, such that the
subsequences match a given (query) pattern within a specified tolerance.
The idea is to map each data sequence into a small set of
multidimensional rectangles in feature space.
Then, these rectangles can be readily indexed using traditional
spatial access methods, like the R*-tree \cite{Beckmann90R}.
In more detail, we use a sliding window over the data sequence
and extract its features; the result is a trail in feature space.
We propose an efficient and effective algorithm to divide such trails
into sub-trails, which are subsequently represented by their
Minimum Bounding Rectangles (MBRs). We also examine queries of
varying lengths, and we show how to handle each case efficiently.
We implemented our method and carried out
experiments on synthetic and real data (stock price movements).
We compared the method to sequential scanning,
which is the only obvious competitor. The results were excellent:
our method accelerated the search time from 3 times up to 100 times.
Appeared in ACM SIGMOD 1994, pp 419-429. Given "Best Paper award"
(Also cross-referenced as UMIACS-TR-93-131
Learning the Number of Autoregressive Mixtures in Time Series Using the Gap Statistics
Using a proper model to characterize a time series is crucial in making
accurate predictions. In this work we use time-varying autoregressive process
(TVAR) to describe non-stationary time series and model it as a mixture of
multiple stable autoregressive (AR) processes. We introduce a new model
selection technique based on Gap statistics to learn the appropriate number of
AR filters needed to model a time series. We define a new distance measure
between stable AR filters and draw a reference curve that is used to measure
how much adding a new AR filter improves the performance of the model, and then
choose the number of AR filters that has the maximum gap with the reference
curve. To that end, we propose a new method in order to generate uniform random
stable AR filters in root domain. Numerical results are provided demonstrating
the performance of the proposed approach.Comment: This paper has been accepted by 2015 IEEE International Conference on
Data Minin
Efficient Retrieval of Similar Time Sequences Using DFT
We propose an improvement of the known DFT-based indexing technique for fast
retrieval of similar time sequences. We use the last few Fourier coefficients
in the distance computation without storing them in the index since every
coefficient at the end is the complex conjugate of a coefficient at the
beginning and as strong as its counterpart. We show analytically that this
observation can accelerate the search time of the index by more than a factor
of two. This result was confirmed by our experiments, which were carried out on
real stock prices and synthetic data
Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency
Subsequence matching has appeared to be an ideal approach for solving many
problems related to the fields of data mining and similarity retrieval. It has
been shown that almost any data class (audio, image, biometrics, signals) is or
can be represented by some kind of time series or string of symbols, which can
be seen as an input for various subsequence matching approaches. The variety of
data types, specific tasks and their partial or full solutions is so wide that
the choice, implementation and parametrization of a suitable solution for a
given task might be complicated and time-consuming; a possibly fruitful
combination of fragments from different research areas may not be obvious nor
easy to realize. The leading authors of this field also mention the
implementation bias that makes difficult a proper comparison of competing
approaches. Therefore we present a new generic Subsequence Matching Framework
(SMF) that tries to overcome the aforementioned problems by a uniform frame
that simplifies and speeds up the design, development and evaluation of
subsequence matching related systems. We identify several relatively separate
subtasks solved differently over the literature and SMF enables to combine them
in straightforward manner achieving new quality and efficiency. This framework
can be used in many application domains and its components can be reused
effectively. Its strictly modular architecture and openness enables also
involvement of efficient solutions from different fields, for instance
efficient metric-based indexes. This is an extended version of a paper
published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201
ВИЯВЛЕННЯ ВІРУСНИХ «ХРОБАКІВ» ЕЛЕКТРОННОЇ ПОШТИ ЗА ДОПОМОГОЮ ВЕЙВЛЕТ-АНАЛІЗУ ПОТОКІВ ЗАПИТІВ DNS.
Investigations into detection of mail worms using wavelet transforms are presented. The article takes a step towards a better understanding of email worms and the study of their impact on the level of performance of Domain Name System (DNS) query flows generated by user machines. Wavelet analysis, namely discrete and continuous wavelet transform, statistical clustering algorithms, numerical methods and other methods of mathematical analysis, were used for modeling and experimental calculations.Key words: mail worms; DNS requests; discrete wavelet transformation; Haar wavelet, data compressionПредставлено дослідження виявлення поштових черв’яків за допомогою вейвлет-перетворень. Стаття робить крок до кращого розуміння черв'яків електронної пошти та дослідження їх впливу на рівень характеристик потоків запитів системи доменних імен (DNS), які створюють машини користувачів. Для моделювання та експериментальних обчислень використано вейвлет-аналіз, а саме дискретне та неперевне вейвлет перетворення, статистичні алгоритми кластеризації, численні методи та інші методи математичного аналізу.Ключові слова: поштові черв’яки; DNS запити; дискретне вейвлетне перетворення; вейвлет Хаара (Гаара), стиснення даних.
- …