9,042 research outputs found
Self-organising symbolic aggregate approximation for real-time fault detection and diagnosis in transient dynamic systems
The development of accurate fault detection and diagnosis (FDD) techniques are an important aspect of monitoring system health, whether it be an industrial machine or human system. In FDD systems where real-time or mobile monitoring is required there is a need to minimise computational overhead whilst maintaining detection and diagnosis accuracy. Symbolic Aggregate Approximation (SAX) is one such method, whereby reduced representations of signals are used to create symbolic representations for similarity search. Data reduction is achieved through application of the Piecewise Aggregate Approximation (PAA) algorithm. However, this can often lead to the loss of key information characteristics resulting in misclassification of signal types and a high risk of false alarms. This paper proposes a novel methodology based on SAX for generating more accurate symbolic representations, called Self-Organising Symbolic Aggregate Approximation (SOSAX). Data reduction is achieved through the application of an optimised PAA algorithm, Self-Organising Piecewise Aggregate Approximation (SOPAA). The approach is validated through the classification of electrocardiogram (ECG) signals where it is shown to outperform standard SAX in terms of inter-class separation and intra-class distance of signal types
Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency
Subsequence matching has appeared to be an ideal approach for solving many
problems related to the fields of data mining and similarity retrieval. It has
been shown that almost any data class (audio, image, biometrics, signals) is or
can be represented by some kind of time series or string of symbols, which can
be seen as an input for various subsequence matching approaches. The variety of
data types, specific tasks and their partial or full solutions is so wide that
the choice, implementation and parametrization of a suitable solution for a
given task might be complicated and time-consuming; a possibly fruitful
combination of fragments from different research areas may not be obvious nor
easy to realize. The leading authors of this field also mention the
implementation bias that makes difficult a proper comparison of competing
approaches. Therefore we present a new generic Subsequence Matching Framework
(SMF) that tries to overcome the aforementioned problems by a uniform frame
that simplifies and speeds up the design, development and evaluation of
subsequence matching related systems. We identify several relatively separate
subtasks solved differently over the literature and SMF enables to combine them
in straightforward manner achieving new quality and efficiency. This framework
can be used in many application domains and its components can be reused
effectively. Its strictly modular architecture and openness enables also
involvement of efficient solutions from different fields, for instance
efficient metric-based indexes. This is an extended version of a paper
published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201
DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams
Similarity matching and join of time series data streams has gained a lot of
relevance in today's world that has large streaming data. This process finds
wide scale application in the areas of location tracking, sensor networks,
object positioning and monitoring to name a few. However, as the size of the
data stream increases, the cost involved to retain all the data in order to aid
the process of similarity matching also increases. We develop a novel framework
to addresses the following objectives. Firstly, Dimension reduction is
performed in the preprocessing stage, where large stream data is segmented and
reduced into a compact representation such that it retains all the crucial
information by a technique called Multi-level Segment Means (MSM). This reduces
the space complexity associated with the storage of large time-series data
streams. Secondly, it incorporates effective Similarity Matching technique to
analyze if the new data objects are symmetric to the existing data stream. And
finally, the Pruning Technique that filters out the pseudo data object pairs
and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction
Factor. We have performed exhaustive experimental trials to show that the
proposed framework is both efficient and competent in comparison with earlier
works.Comment: 20 pages,8 figures, 6 Table
Modifying the Symbolic Aggregate Approximation Method to Capture Segment Trend Information
The Symbolic Aggregate approXimation (SAX) is a very popular symbolic
dimensionality reduction technique of time series data, as it has several
advantages over other dimensionality reduction techniques. One of its major
advantages is its efficiency, as it uses precomputed distances. The other main
advantage is that in SAX the distance measure defined on the reduced space
lower bounds the distance measure defined on the original space. This enables
SAX to return exact results in query-by-content tasks. Yet SAX has an inherent
drawback, which is its inability to capture segment trend information. Several
researchers have attempted to enhance SAX by proposing modifications to include
trend information. However, this comes at the expense of giving up on one or
more of the advantages of SAX. In this paper we investigate three modifications
of SAX to add trend capturing ability to it. These modifications retain the
same features of SAX in terms of simplicity, efficiency, as well as the exact
results it returns. They are simple procedures based on a different
segmentation of the time series than that used in classic-SAX. We test the
performance of these three modifications on 45 time series datasets of
different sizes, dimensions, and nature, on a classification task and we
compare it to that of classic-SAX. The results we obtained show that one of
these modifications manages to outperform classic-SAX and that another one
slightly gives better results than classic-SAX.Comment: International Conference on Modeling Decisions for Artificial
Intelligence - MDAI 2020: Modeling Decisions for Artificial Intelligence pp
230-23
Piecewise Trend Approximation: A Ratio-Based Time Series Representation
A time series representation, piecewise trend approximation (PTA), is proposed to improve efficiency of time series data mining in high dimensional large databases. PTA represents time series in concise form while retaining main trends in original time series; the dimensionality of original data is therefore reduced, and the key features are maintained. Different from the representations that based on original data space, PTA transforms original data space into the feature space of ratio between any two consecutive data points in original time series, of which sign and magnitude indicate changing direction and degree of local trend, respectively. Based on the ratio-based feature space, segmentation is performed such that each two conjoint segments have different trends, and then the piecewise segments are approximated by the ratios between the first and last points within the segments. To validate the proposed PTA, it is compared with classical time series representations PAA and APCA on two classical datasets by applying the commonly used K-NN classification algorithm. For ControlChart dataset, PTA outperforms them by 3.55% and 2.33% higher classification accuracy and 8.94% and 7.07% higher for Mixed-BagShapes dataset, respectively. It is indicated that the proposed PTA is effective for high dimensional time series data mining
Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining
The chapter is organized as follows. Section 2 will introduce the similarity matching
problem on time series. We will note the importance of the use of efficient data structures to
perform search, and the choice of an adequate distance measure. Section 3 will show some
of the most used distance measure for time series data mining. Section 4 will review the
above mentioned dimensionality reduction techniques
Particle Swarm Optimization of Information-Content Weighting of Symbolic Aggregate Approximation
Bio-inspired optimization algorithms have been gaining more popularity
recently. One of the most important of these algorithms is particle swarm
optimization (PSO). PSO is based on the collective intelligence of a swam of
particles. Each particle explores a part of the search space looking for the
optimal position and adjusts its position according to two factors; the first
is its own experience and the second is the collective experience of the whole
swarm. PSO has been successfully used to solve many optimization problems. In
this work we use PSO to improve the performance of a well-known representation
method of time series data which is the symbolic aggregate approximation (SAX).
As with other time series representation methods, SAX results in loss of
information when applied to represent time series. In this paper we use PSO to
propose a new minimum distance WMD for SAX to remedy this problem. Unlike the
original minimum distance, the new distance sets different weights to different
segments of the time series according to their information content. This
weighted minimum distance enhances the performance of SAX as we show through
experiments using different time series datasets.Comment: The 8th International Conference on Advanced Data Mining and
Applications (ADMA 2012
- …