5,989 research outputs found

    One-Step or Two-Step Optimization and the Overfitting Phenomenon: A Case Study on Time Series Classification

    Get PDF
    For the last few decades, optimization has been developing at a fast rate. Bio-inspired optimization algorithms are metaheuristics inspired by nature. These algorithms have been applied to solve different problems in engineering, economics, and other domains. Bio-inspired algorithms have also been applied in different branches of information technology such as networking and software engineering. Time series data mining is a field of information technology that has its share of these applications too. In previous works we showed how bio-inspired algorithms such as the genetic algorithms and differential evolution can be used to find the locations of the breakpoints used in the symbolic aggregate approximation of time series representation, and in another work we showed how we can utilize the particle swarm optimization, one of the famous bio-inspired algorithms, to set weights to the different segments in the symbolic aggregate approximation representation. In this paper we present, in two different approaches, a new meta optimization process that produces optimal locations of the breakpoints in addition to optimal weights of the segments. The experiments of time series classification task that we conducted show an interesting example of how the overfitting phenomenon, a frequently encountered problem in data mining which happens when the model overfits the training set, can interfere in the optimization process and hide the superior performance of an optimization algorithm

    Temporal Feature Selection with Symbolic Regression

    Get PDF
    Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal\u27\u27 that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic

    Particle Swarm Optimization of Information-Content Weighting of Symbolic Aggregate Approximation

    Full text link
    Bio-inspired optimization algorithms have been gaining more popularity recently. One of the most important of these algorithms is particle swarm optimization (PSO). PSO is based on the collective intelligence of a swam of particles. Each particle explores a part of the search space looking for the optimal position and adjusts its position according to two factors; the first is its own experience and the second is the collective experience of the whole swarm. PSO has been successfully used to solve many optimization problems. In this work we use PSO to improve the performance of a well-known representation method of time series data which is the symbolic aggregate approximation (SAX). As with other time series representation methods, SAX results in loss of information when applied to represent time series. In this paper we use PSO to propose a new minimum distance WMD for SAX to remedy this problem. Unlike the original minimum distance, the new distance sets different weights to different segments of the time series according to their information content. This weighted minimum distance enhances the performance of SAX as we show through experiments using different time series datasets.Comment: The 8th International Conference on Advanced Data Mining and Applications (ADMA 2012

    Self-organising symbolic aggregate approximation for real-time fault detection and diagnosis in transient dynamic systems

    Get PDF
    The development of accurate fault detection and diagnosis (FDD) techniques are an important aspect of monitoring system health, whether it be an industrial machine or human system. In FDD systems where real-time or mobile monitoring is required there is a need to minimise computational overhead whilst maintaining detection and diagnosis accuracy. Symbolic Aggregate Approximation (SAX) is one such method, whereby reduced representations of signals are used to create symbolic representations for similarity search. Data reduction is achieved through application of the Piecewise Aggregate Approximation (PAA) algorithm. However, this can often lead to the loss of key information characteristics resulting in misclassification of signal types and a high risk of false alarms. This paper proposes a novel methodology based on SAX for generating more accurate symbolic representations, called Self-Organising Symbolic Aggregate Approximation (SOSAX). Data reduction is achieved through the application of an optimised PAA algorithm, Self-Organising Piecewise Aggregate Approximation (SOPAA). The approach is validated through the classification of electrocardiogram (ECG) signals where it is shown to outperform standard SAX in terms of inter-class separation and intra-class distance of signal types

    Modelling Medical Time Series Using Grammar-Guided Genetic Programming

    Get PDF
    The analysis of time series is extremely important in the field of medicine, because this is the format of many medical data types. Most of the approaches that address this problem are based on numerical algorithms that calculate distances, clusters, reference models, etc. However, a symbolic rather than numerical analysis is sometimes needed to search for the characteristics of time series. Symbolic information helps users to efficiently analyse and compare time series in the same or in a similar way as a domain expert would. This paper describes the definition of the symbolic domain, the process of converting numerical into symbolic time series and a distance for comparing symbolic temporal sequences. Then, the paper focuses on a method to create the symbolic reference model for a certain population using grammar-guided genetic programming. The work is applied to the isokinetics domain within an application called I4

    Representation and Analysis of Multi-Modal, Nonuniform Time Series Data: An Application to Survival Prognosis of Oncology Patients in an Outpatient Setting

    Get PDF
    The representation of nonuniform, multi-modal, time-limited time series data is complex and explored through the use of discrete representation, dimensionality reduction with segmentation based techniques, and with behavioral representation approaches. These explorations are done with a focus on an outpatient oncology setting with the classification and regression analysis being used for length of survival prognosis. Each decision of representation and analysis is not independent, with implications of each decision in method for how the data is represented and then which analysis technique is used. One unique aspect of the work is the use of outpatient clinical data for patients, which was explored initially through discrete sampling and behavioral representation. The length of survival was evaluated with both classification and regression methods initially. The first conclusion determined that including more discrete samples in the model showed no statistical benefit and the addition of behavioral approaches did improve the prognostic accuracy. From this result, the adaption of Piecewise Aggregate Approximation was made to accommodate the multi-modal time series data of the outpatient clinical data, and evaluated with the regression methodologies. This representation approach demonstrated promise due to the simplicity but had decreased performance in the length of survival prognosis compared with behavioral representation and discrete samples approach. A solution was a new representation approach made which incorporates a genetic algorithm to select the window boundaries of the Piecewise Aggregate Approximation method. This selection is based on the fraction of the Piecewise Aggregate Approximation windows that contain values other than zero. The new representation improved the performance in some cases by a 20% reduction in median relative error
    • …
    corecore