600 research outputs found

    One-Step or Two-Step Optimization and the Overfitting Phenomenon: A Case Study on Time Series Classification

    Get PDF
    For the last few decades, optimization has been developing at a fast rate. Bio-inspired optimization algorithms are metaheuristics inspired by nature. These algorithms have been applied to solve different problems in engineering, economics, and other domains. Bio-inspired algorithms have also been applied in different branches of information technology such as networking and software engineering. Time series data mining is a field of information technology that has its share of these applications too. In previous works we showed how bio-inspired algorithms such as the genetic algorithms and differential evolution can be used to find the locations of the breakpoints used in the symbolic aggregate approximation of time series representation, and in another work we showed how we can utilize the particle swarm optimization, one of the famous bio-inspired algorithms, to set weights to the different segments in the symbolic aggregate approximation representation. In this paper we present, in two different approaches, a new meta optimization process that produces optimal locations of the breakpoints in addition to optimal weights of the segments. The experiments of time series classification task that we conducted show an interesting example of how the overfitting phenomenon, a frequently encountered problem in data mining which happens when the model overfits the training set, can interfere in the optimization process and hide the superior performance of an optimization algorithm

    Self-organising symbolic aggregate approximation for real-time fault detection and diagnosis in transient dynamic systems

    Get PDF
    The development of accurate fault detection and diagnosis (FDD) techniques are an important aspect of monitoring system health, whether it be an industrial machine or human system. In FDD systems where real-time or mobile monitoring is required there is a need to minimise computational overhead whilst maintaining detection and diagnosis accuracy. Symbolic Aggregate Approximation (SAX) is one such method, whereby reduced representations of signals are used to create symbolic representations for similarity search. Data reduction is achieved through application of the Piecewise Aggregate Approximation (PAA) algorithm. However, this can often lead to the loss of key information characteristics resulting in misclassification of signal types and a high risk of false alarms. This paper proposes a novel methodology based on SAX for generating more accurate symbolic representations, called Self-Organising Symbolic Aggregate Approximation (SOSAX). Data reduction is achieved through the application of an optimised PAA algorithm, Self-Organising Piecewise Aggregate Approximation (SOPAA). The approach is validated through the classification of electrocardiogram (ECG) signals where it is shown to outperform standard SAX in terms of inter-class separation and intra-class distance of signal types

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Contributions to time series data mining towards the detection of outliers/anomalies

    Get PDF
    148 p.Los recientes avances tecnológicos han supuesto un gran progreso en la recogida de datos, permitiendo recopilar una gran cantidad de datos a lo largo del tiempo. Estos datos se presentan comúnmente en forma de series temporales, donde las observaciones se han registrado de forma cronológica y están correlacionadas en el tiempo. A menudo, estas dependencias temporales contienen información significativa y útil, por lo que, en los últimos años, ha surgido un gran interés por extraer dicha información. En particular, el área de investigación que se centra en esta tarea se denomina minería de datos de series temporales.La comunidad de investigadores de esta área se ha dedicado a resolver diferentes tareas como por ejemplo la clasificación, la predicción, el clustering o agrupamiento y la detección de valores atípicos/anomalías. Los valores atípicos o anomalías son aquellas observaciones que no siguen el comportamiento esperado en una serie temporal. Estos valores atípicos o anómalos suelen representar mediciones no deseadas o eventos de interés, y, por lo tanto, detectarlos suele ser relevante ya que pueden empeorar la calidad de los datos o reflejar fenómenos interesantes para el analista.Esta tesis presenta varias contribuciones en el campo de la minería de datos de series temporales, más específicamente sobre la detección de valores atípicos o anomalías. Estas contribuciones se pueden dividir en dos partes o bloques. Por una parte, la tesis presenta contribuciones en el campo de la detección de valores atípicos o anomalías en series temporales. Para ello, se ofrece una revisión de las técnicas en la literatura, y se presenta una nueva técnica de detección de anomalías en series temporales univariantes para la detección de fugas de agua, basada en el aprendizaje autosupervisado. Por otra parte, la tesis también introduce contribuciones relacionadas con el tratamiento de las series temporales con valores perdidos y demuestra su aplicabilidad en el campo de la detección de anomalías

    Mining typical load profiles in buildings to support energy management in the smart city context

    Get PDF
    Mining typical load profiles in buildings to drive energy management strategies is a fundamental task to be addressed in a smart city environment. In this work, a general framework on load profiles characterisation in buildings based on the recent scientific literature is proposed . The process relies on the combination of different pattern recognition and classification algorithms in order to provide a robust insight of the energy usage patterns at different level s and at different scales (from single building to stock of buildings). Several im plications related to energy profiling in buildings, including tariff design, demand side management and advanced energy diagnos is are discussed. Moreover, a robust methodology to mine typical energy patterns to support advanced energy diagnosis in buildin gs is introduced by analysing the monitored energy consumption of a cooling/heating mechanical room
    corecore