7 research outputs found

    Using Empirical Recurrence Rates Ratio For Time Series Data Similarity

    Full text link
    Several methods exist in classification literature to quantify the similarity between two time series data sets. Applications of these methods range from the traditional Euclidean type metric to the more advanced Dynamic Time Warping metric. Most of these adequately address structural similarity but fail in meeting goals outside it. For example, a tool that could be excellent to identify the seasonal similarity between two time series vectors might prove inadequate in the presence of outliers. In this paper, we have proposed a unifying measure for binary classification that performed well while embracing several aspects of dissimilarity. This statistic is gaining prominence in various fields, such as geology and finance, and is crucial in time series database formation and clustering studies

    Highly comparative feature-based time-series classification

    Full text link
    A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large datasets containing long time series or time series of different lengths. For many of the datasets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using Euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the dataset, insight that can guide further scientific investigation

    Superconducting radio-frequency cavity fault classification using machine learning at Jefferson Laboratory

    Get PDF
    We report on the development of machine learning models for classifying C100 superconducting radio-frequency (SRF) cavity faults in the Continuous Electron Beam Accelerator Facility (CEBAF) at Jefferson Lab. CEBAF is a continuous-wave recirculating linac utilizing 418 SRF cavities to accelerate electrons up to 12 GeV through 5-passes. Of these, 96 cavities (12 cryomodules) are designed with a digital low-level RF system configured such that a cavity fault triggers waveform recordings of 17 RF signals for each of the 8 cavities in the cryomodule. Subject matter experts (SME) are able to analyze the collected time-series data and identify which of the eight cavities faulted first and classify the type of fault. This information is used to find trends and strategically deploy mitigations to problematic cryomodules. However manually labeling the data is laborious and time-consuming. By leveraging machine learning, near real-time (rather than post-mortem) identification of the offending cavity and classification of the fault type has been implemented. We discuss performance of the ML models during a recent physics run. Results show the cavity identification and fault classification models have accuracies of 84.9% and 78.2%, respectively.Comment: 20 pages, 10 figures submitted to Physical Review Accelerators and Beam

    Modèle statistique de prévision long terme de production hydro-électrique

    Get PDF
    Le manque de données est un des facteurs limitatifs à l’implantation de modèles fiables permettant la prévision de la production d’hydro-électricité chez certains producteurs. En effet, la capacité de prévision des modèles conventionnels conceptuels ou à base physique dépend entièrement de la qualité et de la quantité des données physiques disponibles pour analyses. Souvent, à qualité de données égale, le gain en précision d’un modèle est directement relié à sa complexité. Passé un certain stade, les bénéfices associés par ce gain ne justifient plus les investissements nécessaires à l’instrumentation supplémentaire d’un bassin versant. De plus, la qualité des données utilisées pour la modélisation peut être remise en cause par des erreurs de mesure ou d’échantillonnage. Souvent, les appareils de mesures sont imprécis ou se situent dans des endroits éloignés occasionnant des coûts importants pour leur entretien et la collecte de données. Afin de limiter les coûts d’instrumentation supplémentaires et de réduire les risques liés à l’utilisation de données erronées, les données disponibles et fiables doivent être exploitées à leur juste valeur. Ce projet de recherche propose d’utiliser un minimum de données fiables, soit la production historique journalière de plus d’une centaine de centrales hydro-électriques sur près de 40 ans, pour développer un modèle statistique, non à base physique, de prévision de production hydro-électrique. Ce modèle propose l’utilisation de plusieurs techniques statistiques avancées ainsi que des algorithmes de forage de données telle que la déformation temporelle dynamique. Ce modèle de prévision a permis d’améliorer la précision sur tous les horizons testés en comparaison de la moyenne à long terme (LTA). Bien que l'objectif principal du projet soit d'améliorer la précision des prévisions de production d'hydro-électricité, son utilisation permet aussi de générer des résultats de manière probabiliste, permettant ainsi de communiquer l'incertitude de la prévision et d’aider à la prise de décisions basée sur ces résultats

    Deep Cellular Recurrent Neural Architecture for Efficient Multidimensional Time-Series Data Processing

    Get PDF
    Efficient processing of time series data is a fundamental yet challenging problem in pattern recognition. Though recent developments in machine learning and deep learning have enabled remarkable improvements in processing large scale datasets in many application domains, most are designed and regulated to handle inputs that are static in time. Many real-world data, such as in biomedical, surveillance and security, financial, manufacturing and engineering applications, are rarely static in time, and demand models able to recognize patterns in both space and time. Current machine learning (ML) and deep learning (DL) models adapted for time series processing tend to grow in complexity and size to accommodate the additional dimensionality of time. Specifically, the biologically inspired learning based models known as artificial neural networks that have shown extraordinary success in pattern recognition, tend to grow prohibitively large and cumbersome in the presence of large scale multi-dimensional time series biomedical data such as EEG. Consequently, this work aims to develop representative ML and DL models for robust and efficient large scale time series processing. First, we design a novel ML pipeline with efficient feature engineering to process a large scale multi-channel scalp EEG dataset for automated detection of epileptic seizures. With the use of a sophisticated yet computationally efficient time-frequency analysis technique known as harmonic wavelet packet transform and an efficient self-similarity computation based on fractal dimension, we achieve state-of-the-art performance for automated seizure detection in EEG data. Subsequently, we investigate the development of a novel efficient deep recurrent learning model for large scale time series processing. For this, we first study the functionality and training of a biologically inspired neural network architecture known as cellular simultaneous recurrent neural network (CSRN). We obtain a generalization of this network for multiple topological image processing tasks and investigate the learning efficacy of the complex cellular architecture using several state-of-the-art training methods. Finally, we develop a novel deep cellular recurrent neural network (CDRNN) architecture based on the biologically inspired distributed processing used in CSRN for processing time series data. The proposed DCRNN leverages the cellular recurrent architecture to promote extensive weight sharing and efficient, individualized, synchronous processing of multi-source time series data. Experiments on a large scale multi-channel scalp EEG, and a machine fault detection dataset show that the proposed DCRNN offers state-of-the-art recognition performance while using substantially fewer trainable recurrent units
    corecore