75 research outputs found
Automatic Feature Engineering for Time Series Classification: Evaluation and Discussion
Time Series Classification (TSC) has received much attention in the past two
decades and is still a crucial and challenging problem in data science and
knowledge engineering. Indeed, along with the increasing availability of time
series data, many TSC algorithms have been suggested by the research community
in the literature. Besides state-of-the-art methods based on similarity
measures, intervals, shapelets, dictionaries, deep learning methods or hybrid
ensemble methods, several tools for extracting unsupervised informative summary
statistics, aka features, from time series have been designed in the recent
years. Originally designed for descriptive analysis and visualization of time
series with informative and interpretable features, very few of these feature
engineering tools have been benchmarked for TSC problems and compared with
state-of-the-art TSC algorithms in terms of predictive performance. In this
article, we aim at filling this gap and propose a simple TSC process to
evaluate the potential predictive performance of the feature sets obtained with
existing feature engineering tools. Thus, we present an empirical study of 11
feature engineering tools branched with 9 supervised classifiers over 112 time
series data sets. The analysis of the results of more than 10000 learning
experiments indicate that feature-based methods perform as accurately as
current state-of-the-art TSC algorithms, and thus should rightfully be
considered further in the TSC literature
Optimising maintenance operations in photovoltaic solar plants using data analysis for predictive maintenance
In PV (photovoltaic) solar power plants, high reliability of critical assets must be ensured— these include inverters, which combine the power from multiple solar cell modules. While avoiding unexpected failures and downtime, maintenance schedules aim to take advantage of the full equipment lifetime. Predictive maintenance schedules trigger maintenance actions by modelling the current equipment condition and the time until a particular failure type occurs, known as residual useful lifetime (RUL). However, predicting the RUL of an equipment is complex in this case since the equipment condition is not directly measurable; it is affected by numerous error types with corresponding influencing factors. This work compares statistical and machine learning models using sensor and weather data for the purpose of optimising maintenance decisions. Our methods allow the user to perform maintenance before failure occurs and hence, contribute to maximising reliability.
We present two distinct data handling and analysis pipelines for predictive maintenance: The first method is based on a Hidden Markov Model, which estimates the degree of degradation on a discrete scale of latent states. The multivariate input time series is transformed using PCA to reduce dimensionality. This approach delivers a profound statistical model providing insight into the temporal dynamics of the degradation process. The second method pursues a machine learning approach by using a Random Forest Regression algorithm, on top of a feature selection step from time series data. Both methods are assessed by their abilities to predict the RUL from a random point in time prior to failure. The machine learning approach is able to exploit its favourable properties in high-dimensional input data and delivers high predictive performance. Further, we discuss qualitative aspects, such as the interpretability of model parameters and results. Both approaches are benchmarked and compared to one another. We conclude that both approaches have practical merits and may contribute to more favourable decisions and optimised maintenance operations.submittedVersionM-D
pyts: A Python Package for Time Series Classification
International audiencepyts is an open-source Python package for time series classification. This versatile toolbox provides implementations of many algorithms published in the literature, preprocessing functionalities, and data set loading utilities. pyts relies on the standard scientific Python packages numpy, scipy, scikit-learn, joblib, and numba, and is distributed under the BSD-3-Clause license. Documentation contains installation instructions, a detailed user guide, a full API description, and concrete self-contained examples. Source code and documentation can be downloaded from https://github.com/johannfaouzi/pyts
- …