80 research outputs found
Deep learning for time series classification: a review
Time Series Classification (TSC) is an important and challenging problem in
data mining. With the increase of time series data availability, hundreds of
TSC algorithms have been proposed. Among these methods, only a few have
considered Deep Neural Networks (DNNs) to perform this task. This is surprising
as deep learning has seen very successful applications in the last years. DNNs
have indeed revolutionized the field of computer vision especially with the
advent of novel deeper architectures such as Residual and Convolutional Neural
Networks. Apart from images, sequential data such as text and audio can also be
processed with DNNs to reach state-of-the-art performance for document
classification and speech recognition. In this article, we study the current
state-of-the-art performance of deep learning algorithms for TSC by presenting
an empirical study of the most recent DNN architectures for TSC. We give an
overview of the most successful deep learning applications in various time
series domains under a unified taxonomy of DNNs for TSC. We also provide an
open source deep learning framework to the TSC community where we implemented
each of the compared approaches and evaluated them on a univariate TSC
benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By
training 8,730 deep learning models on 97 time series datasets, we propose the
most exhaustive study of DNNs for TSC to date.Comment: Accepted at Data Mining and Knowledge Discover
GENDIS : genetic discovery of shapelets
In the time series classification domain, shapelets are subsequences that are discriminative of a certain class. It has been shown that classifiers are able to achieve state-of-the-art results by taking the distances from the input time series to different discriminative shapelets as the input. Additionally, these shapelets can be visualized and thus possess an interpretable characteristic, making them appealing in critical domains, where longitudinal data are ubiquitous. In this study, a new paradigm for shapelet discovery is proposed, which is based on evolutionary computation. The advantages of the proposed approach are that: (i) it is gradient-free, which could allow escaping from local optima more easily and supports non-differentiable objectives; (ii) no brute-force search is required, making the algorithm scalable; (iii) the total amount of shapelets and the length of each of these shapelets are evolved jointly with the shapelets themselves, alleviating the need to specify this beforehand; (iv) entire sets are evaluated at once as opposed to single shapelets, which results in smaller final sets with fewer similar shapelets that result in similar predictive performances; and (v) the discovered shapelets do not need to be a subsequence of the input time series. We present the results of the experiments, which validate the enumerated advantages
Generalised Interpretable Shapelets for Irregular Time Series
The shapelet transform is a form of feature extraction for time series, in
which a time series is described by its similarity to each of a collection of
`shapelets'. However it has previously suffered from a number of limitations,
such as being limited to regularly-spaced fully-observed time series, and
having to choose between efficient training and interpretability. Here, we
extend the method to continuous time, and in doing so handle the general case
of irregularly-sampled partially-observed multivariate time series.
Furthermore, we show that a simple regularisation penalty may be used to train
efficiently without sacrificing interpretability. The continuous-time
formulation additionally allows for learning the length of each shapelet
(previously a discrete object) in a differentiable manner. Finally, we
demonstrate that the measure of similarity between time series may be
generalised to a learnt pseudometric. We validate our method by demonstrating
its performance and interpretability on several datasets; for example we
discover (purely from data) that the digits 5 and 6 may be distinguished by the
chirality of their bottom loop, and that a kind of spectral gap exists in
spoken audio classification
Contrastive Shapelet Learning for Unsupervised Multivariate Time Series Representation Learning
Recent studies have shown great promise in unsupervised representation
learning (URL) for multivariate time series, because URL has the capability in
learning generalizable representation for many downstream tasks without using
inaccessible labels. However, existing approaches usually adopt the models
originally designed for other domains (e.g., computer vision) to encode the
time series data and rely on strong assumptions to design learning objectives,
which limits their ability to perform well. To deal with these problems, we
propose a novel URL framework for multivariate time series by learning
time-series-specific shapelet-based representation through a popular
contrasting learning paradigm. To the best of our knowledge, this is the first
work that explores the shapelet-based embedding in the unsupervised
general-purpose representation learning. A unified shapelet-based encoder and a
novel learning objective with multi-grained contrasting and multi-scale
alignment are particularly designed to achieve our goal, and a data
augmentation library is employed to improve the generalization. We conduct
extensive experiments using tens of real-world datasets to assess the
representation quality on many downstream tasks, including classification,
clustering, and anomaly detection. The results demonstrate the superiority of
our method against not only URL competitors, but also techniques specially
designed for downstream tasks. Our code has been made publicly available at
https://github.com/real2fish/CSL
A scalable machine learning system for anomaly detection in manufacturing
Berichte ĂŒber RĂŒckrufaktionen in der Automobilindustrie gehören inzwischen zum medialen Alltag. TatsĂ€chlich hat deren HĂ€ufigkeit und die Anzahl der betroffenen Fahrzeuge in den letzten Jahren weiter zugenommen. Die meisten Aktionen sind auf Fehler in der Produktion zurĂŒckzufĂŒhren. FĂŒr die Hersteller stellt neben Verbesserungen im QualitĂ€tsmanagement die intelligente und automatisierte Analyse von Produktionsprozessdaten ein bislang kaum ausgeschöpftes Potential dar. Die technischen Herausforderungen sind jedoch enorm: die Datenmengen sind gewaltig und die fĂŒr einen Fehler charakteristischen Datenmuster zwangslĂ€ufig unbekannt. Der Einsatz maschineller Lernverfahren (ML) ist ein vielversprechender Ansatz um diese Suche nach der sinnbildlichen Nadel im HĂ€uhaufen zu ermöglichen. Algorithmen sollen anhand der Daten selbstĂ€ndig lernen zwischen normalem und auffĂ€lligem Prozessverhalten zu unterscheiden um Prozessexperten frĂŒhzeitig zu warnen. Industrie und Forschung versuchen bereits seit Jahren solche ML-Systeme im Produktionsumfeld zu etablieren. Die meisten ML-Projekte scheitern jedoch bereits vor der Produktivphase bzw. verschlingen enorme Ressourcen im Betrieb und liefern keinen wirtschaftlichen Mehrwert.
Ziel der Arbeit ist die Entwicklung eines technischen Frameworks zur Implementierung eines skalierbares ML-System fĂŒr die Anomalieerkennung in Prozessdaten. Die Trainingsprozesse zum Initialisieren und Adaptieren der Modelle sollen hochautomatisierbar sein um einen strukturierten Skalierungsprozess zu ermöglichen. Das entwickelt DM/ML-Verfahren ermöglicht den langfristigen Aufwand fĂŒr den Systembetrieb durch initialen Mehraufwand fĂŒr den Modelltrainingsprozess zu senken und hat sich in der Praxis als sowohl relativ als auch absolut Skalierbar bewĂ€hrt. Dadurch kann die KomplexitĂ€t auf Systemebene auf ein beherrschbares MaĂ reduziert werden um einen spĂ€teren Systembetrieb zu ermöglichen
Deep learning for time series classification
Time series analysis is a field of data science which is interested in
analyzing sequences of numerical values ordered in time. Time series are
particularly interesting because they allow us to visualize and understand the
evolution of a process over time. Their analysis can reveal trends,
relationships and similarities across the data. There exists numerous fields
containing data in the form of time series: health care (electrocardiogram,
blood sugar, etc.), activity recognition, remote sensing, finance (stock market
price), industry (sensors), etc. Time series classification consists of
constructing algorithms dedicated to automatically label time series data. The
sequential aspect of time series data requires the development of algorithms
that are able to harness this temporal property, thus making the existing
off-the-shelf machine learning models for traditional tabular data suboptimal
for solving the underlying task. In this context, deep learning has emerged in
recent years as one of the most effective methods for tackling the supervised
classification task, particularly in the field of computer vision. The main
objective of this thesis was to study and develop deep neural networks
specifically constructed for the classification of time series data. We thus
carried out the first large scale experimental study allowing us to compare the
existing deep methods and to position them compared other non-deep learning
based state-of-the-art methods. Subsequently, we made numerous contributions in
this area, notably in the context of transfer learning, data augmentation,
ensembling and adversarial attacks. Finally, we have also proposed a novel
architecture, based on the famous Inception network (Google), which ranks among
the most efficient to date.Comment: PhD thesi
timeXplain -- A Framework for Explaining the Predictions of Time Series Classifiers
Modern time series classifiers display impressive predictive capabilities,
yet their decision-making processes mostly remain black boxes to the user. At
the same time, model-agnostic explainers, such as the recently proposed SHAP,
promise to make the predictions of machine learning models interpretable,
provided there are well-designed domain mappings. We bring both worlds together
in our timeXplain framework, extending the reach of explainable artificial
intelligence to time series classification and value prediction. We present
novel domain mappings for the time and the frequency domain as well as series
statistics and analyze their explicative power as well as their limits. We
employ timeXplain in a large-scale experimental comparison of several
state-of-the-art time series classifiers and discover similarities between
seemingly distinct classification concepts such as residual neural networks and
elastic ensembles
Time Series Anomaly Detection using Diffusion-based Models
Diffusion models have been recently used for anomaly detection (AD) in
images. In this paper we investigate whether they can also be leveraged for AD
on multivariate time series (MTS). We test two diffusion-based models and
compare them to several strong neural baselines. We also extend the PA%K
protocol, by computing a ROCK-AUC metric, which is agnostic to both the
detection threshold and the ratio K of correctly detected points. Our models
outperform the baselines on synthetic datasets and are competitive on
real-world datasets, illustrating the potential of diffusion-based methods for
AD in multivariate time series.Comment: Accepted at the AI4TS workshop of the 23rd IEEE International
Conference on Data Mining (ICDM 2023), 9 pages, 7 figures, 2 table
- âŠ