40 research outputs found
A review on distance based time series classification
Time series classification is an increasing research topic due to the vast amount of time series data
that is being created over a wide variety of fields. The particularity of the data makes it a challenging task
and different approaches have been taken, including the distance based approach. 1-NN has been a widely used
method within distance based time series classification due to its simplicity but still good performance. However,
its supremacy may be attributed to being able to use specific distances for time series within the classification
process and not to the classifier itself. With the aim of exploiting these distances within more complex classifiers,
new approaches have arisen in the past few years that are competitive or which outperform the 1-NN based
approaches. In some cases, these new methods use the distance measure to transform the series into feature
vectors, bridging the gap between time series and traditional classifiers. In other cases, the distances are employed
to obtain a time series kernel and enable the use of kernel methods for time series classification. One of the main
challenges is that a kernel function must be positive semi-definite, a matter that is also addressed within this
review. The presented review includes a taxonomy of all those methods that aim to classify time series using a
distance based approach, as well as a discussion of the strengths and weaknesses of each method.TIN2016-78365-
Contributions to Time Series Classification: Meta-Learning and Explainability
This thesis includes 3 contributions of different types to the area of supervised time series classification, a growing field of research due to the amount of time series collected daily in a wide variety of domains. In this context, the number of methods available for classifying time series is increasing, and the classifiers are becoming more and more competitive and varied. Thus, the first contribution of the thesis consists of proposing a taxonomy of distance-based time series classifiers, where an exhaustive review of the existing methods and their computational costs is made. Moreover, from the point of view of a non-expert user (even from that of an expert), choosing a suitable classifier for a given problem is a difficult task. The second contribution, therefore, deals with the recommendation of time series classifiers, for which we will use a meta-learning approach. Finally, the third contribution consists of proposing a method to explain the prediction of time series classifiers, in which we calculate the relevance of each region of a series in the prediction. This method of explanation is based on perturbations, for which we will consider specific and realistic transformations for the time series.BES-2016-07689
Contributions to Time Series Classification: Meta-Learning and Explainability
141 p.La presente tesis incluye 3 contribuciones de diferentes tipos al área de la clasificación supervisada de series temporales, un campo en auge por la cantidad de series temporales recolectadas dÃa a dÃa en una gran variedad en ámbitos. En este contexto, la cantidad de métodos disponibles para clasificar series temporales es cada vez más grande, siendo los clasificadores cada vez más competitivos y variados. De esta manera, la primera contribución de la tesis consiste en proponer una taxonomÃa de los clasificadores de series temporales basados en distancias, donde se hace una revisión exhaustiva de los métodos existentes y sus costes computacionales. Además, desde el punto de vista de un/a usuario/a no experto/a (incluso desde la de un/a experto/a), elegir un clasificador adecuado para un problema concreto es una tarea difÃcil. En la segunda contribución, por tanto, se aborda la recomendación de clasificadores de series temporales, para lo que usaremos un enfoque basado en el meta-aprendizaje. Por último, la tercera contribución consiste en proponer un método para explicar la predicción de los clasificadores de series temporales, en el que calculamos la relevancia de cada región de una serie en la predicción. Este método de explicación está basado en perturbaciones, para lo que consideraremos transformaciones especÃficas y realistas para las series temporales
Deep Time-Series Clustering: A Review
We present a comprehensive, detailed review of time-series data analysis, with emphasis on deep time-series clustering (DTSC), and a case study in the context of movement behavior clustering utilizing the deep clustering method. Specifically, we modified the DCAE architectures to suit time-series data at the time of our prior deep clustering work. Lately, several works have been carried out on deep clustering of time-series data. We also review these works and identify state-of-the-art, as well as present an outlook on this important field of DTSC from five important perspectives
Recommended from our members
Fast, Scalable, and Accurate Algorithms for Time-Series Analysis
Time is a critical element for the understanding of natural processes (e.g., earthquakes and weather) or human-made artifacts (e.g., stock market and speech signals). The analysis of time series, the result of sequentially collecting observations of such processes and artifacts, is becoming increasingly prevalent across scientific and industrial applications. The extraction of non-trivial features (e.g., patterns, correlations, and trends) in time series is a critical step for devising effective time-series mining methods for real-world problems and the subject of active research for decades. In this dissertation, we address this fundamental problem by studying and presenting computational methods for efficient unsupervised learning of robust feature representations from time series. Our objective is to (i) simplify and unify the design of scalable and accurate time-series mining algorithms; and (ii) provide a set of readily available tools for effective time-series analysis. We focus on applications operating solely over time-series collections and on applications where the analysis of time series complements the analysis of other types of data, such as text and graphs.
For applications operating solely over time-series collections, we propose a generic computational framework, GRAIL, to learn low-dimensional representations that natively preserve the invariances offered by a given time-series comparison method. GRAIL represents a departure from classic approaches in the time-series literature where representation methods are agnostic to the similarity function used in subsequent learning processes. GRAIL relies on the attractive idea that once we construct the data-to-data similarity matrix most time-series mining tasks can be trivially solved. To overcome scalability issues associated with approaches relying on such matrices, GRAIL exploits time-series clustering to construct a small set of landmark time series and learns representations to reduce the data-to-data matrix to a data-to-landmark points matrix. To demonstrate the effectiveness of GRAIL, we first present domain-independent, highly accurate, and scalable time-series clustering methods to facilitate exploration and summarization of time-series collections. Then, we show that GRAIL representations, when combined with suitable methods, significantly outperform, in terms of efficiency and accuracy, state-of-the-art methods in major time-series mining tasks, such as querying, clustering, classification, sampling, and visualization. Overall, GRAIL rises as a new primitive for highly accurate, yet scalable, time-series analysis.
For applications where the analysis of time series complements the analysis of other types of data, such as text and graphs, we propose generic, simple, and lightweight methodologies to learn features from time-varying measurements. Such applications often organize operations over different types of data in a pipeline such that one operation provides input---in the form of feature vectors---to subsequent operations. To reason about the temporal patterns and trends in the underlying features, we need to (i) track the evolution of features over different time periods; and (ii) transform these time-varying features into actionable knowledge (e.g., forecasting an outcome). To address this challenging problem, we propose principled approaches to model time-varying features and study two large-scale, real-world, applications. Specifically, we first study the problem of predicting the impact of scientific concepts through temporal analysis of characteristics extracted from the metadata and full text of scientific articles. Then, we explore the promise of harnessing temporal patterns in behavioral signals extracted from web search engine logs for early detection of devastating diseases. In both applications, combinations of features with time-series relevant features yielded the greatest impact than any other indicator considered in our analysis. We believe that our simple methodology, along with the interesting domain-specific findings that our work revealed, will motivate new studies across different scientific and industrial settings
Shapelet Transforms for Univariate and Multivariate Time Series Classification
Time Series Classification (TSC) is a growing field of machine learning research. One particular algorithm from the TSC literature is the Shapelet Transform (ST). Shapelets are a phase independent subsequences that are extracted from times series to form discriminatory features. It has been shown that using the shapelets to transform the datasets into a new space can improve performance. One of the major problems with ST, is that the algorithm is O(n2m4), where n is the number of time series and m is the length of the series. As a problem increases in sizes, or additional dimensions are added, the algorithm quickly becomes computationally infeasible.
The research question addressed is whether the shapelet transform be improved in terms of accuracy and speed. Making algorithmic improvements to shapelets will enable the development of multivariate shapelet algorithms that can attempt to solve much larger problems in realistic time frames.
In support of this thesis a new distance early abandon method is proposed. A class balancing algorithm is implemented, which uses a one vs. all multi class information gain that enables heuristics which were developed for two class problems. To support these improvements a large scale analysis of the best shapelet algorithms is conducted as part of a larger experimental evaluation. ST is proven to be one of the most accurate algorithms in TSC on the UCR-UEA datasets. Contract classification is proposed for shapelets, where a fixed run time is set, and the number of shapelets is bounded. Four search algorithms are evaluated with fixed run times of one hour and one day, three of which are not significantly worse than a full enumeration. Finally, three multivariate shapelet algorithms are developed and compared to benchmark results and multivariate dynamic time warping
Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey
Data Augmentation (DA) has emerged as an indispensable strategy in Time
Series Classification (TSC), primarily due to its capacity to amplify training
samples, thereby bolstering model robustness, diversifying datasets, and
curtailing overfitting. However, the current landscape of DA in TSC is plagued
with fragmented literature reviews, nebulous methodological taxonomies,
inadequate evaluative measures, and a dearth of accessible, user-oriented
tools. In light of these challenges, this study embarks on an exhaustive
dissection of DA methodologies within the TSC realm. Our initial approach
involved an extensive literature review spanning a decade, revealing that
contemporary surveys scarcely capture the breadth of advancements in DA for
TSC, prompting us to meticulously analyze over 100 scholarly articles to
distill more than 60 unique DA techniques. This rigorous analysis precipitated
the formulation of a novel taxonomy, purpose-built for the intricacies of DA in
TSC, categorizing techniques into five principal echelons:
Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and
Automated Data Augmentation. Our taxonomy promises to serve as a robust
navigational aid for scholars, offering clarity and direction in method
selection. Addressing the conspicuous absence of holistic evaluations for
prevalent DA techniques, we executed an all-encompassing empirical assessment,
wherein upwards of 15 DA strategies were subjected to scrutiny across 8 UCR
time-series datasets, employing ResNet and a multi-faceted evaluation paradigm
encompassing Accuracy, Method Ranking, and Residual Analysis, yielding a
benchmark accuracy of 88.94 +- 11.83%. Our investigation underscored the
inconsistent efficacies of DA techniques, with..