702 research outputs found
Classification of time series by shapelet transformation
Time-series classification (TSC) problems present a specific challenge for classification algorithms: how to measure similarity between series. A \emph{shapelet} is a time-series subsequence that allows for TSC based on local, phase-independent similarity in shape. Shapelet-based classification uses the similarity between a shapelet and a series as a discriminatory feature. One benefit of the shapelet approach is that shapelets are comprehensible, and can offer insight into the problem domain. The original shapelet-based classifier embeds the shapelet-discovery algorithm in a decision tree, and uses information gain to assess the quality of candidates, finding a new shapelet at each node of the tree through an enumerative search. Subsequent research has focused mainly on techniques to speed up the search. We examine how best to use the shapelet primitive to construct classifiers. We propose a single-scan shapelet algorithm that finds the best shapelets, which are used to produce a transformed dataset, where each of the features represent the distance between a time series and a shapelet. The primary advantages over the embedded approach are that the transformed data can be used in conjunction with any classifier, and that there is no recursive search for shapelets. We demonstrate that the transformed data, in conjunction with more complex classifiers, gives greater accuracy than the embedded shapelet tree. We also evaluate three similarity measures that produce equivalent results to information gain in less time. Finally, we show that by conducting post-transform clustering of shapelets, we can enhance the interpretability of the transformed data. We conduct our experiments on 29 datasets: 17 from the UCR repository, and 12 we provide ourselve
Feature-based time-series analysis
This work presents an introduction to feature-based time-series analysis. The
time series as a data type is first described, along with an overview of the
interdisciplinary time-series analysis literature. I then summarize the range
of feature-based representations for time series that have been developed to
aid interpretable insights into time-series structure. Particular emphasis is
given to emerging research that facilitates wide comparison of feature-based
representations that allow us to understand the properties of a time-series
dataset that make it suited to a particular feature-based representation or
analysis algorithm. The future of time-series analysis is likely to embrace
approaches that exploit machine learning methods to partially automate human
learning to aid understanding of the complex dynamical patterns in the time
series we measure from the world.Comment: 28 pages, 9 figure
An Experimental Evaluation of Nearest Neighbour Time Series Classification
Data mining research into time series classification (TSC) has focussed on alternative distance measures for nearest neighbour classifiers. It is standard practice to use 1-NN with Euclidean or dynamic time warping (DTW) distance as a straw man for comparison. As part of a wider investigation into elastic distance measures for TSC~\cite{lines14elastic}, we perform a series of experiments to test whether this standard practice is valid. Specifically, we compare 1-NN classifiers with Euclidean and DTW distance to standard classifiers, examine whether the performance of 1-NN Euclidean approaches that of 1-NN DTW as the number of cases increases, assess whether there is any benefit of setting for -NN through cross validation whether it is worth setting the warping path for DTW through cross validation and finally is it better to use a window or weighting for DTW. Based on experiments on 77 problems, we conclude that 1-NN with Euclidean distance is fairly easy to beat but 1-NN with DTW is not, if window size is set through cross validation
Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling
Spambot detection in online social networks is a long-lasting challenge
involving the study and design of detection techniques capable of efficiently
identifying ever-evolving spammers. Recently, a new wave of social spambots has
emerged, with advanced human-like characteristics that allow them to go
undetected even by current state-of-the-art algorithms. In this paper, we show
that efficient spambots detection can be achieved via an in-depth analysis of
their collective behaviors exploiting the digital DNA technique for modeling
the behaviors of social network users. Inspired by its biological counterpart,
in the digital DNA representation the behavioral lifetime of a digital account
is encoded in a sequence of characters. Then, we define a similarity measure
for such digital DNA sequences. We build upon digital DNA and the similarity
between groups of users to characterize both genuine accounts and spambots.
Leveraging such characterization, we design the Social Fingerprinting
technique, which is able to discriminate among spambots and genuine accounts in
both a supervised and an unsupervised fashion. We finally evaluate the
effectiveness of Social Fingerprinting and we compare it with three
state-of-the-art detection algorithms. Among the peculiarities of our approach
is the possibility to apply off-the-shelf DNA analysis techniques to study
online users behaviors and to efficiently rely on a limited number of
lightweight account characteristics
A Review of Subsequence Time Series Clustering
Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies
Interpretable time series neural representation for classification purposes
Deep learning has made significant advances in creating efficient
representations of time series data by automatically identifying complex
patterns. However, these approaches lack interpretability, as the time series
is transformed into a latent vector that is not easily interpretable. On the
other hand, Symbolic Aggregate approximation (SAX) methods allow the creation
of symbolic representations that can be interpreted but do not capture complex
patterns effectively. In this work, we propose a set of requirements for a
neural representation of univariate time series to be interpretable. We propose
a new unsupervised neural architecture that meets these requirements. The
proposed model produces consistent, discrete, interpretable, and visualizable
representations. The model is learned independently of any downstream tasks in
an unsupervised setting to ensure robustness. As a demonstration of the
effectiveness of the proposed model, we propose experiments on classification
tasks using UCR archive datasets. The obtained results are extensively compared
to other interpretable models and state-of-the-art neural representation
learning models. The experiments show that the proposed model yields, on
average better results than other interpretable approaches on multiple
datasets. We also present qualitative experiments to asses the interpretability
of the approach.Comment: International Conference on Data Science and Advanced Analytics
(DSAA) 202
- …