52,856 research outputs found
A proposal of a methodological framework with experimental guidelines to investigate clustering stability on financial time series
We present in this paper an empirical framework motivated by the practitioner
point of view on stability. The goal is to both assess clustering validity and
yield market insights by providing through the data perturbations we propose a
multi-view of the assets' clustering behaviour. The perturbation framework is
illustrated on an extensive credit default swap time series database available
online at www.datagrapple.com.Comment: Accepted at ICMLA 201
Using Empirical Recurrence Rates Ratio For Time Series Data Similarity
Several methods exist in classification literature to quantify the similarity between two time series data sets. Applications of these methods range from the traditional Euclidean type metric to the more advanced Dynamic Time Warping metric. Most of these adequately address structural similarity but fail in meeting goals outside it. For example, a tool that could be excellent to identify the seasonal similarity between two time series vectors might prove inadequate in the presence of outliers. In this paper, we have proposed a unifying measure for binary classification that performed well while embracing several aspects of dissimilarity. This statistic is gaining prominence in various fields, such as geology and finance, and is crucial in time series database formation and clustering studies
Feature-based time-series analysis
This work presents an introduction to feature-based time-series analysis. The
time series as a data type is first described, along with an overview of the
interdisciplinary time-series analysis literature. I then summarize the range
of feature-based representations for time series that have been developed to
aid interpretable insights into time-series structure. Particular emphasis is
given to emerging research that facilitates wide comparison of feature-based
representations that allow us to understand the properties of a time-series
dataset that make it suited to a particular feature-based representation or
analysis algorithm. The future of time-series analysis is likely to embrace
approaches that exploit machine learning methods to partially automate human
learning to aid understanding of the complex dynamical patterns in the time
series we measure from the world.Comment: 28 pages, 9 figure
Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data
Similarity-based approaches represent a promising direction for time series
analysis. However, many such methods rely on parameter tuning, and some have
shortcomings if the time series are multivariate (MTS), due to dependencies
between attributes, or the time series contain missing data. In this paper, we
address these challenges within the powerful context of kernel methods by
proposing the robust \emph{time series cluster kernel} (TCK). The approach
taken leverages the missing data handling properties of Gaussian mixture models
(GMM) augmented with informative prior distributions. An ensemble learning
approach is exploited to ensure robustness to parameters by combining the
clustering results of many GMM to form the final kernel.
We evaluate the TCK on synthetic and real data and compare to other
state-of-the-art techniques. The experimental results demonstrate that the TCK
is robust to parameter choices, provides competitive results for MTS without
missing data and outstanding results for missing data.Comment: 23 pages, 6 figure
Highly comparative feature-based time-series classification
A highly comparative, feature-based approach to time series classification is
introduced that uses an extensive database of algorithms to extract thousands
of interpretable features from time series. These features are derived from
across the scientific time-series analysis literature, and include summaries of
time series in terms of their correlation structure, distribution, entropy,
stationarity, scaling properties, and fits to a range of time-series models.
After computing thousands of features for each time series in a training set,
those that are most informative of the class structure are selected using
greedy forward feature selection with a linear classifier. The resulting
feature-based classifiers automatically learn the differences between classes
using a reduced number of time-series properties, and circumvent the need to
calculate distances between time series. Representing time series in this way
results in orders of magnitude of dimensionality reduction, allowing the method
to perform well on very large datasets containing long time series or time
series of different lengths. For many of the datasets studied, classification
performance exceeded that of conventional instance-based classifiers, including
one nearest neighbor classifiers using Euclidean distances and dynamic time
warping and, most importantly, the features selected provide an understanding
of the properties of the dataset, insight that can guide further scientific
investigation
Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science
The purpose of the New York Workshop on Computer, Earth and Space Sciences is
to bring together the New York area's finest Astronomers, Statisticians,
Computer Scientists, Space and Earth Scientists to explore potential synergies
between their respective fields. The 2011 edition (CESS2011) was a great
success, and we would like to thank all of the presenters and participants for
attending. This year was also special as it included authors from the upcoming
book titled "Advances in Machine Learning and Data Mining for Astronomy". Over
two days, the latest advanced techniques used to analyze the vast amounts of
information now available for the understanding of our universe and our planet
were presented. These proceedings attempt to provide a small window into what
the current state of research is in this vast interdisciplinary field and we'd
like to thank the speakers who spent the time to contribute to this volume.Comment: Author lists modified. 82 pages. Workshop Proceedings from CESS 2011
in New York City, Goddard Institute for Space Studie
- …