7,782 research outputs found
Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data
Similarity-based approaches represent a promising direction for time series
analysis. However, many such methods rely on parameter tuning, and some have
shortcomings if the time series are multivariate (MTS), due to dependencies
between attributes, or the time series contain missing data. In this paper, we
address these challenges within the powerful context of kernel methods by
proposing the robust \emph{time series cluster kernel} (TCK). The approach
taken leverages the missing data handling properties of Gaussian mixture models
(GMM) augmented with informative prior distributions. An ensemble learning
approach is exploited to ensure robustness to parameters by combining the
clustering results of many GMM to form the final kernel.
We evaluate the TCK on synthetic and real data and compare to other
state-of-the-art techniques. The experimental results demonstrate that the TCK
is robust to parameter choices, provides competitive results for MTS without
missing data and outstanding results for missing data.Comment: 23 pages, 6 figure
Nonparametric Bayesian multiple testing for longitudinal performance stratification
This paper describes a framework for flexible multiple hypothesis testing of
autoregressive time series. The modeling approach is Bayesian, though a blend
of frequentist and Bayesian reasoning is used to evaluate procedures.
Nonparametric characterizations of both the null and alternative hypotheses
will be shown to be the key robustification step necessary to ensure reasonable
Type-I error performance. The methodology is applied to part of a large
database containing up to 50 years of corporate performance statistics on
24,157 publicly traded American companies, where the primary goal of the
analysis is to flag companies whose historical performance is significantly
different from that expected due to chance.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS252 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Consensus clustering and functional interpretation of gene-expression data
Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFκB and the unfolded protein response in certain B-cell lymphomas
Clustering of discretely observed diffusion processes
In this paper a new dissimilarity measure to identify groups of assets
dynamics is proposed. The underlying generating process is assumed to be a
diffusion process solution of stochastic differential equations and observed at
discrete time. The mesh of observations is not required to shrink to zero. As
distance between two observed paths, the quadratic distance of the
corresponding estimated Markov operators is considered. Analysis of both
synthetic data and real financial data from NYSE/NASDAQ stocks, give evidence
that this distance seems capable to catch differences in both the drift and
diffusion coefficients contrary to other commonly used metrics
- …