4,290 research outputs found
Toward a generic representation of random variables for machine learning
This paper presents a pre-processing and a distance which improve the
performance of machine learning algorithms working on independent and
identically distributed stochastic processes. We introduce a novel
non-parametric approach to represent random variables which splits apart
dependency and distribution without losing any information. We also propound an
associated metric leveraging this representation and its statistical estimate.
Besides experiments on synthetic datasets, the benefits of our contribution is
illustrated through the example of clustering financial time series, for
instance prices from the credit default swaps market. Results are available on
the website www.datagrapple.com and an IPython Notebook tutorial is available
at www.datagrapple.com/Tech for reproducible research.Comment: submitted to Pattern Recognition Letter
- …