Toward a generic representation of random variables for machine learning

Donnat, Philippe; Marti, Gautier; Very, Philippe

research

Toward a generic representation of random variables for machine learning

Authors: Philippe Donnat
Gautier Marti
Philippe Very
Publication date: 3 September 2015
Publisher

Abstract

This paper presents a pre-processing and a distance which improve the performance of machine learning algorithms working on independent and identically distributed stochastic processes. We introduce a novel non-parametric approach to represent random variables which splits apart dependency and distribution without losing any information. We also propound an associated metric leveraging this representation and its statistical estimate. Besides experiments on synthetic datasets, the benefits of our contribution is illustrated through the example of clustering financial time series, for instance prices from the credit default swaps market. Results are available on the website www.datagrapple.com and an IPython Notebook tutorial is available at www.datagrapple.com/Tech for reproducible research.Comment: submitted to Pattern Recognition Letter

Similar works

Full text

Available Versions

HAL-Polytechnique

oai:HAL:hal-01196883v1

Last time updated on 20/04/2018