Search CORE

78,382 research outputs found

Testing and Learning on Distributions with Symmetric Noise Invariance

Author: Law Ho Chung Leon
Sejdinovic Dino
Yau Christopher
Publication venue
Publication date: 01/01/2017
Field of study

Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric two-sample testing and learning on distributions. However, it is rarely that all possible differences between samples are of interest -- discovered differences can be due to different types of measurement noise, data collection artefacts or other irrelevant sources of variability. We propose distances between distributions which encode invariance to additive symmetric noise, aimed at testing whether the assumed true underlying processes differ. Moreover, we construct invariant features of distributions, leading to learning algorithms robust to the impairment of the input distributions with symmetric additive noise.Comment: 22 page

arXiv.org e-Print Archive

University of Birmingham Research Portal

Oxford University Research Archive

The University of Manchester - Institutional Repository

Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss

Author: Alippi Cesare
Boracchi Giacomo
Carrera Diego
Roveri Manuel
Publication venue
Publication date: 01/01/2016
Field of study

We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data dimension scales. In particular, we consider a general approach where changes are detected by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame of several change-detection methods, its effectiveness when data dimension scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the symmetric Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data dimension increases. This problem, which we refer to as \emph{detectability loss}, is due to the linear relationship between the variance of the log-likelihood and the data dimension. We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on real-world datasets and that can be harmful even at low data-dimensions (say, 10)

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano