Search CORE

208 research outputs found

Hedging predictions in machine learning

Author: Gammerman Alexander
Vovk Vladimir
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/02/2006
Field of study

Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This paper describes a new technique for "hedging" the predictions output by many such algorithms, including support vector machines, kernel ridge regression, kernel nearest neighbours, and by many other state-of-the-art methods. The hedged predictions for the labels of new objects include quantitative measures of their own accuracy and reliability. These measures are provably valid under the assumption of randomness, traditional in machine learning: the objects and their labels are assumed to be generated independently from the same probability distribution. In particular, it becomes possible to control (up to statistical fluctuations) the number of erroneous predictions by selecting a suitable confidence level. Validity being achieved automatically, the remaining goal of hedged prediction is efficiency: taking full account of the new objects' features and other available information to produce as accurate predictions as possible. This can be done successfully using the powerful machinery of modern machine learning.Comment: 24 pages; 9 figures; 2 tables; a version of this paper (with discussion and rejoinder) is to appear in "The Computer Journal

arXiv.org e-Print Archive

CiteSeerX

Royal Holloway Research Online

Royal Holloway - Pure

Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series

Author: Astola Jaakko
Gammerman Alex
Ryabko Boris
Publication venue
Publication date: 01/01/2005
Field of study

We show that Kolmogorov complexity and such its estimators as universal codes (or data compression methods) can be applied for hypotheses testing in a framework of classical mathematical statistics. The methods for identity testing and nonparametric testing of serial independence for time series are suggested.Comment: submitte

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Dagstuhl Research Online Publication Server

On-line predictive linear regression

Author: Gammerman Alex
Nouretdinov Ilia
Vovk Vladimir
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 21/11/2011
Field of study

We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. We are mainly interested in prediction intervals rather than point predictions. The standard treatment of prediction intervals in linear regression analysis has two drawbacks: (1) the classical prediction intervals guarantee that the probability of error is equal to the nominal significance level epsilon, but this property per se does not imply that the long-run frequency of error is close to epsilon; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error for the classical prediction intervals does equal the nominal significance level, up to statistical fluctuations. We also describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression.Comment: 34 pages; 6 figures; 1 table. arXiv admin note: substantial text overlap with arXiv:0906.312

arXiv.org e-Print Archive

Crossref

Conformal Changepoint Detection in Continuous Model Situations

Author: Gammerman Alex
Nouretdinov Ilia
Vovk Vladimir
Publication venue: COPA 2021 : 10th Symposium on Conformal and Probabilistic Prediction with Applications
Publication date: 10/09/2021
Field of study

Royal Holloway - Pure

Multi-level conformal clustering:A distribution-free technique for clustering and anomaly detection

Author: Fontana Matteo
Gammerman James
Nouretdinov Ilia
Rehal Daljit
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

In this work we present a clustering technique called multi-level conformal clustering (MLCC). The technique is hierarchical in nature because it can be performed at multiple significance levels which yields greater insight into the data than performing it at just one level. We describe the theoretical underpinnings of MLCC, compare and contrast it with the hierarchical clustering algorithm, and then apply it to real world datasets to assess its performance. There are several advantages to using MLCC over more classical clustering techniques: Once a significance level has been set, MLCC is able to automatically select the number of clusters. Furthermore, thanks to the conformal prediction framework the resulting clustering model has a clear statistical meaning without any assumptions about the distribution of the data. This statistical robustness also allows us to perform clustering and anomaly detection simultaneously. Moreover, due to the flexibility of the conformal prediction framework, our algorithm can be used on top of many other machine learning algorithms

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Royal Holloway - Pure

Modern Machine Learning Techniques and Their Applications to Medical Diagnostics

Author: Alexander Gammerman
Publication venue: Springer Berlin Heidelberg
Publication date: 01/01/2010
Field of study

Almae Matris Studiorum Campus

Crossref