Search CORE

372 research outputs found

Fast rates in statistical and online learning

Author: Grünwald Peter D.
Mehta Nishant A.
Reid Mark D.
van Erven Tim
Williamson Robert C.
Publication venue
Publication date: 01/01/2015
Field of study

The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.Comment: 69 pages, 3 figure

arXiv.org e-Print Archive

CWI's Institutional Repository

Leiden University Scholary Publications

Divergence Measures

Author
Publication venue: 'MDPI AG'
Publication date: 21/06/2022
Field of study

Data science, information theory, probability theory, statistical learning and other related disciplines greatly benefit from non-negative measures of dissimilarity between pairs of probability measures. These are known as divergence measures, and exploring their mathematical foundations and diverse applications is of significant interest. The present Special Issue, entitled “Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems”, includes eight original contributions, and it is focused on the study of the mathematical properties and applications of classical and generalized divergence measures from an information-theoretic perspective. It mainly deals with two key generalizations of the relative entropy: namely, the R_ényi divergence and the important class of f -divergences. It is our hope that the readers will find interest in this Special Issue, which will stimulate further research in the study of the mathematical foundations and applications of divergence measures

Directory of Open Access Books (DOAB)

Recommended from our members

Converting to Optimization in Machine Learning: Perturb-and-MAP, Differential Privacy, and Program Synthesis

Author: Balog Matej
Publication venue: University of Cambridge
Publication date: 01/02/2020
Field of study

On a mathematical level, most computational problems encountered in machine learning are instances of one of four abstract, fundamental problems: sampling, integration, optimization, and search. Thanks to the rich history of the respective mathematical fields, disparate methods with different properties have been developed for these four problem classes. As a result it can be beneficial to convert a problem from one abstract class into a problem of a different class, because the latter might come with insights, techniques, and algorithms well suited to the particular problem at hand. In particular, this thesis contributes four new methods and generalizations of existing methods for converting specific non-optimization machine learning tasks into optimization problems with more appealing properties. The first example is partition function estimation (an integration problem), where an existing algorithm -- the Gumbel trick -- for converting to the MAP optimization problem is generalized into a more general family of algorithms, such that other instances of this family have better statistical properties. Second, this family of algorithms is further generalized to another integration problem, the problem of estimating Rényi entropies. The third example shows how an intractable sampling problem arising when wishing to publicly release a database containing sensitive data in a safe ("differentially private") manner can be converted into an optimization problem using the theory of Reproducing Kernel Hilbert Spaces. Finally, the fourth case study casts the challenging discrete search problem of program synthesis from input-output examples as a supervised learning task that can be efficiently tackled using gradient-based optimization. In all four instances, the conversions result in novel algorithms with desirable properties. In the first instance, new generalizations of the Gumbel trick can be used to construct statistical estimators of the partition function that achieve the same estimation error while using up to 40% fewer samples. The second instance shows that unbiased estimators of the Rényi entropy can be constructed in the Perturb-and-MAP framework. The main contribution of the third instance is theoretical: the conversion shows that it is possible to construct an algorithm for releasing synthetic databases that approximate databases containing sensitive data in a mathematically precise sense, and to prove results about their approximation errors. Finally, the fourth conversion yields an algorithm for synthesising program source code from input-output examples that is able to solve test problems 1-3 orders of magnitude faster than a wide range of baselines

Apollo (Cambridge)

MPG.PuRe

Estimating labels from label proportions

Author: Caetano TS
Le QV
Quadrianto N
Smola AJ
Publication venue: Microtome Publishing
Publication date: 01/01/2008
Field of study

Consider the following problem: given sets of unlabeled observations, each set with known label proportions, predict the labels of another set of observations, also with known label proportions. This problem appears in areas like e-commerce, spam filtering and improper content detection. We present consistent estimators which can reconstruct the correct labels with high probability in a uniform convergence sense. Experiments show that our method works well in practice.

CiteSeerX

Crossref

Sussex Research Online

CUED - Cambridge University Engineering Department

Distinguishing cause from effect using observational data: methods and benchmarks

Author: Janzing Dominik
Mooij Joris M.
Peters Jonas
Schölkopf Bernhard
Zscheischler Jakob
Publication venue
Publication date: 01/01/2015
Field of study

The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 datasets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the "ground truth" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One of the best performing methods overall is the additive-noise method originally proposed by Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of 0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning Researc

arXiv.org e-Print Archive

MPG.PuRe

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Adaptivity in Online and Statistical Learning

Author: Mhammedi Zakaria
Publication venue
Publication date: 01/01/2021
Field of study

Many modern machine learning algorithms, though successful, are still based on heuristics. In a typical application, such heuristics may manifest in the choice of a specific Neural Network structure, its number of parameters, or the learning rate during training. Relying on these heuristics is not ideal from a computational perspective (often involving multiple runs of the algorithm), and can also lead to over-fitting in some cases. This motivates the following question: for which machine learning tasks/settings do there exist efficient algorithms that automatically adapt to the best parameters? Characterizing the settings where this is the case and designing corresponding (parameter-free) algorithms within the online learning framework constitutes one of this thesis' primary goals. Towards this end, we develop algorithms for constrained and unconstrained online convex optimization that can automatically adapt to various parameters of interest such as the Lipschitz constant, the curvature of the sequence of losses, and the norm of the comparator. We also derive new performance lower-bounds characterizing the limits of adaptivity for algorithms in these settings. Part of systematizing the choice of machine learning methods also involves having ``certificates'' for the performance of algorithms. In the statistical learning setting, this translates to having (tight) generalization bounds. Adaptivity can manifest here through data-dependent bounds that become small whenever the problem is ``easy''. In this thesis, we provide such data-dependent bounds for the expected loss (the standard risk measure) and other risk measures. We also explore how such bounds can be used in the context of risk-monotonicity

The Australian National University

Dimension-reduction and discrimination of neuronal multi-channel signals

Author: Kremper Helmut Alexander
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2006
Field of study

Dimensionsreduktion und Trennung neuronaler Multikanal-Signale

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg