7 research outputs found
Transitions, Losses, and Re-parameterizations: Elements of Prediction Games
This thesis presents some geometric insights into three different
types of two-player prediction games â namely general learning
task, prediction with expert advice, and online convex
optimization. These games differ in the nature of the opponent
(stochastic, adversarial, or intermediate), the order of the
players' move, and the utility function. The insights shed some
light on the understanding of the intrinsic barriers of the
prediction problems and the design of computationally efficient
learning algorithms with strong theoretical guarantees (such as
generalizability, statistical consistency, and constant regret
etc.). The main contributions of the thesis are:
⢠Leveraging concepts from statistical decision theory, we
develop a necessary toolkit for formalizing the prediction games
mentioned above and quantifying the objective of them.
⢠We investigate the cost-sensitive classification problem
which is an instantiation of the general learning task, and
demonstrate the hardness of this problem by producing the lower
bounds on the minimax risk of it.
Then we analyse the impact of imposing constraints (such as
corruption level, and privacy requirements etc.) on the general
learning task. This naturally leads us to further investigation
of strong data processing inequalities which is a fundamental
concept in information theory.
Furthermore, by extending the hypothesis testing interpretation
of standard privacy definitions, we propose an asymmetric
(prioritized) privacy definition.
⢠We study efficient merging schemes for prediction with expert
advice problem and the geometric properties (mixability and
exp-concavity) of the loss functions that guarantee constant
regret bounds. As a result of our study, we construct two types
of link functions (one using calculus approach and another using
geometric approach) that can re-parameterize any binary mixable
loss into an exp-concave loss.
⢠We focus on some recent algorithms for online convex
optimization, which exploit the easy nature of the data (such as
sparsity, predictable sequences, and curved losses) in order to
achieve better regret bound while ensuring the protection against
the worst case scenario. We unify some of these existing
techniques to obtain new update rules for the cases when these
easy instances occur together, and analyse the regret bounds of
them
Fast rates in statistical and online learning
The speed with which a learning algorithm converges as it is presented with
more data is a central problem in machine learning --- a fast rate of
convergence means less data is needed for the same level of performance. The
pursuit of fast rates in online and statistical learning has led to the
discovery of many conditions in learning theory under which fast learning is
possible. We show that most of these conditions are special cases of a single,
unifying condition, that comes in two forms: the central condition for 'proper'
learning algorithms that always output a hypothesis in the given model, and
stochastic mixability for online algorithms that may make predictions outside
of the model. We show that under surprisingly weak assumptions both conditions
are, in a certain sense, equivalent. The central condition has a
re-interpretation in terms of convexity of a set of pseudoprobabilities,
linking it to density estimation under misspecification. For bounded losses, we
show how the central condition enables a direct proof of fast rates and we
prove its equivalence to the Bernstein condition, itself a generalization of
the Tsybakov margin condition, both of which have played a central role in
obtaining fast rates in statistical learning. Yet, while the Bernstein
condition is two-sided, the central condition is one-sided, making it more
suitable to deal with unbounded losses. In its stochastic mixability form, our
condition generalizes both a stochastic exp-concavity condition identified by
Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying
conditions thus provide a substantial step towards a characterization of fast
rates in statistical learning, similar to how classical mixability
characterizes constant regret in the sequential prediction with expert advice
setting.Comment: 69 pages, 3 figure
Composite multiclass losses
We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a âproper composite lossâ, which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We subsume results on âclassification calibrationâ by relating it to properness. We determine the stationarity condition, Bregman representation, order-sensitivity, and quasi-convexity of multiclass proper losses. We then characterise the existence and uniqueness of the composite representation formulti class losses. We show how the composite representation is related to other core properties of a loss: mixability, admissibility and (strong) convexity of multiclass losses which we characterise in terms of the Hessian of the Bayes risk. We show that the simple integral representation for binary proper losses can not be extended to multiclass losses but offer concrete guidance regarding how to design different loss functions. The conclusion drawn from these results is that the proper composite representation is a natural and convenient tool for the design of multiclass loss functions
Exp-Concavity of Proper Composite Losses
The goal of online prediction with expert advice is to find a decision strategy which will perform almost as well as the best expert in a given pool of experts, on any sequence of outcomes. This problem has been widely studied and O (T â ) O(T) and O(logT) O(logâĄT) regret bounds can be achieved for convex losses and strictly convex losses with bounded first and second derivatives respectively. In special cases like the Aggregating Algorithm with mixable losses and the Weighted Average Algorithm with exp-concave losses, it is possible to achieve O(1) O(1) regret bounds. But mixability and exp -concavity are roughly equivalent under certain conditions. Thus by understanding the underlying relationship between these two notions we can gain the best of both algorithms (strong theoretical performance guarantees of the Aggregating Algorithm and the computational efficiency of the Weighted Average Algorithm). In this paper we provide a complete characterization of the exp-concavity of any proper composite loss. Using this characterization and the mixability condition of proper losses, we show that it is possible to transform (re-parameterize) any β β -mixable binary proper loss into a β β-exp-concave composite loss with the same β β. In the multi-class case, we propose an approximation approach for this transformatio