11 research outputs found
On Classification-Calibration of Gamma-Phi Losses
Gamma-Phi losses constitute a family of multiclass classification loss
functions that generalize the logistic and other common losses, and have found
application in the boosting literature. We establish the first general
sufficient condition for the classification-calibration of such losses. In
addition, we show that a previously proposed sufficient condition is in fact
not sufficient.Comment: 21 page
NIPS - Not Even Wrong? A Systematic Review of Empirically Complete Demonstrations of Algorithmic Effectiveness in the Machine Learning and Artificial Intelligence Literature
Objective: To determine the completeness of argumentative steps necessary to
conclude effectiveness of an algorithm in a sample of current ML/AI supervised
learning literature.
Data Sources: Papers published in the Neural Information Processing Systems
(NeurIPS, n\'ee NIPS) journal where the official record showed a 2017 year of
publication.
Eligibility Criteria: Studies reporting a (semi-)supervised model, or
pre-processing fused with (semi-)supervised models for tabular data.
Study Appraisal: Three reviewers applied the assessment criteria to determine
argumentative completeness. The criteria were split into three groups,
including: experiments (e.g real and/or synthetic data), baselines (e.g
uninformed and/or state-of-art) and quantitative comparison (e.g. performance
quantifiers with confidence intervals and formal comparison of the algorithm
against baselines).
Results: Of the 121 eligible manuscripts (from the sample of 679 abstracts),
99\% used real-world data and 29\% used synthetic data. 91\% of manuscripts did
not report an uninformed baseline and 55\% reported a state-of-art baseline.
32\% reported confidence intervals for performance but none provided references
or exposition for how these were calculated. 3\% reported formal comparisons.
Limitations: The use of one journal as the primary information source may not
be representative of all ML/AI literature. However, the NeurIPS conference is
recognised to be amongst the top tier concerning ML/AI studies, so it is
reasonable to consider its corpus to be representative of high-quality
research.
Conclusion: Using the 2017 sample of the NeurIPS supervised learning corpus
as an indicator for the quality and trustworthiness of current ML/AI research,
it appears that complete argumentative chains in demonstrations of algorithmic
effectiveness are rare
New Directions in Online Learning: Boosting, Partial Information, and Non-Stationarity
Online learning, where a learning algorithm fits a model on-the-fly with streaming data, has become an important research area in machine learning. Batch learning, where the entire data set has to be available to the learning algorithm, is not always a suitable paradigm for the big data era. It is increasingly common in many practical situations, such as online ads prediction or control of self-driving cars, that data instances naturally arrive in a sequential manner. In these situations, researchers want to update their model in an online fashion. This dissertation pursues several topics at the frontier of online learning research.
In Chapter 2 and Chapter 3, the journey starts with online boosting. Online boosting studies how to combine multiple online weak learners to get a stronger learner. Chapter 2 considers online multi-class classification problems. Chapter 3 focuses on the more challenging multi-label ranking problem where there are multiple correct labels and the learner outputs a ranking of labels based on their relevance. In both chapters, an optimal algorithm and an adaptive algorithm are proposed. The optimal algorithms require a minimal number of weak learners to attain the desired accuracy. The adaptive algorithms are practically more useful since they do not require a priori knowledge about the strength of weak learners and are more computationally efficient. The adaptive algorithms are not statistically optimal but they still come with reasonable performance guarantees. The empirical results on real data sets support the theoretical findings and the proposed boosting algorithms outperformed existing competitors on benchmark data sets.
Chapter 4 considers the partial information setting, where the learner does not receive the true labels. Partial feedback is common in practice as obtaining complete feedback can be costly.
The chapter revisits the boosting algorithms that are presented in Chapter 2 and Chapter 3 and extends them to work with partial information feedback. Despite the learner receiving much less information, comparable performance guarantees can be made.
Later in Chapter 5 and Chapter 6, we move on to another interesting area in online learning called restless bandit problems. Unlike the classical (stochastic) multi-armed bandit problems where the reward distributions are unknown but stationary, in restless bandit problems the distributions can change over time. This extra layer of complexity allows us to study more complicated models, but the analysis becomes even more difficult. In restless bandit problems, it is assumed that each arm has a state that evolves according to an unknown Markov process, and the reward distribution depends on the arm's current state. This setting can be thought of as a sub-class of reinforcement learning and the partial observability inherent in this problem makes the analysis very challenging. The well known Thompson Sampling algorithm is analyzed and a Bayesian regret bound for it is derived. Chapter 5 considers the episodic case where the system periodically resets. Chapter 6 extends the analysis to the more challenging non-episodic (i.e., infinite time horizon) case. In both settings, Thompson Sampling algorithms (with slight modifications) enjoy sub-linear regret bounds, and the empirical results on simulated data support this fact. The experiments also suggest the possibility that the algorithm can be used in the frequentist setting even though the theoretical bounds are only shown for the Bayesian regret.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155110/1/yhjung_1.pd
A Boosting Approach to Reinforcement Learning
Reducing reinforcement learning to supervised learning is a well-studied and
effective approach that leverages the benefits of compact function
approximation to deal with large-scale Markov decision processes.
Independently, the boosting methodology (e.g. AdaBoost) has proven to be
indispensable in designing efficient and accurate classification algorithms by
combining inaccurate rules-of-thumb.
In this paper, we take a further step: we reduce reinforcement learning to a
sequence of weak learning problems. Since weak learners perform only marginally
better than random guesses, such subroutines constitute a weaker assumption
than the availability of an accurate supervised learning oracle. We prove that
the sample complexity and running time bounds of the proposed method do not
explicitly depend on the number of states.
While existing results on boosting operate on convex losses, the value
function over policies is non-convex. We show how to use a non-convex variant
of the Frank-Wolfe method for boosting, that additionally improves upon the
known sample complexity and running time even for reductions to supervised
learning.Comment: Now in sync with camera ready for NeurIPS 202
Boosted Off-Policy Learning
We investigate boosted ensemble models for off-policy learning from logged
bandit feedback. Toward this goal, we propose a new boosting algorithm that
directly optimizes an estimate of the policy's expected reward. We analyze this
algorithm and prove that the empirical risk decreases (possibly exponentially
fast) with each round of boosting, provided a "weak" learning condition is
satisfied. We further show how the base learner reduces to standard supervised
learning problems. Experiments indicate that our algorithm can outperform deep
off-policy learning and methods that simply regress on the observed rewards,
thereby demonstrating the benefits of both boosting and choosing the right
learning objective