175,965 research outputs found
Online Learning with Optimism and Delay
Inspired by the demands of real-time climate and weather forecasting, we
develop optimistic online learning algorithms that require no parameter tuning
and have optimal regret guarantees under delayed feedback. Our algorithms --
DORM, DORM+, and AdaHedgeD -- arise from a novel reduction of delayed online
learning to optimistic online learning that reveals how optimistic hints can
mitigate the regret penalty caused by delay. We pair this delay-as-optimism
perspective with a new analysis of optimistic learning that exposes its
robustness to hinting errors and a new meta-algorithm for learning effective
hinting strategies in the presence of delay. We conclude by benchmarking our
algorithms on four subseasonal climate forecasting tasks, demonstrating low
regret relative to state-of-the-art forecasting models.Comment: ICML 2021. 9 pages of main paper and 26 pages of appendix tex
Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation
This paper studies the robustness aspect of reinforcement learning algorithms
in the presence of errors. Specifically, we revisit the benchmark problem of
discrete-time linear quadratic regulation (LQR) and study the long-standing
open question: Under what conditions is the policy iteration method robustly
stable for dynamical systems with unbounded, continuous state and action
spaces? Using advanced stability results in control theory, it is shown that
policy iteration for LQR is inherently robust to small errors and enjoys local
input-to-state stability: whenever the error in each iteration is bounded and
small, the solutions of the policy iteration algorithm are also bounded, and,
moreover, enter and stay in a small neighborhood of the optimal LQR solution.
As an application, a novel off-policy optimistic least-squares policy iteration
for the LQR problem is proposed, when the system dynamics are subjected to
additive stochastic disturbances. The proposed new results in robust
reinforcement learning are validated by a numerical example.Comment: arXiv admin note: text overlap with arXiv:2005.0952
A Winnow-Based Approach to Context-Sensitive Spelling Correction
A large class of machine-learning problems in natural language require the
characterization of linguistic context. Two characteristic properties of such
problems are that their feature space is of very high dimensionality, and their
target concepts refer to only a small subset of the features in the space.
Under such conditions, multiplicative weight-update algorithms such as Winnow
have been shown to have exceptionally good theoretical properties. We present
an algorithm combining variants of Winnow and weighted-majority voting, and
apply it to a problem in the aforementioned class: context-sensitive spelling
correction. This is the task of fixing spelling errors that happen to result in
valid words, such as substituting "to" for "too", "casual" for "causal", etc.
We evaluate our algorithm, WinSpell, by comparing it against BaySpell, a
statistics-based method representing the state of the art for this task. We
find: (1) When run with a full (unpruned) set of features, WinSpell achieves
accuracies significantly higher than BaySpell was able to achieve in either the
pruned or unpruned condition; (2) When compared with other systems in the
literature, WinSpell exhibits the highest performance; (3) The primary reason
that WinSpell outperforms BaySpell is that WinSpell learns a better linear
separator; (4) When run on a test set drawn from a different corpus than the
training set was drawn from, WinSpell is better able than BaySpell to adapt,
using a strategy we will present that combines supervised learning on the
training set with unsupervised learning on the (noisy) test set.Comment: To appear in Machine Learning, Special Issue on Natural Language
Learning, 1999. 25 page
Semi-supervised Contrastive Outlier removal for Pseudo Expectation Maximization (SCOPE)
Semi-supervised learning is the problem of training an accurate predictive
model by combining a small labeled dataset with a presumably much larger
unlabeled dataset. Many methods for semi-supervised deep learning have been
developed, including pseudolabeling, consistency regularization, and
contrastive learning techniques. Pseudolabeling methods however are highly
susceptible to confounding, in which erroneous pseudolabels are assumed to be
true labels in early iterations, thereby causing the model to reinforce its
prior biases and thereby fail to generalize to strong predictive performance.
We present a new approach to suppress confounding errors through a method we
describe as Semi-supervised Contrastive Outlier removal for Pseudo Expectation
Maximization (SCOPE). Like basic pseudolabeling, SCOPE is related to
Expectation Maximization (EM), a latent variable framework which can be
extended toward understanding cluster-assumption deep semi-supervised
algorithms. However, unlike basic pseudolabeling which fails to adequately take
into account the probability of the unlabeled samples given the model, SCOPE
introduces an outlier suppression term designed to improve the behavior of EM
iteration given a discrimination DNN backbone in the presence of outliers. Our
results show that SCOPE greatly improves semi-supervised classification
accuracy over a baseline, and furthermore when combined with consistency
regularization achieves the highest reported accuracy for the semi-supervised
CIFAR-10 classification task using 250 and 4000 labeled samples. Moreover, we
show that SCOPE reduces the prevalence of confounding errors during
pseudolabeling iterations by pruning erroneous high-confidence pseudolabeled
samples that would otherwise contaminate the labeled set in subsequent
retraining iterations
- …