9,805 research outputs found
Multi-View Active Learning in the Non-Realizable Case
The sample complexity of active learning under the realizability assumption
has been well-studied. The realizability assumption, however, rarely holds in
practice. In this paper, we theoretically characterize the sample complexity of
active learning in the non-realizable case under multi-view setting. We prove
that, with unbounded Tsybakov noise, the sample complexity of multi-view active
learning can be , contrasting to
single-view setting where the polynomial improvement is the best possible
achievement. We also prove that in general multi-view setting the sample
complexity of active learning with unbounded Tsybakov noise is
, where the order of is
independent of the parameter in Tsybakov noise, contrasting to previous
polynomial bounds where the order of is related to the parameter
in Tsybakov noise.Comment: 22 pages, 1 figur
Batch Policy Learning under Constraints
When learning policies for real-world domains, two important questions arise:
(i) how to efficiently use pre-collected off-policy, non-optimal behavior data;
and (ii) how to mediate among different competing objectives and constraints.
We thus study the problem of batch policy learning under multiple constraints,
and offer a systematic solution. We first propose a flexible meta-algorithm
that admits any batch reinforcement learning and online learning procedure as
subroutines. We then present a specific algorithmic instantiation and provide
performance guarantees for the main objective and all constraints. To certify
constraint satisfaction, we propose a new and simple method for off-policy
policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves
strong empirical results in different domains, including in a challenging
problem of simulated car driving subject to multiple constraints such as lane
keeping and smooth driving. We also show experimentally that our OPE method
outperforms other popular OPE techniques on a standalone basis, especially in a
high-dimensional setting
Thermodynamic graph-rewriting
We develop a new thermodynamic approach to stochastic graph-rewriting. The
ingredients are a finite set of reversible graph-rewriting rules called
generating rules, a finite set of connected graphs P called energy patterns and
an energy cost function. The idea is that the generators define the qualitative
dynamics, by showing which transformations are possible, while the energy
patterns and cost function specify the long-term probability of any
reachable graph. Given the generators and energy patterns, we construct a
finite set of rules which (i) has the same qualitative transition system as the
generators; and (ii) when equipped with suitable rates, defines a
continuous-time Markov chain of which is the unique fixed point. The
construction relies on the use of site graphs and a technique of `growth
policy' for quantitative rule refinement which is of independent interest. This
division of labour between the qualitative and long-term quantitative aspects
of the dynamics leads to intuitive and concise descriptions for realistic
models (see the examples in S4 and S5). It also guarantees thermodynamical
consistency (AKA detailed balance), otherwise known to be undecidable, which is
important for some applications. Finally, it leads to parsimonious
parameterizations of models, again an important point in some applications
Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms
We study the key framework of learning with abstention in the multi-class
classification setting. In this setting, the learner can choose to abstain from
making a prediction with some pre-defined cost. We present a series of new
theoretical and algorithmic results for this learning problem in the
predictor-rejector framework. We introduce several new families of surrogate
losses for which we prove strong non-asymptotic and hypothesis set-specific
consistency guarantees, thereby resolving positively two existing open
questions. These guarantees provide upper bounds on the estimation error of the
abstention loss function in terms of that of the surrogate loss. We analyze
both a single-stage setting where the predictor and rejector are learned
simultaneously and a two-stage setting crucial in applications, where the
predictor is learned in a first stage using a standard surrogate loss such as
cross-entropy. These guarantees suggest new multi-class abstention algorithms
based on minimizing these surrogate losses. We also report the results of
extensive experiments comparing these algorithms to the current
state-of-the-art algorithms on CIFAR-10, CIFAR-100 and SVHN datasets. Our
results demonstrate empirically the benefit of our new surrogate losses and
show the remarkable performance of our broadly applicable two-stage abstention
algorithm
Fairness Behind a Veil of Ignorance: A Welfare Analysis for Automated Decision Making
We draw attention to an important, yet largely overlooked aspect of
evaluating fairness for automated decision making systems---namely risk and
welfare considerations. Our proposed family of measures corresponds to the
long-established formulations of cardinal social welfare in economics, and is
justified by the Rawlsian conception of fairness behind a veil of ignorance.
The convex formulation of our welfare-based measures of fairness allows us to
integrate them as a constraint into any convex loss minimization pipeline. Our
empirical analysis reveals interesting trade-offs between our proposal and (a)
prediction accuracy, (b) group discrimination, and (c) Dwork et al.'s notion of
individual fairness. Furthermore and perhaps most importantly, our work
provides both heuristic justification and empirical evidence suggesting that a
lower-bound on our measures often leads to bounded inequality in algorithmic
outcomes; hence presenting the first computationally feasible mechanism for
bounding individual-level inequality.Comment: Conference: Thirty-second Conference on Neural Information Processing
Systems (NIPS 2018
Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms
Inductive learning is based on inferring a general rule from a finite data
set and using it to label new data. In transduction one attempts to solve the
problem of using a labeled training set to label a set of unlabeled points,
which are given to the learner prior to learning. Although transduction seems
at the outset to be an easier task than induction, there have not been many
provably useful algorithms for transduction. Moreover, the precise relation
between induction and transduction has not yet been determined. The main
theoretical developments related to transduction were presented by Vapnik more
than twenty years ago. One of Vapnik's basic results is a rather tight error
bound for transductive classification based on an exact computation of the
hypergeometric tail. While tight, this bound is given implicitly via a
computational routine. Our first contribution is a somewhat looser but explicit
characterization of a slightly extended PAC-Bayesian version of Vapnik's
transductive bound. This characterization is obtained using concentration
inequalities for the tail of sums of random variables obtained by sampling
without replacement. We then derive error bounds for compression schemes such
as (transductive) support vector machines and for transduction algorithms based
on clustering. The main observation used for deriving these new error bounds
and algorithms is that the unlabeled test points, which in the transductive
setting are known in advance, can be used in order to construct useful data
dependent prior distributions over the hypothesis space
- …