191 research outputs found
Quantifying Epistemic Uncertainty in Deep Learning
Uncertainty quantification is at the core of the reliability and robustness
of machine learning. In this paper, we provide a theoretical framework to
dissect the uncertainty, especially the epistemic component, in deep learning
into procedural variability (from the training procedure) and data variability
(from the training data), which is the first such attempt in the literature to
our best knowledge. We then propose two approaches to estimate these
uncertainties, one based on influence function and one on batching. We
demonstrate how our approaches overcome the computational difficulties in
applying classical statistical methods. Experimental evaluations on multiple
problem settings corroborate our theory and illustrate how our framework and
estimation can provide direct guidance on modeling and data collection effort
to improve deep learning performance
Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits
We study pure exploration with infinitely many bandit arms generated i.i.d.
from an unknown distribution. Our goal is to efficiently select a single high
quality arm whose average reward is, with probability , within
of being among the top -fraction of arms; this is a natural
adaptation of the classical PAC guarantee for infinite action sets. We consider
both the fixed confidence and fixed budget settings, aiming respectively for
minimal expected and fixed sample complexity.
For fixed confidence, we give an algorithm with expected sample complexity
. This is
optimal except for the factor, and the -dependence
closes a quadratic gap in the literature. For fixed budget, we show the
asymptotically optimal sample complexity as is
to leading order.
Equivalently, the optimal failure probability given exactly samples decays
as , up to a factor inside the
exponent. The constant depends explicitly on the problem parameters
(including the unknown arm distribution) through a certain Fisher information
distance. Even the strictly super-linear dependence on was not
known and resolves a question of Grossman and Moshkovitz (FOCS 2016, SIAM
Journal on Computing 2020)
Online Learning of Energy Consumption for Navigation of Electric Vehicles
Energy efficient navigation constitutes an important challenge in electric vehicles, due to their limited battery capacity. We employ a Bayesian approach to model the energy consumption at road segments for efficient navigation. In order to learn the model parameters, we develop an online learning framework and investigate several exploration strategies such as Thompson Sampling and Upper Confidence Bound. We then extend our online learning framework to the multi-agent setting, where multiple vehicles adaptively navigate and learn the parameters of the energy model. We analyze Thompson Sampling and establish rigorous regret bounds on its performance in the single-agent and multi-agent settings, through an analysis of the algorithm under batched feedback. Finally, we demonstrate the performance of our methods via experiments on several real-world city road networks
Online Learning of Energy Consumption for Navigation of Electric Vehicles
Energy efficient navigation constitutes an important challenge in electric
vehicles, due to their limited battery capacity. We employ a Bayesian approach
to model the energy consumption at road segments for efficient navigation. In
order to learn the model parameters, we develop an online learning framework
and investigate several exploration strategies such as Thompson Sampling and
Upper Confidence Bound. We then extend our online learning framework to the
multi-agent setting, where multiple vehicles adaptively navigate and learn the
parameters of the energy model. We analyze Thompson Sampling and establish
rigorous regret bounds on its performance in the single-agent and multi-agent
settings, through an analysis of the algorithm under batched feedback. Finally,
we demonstrate the performance of our methods via experiments on several
real-world city road networks.Comment: Extension of arXiv:2003.0141
Modeling Persistent Trends in Distributions
We present a nonparametric framework to model a short sequence of probability
distributions that vary both due to underlying effects of sequential
progression and confounding noise. To distinguish between these two types of
variation and estimate the sequential-progression effects, our approach
leverages an assumption that these effects follow a persistent trend. This work
is motivated by the recent rise of single-cell RNA-sequencing experiments over
a brief time course, which aim to identify genes relevant to the progression of
a particular biological process across diverse cell populations. While
classical statistical tools focus on scalar-response regression or
order-agnostic differences between distributions, it is desirable in this
setting to consider both the full distributions as well as the structure
imposed by their ordering. We introduce a new regression model for ordinal
covariates where responses are univariate distributions and the underlying
relationship reflects consistent changes in the distributions over increasing
levels of the covariate. This concept is formalized as a "trend" in
distributions, which we define as an evolution that is linear under the
Wasserstein metric. Implemented via a fast alternating projections algorithm,
our method exhibits numerous strengths in simulations and analyses of
single-cell gene expression data.Comment: To appear in: Journal of the American Statistical Associatio
Recommended from our members
Generalized Probabilistic Bisection for Stochastic Root-Finding
This thesis studies the stochastic root-finding problem, which consists of estimating the point x∗ that solves the equation h(x∗) = 0, where the function h : (0,1) → R is learned via a stochastic simulator (oracle). Instead of focusing on modeling h(·), we develop statistical methodologies that directly infer x∗ following a fully Bayesian approach. To do so, we investigate procedures that generalize the Probabilistic Bisection Algorithm (PBA) first introduced in Horstein (1963). The PBA is a one-dimensional stochastic root-finding routine which builds an explicit Bayesian representation (i.e., a posterior density) for x∗ based on the history of noisy function evaluations and sampling locations. The PBA starts by assuming that x∗ is the realized value of an absolutely continuous random variable, X∗ ∼ g0, with prior density g0. Then, it recursively updates a posterior, gn, leveraging the information provided by the signs (positive/negative) of the noisy function evaluations — which inform the direction where x∗ is located with respect to a given location, x—. Due to observational noise, the oracle responses are correct only with probability p(x). Waeber et al. (2013) showed that sampling at the median of gn is an optimal sampling strategy and established exponential convergence of the posterior gn to a Dirac mass at the true x∗ under the very restrictive assumption that the probability of correct response p(x) is known and constant for all x; however, in the most general and practical settings the latter condition no longer holds and the only way to implement the PBA is to estimate p(·).In the first part of this thesis, we state the Generalized PBA (G-PBA), where the above assumption is relaxed to the case where the sampling distribution of the oracle is unknown and location-dependent. Namely, as in standard PBA, we rely on a knowledge state to approximate the posterior of the root location. To implement the corresponding Bayesian updating, we also carry out inference of p(·). To this end we utilize batched querying in combination with a variety of frequentist and Bayesian estimators based on majority vote, as well as the underlying functional responses, if available. For guiding sampling selection we propose two families of sampling policies: batched Information Di- rected Sampling and Randomized Quantile Sampling, which is a reminiscent of Thompson Sampling and a generalization of the median sampling as in classical PBA. The latter leads to the first main conclusion: the G-PBA is able to efficiently learn p(·) and X∗ simultaneously.In the second part of this thesis, we propose to leverage the spatial structure of a typical oracle by constructing a non-parametric statistical surrogate for p(·) based on binomial regression. The latter leads to the second main conclusion: surrogate modeling allows to determine the batch size for querying the oracle adaptively as a function of the estimated predictive uncertainty of p(·).In the last part of this thesis, we present extensive numerical experiments in order to evaluate our sampling strategies (information-based or randomized). In particular we demonstrate the efficiency of randomized quantile sampling for balancing the ex- ploration/exploitation component; moreover, we show that spatial surrogate modeling results in significant gains relative to the local estimators, as quantified by the improved quality of the resulting root estimates (namely lower absolute residuals, narrower credible intervals and dramatically higher probability coverage). Our work is motivated by the root-finding sub-routine in pricing of Bermudan financial derivatives, illustrated in the last section of this thesis
Steady-State Co-Kriging Models
In deterministic computer experiments, a computer code can often be run at different levels of complexity/fidelity and a hierarchy of levels of code can be obtained. The higher the fidelity and hence the computational cost, the more accurate output data can be obtained. Methods based on the co-kriging methodology Cressie (2015) for predicting the output of a high-fidelity computer code by combining data generated to varying levels of fidelity have become popular over the last two decades. For instance, Kennedy and O\u27Hagan (2000) first propose to build a metamodel for multi-level computer codes by using an auto-regressive model structure. Forrester et al. (2007) provide details on estimation of the model parameters and further investigate the use of co-kriging for multi-fidelity optimization based on the efficient global optimization algorithm Jones et al. (1998). Qian and Wu (2008) propose a Bayesian hierarchical modeling approach for combining low-accuracy and high-accuracy experiments. More recently, Gratiet and Cannamela (2015) propose sequential design strategies using fast cross-validation techniques for multi-fidelity computer codes.;This research intends to extend the co-kriging metamodeling methodology to study steady-state simulation experiments. First, the mathematical structure of co-kriging is extended to take into account heterogeneous simulation output variances. Next, efficient steady-state simulation experimental designs are investigated for co-kriging to achieve a high prediction accuracy for estimation of steady-state parameters. Specifically, designs consisting of replicated longer simulation runs at a few design points and replicated shorter simulation runs at a larger set of design points will be considered. Also, design with no replicated simulation runs at long simulation is studied, along with different methods for calculating the output variance in absence of replicated outputs.;Stochastic co-kriging (SCK) method is applied to an M/M/1, as well as an M/M/5 queueing system. In both examples, the prediction performance of the SCK model is promising. It is also shown that the SCK method provides better response surfaces compared to the SK method
- …