12 research outputs found
Optimal Transport in the Face of Noisy Data
Optimal transport distances are popular and theoretically well understood in
the context of data-driven prediction. A flurry of recent work has popularized
these distances for data-driven decision-making as well although their merits
in this context are far less well understood. This in contrast to the more
classical entropic distances which are known to enjoy optimal statistical
properties. This begs the question when, if ever, optimal transport distances
enjoy similar statistical guarantees. Optimal transport methods are shown here
to enjoy optimal statistical guarantees for decision problems faced with noisy
data
Optimal Learning for Structured Bandits
We study structured multi-armed bandits, which is the problem of online
decision-making under uncertainty in the presence of structural information. In
this problem, the decision-maker needs to discover the best course of action
despite observing only uncertain rewards over time. The decision-maker is aware
of certain structural information regarding the reward distributions and would
like to minimize their regret by exploiting this information, where the regret
is its performance difference against a benchmark policy that knows the best
action ahead of time. In the absence of structural information, the classical
upper confidence bound (UCB) and Thomson sampling algorithms are well known to
suffer only minimal regret. As recently pointed out, neither algorithms are,
however, capable of exploiting structural information that is commonly
available in practice. We propose a novel learning algorithm that we call DUSA
whose worst-case regret matches the information-theoretic regret lower bound up
to a constant factor and can handle a wide range of structural information. Our
algorithm DUSA solves a dual counterpart of the regret lower bound at the
empirical reward distribution and follows its suggested play. Our proposed
algorithm is the first computationally viable learning policy for structured
bandit problems that has asymptotic minimal regret
A General Framework for Optimal Data-Driven Optimization
We propose a statistically optimal approach to construct data-driven
decisions for stochastic optimization problems. Fundamentally, a data-driven
decision is simply a function that maps the available training data to a
feasible action. It can always be expressed as the minimizer of a surrogate
optimization model constructed from the data. The quality of a data-driven
decision is measured by its out-of-sample risk. An additional quality measure
is its out-of-sample disappointment, which we define as the probability that
the out-of-sample risk exceeds the optimal value of the surrogate optimization
model. An ideal data-driven decision should minimize the out-of-sample risk
simultaneously with respect to every conceivable probability measure as the
true measure is unkown. Unfortunately, such ideal data-driven decisions are
generally unavailable. This prompts us to seek data-driven decisions that
minimize the out-of-sample risk subject to an upper bound on the out-of-sample
disappointment. We prove that such Pareto-dominant data-driven decisions exist
under conditions that allow for interesting applications: the unknown
data-generating probability measure must belong to a parametric ambiguity set,
and the corresponding parameters must admit a sufficient statistic that
satisfies a large deviation principle. We can further prove that the surrogate
optimization model must be a distributionally robust optimization problem
constructed from the sufficient statistic and the rate function of its large
deviation principle. Hence the optimal method for mapping data to decisions is
to solve a distributionally robust optimization model. Maybe surprisingly, this
result holds even when the training data is non-i.i.d. Our analysis reveals how
the structural properties of the data-generating stochastic process impact the
shape of the ambiguity set underlying the optimal distributionally robust
model.Comment: 52 page
Exterior-point Optimization for Nonconvex Learning
In this paper we present the nonconvex exterior-point optimization solver
(NExOS) -- a novel first-order algorithm tailored to constrained nonconvex
learning problems. We consider the problem of minimizing a convex function over
nonconvex constraints, where the projection onto the constraint set is
single-valued around local minima. A wide range of nonconvex learning problems
have this structure including (but not limited to) sparse and low-rank
optimization problems. By exploiting the underlying geometry of the constraint
set, NExOS finds a locally optimal point by solving a sequence of penalized
problems with strictly decreasing penalty parameters. NExOS solves each
penalized problem by applying a first-order algorithm, which converges linearly
to a local minimum of the corresponding penalized formulation under regularity
conditions. Furthermore, the local minima of the penalized problems converge to
a local minimum of the original problem as the penalty parameter goes to zero.
We implement NExOS in the open-source Julia package NExOS.jl, which has been
extensively tested on many instances from a wide variety of learning problems.
We demonstrate that our algorithm, in spite of being general purpose,
outperforms specialized methods on several examples of well-known nonconvex
learning problems involving sparse and low-rank optimization. For sparse
regression problems, NExOS finds locally optimal solutions which dominate
glmnet in terms of support recovery, yet its training loss is smaller by an
order of magnitude. For low-rank optimization with real-world data, NExOS
recovers solutions with 3 fold training loss reduction, but with a proportion
of explained variance that is 2 times better compared to the nuclear norm
heuristic.Comment: 40 pages, 6 figure
Energy-optimal Timetable Design for Sustainable Metro Railway Networks
We present our collaboration with Thales Canada Inc, the largest provider of
communication-based train control (CBTC) systems worldwide. We study the
problem of designing energy-optimal timetables in metro railway networks to
minimize the effective energy consumption of the network, which corresponds to
simultaneously minimizing total energy consumed by all the trains and
maximizing the transfer of regenerative braking energy from suitable braking
trains to accelerating trains. We propose a novel data-driven linear
programming model that minimizes the total effective energy consumption in a
metro railway network, capable of computing the optimal timetable in real-time,
even for some of the largest CBTC systems in the world. In contrast with
existing works, which are either NP-hard or involve multiple stages requiring
extensive simulation, our model is a single linear programming model capable of
computing the energy-optimal timetable subject to the constraints present in
the railway network. Furthermore, our model can predict the total energy
consumption of the network without requiring time-consuming simulations, making
it suitable for widespread use in managerial settings. We apply our model to
Shanghai Railway Network's Metro Line 8 -- one of the largest and busiest
railway services in the world -- and empirically demonstrate that our model
computes energy-optimal timetables for thousands of active trains spanning an
entire service period of one day in real-time (solution time less than one
second on a standard desktop), achieving energy savings between approximately
20.93% and 28.68%. Given the compelling advantages, our model is in the process
of being integrated into Thales Canada Inc's industrial timetable compiler.Comment: 28 pages, 8 figures, 2 table
Generalized Gauss Inequalities via Semidefinite Programming
A sharp upper bound on the probability of a random vector falling outside a polytope, based solely on the first and second moments of its distribution, can be computed efficiently using semidefinite programming. However, this Chebyshev-type bound tends to be overly conservative since it is determined by a discrete worst-case distribution. In this paper we obtain a less pessimistic Gauss-type bound by imposing the additional requirement that the random vector's distribution must be unimodal. We prove that this generalized Gauss bound still admits an exact and tractable semidefinite representation. Moreover, we demonstrate that both the Chebyshev and Gauss bounds can be obtained within a unified framework using a generalized notion of unimodality. We also offer new perspectives on the computational solution of generalized moment problems, since we use concepts from Choquet theory instead of traditional duality arguments to derive semidefinite representations for worst-case probability bounds
Branch-and-Bound Performance Estimation Programming: A Unified Methodology for Constructing Optimal Optimization Methods
We present the Branch-and-Bound Performance Estimation Programming (BnB-PEP),
a unified methodology for constructing optimal first-order methods for convex
and nonconvex optimization. BnB-PEP poses the problem of finding the optimal
optimization method as a nonconvex but practically tractable quadratically
constrained quadratic optimization problem and solves it to certifiable global
optimality using a customized branch-and-bound algorithm. By directly
confronting the nonconvexity, BnB-PEP offers significantly more flexibility and
removes the many limitations of the prior methodologies. Our customized
branch-and-bound algorithm, through exploiting specific problem structures,
outperforms the latest off-the-shelf implementations by orders of magnitude,
accelerating the solution time from hours to seconds and weeks to minutes. We
apply BnB-PEP to several setups for which the prior methodologies do not apply
and obtain methods with bounds that improve upon prior state-of-the-art
results. Finally, we use the BnB-PEP methodology to find proofs with potential
function structures, thereby systematically generating analytical convergence
proofs.Comment: 65 pages, 7 figures, 17 table