4 research outputs found
Regret Bounds and Experimental Design for Estimate-then-Optimize
In practical applications, data is used to make decisions in two steps:
estimation and optimization. First, a machine learning model estimates
parameters for a structural model relating decisions to outcomes. Second, a
decision is chosen to optimize the structural model's predicted outcome as if
its parameters were correctly estimated. Due to its flexibility and simple
implementation, this ``estimate-then-optimize'' approach is often used for
data-driven decision-making. Errors in the estimation step can lead
estimate-then-optimize to sub-optimal decisions that result in regret, i.e., a
difference in value between the decision made and the best decision available
with knowledge of the structural model's parameters. We provide a novel bound
on this regret for smooth and unconstrained optimization problems. Using this
bound, in settings where estimated parameters are linear transformations of
sub-Gaussian random vectors, we provide a general procedure for experimental
design to minimize the regret resulting from estimate-then-optimize. We
demonstrate our approach on simple examples and a pandemic control application
Optimizer's Information Criterion: Dissecting and Correcting Bias in Data-Driven Optimization
In data-driven optimization, the sample performance of the obtained decision
typically incurs an optimistic bias against the true performance, a phenomenon
commonly known as the Optimizer's Curse and intimately related to overfitting
in machine learning. Common techniques to correct this bias, such as
cross-validation, require repeatedly solving additional optimization problems
and are therefore computationally expensive. We develop a general bias
correction approach, building on what we call Optimizer's Information Criterion
(OIC), that directly approximates the first-order bias and does not require
solving any additional optimization problems. Our OIC generalizes the
celebrated Akaike Information Criterion to evaluate the objective performance
in data-driven optimization, which crucially involves not only model fitting
but also its interplay with the downstream optimization. As such it can be used
for decision selection instead of only model selection. We apply our approach
to a range of data-driven optimization formulations comprising empirical and
parametric models, their regularized counterparts, and furthermore contextual
optimization. Finally, we provide numerical validation on the superior
performance of our approach under synthetic and real-world datasets