3 research outputs found
Robust Super-Level Set Estimation using Gaussian Processes
This paper focuses on the problem of determining as large a region as
possible where a function exceeds a given threshold with high probability. We
assume that we only have access to a noise-corrupted version of the function
and that function evaluations are costly. To select the next query point, we
propose maximizing the expected volume of the domain identified as above the
threshold as predicted by a Gaussian process, robustified by a variance term.
We also give asymptotic guarantees on the exploration effect of the algorithm,
regardless of the prior misspecification. We show by various numerical examples
that our approach also outperforms existing techniques in the literature in
practice.Comment: Accepted to ECML 201
Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling
Obtaining large annotated datasets is critical for training successful
machine learning models and it is often a bottleneck in practice. Weak
supervision offers a promising alternative for producing labeled datasets
without ground truth annotations by generating probabilistic labels using
multiple noisy heuristics. This process can scale to large datasets and has
demonstrated state of the art performance in diverse domains such as healthcare
and e-commerce. One practical issue with learning from user-generated
heuristics is that their creation requires creativity, foresight, and domain
expertise from those who hand-craft them, a process which can be tedious and
subjective. We develop the first framework for interactive weak supervision in
which a method proposes heuristics and learns from user feedback given on each
proposed heuristic. Our experiments demonstrate that only a small number of
feedback iterations are needed to train models that achieve highly competitive
test set performance without access to ground truth training labels. We conduct
user studies, which show that users are able to effectively provide feedback on
heuristics and that test set results track the performance of simulated
oracles.Comment: Accepted as a conference paper at ICLR 202
Sequential Bayesian Risk Set Inference for Robust Discrete Optimization via Simulation
Optimization via simulation (OvS) procedures that assume the simulation
inputs are generated from the real-world distributions are subject to the risk
of selecting a suboptimal solution when the distributions are substituted with
input models estimated from finite real-world data -- known as input model
risk. Focusing on discrete OvS, this paper proposes a new Bayesian framework
for analyzing input model risk of implementing an arbitrary solution, ,
where uncertainty about the input models is captured by a posterior
distribution. We define the -level risk set of solution as the set
of solutions whose expected performance is better than by a practically
meaningful margin given common input models with significant
probability () under the posterior distribution. The user-specified
parameters, and , control robustness of the procedure to the
desired level as well as guards against unnecessary conservatism. An empty risk
set implies that there is no practically better solution than with
significant probability even though the real-world input distributions are
unknown. For efficient estimation of the risk set, the conditional mean
performance of a solution given a set of input distributions is modeled as a
Gaussian process (GP) that takes the solution-distributions pair as an input.
In particular, our GP model allows both parametric and nonparametric input
models. We propose the sequential risk set inference procedure that estimates
the risk set and selects the next solution-distributions pair to simulate using
the posterior GP at each iteration. We show that simulating the pair expected
to change the risk set estimate the most in the next iteration is the
asymptotic one-step optimal sampling rule that minimizes the number of
incorrectly classified solutions, if the procedure runs without stopping.Comment: Under review since September 201