3 research outputs found

    Robust Super-Level Set Estimation using Gaussian Processes

    Full text link
    This paper focuses on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability. We assume that we only have access to a noise-corrupted version of the function and that function evaluations are costly. To select the next query point, we propose maximizing the expected volume of the domain identified as above the threshold as predicted by a Gaussian process, robustified by a variance term. We also give asymptotic guarantees on the exploration effect of the algorithm, regardless of the prior misspecification. We show by various numerical examples that our approach also outperforms existing techniques in the literature in practice.Comment: Accepted to ECML 201

    Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling

    Full text link
    Obtaining large annotated datasets is critical for training successful machine learning models and it is often a bottleneck in practice. Weak supervision offers a promising alternative for producing labeled datasets without ground truth annotations by generating probabilistic labels using multiple noisy heuristics. This process can scale to large datasets and has demonstrated state of the art performance in diverse domains such as healthcare and e-commerce. One practical issue with learning from user-generated heuristics is that their creation requires creativity, foresight, and domain expertise from those who hand-craft them, a process which can be tedious and subjective. We develop the first framework for interactive weak supervision in which a method proposes heuristics and learns from user feedback given on each proposed heuristic. Our experiments demonstrate that only a small number of feedback iterations are needed to train models that achieve highly competitive test set performance without access to ground truth training labels. We conduct user studies, which show that users are able to effectively provide feedback on heuristics and that test set results track the performance of simulated oracles.Comment: Accepted as a conference paper at ICLR 202

    Sequential Bayesian Risk Set Inference for Robust Discrete Optimization via Simulation

    Full text link
    Optimization via simulation (OvS) procedures that assume the simulation inputs are generated from the real-world distributions are subject to the risk of selecting a suboptimal solution when the distributions are substituted with input models estimated from finite real-world data -- known as input model risk. Focusing on discrete OvS, this paper proposes a new Bayesian framework for analyzing input model risk of implementing an arbitrary solution, xx, where uncertainty about the input models is captured by a posterior distribution. We define the α\alpha-level risk set of solution xx as the set of solutions whose expected performance is better than xx by a practically meaningful margin (>δ)(>\delta) given common input models with significant probability (>α>\alpha) under the posterior distribution. The user-specified parameters, δ\delta and α\alpha, control robustness of the procedure to the desired level as well as guards against unnecessary conservatism. An empty risk set implies that there is no practically better solution than xx with significant probability even though the real-world input distributions are unknown. For efficient estimation of the risk set, the conditional mean performance of a solution given a set of input distributions is modeled as a Gaussian process (GP) that takes the solution-distributions pair as an input. In particular, our GP model allows both parametric and nonparametric input models. We propose the sequential risk set inference procedure that estimates the risk set and selects the next solution-distributions pair to simulate using the posterior GP at each iteration. We show that simulating the pair expected to change the risk set estimate the most in the next iteration is the asymptotic one-step optimal sampling rule that minimizes the number of incorrectly classified solutions, if the procedure runs without stopping.Comment: Under review since September 201
    corecore