689 research outputs found

    On Consistent Surrogate Risk Minimization and Property Elicitation

    Get PDF
    Abstract Surrogate risk minimization is a popular framework for supervised learning; property elicitation is a widely studied area in probability forecasting, machine learning, statistics and economics. In this paper, we connect these two themes by showing that calibrated surrogate losses in supervised learning can essentially be viewed as eliciting or estimating certain properties of the underlying conditional label distribution that are sufficient to construct an optimal classifier under the target loss of interest. Our study helps to shed light on the design of convex calibrated surrogates. We also give a new framework for designing convex calibrated surrogates under low-noise conditions by eliciting properties that allow one to construct 'coarse' estimates of the underlying distribution

    On Consistent Surrogate Risk Minimization and Property Elicitation

    Get PDF
    Abstract Surrogate risk minimization is a popular framework for supervised learning; property elicitation is a widely studied area in probability forecasting, machine learning, statistics and economics. In this paper, we connect these two themes by showing that calibrated surrogate losses in supervised learning can essentially be viewed as eliciting or estimating certain properties of the underlying conditional label distribution that are sufficient to construct an optimal classifier under the target loss of interest. Our study helps to shed light on the design of convex calibrated surrogates. We also give a new framework for designing convex calibrated surrogates under low-noise conditions by eliciting properties that allow one to construct 'coarse' estimates of the underlying distribution

    Trading off Consistency and Dimensionality of Convex Surrogates for the Mode

    Full text link
    In multiclass classification over nn outcomes, the outcomes must be embedded into the reals with dimension at least nβˆ’1n-1 in order to design a consistent surrogate loss that leads to the "correct" classification, regardless of the data distribution. For large nn, such as in information retrieval and structured prediction tasks, optimizing a surrogate in nβˆ’1n-1 dimensions is often intractable. We investigate ways to trade off surrogate loss dimension, the number of problem instances, and restricting the region of consistency in the simplex for multiclass classification. Following past work, we examine an intuitive embedding procedure that maps outcomes into the vertices of convex polytopes in a low-dimensional surrogate space. We show that full-dimensional subsets of the simplex exist around each point mass distribution for which consistency holds, but also, with less than nβˆ’1n-1 dimensions, there exist distributions for which a phenomenon called hallucination occurs, which is when the optimal report under the surrogate loss is an outcome with zero probability. Looking towards application, we derive a result to check if consistency holds under a given polytope embedding and low-noise assumption, providing insight into when to use a particular embedding. We provide examples of embedding n=2dn = 2^{d} outcomes into the dd-dimensional unit cube and n=d!n = d! outcomes into the dd-dimensional permutahedron under low-noise assumptions. Finally, we demonstrate that with multiple problem instances, we can learn the mode with n2\frac{n}{2} dimensions over the whole simplex
    • …
    corecore