689 research outputs found
On Consistent Surrogate Risk Minimization and Property Elicitation
Abstract Surrogate risk minimization is a popular framework for supervised learning; property elicitation is a widely studied area in probability forecasting, machine learning, statistics and economics. In this paper, we connect these two themes by showing that calibrated surrogate losses in supervised learning can essentially be viewed as eliciting or estimating certain properties of the underlying conditional label distribution that are sufficient to construct an optimal classifier under the target loss of interest. Our study helps to shed light on the design of convex calibrated surrogates. We also give a new framework for designing convex calibrated surrogates under low-noise conditions by eliciting properties that allow one to construct 'coarse' estimates of the underlying distribution
On Consistent Surrogate Risk Minimization and Property Elicitation
Abstract Surrogate risk minimization is a popular framework for supervised learning; property elicitation is a widely studied area in probability forecasting, machine learning, statistics and economics. In this paper, we connect these two themes by showing that calibrated surrogate losses in supervised learning can essentially be viewed as eliciting or estimating certain properties of the underlying conditional label distribution that are sufficient to construct an optimal classifier under the target loss of interest. Our study helps to shed light on the design of convex calibrated surrogates. We also give a new framework for designing convex calibrated surrogates under low-noise conditions by eliciting properties that allow one to construct 'coarse' estimates of the underlying distribution
Trading off Consistency and Dimensionality of Convex Surrogates for the Mode
In multiclass classification over outcomes, the outcomes must be embedded
into the reals with dimension at least in order to design a consistent
surrogate loss that leads to the "correct" classification, regardless of the
data distribution. For large , such as in information retrieval and
structured prediction tasks, optimizing a surrogate in dimensions is
often intractable. We investigate ways to trade off surrogate loss dimension,
the number of problem instances, and restricting the region of consistency in
the simplex for multiclass classification. Following past work, we examine an
intuitive embedding procedure that maps outcomes into the vertices of convex
polytopes in a low-dimensional surrogate space. We show that full-dimensional
subsets of the simplex exist around each point mass distribution for which
consistency holds, but also, with less than dimensions, there exist
distributions for which a phenomenon called hallucination occurs, which is when
the optimal report under the surrogate loss is an outcome with zero
probability. Looking towards application, we derive a result to check if
consistency holds under a given polytope embedding and low-noise assumption,
providing insight into when to use a particular embedding. We provide examples
of embedding outcomes into the -dimensional unit cube and outcomes into the -dimensional permutahedron under low-noise
assumptions. Finally, we demonstrate that with multiple problem instances, we
can learn the mode with dimensions over the whole simplex
- β¦