3,251 research outputs found
Calibrated Surrogate Losses for Classification with Label-Dependent Costs
We present surrogate regret bounds for arbitrary surrogate losses in the
context of binary classification with label-dependent costs. Such bounds relate
a classifier's risk, assessed with respect to a surrogate loss, to its
cost-sensitive classification risk. Two approaches to surrogate regret bounds
are developed. The first is a direct generalization of Bartlett et al. [2006],
who focus on margin-based losses and cost-insensitive classification, while the
second adopts the framework of Steinwart [2007] based on calibration functions.
Nontrivial surrogate regret bounds are shown to exist precisely when the
surrogate loss satisfies a "calibration" condition that is easily verified for
many common losses. We apply this theory to the class of uneven margin losses,
and characterize when these losses are properly calibrated. The uneven hinge,
squared error, exponential, and sigmoid losses are then treated in detail.Comment: 33 pages, 7 figure
Class Proportion Estimation with Application to Multiclass Anomaly Rejection
This work addresses two classification problems that fall under the heading
of domain adaptation, wherein the distributions of training and testing
examples differ. The first problem studied is that of class proportion
estimation, which is the problem of estimating the class proportions in an
unlabeled testing data set given labeled examples of each class. Compared to
previous work on this problem, our approach has the novel feature that it does
not require labeled training data from one of the classes. This property allows
us to address the second domain adaptation problem, namely, multiclass anomaly
rejection. Here, the goal is to design a classifier that has the option of
assigning a "reject" label, indicating that the instance did not arise from a
class present in the training data. We establish consistent learning strategies
for both of these domain adaptation problems, which to our knowledge are the
first of their kind. We also implement the class proportion estimation
technique and demonstrate its performance on several benchmark data sets.Comment: Accepted to AISTATS 2014. 15 pages. 2 figure
Information Constraints and Financial Aid Policy
One justification for public support of higher education is that prospective students, particularly those from underprivileged groups, lack complete information about the costs and benefits of a college degree. Beyond financial considerations, students may also lack information about what they need to do academically to prepare for and successfully complete college. Yet until recently, college aid programs have typically paid little attention to students' information constraints, and the complexity of some programs can exacerbate the problem. This chapter describes the information problems facing prospective students as well as their consequences, drawing upon economic theory and empirical evidence.
Recommended from our members
Pell Grants as Performance-Based Scholarships? An Examination of Satisfactory Academic Progress Requirements in the Nation's Largest Need-Based Aid Program
The Federal Pell Grant Program is the nation’s largest need-based grant program. While students’ initial eligibility for the Pell is based on financial need, renewal is contingent on meeting minimum academic standards similar to those in models of performance-based scholarships, including a grade point average (GPA) requirement and ratio of credits completed compared to those attempted. In this study, we describe federal satisfactory academic progress (SAP) requirements and illustrate the policy’s implementation in a statewide community college system. Using state administrative data, we demonstrate that a substantial portion of Pell recipients are at risk for Pell ineligibility due to their failure to meet SAP GPA or credit completion requirements. We then leverage the GPA component of the policy to explore the impacts of failure to meet standards on early college persistence and achievement, earning a credential, and transferring to a four-year college using two methodological approaches: regression discontinuity (RD) and difference-in-differences (DD). Our results across the two approaches are mixed, with the RD providing null estimates and the DD indicating some statistically significant impacts, including a negative effect on early college persistence. We conclude by discussing the implications for future research.Educational Leadership and Polic
Query Learning with Exponential Query Costs
In query learning, the goal is to identify an unknown object while minimizing
the number of "yes" or "no" questions (queries) posed about that object. A
well-studied algorithm for query learning is known as generalized binary search
(GBS). We show that GBS is a greedy algorithm to optimize the expected number
of queries needed to identify the unknown object. We also generalize GBS in two
ways. First, we consider the case where the cost of querying grows
exponentially in the number of queries and the goal is to minimize the expected
exponential cost. Then, we consider the case where the objects are partitioned
into groups, and the objective is to identify only the group to which the
object belongs. We derive algorithms to address these issues in a common,
information-theoretic framework. In particular, we present an exact formula for
the objective function in each case involving Shannon or Renyi entropy, and
develop a greedy algorithm for minimizing it. Our algorithms are demonstrated
on two applications of query learning, active learning and emergency response.Comment: 15 page
- …