57,010 research outputs found
Active Sampling for Class Probability Estimation and Ranking
In many cost-sensitive environments class probability estimates are used by decision
makers to evaluate the expected utility from a set of alternatives. Supervised
learning can be used to build class probability estimates; however, it often is very
costly to obtain training data with class labels. Active sampling acquires data incrementally,
at each phase identifying especially useful additional data for labeling,
and can be used to economize on examples needed for learning. We outline the
critical features for an active sampling approach and present an active sampling
method for estimating class probabilities and ranking. BOOTSTRAP-LV identifies particularly
informative new data for learning based on the variance in probability estimates,
and by accounting for a particular data item's informative value for the
rest of the input space. We show empirically that the method reduces the number
of data items that must be obtained and labeled, across a wide variety of domains.
We investigate the contribution of the components of the algorithm and show that
each provides valuable information to help identify informative examples. We also
compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING,a n existing active sampling
method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain class probability estimation accuracy
and provide insights on the behavior of the algorithms. Finally, to further our
understanding of the contributions made by the elements of BOOTSTRAP-LV, we experiment
with a new active sampling algorithm drawing from both UNCERTAINIY
SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive
with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests
more general implications for improving existing active sampling algorithms for
classification.Information Systems Working Papers Serie
- …