116 research outputs found
Active Learning for Regression with Aggregated Outputs
Due to the privacy protection or the difficulty of data collection, we cannot
observe individual outputs for each instance, but we can observe aggregated
outputs that are summed over multiple instances in a set in some real-world
applications. To reduce the labeling cost for training regression models for
such aggregated data, we propose an active learning method that sequentially
selects sets to be labeled to improve the predictive performance with fewer
labeled sets. For the selection measurement, the proposed method uses the
mutual information, which quantifies the reduction of the uncertainty of the
model parameters by observing the aggregated output. With Bayesian linear basis
functions for modeling outputs given an input, which include approximated
Gaussian processes and neural networks, we can efficiently calculate the mutual
information in a closed form. With the experiments using various datasets, we
demonstrate that the proposed method achieves better predictive performance
with fewer labeled sets than existing methods
- …