6,820 research outputs found
Bayesian batch active learning as sparse subset approximation
Leveraging the wealth of unlabeled data produced in recent years provides
great potential for improving supervised models. When the cost of acquiring
labels is high, probabilistic active learning methods can be used to greedily
select the most informative data points to be labeled. However, for many
large-scale problems standard greedy procedures become computationally
infeasible and suffer from negligible model change. In this paper, we introduce
a novel Bayesian batch active learning approach that mitigates these issues.
Our approach is motivated by approximating the complete data posterior of the
model parameters. While naive batch construction methods result in correlated
queries, our algorithm produces diverse batches that enable efficient active
learning at scale. We derive interpretable closed-form solutions akin to
existing active learning procedures for linear models, and generalize to
arbitrary models using random projections. We demonstrate the benefits of our
approach on several large-scale regression and classification tasks.Comment: NeurIPS 201
Understanding and Comparing Scalable Gaussian Process Regression for Big Data
As a non-parametric Bayesian model which produces informative predictive
distribution, Gaussian process (GP) has been widely used in various fields,
like regression, classification and optimization. The cubic complexity of
standard GP however leads to poor scalability, which poses challenges in the
era of big data. Hence, various scalable GPs have been developed in the
literature in order to improve the scalability while retaining desirable
prediction accuracy. This paper devotes to investigating the methodological
characteristics and performance of representative global and local scalable GPs
including sparse approximations and local aggregations from four main
perspectives: scalability, capability, controllability and robustness. The
numerical experiments on two toy examples and five real-world datasets with up
to 250K points offer the following findings. In terms of scalability, most of
the scalable GPs own a time complexity that is linear to the training size. In
terms of capability, the sparse approximations capture the long-term spatial
correlations, the local aggregations capture the local patterns but suffer from
over-fitting in some scenarios. In terms of controllability, we could improve
the performance of sparse approximations by simply increasing the inducing
size. But this is not the case for local aggregations. In terms of robustness,
local aggregations are robust to various initializations of hyperparameters due
to the local attention mechanism. Finally, we highlight that the proper hybrid
of global and local scalable GPs may be a promising way to improve both the
model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB
Large-scale Heteroscedastic Regression via Gaussian Process
Heteroscedastic regression considering the varying noises among observations
has many applications in the fields like machine learning and statistics. Here
we focus on the heteroscedastic Gaussian process (HGP) regression which
integrates the latent function and the noise function together in a unified
non-parametric Bayesian framework. Though showing remarkable performance, HGP
suffers from the cubic time complexity, which strictly limits its application
to big data. To improve the scalability, we first develop a variational sparse
inference algorithm, named VSHGP, to handle large-scale datasets. Furthermore,
two variants are developed to improve the scalability and capability of VSHGP.
The first is stochastic VSHGP (SVSHGP) which derives a factorized evidence
lower bound, thus enhancing efficient stochastic variational inference. The
second is distributed VSHGP (DVSHGP) which (i) follows the Bayesian committee
machine formalism to distribute computations over multiple local VSHGP experts
with many inducing points; and (ii) adopts hybrid parameters for experts to
guard against over-fitting and capture local variety. The superiority of DVSHGP
and SVSHGP as compared to existing scalable heteroscedastic/homoscedastic GPs
is then extensively verified on various datasets.Comment: 14 pages, 15 figure
Batch Active Learning from the Perspective of Sparse Approximation
Active learning enables efficient model training by leveraging interactions
between machine learning agents and human annotators. We study and propose a
novel framework that formulates batch active learning from the sparse
approximation's perspective. Our active learning method aims to find an
informative subset from the unlabeled data pool such that the corresponding
training loss function approximates its full data pool counterpart. We realize
the framework as sparsity-constrained discontinuous optimization problems,
which explicitly balance uncertainty and representation for large-scale
applications and could be solved by greedy or proximal iterative hard
thresholding algorithms. The proposed method can adapt to various settings,
including both Bayesian and non-Bayesian neural networks. Numerical experiments
show that our work achieves competitive performance across different settings
with lower computational complexity.Comment: NeurIPS 2022 Workshop on Human in the Loop Learnin
- …