126 research outputs found
Diversity in Machine Learning
Machine learning methods have achieved good performance and been widely
applied in various real-world applications. They can learn the model adaptively
and be better fit for special requirements of different tasks. Generally, a
good machine learning system is composed of plentiful training data, a good
model training process, and an accurate inference. Many factors can affect the
performance of the machine learning process, among which the diversity of the
machine learning process is an important one. The diversity can help each
procedure to guarantee a total good machine learning: diversity of the training
data ensures that the training data can provide more discriminative information
for the model, diversity of the learned model (diversity in parameters of each
model or diversity among different base models) makes each parameter/model
capture unique or complement information and the diversity in inference can
provide multiple choices each of which corresponds to a specific plausible
local optimal result. Even though the diversity plays an important role in
machine learning process, there is no systematical analysis of the
diversification in machine learning system. In this paper, we systematically
summarize the methods to make data diversification, model diversification, and
inference diversification in the machine learning process, respectively. In
addition, the typical applications where the diversity technology improved the
machine learning performance have been surveyed, including the remote sensing
imaging tasks, machine translation, camera relocalization, image segmentation,
object detection, topic modeling, and others. Finally, we discuss some
challenges of the diversity technology in machine learning and point out some
directions in future work.Comment: Accepted by IEEE Acces
Fast determinantal point processes via distortion-free intermediate sampling
Given a fixed matrix , where , we study the
complexity of sampling from a distribution over all subsets of rows where the
probability of a subset is proportional to the squared volume of the
parallelepiped spanned by the rows (a.k.a. a determinantal point process). In
this task, it is important to minimize the preprocessing cost of the procedure
(performed once) as well as the sampling cost (performed repeatedly). To that
end, we propose a new determinantal point process algorithm which has the
following two properties, both of which are novel: (1) a preprocessing step
which runs in time , and (2) a sampling step which runs in
time, independent of the number of rows . We achieve this by introducing a
new regularized determinantal point process (R-DPP), which serves as an
intermediate distribution in the sampling procedure by reducing the number of
rows from to . Crucially, this intermediate distribution
does not distort the probabilities of the target sample. Our key novelty in
defining the R-DPP is the use of a Poisson random variable for controlling the
probabilities of different subset sizes, leading to new determinantal formulas
such as the normalization constant for this distribution. Our algorithm has
applications in many diverse areas where determinantal point processes have
been used, such as machine learning, stochastic optimization, data
summarization and low-rank matrix reconstruction
Diversifying Sparsity Using Variational Determinantal Point Processes
We propose a novel diverse feature selection method based on determinantal
point processes (DPPs). Our model enables one to flexibly define diversity
based on the covariance of features (similar to orthogonal matching pursuit) or
alternatively based on side information. We introduce our approach in the
context of Bayesian sparse regression, employing a DPP as a variational
approximation to the true spike and slab posterior distribution. We
subsequently show how this variational DPP approximation generalizes and
extends mean-field approximation, and can be learned efficiently by exploiting
the fast sampling properties of DPPs. Our motivating application comes from
bioinformatics, where we aim to identify a diverse set of genes whose
expression profiles predict a tumor type where the diversity is defined with
respect to a gene-gene interaction network. We also explore an application in
spatial statistics. In both cases, we demonstrate that the proposed method
yields significantly more diverse feature sets than classic sparse methods,
without compromising accuracy.Comment: 9 pages, 3 figure
Determinantal Point Processes Implicitly Regularize Semi-parametric Regression Problems
Semi-parametric regression models are used in several applications which
require comprehensibility without sacrificing accuracy. Typical examples are
spline interpolation in geophysics, or non-linear time series problems, where
the system includes a linear and non-linear component. We discuss here the use
of a finite Determinantal Point Process (DPP) for approximating semi-parametric
models. Recently, Barthelm\'e, Tremblay, Usevich, and Amblard introduced a
novel representation of some finite DPPs. These authors formulated extended
L-ensembles that can conveniently represent partial-projection DPPs and suggest
their use for optimal interpolation. With the help of this formalism, we derive
a key identity illustrating the implicit regularization effect of determinantal
sampling for semi-parametric regression and interpolation. Also, a novel
projected Nystr\"om approximation is defined and used to derive a bound on the
expected risk for the corresponding approximation of semi-parametric
regression. This work naturally extends similar results obtained for kernel
ridge regression.Comment: 26 pages. Extended results. Typos correcte
Dissimilarity-based Sparse Subset Selection
Finding an informative subset of a large collection of data points or models
is at the center of many problems in computer vision, recommender systems,
bio/health informatics as well as image and natural language processing. Given
pairwise dissimilarities between the elements of a `source set' and a `target
set,' we consider the problem of finding a subset of the source set, called
representatives or exemplars, that can efficiently describe the target set. We
formulate the problem as a row-sparsity regularized trace minimization problem.
Since the proposed formulation is, in general, NP-hard, we consider a convex
relaxation. The solution of our optimization finds representatives and the
assignment of each element of the target set to each representative, hence,
obtaining a clustering. We analyze the solution of our proposed optimization as
a function of the regularization parameter. We show that when the two sets
jointly partition into multiple groups, our algorithm finds representatives
from all groups and reveals clustering of the sets. In addition, we show that
the proposed framework can effectively deal with outliers. Our algorithm works
with arbitrary dissimilarities, which can be asymmetric or violate the triangle
inequality. To efficiently implement our algorithm, we consider an Alternating
Direction Method of Multipliers (ADMM) framework, which results in quadratic
complexity in the problem size. We show that the ADMM implementation allows to
parallelize the algorithm, hence further reducing the computational time.
Finally, by experiments on real-world datasets, we show that our proposed
algorithm improves the state of the art on the two problems of scene
categorization using representative images and time-series modeling and
segmentation using representative~models
Determinantal Point Processes in Randomized Numerical Linear Algebra
Randomized Numerical Linear Algebra (RandNLA) uses randomness to develop
improved algorithms for matrix problems that arise in scientific computing,
data science, machine learning, etc. Determinantal Point Processes (DPPs), a
seemingly unrelated topic in pure and applied mathematics, is a class of
stochastic point processes with probability distribution characterized by
sub-determinants of a kernel matrix. Recent work has uncovered deep and
fruitful connections between DPPs and RandNLA which lead to new guarantees and
improved algorithms that are of interest to both areas. We provide an overview
of this exciting new line of research, including brief introductions to RandNLA
and DPPs, as well as applications of DPPs to classical linear algebra tasks
such as least squares regression, low-rank approximation and the Nystr\"om
method. For example, random sampling with a DPP leads to new kinds of unbiased
estimators for least squares, enabling more refined statistical and inferential
understanding of these algorithms; a DPP is, in some sense, an optimal
randomized algorithm for the Nystr\"om method; and a RandNLA technique called
leverage score sampling can be derived as the marginal distribution of a DPP.
We also discuss recent algorithmic developments, illustrating that, while not
quite as efficient as standard RandNLA techniques, DPP-based algorithms are
only moderately more expensive
Towards Bursting Filter Bubble via Contextual Risks and Uncertainties
A rising topic in computational journalism is how to enhance the diversity in
news served to subscribers to foster exploration behavior in news reading.
Despite the success of preference learning in personalized news recommendation,
their over-exploitation causes filter bubble that isolates readers from
opposing viewpoints and hurts long-term user experiences with lack of
serendipity. Since news providers can recommend neither opposite nor
diversified opinions if unpopularity of these articles is surely predicted,
they can only bet on the articles whose forecasts of click-through rate involve
high variability (risks) or high estimation errors (uncertainties). We propose
a novel Bayesian model of uncertainty-aware scoring and ranking for news
articles. The Bayesian binary classifier models probability of success (defined
as a news click) as a Beta-distributed random variable conditional on a vector
of the context (user features, article features, and other contextual
features). The posterior of the contextual coefficients can be computed
efficiently using a low-rank version of Laplace's method via thin Singular
Value Decomposition. Efficiencies in personalized targeting of exceptional
articles, which are chosen by each subscriber in test period, are evaluated on
real-world news datasets. The proposed estimator slightly outperformed existing
training and scoring algorithms, in terms of efficiency in identifying
successful outliers.Comment: The filter bubble problem; Uncertainty-aware scoring; Empirical-Bayes
method; Low-rank Laplace's metho
On the Generalization Error Bounds of Neural Networks under Diversity-Inducing Mutual Angular Regularization
Recently diversity-inducing regularization methods for latent variable models
(LVMs), which encourage the components in LVMs to be diverse, have been studied
to address several issues involved in latent variable modeling: (1) how to
capture long-tail patterns underlying data; (2) how to reduce model complexity
without sacrificing expressivity; (3) how to improve the interpretability of
learned patterns. While the effectiveness of diversity-inducing regularizers
such as the mutual angular regularizer has been demonstrated empirically, a
rigorous theoretical analysis of them is still missing. In this paper, we aim
to bridge this gap and analyze how the mutual angular regularizer (MAR) affects
the generalization performance of supervised LVMs. We use neural network (NN)
as a model instance to carry out the study and the analysis shows that
increasing the diversity of hidden units in NN would reduce estimation error
and increase approximation error. In addition to theoretical analysis, we also
present empirical study which demonstrates that the MAR can greatly improve the
performance of NN and the empirical observations are in accordance with the
theoretical analysis
Repulsive Mixture Models of Exponential Family PCA for Clustering
The mixture extension of exponential family principal component analysis
(EPCA) was designed to encode much more structural information about data
distribution than the traditional EPCA does. For example, due to the linearity
of EPCA's essential form, nonlinear cluster structures cannot be easily
handled, but they are explicitly modeled by the mixing extensions. However, the
traditional mixture of local EPCAs has the problem of model redundancy, i.e.,
overlaps among mixing components, which may cause ambiguity for data
clustering. To alleviate this problem, in this paper, a
repulsiveness-encouraging prior is introduced among mixing components and a
diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
Specifically, a determinantal point process (DPP) is exploited as a
diversity-encouraging prior distribution over the joint local EPCAs. As
required, a matrix-valued measure for L-ensemble kernel is designed, within
which, constraints are imposed to facilitate selecting effective PCs
of local EPCAs, and angular based similarity measure are proposed. An efficient
variational EM algorithm is derived to perform parameter learning and hidden
variable inference. Experimental results on both synthetic and real-world
datasets confirm the effectiveness of the proposed method in terms of model
parsimony and generalization ability on unseen test data
Latent Variable Modeling with Diversity-Inducing Mutual Angular Regularization
Latent Variable Models (LVMs) are a large family of machine learning models
providing a principled and effective way to extract underlying patterns,
structure and knowledge from observed data. Due to the dramatic growth of
volume and complexity of data, several new challenges have emerged and cannot
be effectively addressed by existing LVMs: (1) How to capture long-tail
patterns that carry crucial information when the popularity of patterns is
distributed in a power-law fashion? (2) How to reduce model complexity and
computational cost without compromising the modeling power of LVMs? (3) How to
improve the interpretability and reduce the redundancy of discovered patterns?
To addresses the three challenges discussed above, we develop a novel
regularization technique for LVMs, which controls the geometry of the latent
space during learning to enable the learned latent components of LVMs to be
diverse in the sense that they are favored to be mutually different from each
other, to accomplish long-tail coverage, low redundancy, and better
interpretability. We propose a mutual angular regularizer (MAR) to encourage
the components in LVMs to have larger mutual angles. The MAR is non-convex and
non-smooth, entailing great challenges for optimization. To cope with this
issue, we derive a smooth lower bound of the MAR and optimize the lower bound
instead. We show that the monotonicity of the lower bound is closely aligned
with the MAR to qualify the lower bound as a desirable surrogate of the MAR.
Using neural network (NN) as an instance, we analyze how the MAR affects the
generalization performance of NN. On two popular latent variable models ---
restricted Boltzmann machine and distance metric learning, we demonstrate that
MAR can effectively capture long-tail patterns, reduce model complexity without
sacrificing expressivity and improve interpretability
- …