1,532 research outputs found
Bethe Projections for Non-Local Inference
Many inference problems in structured prediction are naturally solved by
augmenting a tractable dependency structure with complex, non-local auxiliary
objectives. This includes the mean field family of variational inference
algorithms, soft- or hard-constrained inference using Lagrangian relaxation or
linear programming, collective graphical models, and forms of semi-supervised
learning such as posterior regularization. We present a method to
discriminatively learn broad families of inference objectives, capturing
powerful non-local statistics of the latent variables, while maintaining
tractable and provably fast inference using non-Euclidean projected gradient
descent with a distance-generating function given by the Bethe entropy. We
demonstrate the performance and flexibility of our method by (1) extracting
structured citations from research papers by learning soft global constraints,
(2) achieving state-of-the-art results on a widely-used handwriting recognition
task using a novel learned non-convex inference procedure, and (3) providing a
fast and highly scalable algorithm for the challenging problem of inference in
a collective graphical model applied to bird migration.Comment: minor bug fix to appendix. appeared in UAI 201
Pairwise Learning via Stagewise Training in Proximal Setting
The pairwise objective paradigms are an important and essential aspect of
machine learning. Examples of machine learning approaches that use pairwise
objective functions include differential network in face recognition, metric
learning, bipartite learning, multiple kernel learning, and maximizing of area
under the curve (AUC). Compared to pointwise learning, pairwise learning's
sample size grows quadratically with the number of samples and thus its
complexity. Researchers mostly address this challenge by utilizing an online
learning system. Recent research has, however, offered adaptive sample size
training for smooth loss functions as a better strategy in terms of convergence
and complexity, but without a comprehensive theoretical study. In a distinct
line of research, importance sampling has sparked a considerable amount of
interest in finite pointwise-sum minimization. This is because of the
stochastic gradient variance, which causes the convergence to be slowed
considerably. In this paper, we combine adaptive sample size and importance
sampling techniques for pairwise learning, with convergence guarantees for
nonsmooth convex pairwise loss functions. In particular, the model is trained
stochastically using an expanded training set for a predefined number of
iterations derived from the stability bounds. In addition, we demonstrate that
sampling opposite instances at each iteration reduces the variance of the
gradient, hence accelerating convergence. Experiments on a broad variety of
datasets in AUC maximization confirm the theoretical results.Comment: 10 Page
A survey of cost-sensitive decision tree induction algorithms
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field
- …