19,222 research outputs found
Structured prediction models via the matrix-tree theorem
This paper provides an algorithmic framework for learning statistical models involving directed spanning trees, or equivalently non-projective dependency structures. We show how partition functions and marginals for directed spanning trees can be computed by an adaptation of Kirchhoff’s Matrix-Tree Theorem. To demonstrate an application of the method, we perform experiments which use the algorithm in training both log-linear and max-margin dependency parsers. The new training methods give improvements in accuracy over perceptron-trained models.Peer ReviewedPostprint (author’s final draft
Kernel methods in machine learning
We review machine learning methods employing positive definite kernels. These
methods formulate learning and estimation problems in a reproducing kernel
Hilbert space (RKHS) of functions defined on the data domain, expanded in terms
of a kernel. Working in linear spaces of function has the benefit of
facilitating the construction and analysis of learning algorithms while at the
same time allowing large classes of functions. The latter include nonlinear
functions as well as functions defined on nonvectorial data. We cover a wide
range of methods, ranging from binary classifiers to sophisticated methods for
estimation with structured data.Comment: Published in at http://dx.doi.org/10.1214/009053607000000677 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization
Creating impact in real-world settings requires artificial intelligence
techniques to span the full pipeline from data, to predictive models, to
decisions. These components are typically approached separately: a machine
learning model is first trained via a measure of predictive accuracy, and then
its predictions are used as input into an optimization algorithm which produces
a decision. However, the loss function used to train the model may easily be
misaligned with the end goal, which is to make the best decisions possible.
Hand-tuning the loss function to align with optimization is a difficult and
error-prone process (which is often skipped entirely).
We focus on combinatorial optimization problems and introduce a general
framework for decision-focused learning, where the machine learning model is
directly trained in conjunction with the optimization algorithm to produce
high-quality decisions. Technically, our contribution is a means of integrating
common classes of discrete optimization problems into deep learning or other
predictive models, which are typically trained via gradient descent. The main
idea is to use a continuous relaxation of the discrete problem to propagate
gradients through the optimization procedure. We instantiate this framework for
two broad classes of combinatorial problems: linear programs and submodular
maximization. Experimental results across a variety of domains show that
decision-focused learning often leads to improved optimization performance
compared to traditional methods. We find that standard measures of accuracy are
not a reliable proxy for a predictive model's utility in optimization, and our
method's ability to specify the true goal as the model's training objective
yields substantial dividends across a range of decision problems.Comment: Full version of paper accepted at AAAI 201
On the Bayes-optimality of F-measure maximizers
The F-measure, which has originally been introduced in information retrieval,
is nowadays routinely used as a performance metric for problems such as binary
classification, multi-label classification, and structured output prediction.
Optimizing this measure is a statistically and computationally challenging
problem, since no closed-form solution exists. Adopting a decision-theoretic
perspective, this article provides a formal and experimental analysis of
different approaches for maximizing the F-measure. We start with a Bayes-risk
analysis of related loss functions, such as Hamming loss and subset zero-one
loss, showing that optimizing such losses as a surrogate of the F-measure leads
to a high worst-case regret. Subsequently, we perform a similar type of
analysis for F-measure maximizing algorithms, showing that such algorithms are
approximate, while relying on additional assumptions regarding the statistical
distribution of the binary response variables. Furthermore, we present a new
algorithm which is not only computationally efficient but also Bayes-optimal,
regardless of the underlying distribution. To this end, the algorithm requires
only a quadratic (with respect to the number of binary responses) number of
parameters of the joint distribution. We illustrate the practical performance
of all analyzed methods by means of experiments with multi-label classification
problems
A Framework for Unbiased Model Selection Based on Boosting
Variable selection and model choice are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection.
We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure.
We show that variable selection may be biased if the covariates are of different nature.
Important examples are models combining continuous and categorical covariates, especially if the number of categories is large. In this case, least squares base-learners offer increased flexibility for the categorical covariate and lead to a preference even if the categorical covariate is non-informative.
Similar difficulties arise when comparing linear and nonlinear base-learners for a continuous covariate. The additional flexibility in the nonlinear base-learner again yields a preference of the more complex modeling alternative.
We investigate these problems from a theoretical perspective and suggest a framework for unbiased model selection based on a general class of penalized least squares base-learners.
Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed in naive boosting specifications. The importance of unbiased model selection is demonstrated in simulations and an application to forest health models
- …