11,172 research outputs found
New multicategory boosting algorithms based on multicategory Fisher-consistent losses
Fisher-consistent loss functions play a fundamental role in the construction
of successful binary margin-based classifiers. In this paper we establish the
Fisher-consistency condition for multicategory classification problems. Our
approach uses the margin vector concept which can be regarded as a
multicategory generalization of the binary margin. We characterize a wide class
of smooth convex loss functions that are Fisher-consistent for multicategory
classification. We then consider using the margin-vector-based loss functions
to derive multicategory boosting algorithms. In particular, we derive two new
multicategory boosting algorithms by using the exponential and logistic
regression losses.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS198 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
On the Consistency of Ordinal Regression Methods
Many of the ordinal regression models that have been proposed in the
literature can be seen as methods that minimize a convex surrogate of the
zero-one, absolute, or squared loss functions. A key property that allows to
study the statistical implications of such approximations is that of Fisher
consistency. Fisher consistency is a desirable property for surrogate loss
functions and implies that in the population setting, i.e., if the probability
distribution that generates the data were available, then optimization of the
surrogate would yield the best possible model. In this paper we will
characterize the Fisher consistency of a rich family of surrogate loss
functions used in the context of ordinal regression, including support vector
ordinal regression, ORBoosting and least absolute deviation. We will see that,
for a family of surrogate loss functions that subsumes support vector ordinal
regression and ORBoosting, consistency can be fully characterized by the
derivative of a real-valued function at zero, as happens for convex
margin-based surrogates in binary classification. We also derive excess risk
bounds for a surrogate of the absolute error that generalize existing risk
bounds for binary classification. Finally, our analysis suggests a novel
surrogate of the squared error loss. We compare this novel surrogate with
competing approaches on 9 different datasets. Our method shows to be highly
competitive in practice, outperforming the least squares loss on 7 out of 9
datasets.Comment: Journal of Machine Learning Research 18 (2017
On the use of the l(2)-norm for texture analysis of polarimetric SAR data
In this paper, the use of the l2-norm, or Span, of the scattering vectors is suggested for texture analysis of polarimetric synthetic aperture radar (SAR) data, with the benefits that we need neither an analysis of the polarimetric channels separately nor a filtering of the data to analyze the statistics. Based on the product model, the distribution of the l2-norm is studied. Closed expressions of the probability density functions under the assumptions of several texture distributions are provided. To utilize the statistical properties of the l2-norm, quantities including normalized moments and log-cumulants are derived, along with corresponding estimators and estimation variances. Results on both simulated and real SAR data show that the use of statistics based on the l2-norm brings advantages in several aspects with respect to the normalized intensity moments and matrix variate log-cumulants.Peer ReviewedPostprint (published version
Tensor Regression with Applications in Neuroimaging Data Analysis
Classical regression methods treat covariates as a vector and estimate a
corresponding vector of regression coefficients. Modern applications in medical
imaging generate covariates of more complex form such as multidimensional
arrays (tensors). Traditional statistical and computational methods are proving
insufficient for analysis of these high-throughput data due to their ultrahigh
dimensionality as well as complex structure. In this article, we propose a new
family of tensor regression models that efficiently exploit the special
structure of tensor covariates. Under this framework, ultrahigh dimensionality
is reduced to a manageable level, resulting in efficient estimation and
prediction. A fast and highly scalable estimation algorithm is proposed for
maximum likelihood estimation and its associated asymptotic properties are
studied. Effectiveness of the new methods is demonstrated on both synthetic and
real MRI imaging data.Comment: 27 pages, 4 figure
Stochastic filtering via L2 projection on mixture manifolds with computer algorithms and numerical examples
We examine some differential geometric approaches to finding approximate
solutions to the continuous time nonlinear filtering problem. Our primary focus
is a new projection method for the optimal filter infinite dimensional
Stochastic Partial Differential Equation (SPDE), based on the direct L2 metric
and on a family of normal mixtures. We compare this method to earlier
projection methods based on the Hellinger distance/Fisher metric and
exponential families, and we compare the L2 mixture projection filter with a
particle method with the same number of parameters, using the Levy metric. We
prove that for a simple choice of the mixture manifold the L2 mixture
projection filter coincides with a Galerkin method, whereas for more general
mixture manifolds the equivalence does not hold and the L2 mixture filter is
more general. We study particular systems that may illustrate the advantages of
this new filter over other algorithms when comparing outputs with the optimal
filter. We finally consider a specific software design that is suited for a
numerically efficient implementation of this filter and provide numerical
examples.Comment: Updated and expanded version published in the Journal reference
below. Preprint updates: January 2016 (v3) added projection of Zakai Equation
and difference with projection of Kushner-Stratonovich (section 4.1). August
2014 (v2) added Galerkin equivalence proof (Section 5) to the March 2013 (v1)
versio
A General Family of Penalties for Combining Differing Types of Penalties in Generalized Structured Models
Penalized estimation has become an established tool for regularization and model selection in regression models.
A variety of penalties with specific features are available
and effective algorithms for specific penalties have been proposed.
But not much is available to fit models that call for a combination of different penalties.
When modeling rent data, which will be considered as an example, various types of predictors call for a combination of a Ridge, a grouped Lasso and a Lasso-type penalty within one model.
Algorithms that can deal with such problems, are in demand.
We propose to approximate penalties that are (semi-)norms of scalar linear transformations of the coefficient vector in generalized structured models.
The penalty is very general such that the Lasso, the fused Lasso, the Ridge, the smoothly clipped absolute deviation penalty (SCAD), the elastic net and many more penalties are embedded.
The approximation allows to combine all these penalties within one model.
The computation is based on conventional penalized iteratively re-weighted least squares (PIRLS) algorithms and hence, easy to implement.
Moreover, new penalties can be incorporated quickly.
The approach is also extended to penalties with vector based arguments; that is, to penalties with norms of linear transformations of the coefficient vector.
Some illustrative examples and the model for the Munich rent data show promising results
- …