11,172 research outputs found

    New multicategory boosting algorithms based on multicategory Fisher-consistent losses

    Get PDF
    Fisher-consistent loss functions play a fundamental role in the construction of successful binary margin-based classifiers. In this paper we establish the Fisher-consistency condition for multicategory classification problems. Our approach uses the margin vector concept which can be regarded as a multicategory generalization of the binary margin. We characterize a wide class of smooth convex loss functions that are Fisher-consistent for multicategory classification. We then consider using the margin-vector-based loss functions to derive multicategory boosting algorithms. In particular, we derive two new multicategory boosting algorithms by using the exponential and logistic regression losses.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS198 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On the Consistency of Ordinal Regression Methods

    Get PDF
    Many of the ordinal regression models that have been proposed in the literature can be seen as methods that minimize a convex surrogate of the zero-one, absolute, or squared loss functions. A key property that allows to study the statistical implications of such approximations is that of Fisher consistency. Fisher consistency is a desirable property for surrogate loss functions and implies that in the population setting, i.e., if the probability distribution that generates the data were available, then optimization of the surrogate would yield the best possible model. In this paper we will characterize the Fisher consistency of a rich family of surrogate loss functions used in the context of ordinal regression, including support vector ordinal regression, ORBoosting and least absolute deviation. We will see that, for a family of surrogate loss functions that subsumes support vector ordinal regression and ORBoosting, consistency can be fully characterized by the derivative of a real-valued function at zero, as happens for convex margin-based surrogates in binary classification. We also derive excess risk bounds for a surrogate of the absolute error that generalize existing risk bounds for binary classification. Finally, our analysis suggests a novel surrogate of the squared error loss. We compare this novel surrogate with competing approaches on 9 different datasets. Our method shows to be highly competitive in practice, outperforming the least squares loss on 7 out of 9 datasets.Comment: Journal of Machine Learning Research 18 (2017

    On the use of the l(2)-norm for texture analysis of polarimetric SAR data

    Get PDF
    In this paper, the use of the l2-norm, or Span, of the scattering vectors is suggested for texture analysis of polarimetric synthetic aperture radar (SAR) data, with the benefits that we need neither an analysis of the polarimetric channels separately nor a filtering of the data to analyze the statistics. Based on the product model, the distribution of the l2-norm is studied. Closed expressions of the probability density functions under the assumptions of several texture distributions are provided. To utilize the statistical properties of the l2-norm, quantities including normalized moments and log-cumulants are derived, along with corresponding estimators and estimation variances. Results on both simulated and real SAR data show that the use of statistics based on the l2-norm brings advantages in several aspects with respect to the normalized intensity moments and matrix variate log-cumulants.Peer ReviewedPostprint (published version

    Tensor Regression with Applications in Neuroimaging Data Analysis

    Get PDF
    Classical regression methods treat covariates as a vector and estimate a corresponding vector of regression coefficients. Modern applications in medical imaging generate covariates of more complex form such as multidimensional arrays (tensors). Traditional statistical and computational methods are proving insufficient for analysis of these high-throughput data due to their ultrahigh dimensionality as well as complex structure. In this article, we propose a new family of tensor regression models that efficiently exploit the special structure of tensor covariates. Under this framework, ultrahigh dimensionality is reduced to a manageable level, resulting in efficient estimation and prediction. A fast and highly scalable estimation algorithm is proposed for maximum likelihood estimation and its associated asymptotic properties are studied. Effectiveness of the new methods is demonstrated on both synthetic and real MRI imaging data.Comment: 27 pages, 4 figure

    Stochastic filtering via L2 projection on mixture manifolds with computer algorithms and numerical examples

    Get PDF
    We examine some differential geometric approaches to finding approximate solutions to the continuous time nonlinear filtering problem. Our primary focus is a new projection method for the optimal filter infinite dimensional Stochastic Partial Differential Equation (SPDE), based on the direct L2 metric and on a family of normal mixtures. We compare this method to earlier projection methods based on the Hellinger distance/Fisher metric and exponential families, and we compare the L2 mixture projection filter with a particle method with the same number of parameters, using the Levy metric. We prove that for a simple choice of the mixture manifold the L2 mixture projection filter coincides with a Galerkin method, whereas for more general mixture manifolds the equivalence does not hold and the L2 mixture filter is more general. We study particular systems that may illustrate the advantages of this new filter over other algorithms when comparing outputs with the optimal filter. We finally consider a specific software design that is suited for a numerically efficient implementation of this filter and provide numerical examples.Comment: Updated and expanded version published in the Journal reference below. Preprint updates: January 2016 (v3) added projection of Zakai Equation and difference with projection of Kushner-Stratonovich (section 4.1). August 2014 (v2) added Galerkin equivalence proof (Section 5) to the March 2013 (v1) versio

    A General Family of Penalties for Combining Differing Types of Penalties in Generalized Structured Models

    Get PDF
    Penalized estimation has become an established tool for regularization and model selection in regression models. A variety of penalties with specific features are available and effective algorithms for specific penalties have been proposed. But not much is available to fit models that call for a combination of different penalties. When modeling rent data, which will be considered as an example, various types of predictors call for a combination of a Ridge, a grouped Lasso and a Lasso-type penalty within one model. Algorithms that can deal with such problems, are in demand. We propose to approximate penalties that are (semi-)norms of scalar linear transformations of the coefficient vector in generalized structured models. The penalty is very general such that the Lasso, the fused Lasso, the Ridge, the smoothly clipped absolute deviation penalty (SCAD), the elastic net and many more penalties are embedded. The approximation allows to combine all these penalties within one model. The computation is based on conventional penalized iteratively re-weighted least squares (PIRLS) algorithms and hence, easy to implement. Moreover, new penalties can be incorporated quickly. The approach is also extended to penalties with vector based arguments; that is, to penalties with norms of linear transformations of the coefficient vector. Some illustrative examples and the model for the Munich rent data show promising results
    corecore