2,506 research outputs found

    On the Consistency of Ordinal Regression Methods

    Get PDF
    Many of the ordinal regression models that have been proposed in the literature can be seen as methods that minimize a convex surrogate of the zero-one, absolute, or squared loss functions. A key property that allows to study the statistical implications of such approximations is that of Fisher consistency. Fisher consistency is a desirable property for surrogate loss functions and implies that in the population setting, i.e., if the probability distribution that generates the data were available, then optimization of the surrogate would yield the best possible model. In this paper we will characterize the Fisher consistency of a rich family of surrogate loss functions used in the context of ordinal regression, including support vector ordinal regression, ORBoosting and least absolute deviation. We will see that, for a family of surrogate loss functions that subsumes support vector ordinal regression and ORBoosting, consistency can be fully characterized by the derivative of a real-valued function at zero, as happens for convex margin-based surrogates in binary classification. We also derive excess risk bounds for a surrogate of the absolute error that generalize existing risk bounds for binary classification. Finally, our analysis suggests a novel surrogate of the squared error loss. We compare this novel surrogate with competing approaches on 9 different datasets. Our method shows to be highly competitive in practice, outperforming the least squares loss on 7 out of 9 datasets.Comment: Journal of Machine Learning Research 18 (2017

    A General Framework for Multivariate Analysis with Optimal Scaling: The R Package aspect

    Get PDF
    In a series of papers De Leeuw developed a general framework for multivariate analysis with optimal scaling. The basic idea of optimal scaling is to transform the observed variables (categories) in terms of quantifications. In the approach presented here the multivariate data are collected into a multivariable. An aspect of a multivariable is a function that is used to measure how well the multivariable satisfies some criterion. Basically we can think of two different families of aspects which unify many well-known multivariate methods: Correlational aspects based on sums of correlations, eigenvalues and determinants which unify multiple regression, path analysis, correspondence analysis, nonlinear PCA, etc. Non-correlational aspects which linearize bivariate regressions and can be used for SEM preprocessing with categorical data. Additionally, other aspects can be established that do not correspond to classical techniques at all. By means of the R package aspect we provide a unified majorization-based implementation of this methodology. Using various data examples we will show the flexibility of this approach and how the optimally scaled results can be represented using graphical tools provided by the package.

    Ordinal Hyperplane Loss

    Get PDF
    The problem of ordinal classification occurs in a large and growing number of areas. Some of the most common source and applications of ordinal data include rating scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity, facial age estimation, etc. The problem of predicting ordinal classes is typically addressed by either performing n-1 binary classification for n ordinal classes or treating ordinal classes as continuous values for regression. However, the first strategy doesn’t fully utilize the ordering information of classes and the second strategy imposes a strong continuous assumption to ordinal classes. In this paper, we propose a novel loss function called Ordinal Hyperplane Loss (OHPL) that is particularly designed for data with ordinal classes. The proposal of OHPL is a significant advancement in predicting ordinal class data, since it enables deep learning techniques to be applied to the ordinal classification problem on both structured and unstructured data. By minimizing OHPL, a deep neural network learns to map data to an optimal space where the distance between points and their class centroids are minimized while a nontrivial ordinal relationship among classes are maintained. Experimental results show that deep neural network with OHPL not only outperforms the state-of-the-art alternatives on classification accuracy but also scales well to large ordinal classification problems
    • …
    corecore