2,243 research outputs found

    Classification Trees for Problems with Monotonicity Constraints

    Get PDF
    For classification problems with ordinal attributes very often theclass attribute should increase with each or some of theexplaining attributes. These are called classification problemswith monotonicity constraints. Classical decision tree algorithmssuch as CART or C4.5 generally do not produce monotone trees, evenif the dataset is completely monotone. This paper surveys themethods that have so far been proposed for generating decisiontrees that satisfy monotonicity constraints. A distinction is madebetween methods that work only for monotone datasets and methodsthat work for monotone and non-monotone datasets alike.classification tree;decision tree;monotone;monotonicity constraint;ordinal data

    Using monotonicity constraints for the treatment of ordinal data in regression analysis

    Get PDF
    A regression model is proposed for the analysis of an ordinal response variable depending on a set of multiple covariates containing ordinal and potentially other types of variables. The ordinal predictors are not treated as nominal-scaled variables, and neither transformed into interval-scaled variables. Therefore, the information provided by the order of their categories is neither ignored nor overstated. The proportional odds cumulative logit model (POCLM, see McCullagh (1980)) is used for the ordinal response, and constrained maximum likelihood estimation is used to account for the ordinality of covariates. Ordinal predictors are coded by dummy variables. The parameters associated with the categories of the ordinal predictor(s) are constrained, enforcing them to be monotonic (isotonic or antitonic). A monotonicity direction classification procedure (MDCP) is proposed for classifying the monotonicity direction of the coefficients of the ordinal predictors, also providing information whether observations are compatible with both or no monotonicity direction. The MDCP consists of three steps, which offers two instances of decisions to be made by the researcher. Asymptotic theory of the constrained MLE (CMLE) for the POCLM is discussed. Some results of the asymptotic theory of the unconstrained MLE developed by Fahrmeir and Kaufmann (1985) are made explicit for the POCLM. These results are further adapted to extend the analysis of asymptotic theory to the constrained case. Asymptotic existence and strong consistency of the CMLE for the POCLM are proved. Asymptotic normality is also discussed. Different scenarios are identified in the analysis of confidence regions of the CMLE for the POCLM, which leads to the definition of three alternative confidence regions. Their results are compared through simulations in terms of their coverage probability. Similarly, different scenarios are identified in the analysis of confidence intervals of the CMLE and alternative definitions are provided. However, the fact that monotonicity is a feature of a parameter vector rather than of a singular parameter value becomes a problem for their computation, which is also discussed. Two monotonicity tests for the set of parameters of an ordinal predictor are proposed. One of them is based on a Bonferroni correction of the confidence intervals associated with the parameters of an ordinal predictor, and the other uses the analysis of confidence regions. Six constrained estimation methods are proposed depending on different approaches for making the decision of imposing the monotonicity constraints to the parameters of an ordinal predictor or not. Each one of them uses the steps of the MDCP or one of the two monotonicity tests. The constrained estimation methods are compared to the unconstrained proportional odds cumulative logit model through simulations under several settings. The results of using different scoring systems that transform ordinal variables into interval-scaled variables in regression analysis are compared to the ones obtained when using the proposed constrained regression methods based on simulations. The constrained model is applied to real data explaining a 10-Points Likert scale quality of life self-assessment variable by ordinal and other predictors

    Selection of Ordinally Scaled Independent Variables

    Get PDF
    Ordinal categorial variables are a common case in regression modeling. Although the case of ordinal response variables has been well investigated, less work has been done concerning ordinal predictors. This article deals with the selection of ordinally scaled independent variables in the classical linear model, where the ordinal structure is taken into account by use of a difference penalty on adjacent dummy coefficients. It is shown how the Group Lasso can be used for the selection of ordinal predictors, and an alternative blockwise Boosting procedure is proposed. Emphasis is placed on the application of the presented methods to the (Comprehensive) ICF Core Set for chronic widespread pain. The paper is a preprint of an article accepted for publication in the Journal of the Royal Statistical Society Series C (Applied Statistics). Please use the journal version for citation

    On the Consistency of Ordinal Regression Methods

    Get PDF
    Many of the ordinal regression models that have been proposed in the literature can be seen as methods that minimize a convex surrogate of the zero-one, absolute, or squared loss functions. A key property that allows to study the statistical implications of such approximations is that of Fisher consistency. Fisher consistency is a desirable property for surrogate loss functions and implies that in the population setting, i.e., if the probability distribution that generates the data were available, then optimization of the surrogate would yield the best possible model. In this paper we will characterize the Fisher consistency of a rich family of surrogate loss functions used in the context of ordinal regression, including support vector ordinal regression, ORBoosting and least absolute deviation. We will see that, for a family of surrogate loss functions that subsumes support vector ordinal regression and ORBoosting, consistency can be fully characterized by the derivative of a real-valued function at zero, as happens for convex margin-based surrogates in binary classification. We also derive excess risk bounds for a surrogate of the absolute error that generalize existing risk bounds for binary classification. Finally, our analysis suggests a novel surrogate of the squared error loss. We compare this novel surrogate with competing approaches on 9 different datasets. Our method shows to be highly competitive in practice, outperforming the least squares loss on 7 out of 9 datasets.Comment: Journal of Machine Learning Research 18 (2017

    Sparsity with sign-coherent groups of variables via the cooperative-Lasso

    Full text link
    We consider the problems of estimation and selection of parameters endowed with a known group structure, when the groups are assumed to be sign-coherent, that is, gathering either nonnegative, nonpositive or null parameters. To tackle this problem, we propose the cooperative-Lasso penalty. We derive the optimality conditions defining the cooperative-Lasso estimate for generalized linear models, and propose an efficient active set algorithm suited to high-dimensional problems. We study the asymptotic consistency of the estimator in the linear regression setup and derive its irrepresentable conditions, which are milder than the ones of the group-Lasso regarding the matching of groups with the sparsity pattern of the true parameters. We also address the problem of model selection in linear regression by deriving an approximation of the degrees of freedom of the cooperative-Lasso estimator. Simulations comparing the proposed estimator to the group and sparse group-Lasso comply with our theoretical results, showing consistent improvements in support recovery for sign-coherent groups. We finally propose two examples illustrating the wide applicability of the cooperative-Lasso: first to the processing of ordinal variables, where the penalty acts as a monotonicity prior; second to the processing of genomic data, where the set of differentially expressed probes is enriched by incorporating all the probes of the microarray that are related to the corresponding genes.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS520 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore