13,620 research outputs found

    A constrained regression model for an ordinal response with ordinal predictors

    Get PDF
    A regression model is proposed for the analysis of an ordinal response variable depending on a set of multiple covariates containing ordinal and potentially other variables. The proportional odds model (McCullagh in J R Stat Soc Ser B (Methodol) 109–142, 1980) is used for the ordinal response, and constrained maximum likelihood estimation is used to account for the ordinality of covariates. Ordinal predictors are coded by dummy variables. The parameters associated with the categories of the ordinal predictor(s) are constrained, enforcing them to be monotonic (isotonic or antitonic). A decision rule is introduced for classifying the ordinal predictors’ monotonicity directions, also providing information whether observations are compatible with both or no monotonicity direction. In addition, a monotonicity test for the parameters of any ordinal predictor is proposed. The monotonicity constrained model is proposed together with five estimation methods and compared to the unconstrained one based on simulations. The model is applied to real data explaining a 10-points Likert scale quality of life self-assessment variable by ordinal and other predictors

    Using monotonicity constraints for the treatment of ordinal data in regression analysis

    Get PDF
    A regression model is proposed for the analysis of an ordinal response variable depending on a set of multiple covariates containing ordinal and potentially other types of variables. The ordinal predictors are not treated as nominal-scaled variables, and neither transformed into interval-scaled variables. Therefore, the information provided by the order of their categories is neither ignored nor overstated. The proportional odds cumulative logit model (POCLM, see McCullagh (1980)) is used for the ordinal response, and constrained maximum likelihood estimation is used to account for the ordinality of covariates. Ordinal predictors are coded by dummy variables. The parameters associated with the categories of the ordinal predictor(s) are constrained, enforcing them to be monotonic (isotonic or antitonic). A monotonicity direction classification procedure (MDCP) is proposed for classifying the monotonicity direction of the coefficients of the ordinal predictors, also providing information whether observations are compatible with both or no monotonicity direction. The MDCP consists of three steps, which offers two instances of decisions to be made by the researcher. Asymptotic theory of the constrained MLE (CMLE) for the POCLM is discussed. Some results of the asymptotic theory of the unconstrained MLE developed by Fahrmeir and Kaufmann (1985) are made explicit for the POCLM. These results are further adapted to extend the analysis of asymptotic theory to the constrained case. Asymptotic existence and strong consistency of the CMLE for the POCLM are proved. Asymptotic normality is also discussed. Different scenarios are identified in the analysis of confidence regions of the CMLE for the POCLM, which leads to the definition of three alternative confidence regions. Their results are compared through simulations in terms of their coverage probability. Similarly, different scenarios are identified in the analysis of confidence intervals of the CMLE and alternative definitions are provided. However, the fact that monotonicity is a feature of a parameter vector rather than of a singular parameter value becomes a problem for their computation, which is also discussed. Two monotonicity tests for the set of parameters of an ordinal predictor are proposed. One of them is based on a Bonferroni correction of the confidence intervals associated with the parameters of an ordinal predictor, and the other uses the analysis of confidence regions. Six constrained estimation methods are proposed depending on different approaches for making the decision of imposing the monotonicity constraints to the parameters of an ordinal predictor or not. Each one of them uses the steps of the MDCP or one of the two monotonicity tests. The constrained estimation methods are compared to the unconstrained proportional odds cumulative logit model through simulations under several settings. The results of using different scoring systems that transform ordinal variables into interval-scaled variables in regression analysis are compared to the ones obtained when using the proposed constrained regression methods based on simulations. The constrained model is applied to real data explaining a 10-Points Likert scale quality of life self-assessment variable by ordinal and other predictors

    Regularized Ordinal Regression and the ordinalNet R Package

    Full text link
    Regularization techniques such as the lasso (Tibshirani 1996) and elastic net (Zou and Hastie 2005) can be used to improve regression model coefficient estimation and prediction accuracy, as well as to perform variable selection. Ordinal regression models are widely used in applications where the use of regularization could be beneficial; however, these models are not included in many popular software packages for regularized regression. We propose a coordinate descent algorithm to fit a broad class of ordinal regression models with an elastic net penalty. Furthermore, we demonstrate that each model in this class generalizes to a more flexible form, for instance to accommodate unordered categorical data. We introduce an elastic net penalty class that applies to both model forms. Additionally, this penalty can be used to shrink a non-ordinal model toward its ordinal counterpart. Finally, we introduce the R package ordinalNet, which implements the algorithm for this model class

    A new specification of generalized linear models for categorical data

    Full text link
    Regression models for categorical data are specified in heterogeneous ways. We propose to unify the specification of such models. This allows us to define the family of reference models for nominal data. We introduce the notion of reversible models for ordinal data that distinguishes adjacent and cumulative models from sequential ones. The combination of the proposed specification with the definition of reference and reversible models and various invariance properties leads to a new view of regression models for categorical data.Comment: 31 pages, 13 figure

    Geoadditive Regression Modeling of Stream Biological Condition

    Get PDF
    Indices of biotic integrity (IBI) have become an established tool to quantify the condition of small non-tidal streams and their watersheds. To investigate the effects of watershed characteristics on stream biological condition, we present a new technique for regressing IBIs on watershed-specific explanatory variables. Since IBIs are typically evaluated on anordinal scale, our method is based on the proportional odds model for ordinal outcomes. To avoid overfitting, we do not use classical maximum likelihood estimation but a component-wise functional gradient boosting approach. Because component-wise gradient boosting has an intrinsic mechanism for variable selection and model choice, determinants of biotic integrity can be identified. In addition, the method offers a relatively simple way to account for spatial correlation in ecological data. An analysis of the Maryland Biological Streams Survey shows that nonlinear effects of predictor variables on stream condition can be quantified while, in addition, accurate predictions of biological condition at unsurveyed locations are obtained

    Penalized Regression with Ordinal Predictors

    Get PDF
    Ordered categorial predictors are a common case in regression modeling. In contrast to the case of ordinal response variables, ordinal predictors have been largely neglected in the literature. In this article penalized regression techniques are proposed. Based on dummy coding two types of penalization are explicitly developed; the first imposes a difference penalty, the second is a ridge type refitting procedure. A Bayesian motivation as well as alternative ways of derivation are provided. Simulation studies and real world data serve for illustration and to compare the approach to methods often seen in practice, namely linear regression on the group labels and pure dummy coding. The proposed regression techniques turn out to be highly competitive. On the basis of GLMs the concept is generalized to the case of non-normal outcomes by performing penalized likelihood estimation. The paper is a preprint of an article published in the International Statistical Review. Please use the journal version for citation

    Sparse modeling of categorial explanatory variables

    Get PDF
    Shrinking methods in regression analysis are usually designed for metric predictors. In this article, however, shrinkage methods for categorial predictors are proposed. As an application we consider data from the Munich rent standard, where, for example, urban districts are treated as a categorial predictor. If independent variables are categorial, some modifications to usual shrinking procedures are necessary. Two L1L_1-penalty based methods for factor selection and clustering of categories are presented and investigated. The first approach is designed for nominal scale levels, the second one for ordinal predictors. Besides applying them to the Munich rent standard, methods are illustrated and compared in simulation studies.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS355 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    cgam: An R Package for the Constrained Generalized Additive Model

    Get PDF
    The cgam package contains routines to fit the generalized additive model where the components may be modeled with shape and smoothness assumptions. The main routine is cgam and nineteen symbolic routines are provided to indicate the relationship between the response and each predictor, which satisfies constraints such as monotonicity, convexity, their combinations, tree, and umbrella orderings. The user may specify constrained splines to fit the components for continuous predictors, and various types of orderings for the ordinal predictors. In addition, the user may specify parametrically modeled covariates. The set over which the likelihood is maximized is a polyhedral convex cone, and a least-squares solution is obtained by projecting the data vector onto the cone. For generalized models, the fit is obtained through iteratively re-weighted cone projections. The cone information criterion is provided and may be used to compare fits for combinations of variables and shapes. In addition, the routine wps implements monotone regression in two dimensions using warped-plane splines, without an additivity assumption. The graphical routine plotpersp will plot an estimated mean surface for a selected pair of predictors, given an object of either cgam or wps. This package is now available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=cgam
    • 

    corecore