28,987 research outputs found

    Penalized Regression with Ordinal Predictors

    Get PDF
    Ordered categorial predictors are a common case in regression modeling. In contrast to the case of ordinal response variables, ordinal predictors have been largely neglected in the literature. In this article penalized regression techniques are proposed. Based on dummy coding two types of penalization are explicitly developed; the first imposes a difference penalty, the second is a ridge type refitting procedure. A Bayesian motivation as well as alternative ways of derivation are provided. Simulation studies and real world data serve for illustration and to compare the approach to methods often seen in practice, namely linear regression on the group labels and pure dummy coding. The proposed regression techniques turn out to be highly competitive. On the basis of GLMs the concept is generalized to the case of non-normal outcomes by performing penalized likelihood estimation. The paper is a preprint of an article published in the International Statistical Review. Please use the journal version for citation

    Semiparametric GEE analysis in partially linear single-index models for longitudinal data

    Get PDF
    In this article, we study a partially linear single-index model for longitudinal data under a general framework which includes both the sparse and dense longitudinal data cases. A semiparametric estimation method based on a combination of the local linear smoothing and generalized estimation equations (GEE) is introduced to estimate the two parameter vectors as well as the unknown link function. Under some mild conditions, we derive the asymptotic properties of the proposed parametric and nonparametric estimators in different scenarios, from which we find that the convergence rates and asymptotic variances of the proposed estimators for sparse longitudinal data would be substantially different from those for dense longitudinal data. We also discuss the estimation of the covariance (or weight) matrices involved in the semiparametric GEE method. Furthermore, we provide some numerical studies including Monte Carlo simulation and an empirical application to illustrate our methodology and theory.Comment: Published at http://dx.doi.org/10.1214/15-AOS1320 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The noise of cluster mass reconstructions from a source redshift distribution

    Get PDF
    The parameter-free reconstruction of the surface-mass density of clusters of galaxies is one of the principal applications of weak gravitational lensing. From the observable ellipticities of images of background galaxies, the tidal gravitational field (shear) of the mass distribution is estimated, and the corresponding surface mass density is constructed. The noise of the resulting mass map is investigated here, generalizing previous work which included mainly the noise due to the intrinsic galaxy ellipticities. Whereas this dominates the noise budget if the lens is very weak, other sources of noise become important, or even dominant, for the medium-strong lensing regime close to the center of clusters. In particular, shot noise due to a Poisson distribution of galaxy images, and increased shot noise owing to the correlation of galaxies in angular position and redshift, can yield significantly larger levels of noise than that from the intrinsic ellipticities only. We estimate the contributions from these various effects for two widely used smoothing operations, showing that one of them effectively removes the Poisson and the correlation noises related to angular positions of galaxies. Noise sources due to the spread in redshift of galaxies are still present in the optimized estimator and are shown to be relevant in many cases. We show how (even approximate) redshift information can be profitably used to reduce the noise in the mass map. The dependence of the various noise terms on the relevant parameters (lens redshift, strength, smoothing length, redshift distribution of background galaxies) are explicitly calculated and simple estimates are provided.Comment: 18 pages, A&A in pres

    Recursive Monte Carlo filters: Algorithms and theoretical analysis

    Full text link
    Recursive Monte Carlo filters, also called particle filters, are a powerful tool to perform computations in general state space models. We discuss and compare the accept--reject version with the more common sampling importance resampling version of the algorithm. In particular, we show how auxiliary variable methods and stratification can be used in the accept--reject version, and we compare different resampling techniques. In a second part, we show laws of large numbers and a central limit theorem for these Monte Carlo filters by simple induction arguments that need only weak conditions. We also show that, under stronger conditions, the required sample size is independent of the length of the observed series.Comment: Published at http://dx.doi.org/10.1214/009053605000000426 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian Synthesis: Combining subjective analyses, with an application to ozone data

    Full text link
    Bayesian model averaging enables one to combine the disparate predictions of a number of models in a coherent fashion, leading to superior predictive performance. The improvement in performance arises from averaging models that make different predictions. In this work, we tap into perhaps the biggest driver of different predictions---different analysts---in order to gain the full benefits of model averaging. In a standard implementation of our method, several data analysts work independently on portions of a data set, eliciting separate models which are eventually updated and combined through a specific weighting method. We call this modeling procedure Bayesian Synthesis. The methodology helps to alleviate concerns about the sizable gap between the foundational underpinnings of the Bayesian paradigm and the practice of Bayesian statistics. In experimental work we show that human modeling has predictive performance superior to that of many automatic modeling techniques, including AIC, BIC, Smoothing Splines, CART, Bagged CART, Bayes CART, BMA and LARS, and only slightly inferior to that of BART. We also show that Bayesian Synthesis further improves predictive performance. Additionally, we examine the predictive performance of a simple average across analysts, which we dub Convex Synthesis, and find that it also produces an improvement.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS444 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Normal-Mixture-of-Inverse-Gamma Priors for Bayesian Regularization and Model Selection in Structured Additive Regression Models

    Get PDF
    In regression models with many potential predictors, choosing an appropriate subset of covariates and their interactions at the same time as determining whether linear or more flexible functional forms are required is a challenging and important task. We propose a spike-and-slab prior structure in order to include or exclude single coefficients as well as blocks of coefficients associated with factor variables, random effects or basis expansions of smooth functions. Structured additive models with this prior structure are estimated with Markov Chain Monte Carlo using a redundant multiplicative parameter expansion. We discuss shrinkage properties of the novel prior induced by the redundant parameterization, investigate its sensitivity to hyperparameter settings and compare performance of the proposed method in terms of model selection, sparsity recovery, and estimation error for Gaussian, binomial and Poisson responses on real and simulated data sets with that of component-wise boosting and other approaches

    Ratings and rankings: Voodoo or Science?

    Full text link
    Composite indicators aggregate a set of variables using weights which are understood to reflect the variables' importance in the index. In this paper we propose to measure the importance of a given variable within existing composite indicators via Karl Pearson's `correlation ratio'; we call this measure `main effect'. Because socio-economic variables are heteroskedastic and correlated, (relative) nominal weights are hardly ever found to match (relative) main effects; we propose to summarize their discrepancy with a divergence measure. We further discuss to what extent the mapping from nominal weights to main effects can be inverted. This analysis is applied to five composite indicators, including the Human Development Index and two popular league tables of university performance. It is found that in many cases the declared importance of single indicators and their main effect are very different, and that the data correlation structure often prevents developers from obtaining the stated importance, even when modifying the nominal weights in the set of nonnegative numbers with unit sum.Comment: 28 pages, 7 figure
    corecore