28,987 research outputs found
Penalized Regression with Ordinal Predictors
Ordered categorial predictors are a common case in regression modeling. In contrast to the case of ordinal response variables, ordinal predictors have been largely neglected in the literature. In this article penalized regression techniques are proposed. Based on dummy coding two types of penalization are explicitly developed; the first imposes a difference penalty, the second is a ridge type refitting procedure. A Bayesian motivation as well as alternative ways of derivation are provided. Simulation studies and real world data serve for illustration and to
compare the approach to methods often seen in practice, namely linear regression on the group labels and pure dummy coding. The proposed regression techniques turn out to be highly competitive. On the basis of GLMs the concept is generalized to the case of non-normal outcomes by performing penalized likelihood estimation. The paper is a preprint of an article published in the International Statistical Review. Please use the journal version for citation
Semiparametric GEE analysis in partially linear single-index models for longitudinal data
In this article, we study a partially linear single-index model for
longitudinal data under a general framework which includes both the sparse and
dense longitudinal data cases. A semiparametric estimation method based on a
combination of the local linear smoothing and generalized estimation equations
(GEE) is introduced to estimate the two parameter vectors as well as the
unknown link function. Under some mild conditions, we derive the asymptotic
properties of the proposed parametric and nonparametric estimators in different
scenarios, from which we find that the convergence rates and asymptotic
variances of the proposed estimators for sparse longitudinal data would be
substantially different from those for dense longitudinal data. We also discuss
the estimation of the covariance (or weight) matrices involved in the
semiparametric GEE method. Furthermore, we provide some numerical studies
including Monte Carlo simulation and an empirical application to illustrate our
methodology and theory.Comment: Published at http://dx.doi.org/10.1214/15-AOS1320 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The noise of cluster mass reconstructions from a source redshift distribution
The parameter-free reconstruction of the surface-mass density of clusters of
galaxies is one of the principal applications of weak gravitational lensing.
From the observable ellipticities of images of background galaxies, the tidal
gravitational field (shear) of the mass distribution is estimated, and the
corresponding surface mass density is constructed. The noise of the resulting
mass map is investigated here, generalizing previous work which included mainly
the noise due to the intrinsic galaxy ellipticities. Whereas this dominates the
noise budget if the lens is very weak, other sources of noise become important,
or even dominant, for the medium-strong lensing regime close to the center of
clusters. In particular, shot noise due to a Poisson distribution of galaxy
images, and increased shot noise owing to the correlation of galaxies in
angular position and redshift, can yield significantly larger levels of noise
than that from the intrinsic ellipticities only. We estimate the contributions
from these various effects for two widely used smoothing operations, showing
that one of them effectively removes the Poisson and the correlation noises
related to angular positions of galaxies. Noise sources due to the spread in
redshift of galaxies are still present in the optimized estimator and are shown
to be relevant in many cases. We show how (even approximate) redshift
information can be profitably used to reduce the noise in the mass map. The
dependence of the various noise terms on the relevant parameters (lens
redshift, strength, smoothing length, redshift distribution of background
galaxies) are explicitly calculated and simple estimates are provided.Comment: 18 pages, A&A in pres
Recursive Monte Carlo filters: Algorithms and theoretical analysis
Recursive Monte Carlo filters, also called particle filters, are a powerful
tool to perform computations in general state space models. We discuss and
compare the accept--reject version with the more common sampling importance
resampling version of the algorithm. In particular, we show how auxiliary
variable methods and stratification can be used in the accept--reject version,
and we compare different resampling techniques. In a second part, we show laws
of large numbers and a central limit theorem for these Monte Carlo filters by
simple induction arguments that need only weak conditions. We also show that,
under stronger conditions, the required sample size is independent of the
length of the observed series.Comment: Published at http://dx.doi.org/10.1214/009053605000000426 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian Synthesis: Combining subjective analyses, with an application to ozone data
Bayesian model averaging enables one to combine the disparate predictions of
a number of models in a coherent fashion, leading to superior predictive
performance. The improvement in performance arises from averaging models that
make different predictions. In this work, we tap into perhaps the biggest
driver of different predictions---different analysts---in order to gain the
full benefits of model averaging. In a standard implementation of our method,
several data analysts work independently on portions of a data set, eliciting
separate models which are eventually updated and combined through a specific
weighting method. We call this modeling procedure Bayesian Synthesis. The
methodology helps to alleviate concerns about the sizable gap between the
foundational underpinnings of the Bayesian paradigm and the practice of
Bayesian statistics. In experimental work we show that human modeling has
predictive performance superior to that of many automatic modeling techniques,
including AIC, BIC, Smoothing Splines, CART, Bagged CART, Bayes CART, BMA and
LARS, and only slightly inferior to that of BART. We also show that Bayesian
Synthesis further improves predictive performance. Additionally, we examine the
predictive performance of a simple average across analysts, which we dub Convex
Synthesis, and find that it also produces an improvement.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS444 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Normal-Mixture-of-Inverse-Gamma Priors for Bayesian Regularization and Model Selection in Structured Additive Regression Models
In regression models with many potential predictors, choosing an appropriate subset of covariates and their interactions at the same time as determining whether linear or more flexible functional forms are required is a challenging and important task. We propose a spike-and-slab prior structure in order to include or exclude single coefficients as well as blocks of coefficients associated
with factor variables, random effects or basis expansions
of smooth functions. Structured additive models with this prior structure are estimated with Markov Chain Monte Carlo using a redundant multiplicative parameter expansion. We discuss shrinkage properties of the novel prior induced by the redundant parameterization, investigate its sensitivity to hyperparameter settings and compare performance of the proposed method in terms of model selection, sparsity recovery, and estimation error for Gaussian, binomial and Poisson responses on real and simulated data sets with that of component-wise boosting and other approaches
Ratings and rankings: Voodoo or Science?
Composite indicators aggregate a set of variables using weights which are
understood to reflect the variables' importance in the index. In this paper we
propose to measure the importance of a given variable within existing composite
indicators via Karl Pearson's `correlation ratio'; we call this measure `main
effect'. Because socio-economic variables are heteroskedastic and correlated,
(relative) nominal weights are hardly ever found to match (relative) main
effects; we propose to summarize their discrepancy with a divergence measure.
We further discuss to what extent the mapping from nominal weights to main
effects can be inverted. This analysis is applied to five composite indicators,
including the Human Development Index and two popular league tables of
university performance. It is found that in many cases the declared importance
of single indicators and their main effect are very different, and that the
data correlation structure often prevents developers from obtaining the stated
importance, even when modifying the nominal weights in the set of nonnegative
numbers with unit sum.Comment: 28 pages, 7 figure
- …