618 research outputs found
Bayesian Regularization for Graphical Models with Unequal Shrinkage
We consider a Bayesian framework for estimating a high-dimensional sparse
precision matrix, in which adaptive shrinkage and sparsity are induced by a
mixture of Laplace priors. Besides discussing our formulation from the Bayesian
standpoint, we investigate the MAP (maximum a posteriori) estimator from a
penalized likelihood perspective that gives rise to a new non-convex penalty
approximating the penalty. Optimal error rates for estimation
consistency in terms of various matrix norms along with selection consistency
for sparse structure recovery are shown for the unique MAP estimator under mild
conditions. For fast and efficient computation, an EM algorithm is proposed to
compute the MAP estimator of the precision matrix and (approximate) posterior
probabilities on the edges of the underlying sparse structure. Through
extensive simulation studies and a real application to a call center data, we
have demonstrated the fine performance of our method compared with existing
alternatives.Comment: To appear in Journal of the American Statistical Association (Theory
& Methods
A Penalty Approach to Differential Item Functioning in Rasch Models
A new diagnostic tool for the identification of differential item functioning (DIF) is proposed. Classical approaches to DIF allow to consider only few subpopulations like ethnic groups when investigating if the solution of items depends on the membership to a subpopulation. We propose an explicit model for differential item functioning that includes a set of variables, containing metric as well as categorical components, as potential candidates for inducing DIF. The ability to include a set of covariates entails that the model contains a large number of parameters. Regularized estimators, in particular penalized maximum likelihood estimators, are used
to solve the estimation problem and to identify the items that induce DIF. It is shown that the method is able to detect items with DIF. Simulations and two applications demonstrate the applicability of the method
Simultaneous Variable and Covariance Selection with the Multivariate Spike-and-Slab Lasso
We propose a Bayesian procedure for simultaneous variable and covariance
selection using continuous spike-and-slab priors in multivariate linear
regression models where q possibly correlated responses are regressed onto p
predictors. Rather than relying on a stochastic search through the
high-dimensional model space, we develop an ECM algorithm similar to the EMVS
procedure of Rockova & George (2014) targeting modal estimates of the matrix of
regression coefficients and residual precision matrix. Varying the scale of the
continuous spike densities facilitates dynamic posterior exploration and allows
us to filter out negligible regression coefficients and partial covariances
gradually. Our method is seen to substantially outperform regularization
competitors on simulated data. We demonstrate our method with a re-examination
of data from a recent observational study of the effect of playing high school
football on several later-life cognition, psychological, and socio-economic
outcomes
BAMarrayā¢: Java software for Bayesian analysis of variance for microarray data
BACKGROUND: DNA microarrays open up a new horizon for studying the genetic determinants of disease. The high throughput nature of these arrays creates an enormous wealth of information, but also poses a challenge to data analysis. Inferential problems become even more pronounced as experimental designs used to collect data become more complex. An important example is multigroup data collected over different experimental groups, such as data collected from distinct stages of a disease process. We have developed a method specifically addressing these issues termed Bayesian ANOVA for microarrays (BAM). The BAM approach uses a special inferential regularization known as spike-and-slab shrinkage that provides an optimal balance between total false detections and total false non-detections. This translates into more reproducible differential calls. Spike and slab shrinkage is a form of regularization achieved by using information across all genes and groups simultaneously. RESULTS: BAMarrayā¢ is a graphically oriented Java-based software package that implements the BAM method for detecting differentially expressing genes in multigroup microarray experiments (up to 256 experimental groups can be analyzed). Drop-down menus allow the user to easily select between different models and to choose various run options. BAMarrayā¢ can also be operated in a fully automated mode with preselected run options. Tuning parameters have been preset at theoretically optimal values freeing the user from such specifications. BAMarrayā¢ provides estimates for gene differential effects and automatically estimates data adaptive, optimal cutoff values for classifying genes into biological patterns of differential activity across experimental groups. A graphical suite is a core feature of the product and includes diagnostic plots for assessing model assumptions and interactive plots that enable tracking of prespecified gene lists to study such things as biological pathway perturbations. The user can zoom in and lasso genes of interest that can then be saved for downstream analyses. CONCLUSION: BAMarrayā¢ is user friendly platform independent software that effectively and efficiently implements the BAM methodology. Classifying patterns of differential activity is greatly facilitated by a data adaptive cutoff rule and a graphical suite. BAMarrayā¢ is licensed software freely available to academic institutions. More information can be found at
Lecture notes on ridge regression
The linear regression model cannot be fitted to high-dimensional data, as the
high-dimensionality brings about empirical non-identifiability. Penalized
regression overcomes this non-identifiability by augmentation of the loss
function by a penalty (i.e. a function of regression coefficients). The ridge
penalty is the sum of squared regression coefficients, giving rise to ridge
regression. Here many aspect of ridge regression are reviewed e.g. moments,
mean squared error, its equivalence to constrained estimation, and its relation
to Bayesian regression. Finally, its behaviour and use are illustrated in
simulation and on omics data. Subsequently, ridge regression is generalized to
allow for a more general penalty. The ridge penalization framework is then
translated to logistic regression and its properties are shown to carry over.
To contrast ridge penalized estimation, the final chapter introduces its lasso
counterpart
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
- ā¦