Search CORE

196 research outputs found

Orthogonalized smoothing for rescaled spike and slab models

Author: Ishwaran Hemant
Papana Ariadni
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

Rescaled spike and slab models are a new Bayesian variable selection method for linear regression models. In high dimensional orthogonal settings such models have been shown to possess optimal model selection properties. We review background theory and discuss applications of rescaled spike and slab models to prediction problems involving orthogonal polynomials. We first consider global smoothing and discuss potential weaknesses. Some of these deficiencies are remedied by using local regression. The local regression approach relies on an intimate connection between local weighted regression and weighted generalized ridge regression. An important implication is that one can trace the effective degrees of freedom of a curve as a way to visualize and classify curvature. Several motivating examples are presented.Comment: Published in at http://dx.doi.org/10.1214/074921708000000192 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

University of Miami: Scholarship Miami

Spike and slab variable selection: Frequentist and Bayesian strategies

Author: Ishwaran Hemant
Rao J. Sunil
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw connections to frequentist generalized ridge regression estimation. Specifically, we study the usefulness of continuous bimodal priors to model hypervariance parameters, and the effect scaling has on the posterior mean through its relationship to penalization. Several model selection strategies, some frequentist and some Bayesian in nature, are developed and studied theoretically. We demonstrate the importance of selective shrinkage for effective variable selection in terms of risk misclassification, and show this is achieved using the posterior from a rescaled spike and slab model. We also show how to verify a procedure's ability to reduce model uncertainty in finite samples using a specialized forward selection strategy. Using this tool, we illustrate the effectiveness of rescaled spike and slab models in reducing model uncertainty.Comment: Published at http://dx.doi.org/10.1214/009053604000001147 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

University of Miami: Scholarship Miami

Variable importance in binary regression trees and forests

Author: Ishwaran Hemant
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 15/11/2007
Field of study

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally extends from single trees to ensembles of trees and applies to methods like random forests. This is useful because while importance values from random forests are used to screen variables, for example they are used to filter high throughput genomic data in Bioinformatics, very little theory exists about their properties.Comment: Published in at http://dx.doi.org/10.1214/07-EJS039 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

University of Miami: Scholarship Miami

Characterizing $L_2$ Boosting

Author: Ehrlinger John
Ishwaran Hemant
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/04/2012
Field of study

We consider

L_2

Boosting, a special case of Friedman's generic boosting algorithm applied to linear regression under

L_2

-loss. We study

L_2

Boosting for an arbitrary regularization parameter and derive an exact closed form expression for the number of steps taken along a fixed coordinate direction. This relationship is used to describe

L_2

Boosting's solution path, to describe new tools for studying its path, and to characterize some of the algorithm's unique properties, including active set cycling, a property where the algorithm spends lengthy periods of time cycling between the same coordinates when the regularization parameter is arbitrarily small. Our fixed descent analysis also reveals a repressible condition that limits the effectiveness of

L_2

Boosting in correlated problems by preventing desirable variables from entering the solution path. As a simple remedy, a data augmentation method similar to that used for the elastic net is used to introduce

L_2

-penalization and is shown, in combination with decorrelation, to reverse the repressible condition and circumvents

L_2

Boosting's deficiencies in correlated problems. In itself, this presents a new explanation for why the elastic net is successful in correlated problems and why methods like LAR and lasso can perform poorly in such settings.Comment: Published in at http://dx.doi.org/10.1214/12-AOS997 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

University of Miami: Scholarship Miami

Recommended from our members

Tree Variable Selection for Paired Case-Control Studies with Application to Microbiome Data

Author: Hemant Ishwaran
Min Lu
Publication venue: 'Modern Language Association'
Publication date: 01/01/2021
Field of study

When case-control studies involve paired samples, tree analyses based on traditional splitting rules are suboptimal as they ignore the paired nature of the data. Paired samples occur in microbiome studies when they are collected from different locations of the same individual or when they are collected from paired individuals with familial ties. Borrowing concepts from tree splitting, we propose a novel approach that accommodates the paired structure in the data for fast and effective nonparametric variable ranking. Importantly this method allows detangling of different types of associations at play with structured correlated outcomes such as host genotype and enviromental exposure effects. Another technique for variable selection are variable importance measures. We describe two types of measures useful for paired data analysis. The methodology is illustrated on the microbiota of paired samples from a case-control study of obesity

University of Miami: Scholarship Miami

Humanities Commons

BAMarray™: Java software for Bayesian analysis of variance for microarray data

Author: Ishwaran Hemant
Kogalur Udaya B
Rao J Sunil
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: DNA microarrays open up a new horizon for studying the genetic determinants of disease. The high throughput nature of these arrays creates an enormous wealth of information, but also poses a challenge to data analysis. Inferential problems become even more pronounced as experimental designs used to collect data become more complex. An important example is multigroup data collected over different experimental groups, such as data collected from distinct stages of a disease process. We have developed a method specifically addressing these issues termed Bayesian ANOVA for microarrays (BAM). The BAM approach uses a special inferential regularization known as spike-and-slab shrinkage that provides an optimal balance between total false detections and total false non-detections. This translates into more reproducible differential calls. Spike and slab shrinkage is a form of regularization achieved by using information across all genes and groups simultaneously. RESULTS: BAMarray™ is a graphically oriented Java-based software package that implements the BAM method for detecting differentially expressing genes in multigroup microarray experiments (up to 256 experimental groups can be analyzed). Drop-down menus allow the user to easily select between different models and to choose various run options. BAMarray™ can also be operated in a fully automated mode with preselected run options. Tuning parameters have been preset at theoretically optimal values freeing the user from such specifications. BAMarray™ provides estimates for gene differential effects and automatically estimates data adaptive, optimal cutoff values for classifying genes into biological patterns of differential activity across experimental groups. A graphical suite is a core feature of the product and includes diagnostic plots for assessing model assumptions and interactive plots that enable tracking of prespecified gene lists to study such things as biological pathway perturbations. The user can zoom in and lasso genes of interest that can then be saved for downstream analyses. CONCLUSION: BAMarray™ is user friendly platform independent software that effectively and efficiently implements the BAM methodology. Classifying patterns of differential activity is greatly facilitated by a data adaptive cutoff rule and a graphical suite. BAMarray™ is licensed software freely available to academic institutions. More information can be found at

Springer - Publisher Connector

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

University of Miami: Scholarship Miami

Random survival forests

Author: Blackstone Eugene H.
Ishwaran Hemant
Kogalur Udaya B.
Lauer Michael S.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

We introduce random survival forests, a random forests method for the analysis of right-censored survival data. New survival splitting rules for growing survival trees are introduced, as is a new missing data algorithm for imputing missing data. A conservation-of-events principle for survival forests is introduced and used to define ensemble mortality, a simple interpretable measure of mortality that can be used as a predicted outcome. Several illustrative examples are given, including a case study of the prognostic implications of body mass for individuals with coronary artery disease. Computations for all examples were implemented using the freely available R-software package, randomSurvivalForest.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS169 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Miami: Scholarship Miami

Satisfaction with web-based training in an integrated healthcare delivery network: do age, education, computer skills and attitudes matter?

Author: Andrew J Fishleder
Anil K Jain
AP Choules
Ashish Atreja
BP Kerfoot
C Urquhart
CM Harris
CP Friedman
DA Cook
DA Cook
DL Kirkpatrick
E Knebel
EA Nelson
G Singh
GA Debourgh
GS Letterie
Hemant Ishwaran
HS Chumley-Jones
J Davis
J Morrissey
JA Pereira
JC Anderson
JG Ruiz
JP Naidr
L Atack
L Breiman
L Howatson-Jones
LO Gostin
M Avital
M Hollander
MG Moore
Michel Avital
MJ Lewis
N Mehta
Neil B Mehta
PA Cohen
R Blair
R Ihaka
R Phipps
RA Kanten-McCoy
SG Lesh
TL Russell
TM Bishop
VR Curran
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Healthcare institutions spend enormous time and effort to train their workforce. Web-based training can potentially streamline this process. However the deployment of web-based training in a large-scale setting with a diverse healthcare workforce has not been evaluated. The aim of this study was to evaluate the satisfaction of healthcare professionals with web-based training and to determine the predictors of such satisfaction including age, education status and computer proficiency. Methods Observational, cross-sectional survey of healthcare professionals from six hospital systems in an integrated delivery network. We measured overall satisfaction to web-based training and response to survey items measuring Website Usability, Course Usefulness, Instructional Design Effectiveness, Computer Proficiency and Self-learning Attitude. Results A total of 17,891 healthcare professionals completed the web-based training on HIPAA Privacy Rule; and of these, 13,537 completed the survey (response rate 75.6%). Overall course satisfaction was good (median, 4; scale, 1 to 5) with more than 75% of the respondents satisfied with the training (rating 4 or 5) and 65% preferring web-based training over traditional instructor-led training (rating 4 or 5). Multivariable ordinal regression revealed 3 key predictors of satisfaction with web-based training: Instructional Design Effectiveness, Website Usability and Course Usefulness. Demographic predictors such as gender, age and education did not have an effect on satisfaction. Conclusion The study shows that web-based training when tailored to learners' background, is perceived as a satisfactory mode of learning by an interdisciplinary group of healthcare professionals, irrespective of age, education level or prior computer experience. Future studies should aim to measure the long-term outcomes of web-based training.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Miami: Scholarship Miami

UvA-DARE

International Migration, Integration and Social Cohesion online publications