Search CORE

3,077 research outputs found

GSplit LBI: Taming the Procedural Bias in Neuroimaging for Disease Prediction

Author: H Zou
J Ashburner
J Ashburner
Jailin Peng
L Grosenick
LI Rudin
LR Dice
R Tibshirani
RJ Tibshirani
Stanley Osher
Z Dai
Publication venue
Publication date: 11/06/2017
Field of study

In voxel-based neuroimage analysis, lesion features have been the main focus in disease prediction due to their interpretability with respect to the related diseases. However, we observe that there exists another type of features introduced during the preprocessing steps and we call them "\textbf{Procedural Bias}". Besides, such bias can be leveraged to improve classification accuracy. Nevertheless, most existing models suffer from either under-fit without considering procedural bias or poor interpretability without differentiating such bias from lesion ones. In this paper, a novel dual-task algorithm namely \emph{GSplit LBI} is proposed to resolve this problem. By introducing an augmented variable enforced to be structural sparsity with a variable splitting term, the estimators for prediction and selecting lesion features can be optimized separately and mutually monitored by each other following an iterative scheme. Empirical experiments have been evaluated on the Alzheimer's Disease Neuroimaging Initiative\thinspace(ADNI) database. The advantage of proposed model is verified by improved stability of selected lesion features and better classification results.Comment: Conditional Accepted by Miccai,201

arXiv.org e-Print Archive

Crossref

Detecting multivariate interactions in spatial point patterns with Gibbs models and variable selection

Author: Diggle P. J.
Geyer C.
R Core Team
Tibshirani R.
van der Geer S.
Yue Y.
Publication venue
Publication date: 24/10/2017
Field of study

We propose a method for detecting significant interactions in very large multivariate spatial point patterns. This methodology develops high dimensional data understanding in the point process setting. The method is based on modelling the patterns using a flexible Gibbs point process model to directly characterise point-to-point interactions at different spatial scales. By using the Gibbs framework significant interactions can also be captured at small scales. Subsequently, the Gibbs point process is fitted using a pseudo-likelihood approximation, and we select significant interactions automatically using the group lasso penalty with this likelihood approximation. Thus we estimate the multivariate interactions stably even in this setting. We demonstrate the feasibility of the method with a simulation study and show its power by applying it to a large and complex rainforest plant population data set of 83 species

arXiv.org e-Print Archive

Crossref

UCL Discovery

Inverse Ising inference using all the data

Author: Erik Aurell
G. Darmois
J. Bento
K. Koh
M. A. Carreira-Perpinan
Magnus Ekeberg
R. Tibshirani
Publication venue: 'American Physical Society (APS)'
Publication date: 12/11/2012
Field of study

We show that a method based on logistic regression, using all the data, solves the inverse Ising problem far better than mean-field calculations relying only on sample pairwise correlation functions, while still computationally feasible for hundreds of nodes. The largest improvement in reconstruction occurs for strong interactions. Using two examples, a diluted Sherrington-Kirkpatrick model and a two-dimensional lattice, we also show that interaction topologies can be recovered from few samples with good accuracy and that the use of

l_1

-regularization is beneficial in this process, pushing inference abilities further into low-temperature regimes.Comment: 5 pages, 2 figures. Accepted versio

arXiv.org e-Print Archive

Crossref

P-values for high-dimensional regression

Author: Benjamini Y.
Holm S.
Huang J.
Lukas Meier
Nicolai Meinshausen
Peter Bühlmann
Tibshirani R.
Zhao P.
Publication venue
Publication date: 01/01/2008
Field of study

Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. It involves, however, a one-time random split of the data. Results are sensitive to this arbitrary choice: it amounts to a `p-value lottery' and makes it difficult to reproduce results. Here, we show that inference across multiple random splits can be aggregated, while keeping asymptotic control over the inclusion of noise variables. We show that the resulting p-values can be used for control of both family-wise error (FWER) and false discovery rate (FDR). In addition, the proposed aggregation is shown to improve power while reducing the number of falsely selected variables substantially.Comment: 25 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Oxford University Research Archive

Research Papers in Economics

Regularized regression method for genome-wide association studies

Author: CH Zhang
J Chen
J Huang
J Liu
Jian Huang
Jin Liu
Kai Wang
P Breheny
R Tibshirani
R Tibshirani
Shuangge Ma
TT Wu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

We use a novel penalized approach for genome-wide association study that accounts for the linkage disequilibrium between adjacent markers. This method uses a penalty on the difference of the genetic effect at adjacent single-nucleotide polymorphisms and combines it with the minimax concave penalty, which has been shown to be superior to the least absolute shrinkage and selection operator (LASSO) in terms of estimator bias and selection consistency. Our method is implemented using a coordinate descent algorithm. The value of the tuning parameters is determined by extended Bayesian information criteria. The leave-one-out method is used to compute p-values of selected single-nucleotide polymorphisms. Its applicability to a simulated data from Genetic Analysis Workshop 17 replication one is illustrated. Our method selects three SNPs (C13S522, C13S523, and C13S524), whereas the LASSO method selects two SNPs (C13S522 and C13S523)

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

High-Dimensional Inference with the generalized Hopfield Model: Principal Component Analysis and Corrections

Author: A. Engel
D. J. Amit
I. M. Johnstone
I. T. Jolliffe
R. Monasson
R. Tibshirani
S. Cocco
T. Hastie
V. Sessak
Z. Bai
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2011
Field of study

We consider the problem of inferring the interactions between a set of N binary variables from the knowledge of their frequencies and pairwise correlations. The inference framework is based on the Hopfield model, a special case of the Ising model where the interaction matrix is defined through a set of patterns in the variable space, and is of rank much smaller than N. We show that Maximum Lik elihood inference is deeply related to Principal Component Analysis when the amp litude of the pattern components, xi, is negligible compared to N^1/2. Using techniques from statistical mechanics, we calculate the corrections to the patterns to the first order in xi/N^1/2. We stress that it is important to generalize the Hopfield model and include both attractive and repulsive patterns, to correctly infer networks with sparse and strong interactions. We present a simple geometrical criterion to decide how many attractive and repulsive patterns should be considered as a function of the sampling noise. We moreover discuss how many sampled configurations are required for a good inference, as a function of the system size, N and of the amplitude, xi. The inference approach is illustrated on synthetic and biological data.Comment: Physical Review E: Statistical, Nonlinear, and Soft Matter Physics (2011) to appea

arXiv.org e-Print Archive

Crossref

Hal-Diderot

Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling

Author: AE Hoerl
D Buckingham
J Bongard
J Dong
J Dozier
J Martinec
J Rees
JR Koza
K Krawiec
M Hollander
M Schmidt
M Tedesco
MD Schmidt
R Tibshirani
TH Painter
WR Tobler
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/12/2017
Field of study

Satellite imagery and remote sensing provide explanatory variables at relatively high resolutions for modeling geospatial phenomena, yet regional summaries are often desirable for analysis and actionable insight. In this paper, we propose a novel method of inducing spatial aggregations as a component of the machine learning process, yielding regional model features whose construction is driven by model prediction performance rather than prior assumptions. Our results demonstrate that Genetic Programming is particularly well suited to this type of feature construction because it can automatically synthesize appropriate aggregations, as well as better incorporate them into predictive models compared to other regression methods we tested. In our experiments we consider a specific problem instance and real-world dataset relevant to predicting snow properties in high-mountain Asia

arXiv.org e-Print Archive

Crossref

Uncovering hidden geographies and socio-economic influences on fuel poverty using household fuel spend data: a meso-scale study in Scotland

Author: Boardman B
Copas JB
Helsel DR
Hurvich CM
Judd CM
Keith J Baker
Lloyd B
McDonald JH
Norbäck D
Perez-Padilla R
Ronald Mould
Tibshirani R
Wang K
Publication venue: 'SAGE Publications'
Publication date: 01/08/2017
Field of study

Crossref

ResearchOnline@GCU