Search CORE

44 research outputs found

Component selection and smoothing in multivariate nonparametric regression

Author: Lin Yi
Zhang Hao Helen
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 22/02/2007
Field of study

We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The ``COSSO'' is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditional smoothing spline method. The COSSO provides a unified framework for several recent proposals for model selection in linear models and smoothing spline ANOVA models. Theoretical properties, such as the existence and the rate of convergence of the COSSO estimator, are studied. In the special case of a tensor product design with periodic functions, a detailed analysis reveals that the COSSO does model selection by applying a novel soft thresholding type operation to the function components. We give an equivalent formulation of the COSSO estimator which leads naturally to an iterative algorithm. We compare the COSSO with MARS, a popular method that builds functional ANOVA models, in simulations and real examples. The COSSO method can be extended to classification problems and we compare its performance with those of a number of machine learning algorithms on real datasets. The COSSO gives very competitive performance in these studies.Comment: Published at http://dx.doi.org/10.1214/009053606000000722 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Prioritizing individual genetic variants after kernel machine testing using variable selection: He et al.

Author: Almli Lynn M.
Binder Elisabeth B.
Cai Tianxi
Conneely Karen N.
Engel Stephanie M.
Harmon Quaker E.
He Qianchuan
Lin Xihong
Liu Yang
Ressler Kerry J.
Wu Michael C.
Zhao Ni
Publication venue
Publication date: 01/01/2016
Field of study

Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and do not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity By State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach

Carolina Digital Repository

Sparse Additive Models

Author: Lafferty John
Liu Han
Ravikumar Pradeep
Wasserman Larry
Publication venue
Publication date: 08/04/2008
Field of study

We present a new class of methods for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive an algorithm for fitting the models that is practical and effective even when the number of covariates is larger than the sample size. SpAM is closely related to the COSSO model of Lin and Zhang (2006), but decouples smoothing and sparsity, enabling the use of arbitrary nonparametric smoothers. An analysis of the theoretical properties of SpAM is given. We also study a greedy estimator that is a nonparametric version of forward stepwise regression. Empirical results on synthetic and real data are presented, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data

arXiv.org e-Print Archive

Some Problems in Model Specification and Inference for Generalized Additive Models

Author: Marra Giampiero
Publication venue
Publication date: 01/01/2010
Field of study

Regression models describingthe dependence between a univariate response and a set of covariates play a fundamental role in statistics. In the last two decades, a tremendous effort has been made in developing flexible regression techniques such as generalized additive models(GAMs) with the aim of modelling the expected value of a response variable as a sum of smooth unspecified functions of predictors. Many nonparametric regression methodologies exist includinglocal-weighted regressionand smoothing splines. Here the focus is on penalized regression spline methods which can be viewed as a generalization of smoothing splines with a more flexible choice of bases and penalties. This thesis addresses three issues. First, the problem of model misspecification is treated by extending the instrumental variable approach to the GAM context. Second, we study the theoretical and empirical properties of the confidence intervals for the smooth component functions of a GAM. Third, we consider the problem of variable selection within this flexible class of models. All results are supported by theoretical arguments and extensive simulation experiments which shed light on the practical performance of the methods discussed in this thesis.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OPUS

OpenGrey Repository

Generalized Sobol sensitivity indices for dependent variables: numerical methods

Author: Chastaing Gaelle
Gamboa Fabrice
Prieur Clémentine
Publication venue: 'Informa UK Limited'
Publication date: 21/05/2014
Field of study

International audienceThe hierarchically orthogonal functional decomposition of any measurable function f of a random vector X=(X_1,...,X_p) consists in decomposing f(X) into a sum of increasing dimension functions depending only on a subvector of X. Even when X_1,..., X_p are assumed to be dependent, this decomposition is unique if components are hierarchically orthogonal. That is, two of the components are orthogonal whenever all the variables involved in one of the summands are a subset of the variables involved in the other. Setting Y=f(X), this decomposition leads to the definition of generalized sensitivity indices able to quantify the uncertainty of Y with respect to the dependent inputs X. In this paper, a numerical method is developed to identify the component functions of the decomposition using the hierarchical orthogonality property. Furthermore, the asymptotic properties of the components estimation is studied, as well as the numerical estimation of the generalized sensitivity indices of a toy model. Lastly, the method is applied to a model arising from a real-world problem

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-INSA Toulouse