Search CORE

263 research outputs found

Bayesian nonparametric tests via sliced inverse modeling

Author: Jiang Bo
Liu Jun S.
Ye Chao
Publication venue
Publication date: 01/05/2015
Field of study

We study the problem of independence and conditional independence tests between categorical covariates and a continuous response variable, which has an immediate application in genetics. Instead of estimating the conditional distribution of the response given values of covariates, we model the conditional distribution of covariates given the discretized response (aka "slices"). By assigning a prior probability to each possible discretization scheme, we can compute efficiently a Bayes factor (BF)-statistic for the independence (or conditional independence) test using a dynamic programming algorithm. Asymptotic and finite-sample properties such as power and null distribution of the BF statistic are studied, and a stepwise variable selection method based on the BF statistic is further developed. We compare the BF statistic with some existing classical methods and demonstrate its statistical power through extensive simulation studies. We apply the proposed method to a mouse genetics data set aiming to detect quantitative trait loci (QTLs) and obtain promising results.Comment: 32 pages, 7 figure

arXiv.org e-Print Archive

CiteSeerX

The Bright and Dark Side of Cooperation for Regional Innovation Performance

Author: Andreas Meder
Tom Broekel
Publication venue
Publication date
Field of study

Studies analyzing the importance of intra- and inter-regional cooperation for regional innovation performance are mainly of qualitative nature and focus strongly on the positive effects that high levels of cooperation can yield. For the case of the German labor market regions and the Electrics & Electronics industry the paper provides a quantitative-empirical analysis taking into account the possibility of negative effects related to regional lock-in, lock-out, and cooperation overload situations. Using conditional nonparametric frontier techniques and cooperation behavior measures we find positive as well as substantial negative effects of cooperation with the latter being induced by excessive and unbalanced cooperation behavior.regional innovation performance, cooperation, lock-out, lock-in, cooperation overload

Research Papers in Economics

Adaptive Basis Sampling for Smoothing Splines

Author: Zhang Nan
Publication venue
Publication date: 29/10/2015
Field of study

Smoothing splines provide flexible nonparametric regression estimators. Penalized likelihood method is adopted when responses are from exponential families and multivariate models are constructed with certain analysis of variance decomposition. However, the high computational cost of smoothing splines for large data sets has hindered their wide application. We develop a new method, named adaptive basis sampling, for efficient computation of smoothing splines in super-large samples. Generally, a smoothing spline for a regression problem with sample size n can be expressed as a linear combination of n basis functions and its computational complexity is O(n³). We achieve a more scalable computation in the multivariate case by evaluating the smoothing spline using a smaller set of basis functions, obtained by an adaptive sampling scheme that uses values of the response variable. Our asymptotic analysis shows that smoothing splines computed via adaptive basis sampling converge to the true function at the same rate as full basis smoothing splines. We show that the proposed method outperforms a sampling method that does not use the values of response variable by simulation studies, and apply it to several real data examples

Texas A&M Repository

New developments of dimension reduction

Author: Huo Lei
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2018
Field of study

Variable selection becomes more crucial than before, since high dimensional data are frequently seen in many research areas. Many model-based variable selection methods have been developed. However, the performance might be poor when the model is mis-specified. Sufficient dimension reduction (SDR, Li 1991; Cook 1998) provides a general framework for model-free variable selection methods. In this thesis, we first propose a novel model-free variable selection method to deal with multi-population data by incorporating the grouping information. Theoretical properties of our proposed method are also presented. Simulation studies show that our new method significantly improves the selection performance compared with those ignoring the grouping information. In the second part of this dissertation, we apply partial SDR method to conduct conditional model-free variable (feature) screening for ultra-high dimensional data, when researchers have prior information regarding the importance of certain predictors based on experience or previous investigations. Comparing to the state of art conditional screening method, conditional sure independence screening (CSIS; Barut, Fan and Verhasselt, 2016), our method greatly outperforms CSIS for nonlinear models. The sure screening consistency property of our proposed method is also established --Abstract, page iv

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Recommended from our members

Partition Models for Variable Selection and Interaction Detection

Author: Jiang Bo
Publication venue: 'Harvard University Botany Libraries'
Publication date: 27/09/2013
Field of study

Variable selection methods play important roles in modeling high-dimensional data and are key to data-driven scientific discoveries. In this thesis, we consider the problem of variable selection with interaction detection. Instead of building a predictive model of the response given combinations of predictors, we start by modeling the conditional distribution of predictors given partitions based on responses. We use this inverse modeling perspective as motivation to propose a stepwise procedure for effectively detecting interaction with few assumptions on parametric form. The proposed procedure is able to detect pairwise interactions among p predictors with a computational time of

O(p)

instead of

O(p^2)

under moderate conditions. We establish consistency of the proposed procedure in variable selection under a diverging number of predictors and sample size. We demonstrate its excellent empirical performance in comparison with some existing methods through simulation studies as well as real data examples. Next, we combine the forward and inverse modeling perspectives under the Bayesian framework to detect pleiotropic and epistatic effects in effects in expression quantitative loci (eQTLs) studies. We augment the Bayesian partition model proposed by Zhang et al. (2010) to capture complex dependence structure among gene expression and genetic markers. In particular, we propose a sequential partition prior to model the asymmetric roles played by the response and the predictors, and we develop an efficient dynamic programming algorithm for sampling latent individual partitions. The augmented partition model significantly improves the power in detecting eQTLs compared to previous methods in both simulations and real data examples pertaining to yeast. Finally, we study the application of Bayesian partition models in the unsupervised learning of transcription factor (TF) families based on protein binding microarray (PBM). The problem of TF subclass identification can be viewed as the clustering of TFs with variable selection on their binding DNA sequences. Our model provides simultaneous identification of TF families and their shared sequence preferences, as well as DNA sequences bound preferentially by individual members of TF families. Our analysis may aid in deciphering cis regulatory codes and determinants of protein-DNA binding specificity.Statistic

Harvard University - DASH

A new sliced inverse regression method for multivariate response

Author: Coudret Raphaël
Girard Stéphane
Saracco Jerome
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

International audienceA semiparametric regression model of a q-dimensional multivariate response y on a p-dimensional covariate x is considered. A new approach is proposed based on sliced inverse regression (SIR) for estimating the effective dimension reduction (EDR) space without requiring a prespecified parametric model. The convergence at rate square root of n of the estimated EDR space is shown. The choice of the dimension of the EDR space is discussed. Moreover, a way to cluster components of y related to the same EDR space is provided. Thus, the proposed multivariate SIR method can be used properly on each cluster instead of blindly applying it on all components of y. The numerical performances of multivariate SIR are illustrated on a simulation study. Applications to a remote sensing dataset and to the Minneapolis elementary schools data are also provided. Although the proposed methodology relies on SIR, it opens the door for new regression approaches with a multivariate response

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-INSU

HAL Descartes

Oskar Bordeaux