1,851 research outputs found
Bayesian Approximate Kernel Regression with Variable Selection
Nonlinear kernel regression models are often used in statistics and machine
learning because they are more accurate than linear models. Variable selection
for kernel regression models is a challenge partly because, unlike the linear
regression setting, there is no clear concept of an effect size for regression
coefficients. In this paper, we propose a novel framework that provides an
effect size analog of each explanatory variable for Bayesian kernel regression
models when the kernel is shift-invariant --- for example, the Gaussian kernel.
We use function analytic properties of shift-invariant reproducing kernel
Hilbert spaces (RKHS) to define a linear vector space that: (i) captures
nonlinear structure, and (ii) can be projected onto the original explanatory
variables. The projection onto the original explanatory variables serves as an
analog of effect sizes. The specific function analytic property we use is that
shift-invariant kernel functions can be approximated via random Fourier bases.
Based on the random Fourier expansion we propose a computationally efficient
class of Bayesian approximate kernel regression (BAKR) models for both
nonlinear regression and binary classification for which one can compute an
analog of effect sizes. We illustrate the utility of BAKR by examining two
important problems in statistical genetics: genomic selection (i.e. phenotypic
prediction) and association mapping (i.e. inference of significant variants or
loci). State-of-the-art methods for genomic selection and association mapping
are based on kernel regression and linear models, respectively. BAKR is the
first method that is competitive in both settings.Comment: 22 pages, 3 figures, 3 tables; theory added; new simulations
presented; references adde
Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks
We present a procedure for effective estimation of entropy and mutual
information from small-sample data, and apply it to the problem of inferring
high-dimensional gene association networks. Specifically, we develop a
James-Stein-type shrinkage estimator, resulting in a procedure that is highly
efficient statistically as well as computationally. Despite its simplicity, we
show that it outperforms eight other entropy estimation procedures across a
diverse range of sampling scenarios and data-generating models, even in cases
of severe undersampling. We illustrate the approach by analyzing E. coli gene
expression data and computing an entropy-based gene-association network from
gene expression data. A computer program is available that implements the
proposed shrinkage estimator.Comment: 18 pages, 3 figures, 1 tabl
Nonparametric Bayes dynamic modeling of relational data
Symmetric binary matrices representing relations among entities are commonly
collected in many areas. Our focus is on dynamically evolving binary relational
matrices, with interest being in inference on the relationship structure and
prediction. We propose a nonparametric Bayesian dynamic model, which reduces
dimensionality in characterizing the binary matrix through a lower-dimensional
latent space representation, with the latent coordinates evolving in continuous
time via Gaussian processes. By using a logistic mapping function from the
probability matrix space to the latent relational space, we obtain a flexible
and computational tractable formulation. Employing P\`olya-Gamma data
augmentation, an efficient Gibbs sampler is developed for posterior
computation, with the dimension of the latent space automatically inferred. We
provide some theoretical results on flexibility of the model, and illustrate
performance via simulation experiments. We also consider an application to
co-movements in world financial markets
Implicit Copulas from Bayesian Regularized Regression Smoothers
We show how to extract the implicit copula of a response vector from a
Bayesian regularized regression smoother with Gaussian disturbances. The copula
can be used to compare smoothers that employ different shrinkage priors and
function bases. We illustrate with three popular choices of shrinkage priors
--- a pairwise prior, the horseshoe prior and a g prior augmented with a point
mass as employed for Bayesian variable selection --- and both univariate and
multivariate function bases. The implicit copulas are high-dimensional, have
flexible dependence structures that are far from that of a Gaussian copula, and
are unavailable in closed form. However, we show how they can be evaluated by
first constructing a Gaussian copula conditional on the regularization
parameters, and then integrating over these. Combined with non-parametric
margins the regularized smoothers can be used to model the distribution of
non-Gaussian univariate responses conditional on the covariates. Efficient
Markov chain Monte Carlo schemes for evaluating the copula are given for this
case. Using both simulated and real data, we show how such copula smoothing
models can improve the quality of resulting function estimates and predictive
distributions
Scale effects on genomic modelling and prediction
In dieser Arbeit wird eine neue Methode für den skalenunabhängigen Vergleich von LD-Strukturen in unterschiedlichen genomischen Regionen vorgeschlagen. Verschiedene Aspekte durch Skalen verursachter Probleme – von der Präzision der Schätzung der Marke-reffekte bis zur Genauigkeit der Vorhersage für neue Individuen - wurden untersucht. Darüber hinaus, basierend auf den Leistungsvergleichen von unterschiedlichen statistischen Methoden, wurden Empfehlungen für die Verwendungen der untersuchten Methoden gege-ben.In dieser Arbeit wird eine neue Methode für den skalenunabhängigen Vergleich von LD-Strukturen in unterschiedlichen genomischen Regionen vorgeschlagen. Verschiedene Aspekte durch Skalen verursachter Probleme – von der Präzision der Schätzung der Marke-reffekte bis zur Genauigkeit der Vorhersage für neue Individuen - wurden untersucht. Darüber hinaus, basierend auf den Leistungsvergleichen von unterschiedlichen statistischen Methoden, wurden Empfehlungen für die Verwendungen der untersuchten Methoden gegebenIn this thesis, a novel method for scale corrected comparisons of LD structure in different genomic regions is suggested. Several aspects of scale-caused problems – from precision of marker effect estimates to accuracy of predictions for new individuals - are investigated. Furthermore, based on a comparison of the performance of different approaches, recommendations on the application of examined methods are given
A Dirichlet process mixture model for survival outcome data: assessing nationwide kidney transplant centers
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/110837/1/sim6438.pd
- …