79,667 research outputs found
Bayesian Model Comparison in Genetic Association Analysis: Linear Mixed Modeling and SNP Set Testing
We consider the problems of hypothesis testing and model comparison under a
flexible Bayesian linear regression model whose formulation is closely
connected with the linear mixed effect model and the parametric models for SNP
set analysis in genetic association studies. We derive a class of analytic
approximate Bayes factors and illustrate their connections with a variety of
frequentist test statistics, including the Wald statistic and the variance
component score statistic. Taking advantage of Bayesian model averaging and
hierarchical modeling, we demonstrate some distinct advantages and
flexibilities in the approaches utilizing the derived Bayes factors in the
context of genetic association studies. We demonstrate our proposed methods
using real or simulated numerical examples in applications of single SNP
association testing, multi-locus fine-mapping and SNP set association testing
Computational statistics using the Bayesian Inference Engine
This paper introduces the Bayesian Inference Engine (BIE), a general
parallel, optimised software package for parameter inference and model
selection. This package is motivated by the analysis needs of modern
astronomical surveys and the need to organise and reuse expensive derived data.
The BIE is the first platform for computational statistics designed explicitly
to enable Bayesian update and model comparison for astronomical problems.
Bayesian update is based on the representation of high-dimensional posterior
distributions using metric-ball-tree based kernel density estimation. Among its
algorithmic offerings, the BIE emphasises hybrid tempered MCMC schemes that
robustly sample multimodal posterior distributions in high-dimensional
parameter spaces. Moreover, the BIE is implements a full persistence or
serialisation system that stores the full byte-level image of the running
inference and previously characterised posterior distributions for later use.
Two new algorithms to compute the marginal likelihood from the posterior
distribution, developed for and implemented in the BIE, enable model comparison
for complex models and data sets. Finally, the BIE was designed to be a
collaborative platform for applying Bayesian methodology to astronomy. It
includes an extensible object-oriented and easily extended framework that
implements every aspect of the Bayesian inference. By providing a variety of
statistical algorithms for all phases of the inference problem, a scientist may
explore a variety of approaches with a single model and data implementation.
Additional technical details and download details are available from
http://www.astro.umass.edu/bie. The BIE is distributed under the GNU GPL.Comment: Resubmitted version. Additional technical details and download
details are available from http://www.astro.umass.edu/bie. The BIE is
distributed under the GNU GP
A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing
In the spirit of modeling inference for microarrays as multiple testing for
sparse mixtures, we present a similar approach to a simplified version of
quantitative trait loci (QTL) mapping. Unlike in case of microarrays, where the
number of tests usually reaches tens of thousands, the number of tests
performed in scans for QTL usually does not exceed several hundreds. However,
in typical cases, the sparsity of significant alternatives for QTL mapping
is in the same range as for microarrays. For methodological interest, as well
as some related applications, we also consider non-sparse mixtures. Using
simulations as well as theoretical observations we study false discovery rate
(FDR), power and misclassification probability for the Benjamini-Hochberg (BH)
procedure and its modifications, as well as for various parametric and
nonparametric Bayes and Parametric Empirical Bayes procedures. Our results
confirm the observation of Genovese and Wasserman (2002) that for small p the
misclassification error of BH is close to optimal in the sense of attaining the
Bayes oracle. This property is shared by some of the considered Bayes testing
rules, which in general perform better than BH for large or moderate 's.Comment: Published in at http://dx.doi.org/10.1214/193940307000000158 the IMS
Collections (http://www.imstat.org/publications/imscollections.htm) by the
Institute of Mathematical Statistics (http://www.imstat.org
Detecting Unspecified Structure in Low-Count Images
Unexpected structure in images of astronomical sources often presents itself
upon visual inspection of the image, but such apparent structure may either
correspond to true features in the source or be due to noise in the data. This
paper presents a method for testing whether inferred structure in an image with
Poisson noise represents a significant departure from a baseline (null) model
of the image. To infer image structure, we conduct a Bayesian analysis of a
full model that uses a multiscale component to allow flexible departures from
the posited null model. As a test statistic, we use a tail probability of the
posterior distribution under the full model. This choice of test statistic
allows us to estimate a computationally efficient upper bound on a p-value that
enables us to draw strong conclusions even when there are limited computational
resources that can be devoted to simulations under the null model. We
demonstrate the statistical performance of our method on simulated images.
Applying our method to an X-ray image of the quasar 0730+257, we find
significant evidence against the null model of a single point source and
uniform background, lending support to the claim of an X-ray jet
The Jeffreys-Lindley Paradox and Discovery Criteria in High Energy Physics
The Jeffreys-Lindley paradox displays how the use of a p-value (or number of
standard deviations z) in a frequentist hypothesis test can lead to an
inference that is radically different from that of a Bayesian hypothesis test
in the form advocated by Harold Jeffreys in the 1930s and common today. The
setting is the test of a well-specified null hypothesis (such as the Standard
Model of elementary particle physics, possibly with "nuisance parameters")
versus a composite alternative (such as the Standard Model plus a new force of
nature of unknown strength). The p-value, as well as the ratio of the
likelihood under the null hypothesis to the maximized likelihood under the
alternative, can strongly disfavor the null hypothesis, while the Bayesian
posterior probability for the null hypothesis can be arbitrarily large. The
academic statistics literature contains many impassioned comments on this
paradox, yet there is no consensus either on its relevance to scientific
communication or on its correct resolution. The paradox is quite relevant to
frontier research in high energy physics. This paper is an attempt to explain
the situation to both physicists and statisticians, in the hope that further
progress can be made.Comment: v4: Continued editing for clarity. Figure added. v5: Minor fixes to
biblio. Same as published version except for minor copy-edits, Synthese
(2014). v6: fix typos, and restore garbled sentence at beginning of Sec 4 to
v
False discovery rate regression: an application to neural synchrony detection in primary visual cortex
Many approaches for multiple testing begin with the assumption that all tests
in a given study should be combined into a global false-discovery-rate
analysis. But this may be inappropriate for many of today's large-scale
screening problems, where auxiliary information about each test is often
available, and where a combined analysis can lead to poorly calibrated error
rates within different subsets of the experiment. To address this issue, we
introduce an approach called false-discovery-rate regression that directly uses
this auxiliary information to inform the outcome of each test. The method can
be motivated by a two-groups model in which covariates are allowed to influence
the local false discovery rate, or equivalently, the posterior probability that
a given observation is a signal. This poses many subtle issues at the interface
between inference and computation, and we investigate several variations of the
overall approach. Simulation evidence suggests that: (1) when covariate effects
are present, FDR regression improves power for a fixed false-discovery rate;
and (2) when covariate effects are absent, the method is robust, in the sense
that it does not lead to inflated error rates. We apply the method to neural
recordings from primary visual cortex. The goal is to detect pairs of neurons
that exhibit fine-time-scale interactions, in the sense that they fire together
more often than expected due to chance. Our method detects roughly 50% more
synchronous pairs versus a standard FDR-controlling analysis. The companion R
package FDRreg implements all methods described in the paper
- …