133,134 research outputs found
Statistical mechanics of transcription-factor binding site discovery using Hidden Markov Models
Hidden Markov Models (HMMs) are a commonly used tool for inference of
transcription factor (TF) binding sites from DNA sequence data. We exploit the
mathematical equivalence between HMMs for TF binding and the "inverse"
statistical mechanics of hard rods in a one-dimensional disordered potential to
investigate learning in HMMs. We derive analytic expressions for the Fisher
information, a commonly employed measure of confidence in learned parameters,
in the biologically relevant limit where the density of binding sites is low.
We then use techniques from statistical mechanics to derive a scaling principle
relating the specificity (binding energy) of a TF to the minimum amount of
training data necessary to learn it.Comment: 25 pages, 2 figures, 1 table V2 - typos fixed and new references
adde
The Effects of Halo Assembly Bias on Self-Calibration in Galaxy Cluster Surveys
Self-calibration techniques for analyzing galaxy cluster counts utilize the
abundance and the clustering amplitude of dark matter halos. These properties
simultaneously constrain cosmological parameters and the cluster
observable-mass relation. It was recently discovered that the clustering
amplitude of halos depends not only on the halo mass, but also on various
secondary variables, such as the halo formation time and the concentration;
these dependences are collectively termed assembly bias. Applying modified
Fisher matrix formalism, we explore whether these secondary variables have a
significant impact on the study of dark energy properties using the
self-calibration technique in current (SDSS) and the near future (DES, SPT, and
LSST) cluster surveys. The impact of the secondary dependence is determined by
(1) the scatter in the observable-mass relation and (2) the correlation between
observable and secondary variables. We find that for optical surveys, the
secondary dependence does not significantly influence an SDSS-like survey;
however, it may affect a DES-like survey (given the high scatter currently
expected from optical clusters) and an LSST-like survey (even for low scatter
values and low correlations). For an SZ survey such as SPT, the impact of
secondary dependence is insignificant if the scatter is 20% or lower but can be
enhanced by the potential high scatter values introduced by a highly correlated
background. Accurate modeling of the assembly bias is necessary for cluster
self-calibration in the era of precision cosmology.Comment: 13 pages, 5 figures, replaced to match published versio
Planck priors for dark energy surveys
Although cosmic microwave background (CMB) anisotropy data alone cannot
constrain simultaneously the spatial curvature and the equation of state of
dark energy, CMB data provide a valuable addition to other experimental
results. However computing a full CMB power spectrum with a Boltzmann code is
quite slow; for instance if we want to work with many dark energy and/or
modified gravity models, or would like to optimize experiments where many
different configurations need to be tested, it is possible to adopt a quicker
and more efficient approach.
In this paper we consider the compression of the projected Planck CMB data
into four parameters, R (scaled distance to last scattering surface), l_a
(angular scale of sound horizon at last scattering), Omega_b h^2 (baryon
density fraction) and n_s (powerlaw index of primordial matter power spectrum),
all of which can be computed quickly. We show that, although this compression
loses information compared to the full likelihood, such information loss
becomes negligible when more data is added. We also demonstrate that the method
can be used for scalar field dark energy independently of the parametrisation
of the equation of state, and discuss how this method should be used for other
kinds of dark energy models.Comment: 8 pages, 3 figures, 4 table
Where do statistical models come from? Revisiting the problem of specification
R. A. Fisher founded modern statistical inference in 1922 and identified its
fundamental problems to be: specification, estimation and distribution. Since
then the problem of statistical model specification has received scant
attention in the statistics literature. The paper traces the history of
statistical model specification, focusing primarily on pioneers like Fisher,
Neyman, and more recently Lehmann and Cox, and attempts a synthesis of their
views in the context of the Probabilistic Reduction (PR) approach. As argued by
Lehmann [11], a major stumbling block for a general approach to statistical
model specification has been the delineation of the appropriate role for
substantive subject matter information. The PR approach demarcates the
interrelated but complemenatry roles of substantive and statistical information
summarized ab initio in the form of a structural and a statistical model,
respectively. In an attempt to preserve the integrity of both sources of
information, as well as to ensure the reliability of their fusing, a purely
probabilistic construal of statistical models is advocated. This probabilistic
construal is then used to shed light on a number of issues relating to
specification, including the role of preliminary data analysis, structural vs.
statistical models, model specification vs. model selection, statistical vs.
substantive adequacy and model validation.Comment: Published at http://dx.doi.org/10.1214/074921706000000419 in the IMS
Lecture Notes--Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
Deep Fishing: Gradient Features from Deep Nets
Convolutional Networks (ConvNets) have recently improved image recognition
performance thanks to end-to-end learning of deep feed-forward models from raw
pixels. Deep learning is a marked departure from the previous state of the art,
the Fisher Vector (FV), which relied on gradient-based encoding of local
hand-crafted features. In this paper, we discuss a novel connection between
these two approaches. First, we show that one can derive gradient
representations from ConvNets in a similar fashion to the FV. Second, we show
that this gradient representation actually corresponds to a structured matrix
that allows for efficient similarity computation. We experimentally study the
benefits of transferring this representation over the outputs of ConvNet
layers, and find consistent improvements on the Pascal VOC 2007 and 2012
datasets.Comment: To appear at BMVC 201
- …