133,134 research outputs found

    Statistical mechanics of transcription-factor binding site discovery using Hidden Markov Models

    Full text link
    Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.Comment: 25 pages, 2 figures, 1 table V2 - typos fixed and new references adde

    The Effects of Halo Assembly Bias on Self-Calibration in Galaxy Cluster Surveys

    Get PDF
    Self-calibration techniques for analyzing galaxy cluster counts utilize the abundance and the clustering amplitude of dark matter halos. These properties simultaneously constrain cosmological parameters and the cluster observable-mass relation. It was recently discovered that the clustering amplitude of halos depends not only on the halo mass, but also on various secondary variables, such as the halo formation time and the concentration; these dependences are collectively termed assembly bias. Applying modified Fisher matrix formalism, we explore whether these secondary variables have a significant impact on the study of dark energy properties using the self-calibration technique in current (SDSS) and the near future (DES, SPT, and LSST) cluster surveys. The impact of the secondary dependence is determined by (1) the scatter in the observable-mass relation and (2) the correlation between observable and secondary variables. We find that for optical surveys, the secondary dependence does not significantly influence an SDSS-like survey; however, it may affect a DES-like survey (given the high scatter currently expected from optical clusters) and an LSST-like survey (even for low scatter values and low correlations). For an SZ survey such as SPT, the impact of secondary dependence is insignificant if the scatter is 20% or lower but can be enhanced by the potential high scatter values introduced by a highly correlated background. Accurate modeling of the assembly bias is necessary for cluster self-calibration in the era of precision cosmology.Comment: 13 pages, 5 figures, replaced to match published versio

    Planck priors for dark energy surveys

    Get PDF
    Although cosmic microwave background (CMB) anisotropy data alone cannot constrain simultaneously the spatial curvature and the equation of state of dark energy, CMB data provide a valuable addition to other experimental results. However computing a full CMB power spectrum with a Boltzmann code is quite slow; for instance if we want to work with many dark energy and/or modified gravity models, or would like to optimize experiments where many different configurations need to be tested, it is possible to adopt a quicker and more efficient approach. In this paper we consider the compression of the projected Planck CMB data into four parameters, R (scaled distance to last scattering surface), l_a (angular scale of sound horizon at last scattering), Omega_b h^2 (baryon density fraction) and n_s (powerlaw index of primordial matter power spectrum), all of which can be computed quickly. We show that, although this compression loses information compared to the full likelihood, such information loss becomes negligible when more data is added. We also demonstrate that the method can be used for scalar field dark energy independently of the parametrisation of the equation of state, and discuss how this method should be used for other kinds of dark energy models.Comment: 8 pages, 3 figures, 4 table

    Where do statistical models come from? Revisiting the problem of specification

    Full text link
    R. A. Fisher founded modern statistical inference in 1922 and identified its fundamental problems to be: specification, estimation and distribution. Since then the problem of statistical model specification has received scant attention in the statistics literature. The paper traces the history of statistical model specification, focusing primarily on pioneers like Fisher, Neyman, and more recently Lehmann and Cox, and attempts a synthesis of their views in the context of the Probabilistic Reduction (PR) approach. As argued by Lehmann [11], a major stumbling block for a general approach to statistical model specification has been the delineation of the appropriate role for substantive subject matter information. The PR approach demarcates the interrelated but complemenatry roles of substantive and statistical information summarized ab initio in the form of a structural and a statistical model, respectively. In an attempt to preserve the integrity of both sources of information, as well as to ensure the reliability of their fusing, a purely probabilistic construal of statistical models is advocated. This probabilistic construal is then used to shed light on a number of issues relating to specification, including the role of preliminary data analysis, structural vs. statistical models, model specification vs. model selection, statistical vs. substantive adequacy and model validation.Comment: Published at http://dx.doi.org/10.1214/074921706000000419 in the IMS Lecture Notes--Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    Deep Fishing: Gradient Features from Deep Nets

    Full text link
    Convolutional Networks (ConvNets) have recently improved image recognition performance thanks to end-to-end learning of deep feed-forward models from raw pixels. Deep learning is a marked departure from the previous state of the art, the Fisher Vector (FV), which relied on gradient-based encoding of local hand-crafted features. In this paper, we discuss a novel connection between these two approaches. First, we show that one can derive gradient representations from ConvNets in a similar fashion to the FV. Second, we show that this gradient representation actually corresponds to a structured matrix that allows for efficient similarity computation. We experimentally study the benefits of transferring this representation over the outputs of ConvNet layers, and find consistent improvements on the Pascal VOC 2007 and 2012 datasets.Comment: To appear at BMVC 201
    corecore