54 research outputs found

    Parametric inference in the large data limit using maximally informative models

    Get PDF
    Motivated by data-rich experiments in transcriptional regulation and sensory neuroscience, we consider the following general problem in statistical inference. When exposed to a high-dimensional signal S, a system of interest computes a representation R of that signal which is then observed through a noisy measurement M. From a large number of signals and measurements, we wish to infer the "filter" that maps S to R. However, the standard method for solving such problems, likelihood-based inference, requires perfect a priori knowledge of the "noise function" mapping R to M. In practice such noise functions are usually known only approximately, if at all, and using an incorrect noise function will typically bias the inferred filter. Here we show that, in the large data limit, this need for a pre-characterized noise function can be circumvented by searching for filters that instead maximize the mutual information I[M;R] between observed measurements and predicted representations. Moreover, if the correct filter lies within the space of filters being explored, maximizing mutual information becomes equivalent to simultaneously maximizing every dependence measure that satisfies the Data Processing Inequality. It is important to note that maximizing mutual information will typically leave a small number of directions in parameter space unconstrained. We term these directions "diffeomorphic modes" and present an equation that allows these modes to be derived systematically. The presence of diffeomorphic modes reflects a fundamental and nontrivial substructure within parameter space, one that is obscured by standard likelihood-based inference.Comment: To appear in Neural Computatio

    Equitability, mutual information, and the maximal information coefficient

    Get PDF
    Reshef et al. recently proposed a new statistical measure, the "maximal information coefficient" (MIC), for quantifying arbitrary dependencies between pairs of stochastic quantities. MIC is based on mutual information, a fundamental quantity in information theory that is widely understood to serve this need. MIC, however, is not an estimate of mutual information. Indeed, it was claimed that MIC possesses a desirable mathematical property called "equitability" that mutual information lacks. This was not proven; instead it was argued solely through the analysis of simulated data. Here we show that this claim, in fact, is incorrect. First we offer mathematical proof that no (non-trivial) dependence measure satisfies the definition of equitability proposed by Reshef et al.. We then propose a self-consistent and more general definition of equitability that follows naturally from the Data Processing Inequality. Mutual information satisfies this new definition of equitability while MIC does not. Finally, we show that the simulation evidence offered by Reshef et al. was artifactual. We conclude that estimating mutual information is not only practical for many real-world applications, but also provides a natural solution to the problem of quantifying associations in large data sets

    Rapid and deterministic estimation of probability densities using scale-free field theories

    Get PDF
    The question of how best to estimate a continuous probability density from finite data is an intriguing open problem at the interface of statistics and physics. Previous work has argued that this problem can be addressed in a natural way using methods from statistical field theory. Here I describe new results that allow this field-theoretic approach to be rapidly and deterministically computed in low dimensions, making it practical for use in day-to-day data analysis. Importantly, this approach does not impose a privileged length scale for smoothness of the inferred probability density, but rather learns a natural length scale from the data due to the tradeoff between goodness-of-fit and an Occam factor. Open source software implementing this method in one and two dimensions is provided.Comment: 4 pages, 4 figures. Major revision in v3. The "Density Estimation using Field Theory" (DEFT) software package is available at https://github.com/jbkinney/13_def

    Modeling multi-particle complexes in stochastic chemical systems

    Get PDF
    Large complexes of classical particles play central roles in biology, in polymer physics, and in other disciplines. However, physics currently lacks mathematical methods for describing such complexes in terms of component particles, interaction energies, and assembly rules. Here we describe a Fock space structure that addresses this need, as well as diagrammatic methods that facilitate the use of this formalism. These methods can dramatically simplify the equations governing both equilibrium and non-equilibrium stochastic chemical systems. A mathematical relationship between the set of all complexes and a list of rules for complex assembly is also identified

    Precision measurement of cis-regulatory energetics in living cells

    Get PDF
    Gene expression in all organisms is controlled by cooperative interactions between DNA-bound transcription factors (TFs). However, measuring TF-TF interactions that occur at individual cis-regulatory sequences remains difficult. Here we introduce a strategy for precisely measuring the Gibbs free energy of such interactions in living cells. Our strategy uses reporter assays performed on strategically designed cis-regulatory sequences, together with a biophysical modeling approach we call "expression manifolds". We applied this strategy in Escherichia coli to interactions between two paradigmatic TFs: CRP and RNA polymerase (RNAP). Doing so, we consistently obtain measurements precise to ~0.1 kcal/mol. Unexpectedly, CRP-RNAP interactions are seen to deviate in multiple ways from the prior literature. Moreover, the well-known RNAP binding motif is found to be a surprisingly unreliable predictor of RNAP-DNA binding energy. Our strategy is compatible with massively parallel reporter assays in both prokaryotes and eukaryotes, and should thus be highly scalable and broadly applicable

    MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect

    Get PDF
    Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps-including biophysically interpretable models-from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise

    Computational reconstitution of spine calcium transients from individual proteins

    Get PDF
    We have built a stochastic model in the program MCell that simulates Ca^(2+) transients in spines from the principal molecular components believed to control Ca^(2+) entry and exit. Proteins, with their kinetic models, are located within two segments of dendrites containing 88 intact spines, centered in a fully reconstructed 6 Ă— 6 Ă— 5 ÎĽm^3 cube of hippocampal neuropil. Protein components include AMPA- and NMDA-type glutamate receptors, L- and R-type voltage-dependent Ca^(2+) channels, Na^+/Ca^(2+) exchangers, plasma membrane Ca^(2+) ATPases, smooth endoplasmic reticulum Ca^(2+) ATPases, immobile Ca2+ buffers, and calbindin. Kinetic models for each protein were taken from published studies of the isolated proteins in vitro. For simulation of electrical stimuli, the time course of voltage changes in the dendritic spine was generated with the desired stimulus in the program NEURON. Voltage-dependent parameters were then continuously re-adjusted during simulations in MCell to reproduce the effects of the stimulus. Nine parameters of the model were optimized within realistic experimental limits by a process that compared results of simulations to published data. We find that simulations in the optimized model reproduce the timing and amplitude of Ca^(2+) transients measured experimentally in intact neurons. Thus, we demonstrate that the characteristics of individual isolated proteins determined in vitro can accurately reproduce the dynamics of experimentally measured Ca^(2+) transients in spines. The model will provide a test bed for exploring the roles of additional proteins that regulate Ca^(2+) influx into spines and for studying the behavior of protein targets in the spine that are regulated by Ca^(2+) influx

    An Index for 4 dimensional Super Conformal Theories

    Full text link
    We present a trace formula for an index over the spectrum of four dimensional superconformal field theories on S3Ă—S^3 \times time. Our index receives contributions from states invariant under at least one supercharge and captures all information -- that may be obtained purely from group theory -- about protected short representations in 4 dimensional superconformal field theories. In the case of the N=4\mathcal{N}=4 theory our index is a function of four continuous variables. We compute it at weak coupling using gauge theory and at strong coupling by summing over the spectrum of free massless particles in AdS5Ă—S5AdS_5\times S^5 and find perfect agreement at large NN and small charges. Our index does not reproduce the entropy of supersymmetric black holes in AdS5AdS_5, but this is not a contradiction, as it differs qualitatively from the partition function over supersymmetric states of the N=4{\cal N}=4 theory. We note that entropy for some small supersymmetric AdS5AdS_5 black holes may be reproduced via a D-brane counting involving giant gravitons. For big black holes we find a qualitative (but not exact) agreement with the naive counting of BPS states in the free Yang Mills theory. In this paper we also evaluate and study the partition function over the chiral ring in the N=4\mathcal{N}=4 Yang Mills theory.Comment: harvmac 40+16 pages, v3: references and table of contents added, typos fixe

    Quantum Gravity and Inflation

    Get PDF
    Using the Ashtekar-Sen variables of loop quantum gravity, a new class of exact solutions to the equations of quantum cosmology is found for gravity coupled to a scalar field, that corresponds to inflating universes. The scalar field, which has an arbitrary potential, is treated as a time variable, reducing the hamiltonian constraint to a time-dependent Schroedinger equation. When reduced to the homogeneous and isotropic case, this is solved exactly by a set of solutions that extend the Kodama state, taking into account the time dependence of the vacuum energy. Each quantum state corresponds to a classical solution of the Hamiltonian-Jacobi equation. The study of the latter shows evidence for an attractor, suggesting a universality in the phenomena of inflation. Finally, wavepackets can be constructed by superposing solutions with different ratios of kinetic to potential scalar field energy, resolving, at least in this case, the issue of normalizability of the Kodama state.Comment: 18 Pages, 2 Figures; major corrections to equations but prior results still hold, updated reference

    A direct-to-drive neural data acquisition system

    Get PDF
    Driven by the increasing channel count of neural probes, there is much effort being directed to creating increasingly scalable electrophysiology data acquisition (DAQ) systems. However, all such systems still rely on personal computers for data storage, and thus are limited by the bandwidth and cost of the computers, especially as the scale of recording increases. Here we present a novel architecture in which a digital processor receives data from an analog-to-digital converter, and writes that data directly to hard drives, without the need for a personal computer to serve as an intermediary in the DAQ process. This minimalist architecture may support exceptionally high data throughput, without incurring costs to support unnecessary hardware and overhead associated with personal computers, thus facilitating scaling of electrophysiological recording in the future.National Institutes of Health (U.S.) (Grant 1DP1NS087724)National Institutes of Health (U.S.) (Grant 1R01DA029639)National Institutes of Health (U.S.) (Grant 1R01NS067199)National Institutes of Health (U.S.) (Grant 2R44NS070453)National Institutes of Health (U.S.) (Grant R43MH101943)New York Stem Cell FoundationPaul Allen FoundationMassachusetts Institute of Technology. Media LaboratoryGoogle (Firm)United States. Defense Advanced Research Projects Agency (HR0011-14-2-0004)Hertz Foundation (Myhrvold Family Fellowship
    • …
    corecore