54 research outputs found
Parametric inference in the large data limit using maximally informative models
Motivated by data-rich experiments in transcriptional regulation and sensory
neuroscience, we consider the following general problem in statistical
inference. When exposed to a high-dimensional signal S, a system of interest
computes a representation R of that signal which is then observed through a
noisy measurement M. From a large number of signals and measurements, we wish
to infer the "filter" that maps S to R. However, the standard method for
solving such problems, likelihood-based inference, requires perfect a priori
knowledge of the "noise function" mapping R to M. In practice such noise
functions are usually known only approximately, if at all, and using an
incorrect noise function will typically bias the inferred filter. Here we show
that, in the large data limit, this need for a pre-characterized noise function
can be circumvented by searching for filters that instead maximize the mutual
information I[M;R] between observed measurements and predicted representations.
Moreover, if the correct filter lies within the space of filters being
explored, maximizing mutual information becomes equivalent to simultaneously
maximizing every dependence measure that satisfies the Data Processing
Inequality. It is important to note that maximizing mutual information will
typically leave a small number of directions in parameter space unconstrained.
We term these directions "diffeomorphic modes" and present an equation that
allows these modes to be derived systematically. The presence of diffeomorphic
modes reflects a fundamental and nontrivial substructure within parameter
space, one that is obscured by standard likelihood-based inference.Comment: To appear in Neural Computatio
Equitability, mutual information, and the maximal information coefficient
Reshef et al. recently proposed a new statistical measure, the "maximal
information coefficient" (MIC), for quantifying arbitrary dependencies between
pairs of stochastic quantities. MIC is based on mutual information, a
fundamental quantity in information theory that is widely understood to serve
this need. MIC, however, is not an estimate of mutual information. Indeed, it
was claimed that MIC possesses a desirable mathematical property called
"equitability" that mutual information lacks. This was not proven; instead it
was argued solely through the analysis of simulated data. Here we show that
this claim, in fact, is incorrect. First we offer mathematical proof that no
(non-trivial) dependence measure satisfies the definition of equitability
proposed by Reshef et al.. We then propose a self-consistent and more general
definition of equitability that follows naturally from the Data Processing
Inequality. Mutual information satisfies this new definition of equitability
while MIC does not. Finally, we show that the simulation evidence offered by
Reshef et al. was artifactual. We conclude that estimating mutual information
is not only practical for many real-world applications, but also provides a
natural solution to the problem of quantifying associations in large data sets
Rapid and deterministic estimation of probability densities using scale-free field theories
The question of how best to estimate a continuous probability density from
finite data is an intriguing open problem at the interface of statistics and
physics. Previous work has argued that this problem can be addressed in a
natural way using methods from statistical field theory. Here I describe new
results that allow this field-theoretic approach to be rapidly and
deterministically computed in low dimensions, making it practical for use in
day-to-day data analysis. Importantly, this approach does not impose a
privileged length scale for smoothness of the inferred probability density, but
rather learns a natural length scale from the data due to the tradeoff between
goodness-of-fit and an Occam factor. Open source software implementing this
method in one and two dimensions is provided.Comment: 4 pages, 4 figures. Major revision in v3. The "Density Estimation
using Field Theory" (DEFT) software package is available at
https://github.com/jbkinney/13_def
Modeling multi-particle complexes in stochastic chemical systems
Large complexes of classical particles play central roles in biology, in polymer physics, and in other disciplines. However, physics currently lacks mathematical methods for describing such complexes in terms of component particles, interaction energies, and assembly rules. Here we describe a Fock space structure that addresses this need, as well as diagrammatic methods that facilitate the use of this formalism. These methods can dramatically simplify the equations governing both equilibrium and non-equilibrium stochastic chemical systems. A mathematical relationship between the set of all complexes and a list of rules for complex assembly is also identified
Precision measurement of cis-regulatory energetics in living cells
Gene expression in all organisms is controlled by cooperative interactions between DNA-bound transcription factors (TFs). However, measuring TF-TF interactions that occur at individual cis-regulatory sequences remains difficult. Here we introduce a strategy for precisely measuring the Gibbs free energy of such interactions in living cells. Our strategy uses reporter assays performed on strategically designed cis-regulatory sequences, together with a biophysical modeling approach we call "expression manifolds". We applied this strategy in Escherichia coli to interactions between two paradigmatic TFs: CRP and RNA polymerase (RNAP). Doing so, we consistently obtain measurements precise to ~0.1 kcal/mol. Unexpectedly, CRP-RNAP interactions are seen to deviate in multiple ways from the prior literature. Moreover, the well-known RNAP binding motif is found to be a surprisingly unreliable predictor of RNAP-DNA binding energy. Our strategy is compatible with massively parallel reporter assays in both prokaryotes and eukaryotes, and should thus be highly scalable and broadly applicable
MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect
Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps-including biophysically interpretable models-from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise
Computational reconstitution of spine calcium transients from individual proteins
We have built a stochastic model in the program MCell that simulates Ca^(2+) transients in spines from the principal molecular components believed to control Ca^(2+) entry and exit. Proteins, with their kinetic models, are located within two segments of dendrites containing 88 intact spines, centered in a fully reconstructed 6 Ă— 6 Ă— 5 ÎĽm^3 cube of hippocampal neuropil. Protein components include AMPA- and NMDA-type glutamate receptors, L- and R-type voltage-dependent Ca^(2+) channels, Na^+/Ca^(2+) exchangers, plasma membrane Ca^(2+) ATPases, smooth endoplasmic reticulum Ca^(2+) ATPases, immobile Ca2+ buffers, and calbindin. Kinetic models for each protein were taken from published studies of the isolated proteins in vitro. For simulation of electrical stimuli, the time course of voltage changes in the dendritic spine was generated with the desired stimulus in the program NEURON. Voltage-dependent parameters were then continuously re-adjusted during simulations in MCell to reproduce the effects of the stimulus. Nine parameters of the model were optimized within realistic experimental limits by a process that compared results of simulations to published data. We find that simulations in the optimized model reproduce the timing and amplitude of Ca^(2+) transients measured experimentally in intact neurons. Thus, we demonstrate that the characteristics of individual isolated proteins determined in vitro can accurately reproduce the dynamics of experimentally measured Ca^(2+) transients in spines. The model will provide a test bed for exploring the roles of additional proteins that regulate Ca^(2+) influx into spines and for studying the behavior of protein targets in the spine that are regulated by Ca^(2+) influx
An Index for 4 dimensional Super Conformal Theories
We present a trace formula for an index over the spectrum of four dimensional
superconformal field theories on time. Our index receives
contributions from states invariant under at least one supercharge and captures
all information -- that may be obtained purely from group theory -- about
protected short representations in 4 dimensional superconformal field theories.
In the case of the theory our index is a function of four
continuous variables. We compute it at weak coupling using gauge theory and at
strong coupling by summing over the spectrum of free massless particles in
and find perfect agreement at large and small charges.
Our index does not reproduce the entropy of supersymmetric black holes in
, but this is not a contradiction, as it differs qualitatively from the
partition function over supersymmetric states of the theory. We
note that entropy for some small supersymmetric black holes may be
reproduced via a D-brane counting involving giant gravitons. For big black
holes we find a qualitative (but not exact) agreement with the naive counting
of BPS states in the free Yang Mills theory. In this paper we also evaluate and
study the partition function over the chiral ring in the Yang
Mills theory.Comment: harvmac 40+16 pages, v3: references and table of contents added,
typos fixe
Quantum Gravity and Inflation
Using the Ashtekar-Sen variables of loop quantum gravity, a new class of
exact solutions to the equations of quantum cosmology is found for gravity
coupled to a scalar field, that corresponds to inflating universes. The scalar
field, which has an arbitrary potential, is treated as a time variable,
reducing the hamiltonian constraint to a time-dependent Schroedinger equation.
When reduced to the homogeneous and isotropic case, this is solved exactly by a
set of solutions that extend the Kodama state, taking into account the time
dependence of the vacuum energy. Each quantum state corresponds to a classical
solution of the Hamiltonian-Jacobi equation. The study of the latter shows
evidence for an attractor, suggesting a universality in the phenomena of
inflation. Finally, wavepackets can be constructed by superposing solutions
with different ratios of kinetic to potential scalar field energy, resolving,
at least in this case, the issue of normalizability of the Kodama state.Comment: 18 Pages, 2 Figures; major corrections to equations but prior results
still hold, updated reference
A direct-to-drive neural data acquisition system
Driven by the increasing channel count of neural probes, there is much effort being directed to creating increasingly scalable electrophysiology data acquisition (DAQ) systems. However, all such systems still rely on personal computers for data storage, and thus are limited by the bandwidth and cost of the computers, especially as the scale of recording increases. Here we present a novel architecture in which a digital processor receives data from an analog-to-digital converter, and writes that data directly to hard drives, without the need for a personal computer to serve as an intermediary in the DAQ process. This minimalist architecture may support exceptionally high data throughput, without incurring costs to support unnecessary hardware and overhead associated with personal computers, thus facilitating scaling of electrophysiological recording in the future.National Institutes of Health (U.S.) (Grant 1DP1NS087724)National Institutes of Health (U.S.) (Grant 1R01DA029639)National Institutes of Health (U.S.) (Grant 1R01NS067199)National Institutes of Health (U.S.) (Grant 2R44NS070453)National Institutes of Health (U.S.) (Grant R43MH101943)New York Stem Cell FoundationPaul Allen FoundationMassachusetts Institute of Technology. Media LaboratoryGoogle (Firm)United States. Defense Advanced Research Projects Agency (HR0011-14-2-0004)Hertz Foundation (Myhrvold Family Fellowship
- …