114 research outputs found

    Parametric inference in the large data limit using maximally informative models

    Get PDF
    Motivated by data-rich experiments in transcriptional regulation and sensory neuroscience, we consider the following general problem in statistical inference. When exposed to a high-dimensional signal S, a system of interest computes a representation R of that signal which is then observed through a noisy measurement M. From a large number of signals and measurements, we wish to infer the "filter" that maps S to R. However, the standard method for solving such problems, likelihood-based inference, requires perfect a priori knowledge of the "noise function" mapping R to M. In practice such noise functions are usually known only approximately, if at all, and using an incorrect noise function will typically bias the inferred filter. Here we show that, in the large data limit, this need for a pre-characterized noise function can be circumvented by searching for filters that instead maximize the mutual information I[M;R] between observed measurements and predicted representations. Moreover, if the correct filter lies within the space of filters being explored, maximizing mutual information becomes equivalent to simultaneously maximizing every dependence measure that satisfies the Data Processing Inequality. It is important to note that maximizing mutual information will typically leave a small number of directions in parameter space unconstrained. We term these directions "diffeomorphic modes" and present an equation that allows these modes to be derived systematically. The presence of diffeomorphic modes reflects a fundamental and nontrivial substructure within parameter space, one that is obscured by standard likelihood-based inference.Comment: To appear in Neural Computatio

    Equitability, mutual information, and the maximal information coefficient

    Get PDF
    Reshef et al. recently proposed a new statistical measure, the "maximal information coefficient" (MIC), for quantifying arbitrary dependencies between pairs of stochastic quantities. MIC is based on mutual information, a fundamental quantity in information theory that is widely understood to serve this need. MIC, however, is not an estimate of mutual information. Indeed, it was claimed that MIC possesses a desirable mathematical property called "equitability" that mutual information lacks. This was not proven; instead it was argued solely through the analysis of simulated data. Here we show that this claim, in fact, is incorrect. First we offer mathematical proof that no (non-trivial) dependence measure satisfies the definition of equitability proposed by Reshef et al.. We then propose a self-consistent and more general definition of equitability that follows naturally from the Data Processing Inequality. Mutual information satisfies this new definition of equitability while MIC does not. Finally, we show that the simulation evidence offered by Reshef et al. was artifactual. We conclude that estimating mutual information is not only practical for many real-world applications, but also provides a natural solution to the problem of quantifying associations in large data sets

    The Social Media Decision: Why Some Terrorist Organizations Choose to Build and Utilize a Social Media Presence and Others Do Not

    Get PDF
    Traditionally, radicalization was accomplished through isolation from society, in a small, personal, hands-on setting. But today, some groups have embraced social media platforms to reach and radicalize new supporters and recruits. This modern tool is an opportunity to reach more individuals in a manner that is less costly, easier, and less time-consuming. This has opened the process, allowing for both “direct recruiting”, targeting selected individuals, and “indirect recruiting”, loading material online and allowing it to spread to cause a form of self-radicalization in those who encounter it. This would seem beneficial for all rebel groups. If you successfully recruit even a few fighters or supporters through a low-cost process, it would be worthwhile to take advantage of that. Yet, we don’t see this occurring at the rate one would expect from such a low-cost, high-benefit approach. Certain groups—such as the Islamic State—have built an online presence and embraced social media, but others choose not to, or maintain a low presence. This presents a puzzle: which group actors choose to create and use social media as a radicalization and recruitment tool and which do not? What factors or characteristics determine this decision?This research project investigates the usage of social media as a strategic tactic by terrorist groups. We see certain groups embrace the tactic, to varying degrees, and others do not. So what types of groups choose to utilize this tool? Is there some component or trait of groups that can explain and/or predict the choice to create a social media presence? And if they have a presence, is there some component(s) that explains and predicts their level of social media activity? I propose several factors play important roles in this decision: ideological identity, recruitment opportunities through alliance networks, and competition or outbidding behavior spurred by the existence of rival groups.Testable hypotheses are derived from these factors and are tested on a dataset of 25 organizations from 2006 to 2016, then through three case studies on individual groups. The analysis reveals statistically significant support for all three variables and their relationship to social media engagement

    Rapid and deterministic estimation of probability densities using scale-free field theories

    Get PDF
    The question of how best to estimate a continuous probability density from finite data is an intriguing open problem at the interface of statistics and physics. Previous work has argued that this problem can be addressed in a natural way using methods from statistical field theory. Here I describe new results that allow this field-theoretic approach to be rapidly and deterministically computed in low dimensions, making it practical for use in day-to-day data analysis. Importantly, this approach does not impose a privileged length scale for smoothness of the inferred probability density, but rather learns a natural length scale from the data due to the tradeoff between goodness-of-fit and an Occam factor. Open source software implementing this method in one and two dimensions is provided.Comment: 4 pages, 4 figures. Major revision in v3. The "Density Estimation using Field Theory" (DEFT) software package is available at https://github.com/jbkinney/13_def

    Modeling multi-particle complexes in stochastic chemical systems

    Get PDF
    Large complexes of classical particles play central roles in biology, in polymer physics, and in other disciplines. However, physics currently lacks mathematical methods for describing such complexes in terms of component particles, interaction energies, and assembly rules. Here we describe a Fock space structure that addresses this need, as well as diagrammatic methods that facilitate the use of this formalism. These methods can dramatically simplify the equations governing both equilibrium and non-equilibrium stochastic chemical systems. A mathematical relationship between the set of all complexes and a list of rules for complex assembly is also identified

    Biophysical models of cis-regulation as interpretable neural networks

    Get PDF
    Abstract The adoption of deep learning techniques in genomics has been hindered by the difficulty of mechanistically interpreting the models that these techniques produce. In recent years, a variety of post-hoc attribution methods have been proposed for addressing this neural network interpretability problem in the context of gene regulation. Here we describe a complementary way of approaching this problem. Our strategy is based on the observation that two large classes of biophysical models of cis-regulatory mechanisms can be expressed as deep neural networks in which nodes and weights have explicit physiochemical interpretations. We also demonstrate how such biophysical networks can be rapidly inferred, using modern deep learning frameworks, from the data produced by certain types of massively parallel reporter assays (MPRAs). These results suggest a scalable strategy for using MPRAs to systematically characterize the biophysical basis of gene regulation in a wide range of biological contexts. They also highlight gene regulation as a promising venue for the development of scientifically interpretable approaches to deep learning

    MPAthic: Quantitative Modeling of Sequence-Function Relationships for massively parallel assays

    Get PDF
    Massively parallel assays (MPAs) are being rapidly adopted for studying a wide range of DNA, RNA, and protein sequence-function relationships. However, the software available for quantitatively modeling these relationships is severely limited. Here we describe MPAthic, a software package that enables the rapid inference of such models from a variety of MPA datasets. Using both simulated and previously published data, we show that the modeling capabilities of MPAthic greatly improve on those of existing software. In particular, only MPAthic can accurately quantify the strength of epistatic interactions. These capabilities address a major need in the analysis of MPA data

    VolRoverN: Enhancing Surface and Volumetric Reconstruction for Realistic Dynamical Simulation of Cellular and Subcellular Function

    Get PDF
    Establishing meaningful relationships between cellular structure and function requires accurate morphological reconstructions. In particular, there is an unmet need for high quality surface reconstructions to model subcellular and synaptic interactions among neurons and glia at nanometer resolution. We address this need with VolRoverN, a software package that produces accurate, efficient, and automated 3D surface reconstructions from stacked 2D contour tracings. While many techniques and tools have been developed in the past for 3D visualization of cellular structure, the reconstructions from VolRoverN meet specific quality criteria that are important for dynamical simulations. These criteria include manifoldness, water-tightness, lack of self- and object-object-intersections, and geometric accuracy. These enhanced surface reconstructions are readily extensible to any cell type and are used here on spiny dendrites with complex morphology and axons from mature rat hippocampal area CA1. Both spatially realistic surface reconstructions and reduced skeletonizations are produced and formatted by VolRoverN for easy input into analysis software packages for neurophysiological simulations at multiple spatial and temporal scales ranging from ion electro-diffusion to electrical cable models

    Close-Packed Silicon Microelectrodes for Scalable Spatially Oversampled Neural Recording

    Get PDF
    Objective: Neural recording electrodes are important tools for understanding neural codes and brain dynamics. Neural electrodes that are closely packed, such as in tetrodes, enable spatial oversampling of neural activity, which facilitates data analysis. Here we present the design and implementation of close-packed silicon microelectrodes to enable spatially oversampled recording of neural activity in a scalable fashion. Methods: Our probes are fabricated in a hybrid lithography process, resulting in a dense array of recording sites connected to submicron dimension wiring. Results: We demonstrate an implementation of a probe comprising 1000 electrode pads, each 9 × 9 μm, at a pitch of 11 μm. We introduce design automation and packaging methods that allow us to readily create a large variety of different designs. Significance: We perform neural recordings with such probes in the live mammalian brain that illustrate the spatial oversampling potential of closely packed electrode sites.Massachusetts Institute of Technology. Simons Center for the Social BrainNational Institutes of Health (U.S.) (NIH Director’s Pioneer Award DP1NS087724)National Institutes of Health (U.S.) (NIH Grant R01NS067199)National Institutes of Health (U.S.) (NIH grant Grant 2R44NS070453- 03A1)National Institutes of Health (U.S.) (NIH Grant R01DA029639)National Science Foundation (U.S.) (Cognitive Rhythms Collaborative, NSF DMS 1042134)Institution of Engineering and Technology (IET) (Harvey Prize)New York Stem Cell FoundationNational Institutes of Health (U.S.) (NIH grant CBET 1053233)United States. Defense Advanced Research Projects Agency (DARPA Grant HR0011-14-2-0004)Paul G. Allen Family Foundatio
    • …
    corecore