54,186 research outputs found

    Causal Inference by Stochastic Complexity

    Full text link
    The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes

    Uncertainty Reduction for Stochastic Processes on Complex Networks

    Full text link
    Many real-world systems are characterized by stochastic dynamical rules where a complex network of interactions among individual elements probabilistically determines their state. Even with full knowledge of the network structure and of the stochastic rules, the ability to predict system configurations is generally characterized by a large uncertainty. Selecting a fraction of the nodes and observing their state may help to reduce the uncertainty about the unobserved nodes. However, choosing these points of observation in an optimal way is a highly nontrivial task, depending on the nature of the stochastic process and on the structure of the underlying interaction pattern. In this paper, we introduce a computationally efficient algorithm to determine quasioptimal solutions to the problem. The method leverages network sparsity to reduce computational complexity from exponential to almost quadratic, thus allowing the straightforward application of the method to mid-to-large-size systems. Although the method is exact only for equilibrium stochastic processes defined on trees, it turns out to be effective also for out-of-equilibrium processes on sparse loopy networks.Comment: 5 pages, 2 figures + Supplemental Material. A python implementation of the algorithm is available at https://github.com/filrad/Maximum-Entropy-Samplin

    Parallel implementation of stochastic simulation for large-scale cellular processes

    Get PDF
    Experimental and theoretical studies have shown the importance of stochastic processes in genetic regulatory networks and cellular processes. Cellular networks and genetic circuits often involve small numbers of key proteins such as transcriptional factors and signaling proteins. In recent years stochastic models have been used successfully for studying noise in biological pathways, and stochastic modelling of biological systems has become a very important research field in computational biology. One of the challenge problems in this field is the reduction of the huge computing time in stochastic simulations. Based on the system of the mitogen-activated protein kinase cascade that is activated by epidermal growth factor, this work give a parallel implementation by using OpenMP and parallelism across the simulation. Special attention is paid to the independence of the generated random numbers in parallel computing, that is a key criterion for the success of stochastic simulations. Numerical results indicate that parallel computers can be used as an efficient tool for simulating the dynamics of large-scale genetic regulatory networks and cellular processes

    Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks

    Full text link
    The stochastic block model (SBM) is a flexible probabilistic tool that can be used to model interactions between clusters of nodes in a network. However, it does not account for interactions of time varying intensity between clusters. The extension of the SBM developed in this paper addresses this shortcoming through a temporal partition: assuming interactions between nodes are recorded on fixed-length time intervals, the inference procedure associated with the model we propose allows to cluster simultaneously the nodes of the network and the time intervals. The number of clusters of nodes and of time intervals, as well as the memberships to clusters, are obtained by maximizing an exact integrated complete-data likelihood, relying on a greedy search approach. Experiments on simulated and real data are carried out in order to assess the proposed methodology

    Statistical and Computational Tradeoffs in Stochastic Composite Likelihood

    Get PDF
    Maximum likelihood estimators are often of limited practical use due to the intensive computation they require. We propose a family of alternative estimators that maximize a stochastic variation of the composite likelihood function. Each of the estimators resolve the computation-accuracy tradeoff differently, and taken together they span a continuous spectrum of computation-accuracy tradeoff resolutions. We prove the consistency of the estimators, provide formulas for their asymptotic variance, statistical robustness, and computational complexity. We discuss experimental results in the context of Boltzmann machines and conditional random fields. The theoretical and experimental studies demonstrate the effectiveness of the estimators when the computational resources are insufficient. They also demonstrate that in some cases reduced computational complexity is associated with robustness thereby increasing statistical accuracy.Comment: 30 pages, 97 figures, 2 author
    • …
    corecore