7,531 research outputs found

    Group invariance principles for causal generative models

    Full text link
    The postulate of independence of cause and mechanism (ICM) has recently led to several new causal discovery algorithms. The interpretation of independence and the way it is utilized, however, varies across these methods. Our aim in this paper is to propose a group theoretic framework for ICM to unify and generalize these approaches. In our setting, the cause-mechanism relationship is assessed by comparing it against a null hypothesis through the application of random generic group transformations. We show that the group theoretic view provides a very general tool to study the structure of data generating mechanisms with direct applications to machine learning.Comment: 16 pages, 6 figure

    Network Inference from Co-Occurrences

    Full text link
    The recovery of network structure from experimental data is a basic and fundamental problem. Unfortunately, experimental data often do not directly reveal structure due to inherent limitations such as imprecision in timing or other observation mechanisms. We consider the problem of inferring network structure in the form of a directed graph from co-occurrence observations. Each observation arises from a transmission made over the network and indicates which vertices carry the transmission without explicitly conveying their order in the path. Without order information, there are an exponential number of feasible graphs which agree with the observed data equally well. Yet, the basic physical principles underlying most networks strongly suggest that all feasible graphs are not equally likely. In particular, vertices that co-occur in many observations are probably closely connected. Previous approaches to this problem are based on ad hoc heuristics. We model the experimental observations as independent realizations of a random walk on the underlying graph, subjected to a random permutation which accounts for the lack of order information. Treating the permutations as missing data, we derive an exact expectation-maximization (EM) algorithm for estimating the random walk parameters. For long transmission paths the exact E-step may be computationally intractable, so we also describe an efficient Monte Carlo EM (MCEM) algorithm and derive conditions which ensure convergence of the MCEM algorithm with high probability. Simulations and experiments with Internet measurements demonstrate the promise of this approach.Comment: Submitted to IEEE Transactions on Information Theory. An extended version is available as University of Wisconsin Technical Report ECE-06-

    Symbolic Partial-Order Execution for Testing Multi-Threaded Programs

    Full text link
    We describe a technique for systematic testing of multi-threaded programs. We combine Quasi-Optimal Partial-Order Reduction, a state-of-the-art technique that tackles path explosion due to interleaving non-determinism, with symbolic execution to handle data non-determinism. Our technique iteratively and exhaustively finds all executions of the program. It represents program executions using partial orders and finds the next execution using an underlying unfolding semantics. We avoid the exploration of redundant program traces using cutoff events. We implemented our technique as an extension of KLEE and evaluated it on a set of large multi-threaded C programs. Our experiments found several previously undiscovered bugs and undefined behaviors in memcached and GNU sort, showing that the new method is capable of finding bugs in industrial-size benchmarks.Comment: Extended version of a paper presented at CAV'2

    netgwas: An R Package for Network-Based Genome-Wide Association Studies

    Full text link
    Graphical models are powerful tools for modeling and making statistical inferences regarding complex associations among variables in multivariate data. In this paper we introduce the R package netgwas, which is designed based on undirected graphical models to accomplish three important and interrelated goals in genetics: constructing linkage map, reconstructing linkage disequilibrium (LD) networks from multi-loci genotype data, and detecting high-dimensional genotype-phenotype networks. The netgwas package deals with species with any chromosome copy number in a unified way, unlike other software. It implements recent improvements in both linkage map construction (Behrouzi and Wit, 2018), and reconstructing conditional independence network for non-Gaussian continuous data, discrete data, and mixed discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely occur in genetics and genomics such as genotype data, and genotype-phenotype data. We demonstrate the value of our package functionality by applying it to various multivariate example datasets taken from the literature. We show, in particular, that our package allows a more realistic analysis of data, as it adjusts for the effect of all other variables while performing pairwise associations. This feature controls for spurious associations between variables that can arise from classical multiple testing approach. This paper includes a brief overview of the statistical methods which have been implemented in the package. The main body of the paper explains how to use the package. The package uses a parallelization strategy on multi-core processors to speed-up computations for large datasets. In addition, it contains several functions for simulation and visualization. The netgwas package is freely available at https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF fil

    Nonparametric Estimation of Multi-View Latent Variable Models

    Full text link
    Spectral methods have greatly advanced the estimation of latent variable models, generating a sequence of novel and efficient algorithms with strong theoretical guarantees. However, current spectral algorithms are largely restricted to mixtures of discrete or Gaussian distributions. In this paper, we propose a kernel method for learning multi-view latent variable models, allowing each mixture component to be nonparametric. The key idea of the method is to embed the joint distribution of a multi-view latent variable into a reproducing kernel Hilbert space, and then the latent parameters are recovered using a robust tensor power method. We establish that the sample complexity for the proposed method is quadratic in the number of latent components and is a low order polynomial in the other relevant parameters. Thus, our non-parametric tensor approach to learning latent variable models enjoys good sample and computational efficiencies. Moreover, the non-parametric tensor power method compares favorably to EM algorithm and other existing spectral algorithms in our experiments
    • …
    corecore