5,657 research outputs found

    Dirichlet Fragmentation Processes

    Full text link
    Tree structures are ubiquitous in data across many domains, and many datasets are naturally modelled by unobserved tree structures. In this paper, first we review the theory of random fragmentation processes [Bertoin, 2006], and a number of existing methods for modelling trees, including the popular nested Chinese restaurant process (nCRP). Then we define a general class of probability distributions over trees: the Dirichlet fragmentation process (DFP) through a novel combination of the theory of Dirichlet processes and random fragmentation processes. This DFP presents a stick-breaking construction, and relates to the nCRP in the same way the Dirichlet process relates to the Chinese restaurant process. Furthermore, we develop a novel hierarchical mixture model with the DFP, and empirically compare the new model to similar models in machine learning. Experiments show the DFP mixture model to be convincingly better than existing state-of-the-art approaches for hierarchical clustering and density modelling

    Lifted Variable Elimination: A Novel Operator and Completeness Results

    Full text link
    Various methods for lifted probabilistic inference have been proposed, but our understanding of these methods and the relationships between them is still limited, compared to their propositional counterparts. The only existing theoretical characterization of lifting is for weighted first-order model counting (WFOMC), which was shown to be complete domain-lifted for the class of 2-logvar models. This paper makes two contributions to lifted variable elimination (LVE). First, we introduce a novel inference operator called group inversion. Second, we prove that LVE augmented with this operator is complete in the same sense as WFOMC

    Probabilistic Software Modeling

    Full text link
    Software Engineering and the implementation of software has become a challenging task as many tools, frameworks and languages must be orchestrated into one functioning piece. This complexity increases the need for testing and analysis methodologies that aid the developers and engineers as the software grows and evolves. The amount of resources that companies budget for testing and analysis is limited, highlighting the importance of automation for economic software development. We propose Probabilistic Software Modeling, a new paradigm for software modeling that builds on the fact that software is an easy-to-monitor environment from which statistical models can be built. Probabilistic Software Modeling provides increased comprehension for engineers without changing the level of abstraction. The approach relies on the recursive decomposition principle of object-oriented programming to build hierarchies of probabilistic models that are fitted via observations collected at runtime of a software system. This leads to a network of models that mirror the static structure of the software system while modeling its dynamic runtime behavior. The resulting models can be used in applications such as test-case generation, anomaly and outlier detection, probabilistic program simulation, or state predictions. Ideally, probabilistic software modeling allows the use of the entire spectrum of statistical modeling and inference for software, enabling in-depth analysis and generative procedures for software.Comment: 10 pages, 5 figures, accepted at ISSTA and ECOOP Doctoral Symposium 201

    Recurrent Predictive State Policy Networks

    Full text link
    We introduce Recurrent Predictive State Policy (RPSP) networks, a recurrent architecture that brings insights from predictive state representations to reinforcement learning in partially observable environments. Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward. The recursive filter leverages predictive state representations (PSRs) (Rosencrantz and Gordon, 2004; Sun et al., 2016) by modeling predictive state-- a prediction of the distribution of future observations conditioned on history and future actions. This representation gives rise to a rich class of statistically consistent algorithms (Hefny et al., 2018) to initialize the recursive filter. Predictive state serves as an equivalent representation of a belief state. Therefore, the policy component of the RPSP-network can be purely reactive, simplifying training while still allowing optimal behaviour. Moreover, we use the PSR interpretation during training as well, by incorporating prediction error in the loss function. The entire network (recursive filter and reactive policy) is still differentiable and can be trained using gradient based methods. We optimize our policy using a combination of policy gradient based on rewards (Williams, 1992) and gradient descent based on prediction error. We show the efficacy of RPSP-networks under partial observability on a set of robotic control tasks from OpenAI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models, being the overall best performing method

    Conditionally Independent Multiresolution Gaussian Processes

    Full text link
    The multiresolution Gaussian process (GP) has gained increasing attention as a viable approach towards improving the quality of approximations in GPs that scale well to large-scale data. Most of the current constructions assume full independence across resolutions. This assumption simplifies the inference, but it underestimates the uncertainties in transitioning from one resolution to another. This in turn results in models which are prone to overfitting in the sense of excessive sensitivity to the chosen resolution, and predictions which are non-smooth at the boundaries. Our contribution is a new construction which instead assumes conditional independence among GPs across resolutions. We show that relaxing the full independence assumption enables robustness against overfitting, and that it delivers predictions that are smooth at the boundaries. Our new model is compared against current state of the art on 2 synthetic and 9 real-world datasets. In most cases, our new conditionally independent construction performed favorably when compared against models based on the full independence assumption. In particular, it exhibits little to no signs of overfitting

    YGGDRASIL - A Statistical Package for Learning Split Models

    Full text link
    There are two main objectives of this paper. The first is to present a statistical framework for models with context specific independence structures, i.e., conditional independences holding only for sepcific values of the conditioning variables. This framework is constituted by the class of split models. Split models are extension of graphical models for contigency tables and allow for a more sophisticiated modelling than graphical models. The treatment of split models include estimation, representation and a Markov property for reading off those independencies holding in a specific context. The second objective is to present a software package named YGGDRASIL which is designed for statistical inference in split models, i.e., for learning such models on the basis of data.Comment: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000

    Exploiting Uniform Assignments in First-Order MPE

    Full text link
    The MPE (Most Probable Explanation) query plays an important role in probabilistic inference. MPE solution algorithms for probabilistic relational models essentially adapt existing belief assessment method, replacing summation with maximization. But the rich structure and symmetries captured by relational models together with the properties of the maximization operator offer an opportunity for additional simplification with potentially significant computational ramifications. Specifically, these models often have groups of variables that define symmetric distributions over some population of formulas. The maximizing choice for different elements of this group is the same. If we can realize this ahead of time, we can significantly reduce the size of the model by eliminating a potentially significant portion of random variables. This paper defines the notion of uniformly assigned and partially uniformly assigned sets of variables, shows how one can recognize these sets efficiently, and how the model can be greatly simplified once we recognize them, with little computational effort. We demonstrate the effectiveness of these ideas empirically on a number of models.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012

    Statistical Inference for the Population Landscape via Moment Adjusted Stochastic Gradients

    Full text link
    Modern statistical inference tasks often require iterative optimization methods to compute the solution. Convergence analysis from an optimization viewpoint only informs us how well the solution is approximated numerically but overlooks the sampling nature of the data. In contrast, recognizing the randomness in the data, statisticians are keen to provide uncertainty quantification, or confidence, for the solution obtained using iterative optimization methods. This paper makes progress along this direction by introducing the moment-adjusted stochastic gradient descents, a new stochastic optimization method for statistical inference. We establish non-asymptotic theory that characterizes the statistical distribution for certain iterative methods with optimization guarantees. On the statistical front, the theory allows for model mis-specification, with very mild conditions on the data. For optimization, the theory is flexible for both convex and non-convex cases. Remarkably, the moment-adjusting idea motivated from "error standardization" in statistics achieves a similar effect as acceleration in first-order optimization methods used to fit generalized linear models. We also demonstrate this acceleration effect in the non-convex setting through numerical experiments.Comment: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2019, to appea

    Local Conditioning: Exact Message Passing for Cyclic Undirected Distributed Networks

    Full text link
    This paper addresses practical implementation of summing out, expanding, and reordering of messages in Local Conditioning (LC) for undirected networks. In particular, incoming messages conditioned on potentially different subsets of the receiving node's relevant set must be expanded to be conditioned on this relevant set, then reordered so that corresponding columns of the conditioned matrices can be fused through element-wise multiplication. An outgoing message is then reduced by summing out loop cutset nodes that are upstream of the outgoing edge. The emphasis on implementation is the primary contribution over the theoretical justification of LC given in Fay et al. Nevertheless, the complexity of Local Conditioning in grid networks is still no better than that of Clustering.Comment: This work was presented at the Future Technologies Conference (FTC), Vancouver, Canada, November 201

    A Bayesian Model for Generative Transition-based Dependency Parsing

    Full text link
    We propose a simple, scalable, fully generative model for transition-based dependency parsing with high accuracy. The model, parameterized by Hierarchical Pitman-Yor Processes, overcomes the limitations of previous generative models by allowing fast and accurate inference. We propose an efficient decoding algorithm based on particle filtering that can adapt the beam size to the uncertainty in the model while jointly predicting POS tags and parse trees. The UAS of the parser is on par with that of a greedy discriminative baseline. As a language model, it obtains better perplexity than a n-gram model by performing semi-supervised learning over a large unlabelled corpus. We show that the model is able to generate locally and syntactically coherent sentences, opening the door to further applications in language generation.Comment: Depling 201
    • …
    corecore