21,934 research outputs found

    Random sum-product networks: A simple and effective approach to probabilistic deep learning

    Get PDF
    Sum-product networks (SPNs) are expressive probabilistic models with a rich set of exact and efficient inference routines. However, in order to guarantee exact inference, they require specific structural constraints, which complicate learning SPNs from data. Thereby, most SPN structure learners proposed so far are tedious to tune, do not scale easily, and are not easily integrated with deep learning frameworks. In this paper, we follow a simple “deep learning” approach, by generating unspecialized random structures, scalable to millions of parameters, and subsequently applying GPU-based optimization. Somewhat surprisingly, our models often perform on par with state-of-the-art SPN structure learners and deep neural networks on a diverse range of generative and discriminative scenarios. At the same time, our models yield well-calibrated uncertainties, and stand out among most deep generative and discriminative models in being robust to missing features and being able to detect anomalies

    Learning Logistic Circuits

    Full text link
    This paper proposes a new classification model called logistic circuits. On MNIST and Fashion datasets, our learning algorithm outperforms neural networks that have an order of magnitude more parameters. Yet, logistic circuits have a distinct origin in symbolic AI, forming a discriminative counterpart to probabilistic-logical circuits such as ACs, SPNs, and PSDDs. We show that parameter learning for logistic circuits is convex optimization, and that a simple local search algorithm can induce strong model structures from data.Comment: Published in the Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI19

    Directly Learning Tractable Models for Sequential Inference and DecisionMaking

    Get PDF
    Probabilistic graphical models such as Bayesian networks and Markov networks provide a general framework to represent multivariate distributions while exploiting conditional independence. Over the years, many approaches have been proposed to learn the structure of those networks. However, even if the resulting network is small, inference may be intractable (e.g., exponential in the size of the network) and practitioners must often resort to approximate inference techniques. Recent work has focused on the development of alternative graphical models such as arithmetic circuits (ACs) and sum-product networks (SPNs) for which inference is guaranteed to be tractable (e.g., linear in the size of the network for SPNs and ACs). This means that the networks learned from data can be directly used for inference without any further approximation. So far, previous work has focused on learning models with only random variables and for a fixed number of variables based on fixed-length data. In this thesis, I present two new probabilistic graphical models: Dynamic Sum-Product Networks (DynamicSPNs) and Decision Sum-Product-Max Networks (DecisionSPMNs), where the former is suitable for problems with sequence data of varying length and the latter is for problems with random, decision, and utility variables. Similar to SPNs and ACs, DynamicSPNs and DecisionSPMNs can be learned directly from data with guaranteed tractable exact inference and decision making in the resulting models. I also present a new online Bayesian discriminative learning algorithm for Selective Sum-Product Networks (SSPNs), which are a special class of SPNs with no latent variables. This new learning algorithm achieves tractability by utilizing a novel idea of mode matching, where the algorithm chooses a tractable distribution that matches the mode of the exact posterior after processing each training instance. This approach lends itself naturally to distributed learning since the data can be divided into subsets based on which partial posteriors are computed by different machines and combined into a single posterior

    Score Function Features for Discriminative Learning: Matrix and Tensor Framework

    Get PDF
    Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and labeled samples for any related task. Our class of features are based on higher-order score functions, which capture local variations in the probability density function of the input. We establish a theoretical framework to characterize the nature of discriminative information that can be extracted from score-function features, when used in conjunction with labeled samples. We employ efficient spectral decomposition algorithms (on matrices and tensors) for extracting discriminative components. The advantage of employing tensor-valued features is that we can extract richer discriminative information in the form of an overcomplete representations. Thus, we present a novel framework for employing generative models of the input for discriminative learning.Comment: 29 page
    • …
    corecore