9 research outputs found

    Visualizing and Understanding Sum-Product Networks

    Full text link
    Sum-Product Networks (SPNs) are recently introduced deep tractable probabilistic models by which several kinds of inference queries can be answered exactly and in a tractable time. Up to now, they have been largely used as black box density estimators, assessed only by comparing their likelihood scores only. In this paper we explore and exploit the inner representations learned by SPNs. We do this with a threefold aim: first we want to get a better understanding of the inner workings of SPNs; secondly, we seek additional ways to evaluate one SPN model and compare it against other probabilistic models, providing diagnostic tools to practitioners; lastly, we want to empirically evaluate how good and meaningful the extracted representations are, as in a classic Representation Learning framework. In order to do so we revise their interpretation as deep neural networks and we propose to exploit several visualization techniques on their node activations and network outputs under different types of inference queries. To investigate these models as feature extractors, we plug some SPNs, learned in a greedy unsupervised fashion on image datasets, in supervised classification learning tasks. We extract several embedding types from node activations by filtering nodes by their type, by their associated feature abstraction level and by their scope. In a thorough empirical comparison we prove them to be competitive against those generated from popular feature extractors as Restricted Boltzmann Machines. Finally, we investigate embeddings generated from random probabilistic marginal queries as means to compare other tractable probabilistic models on a common ground, extending our experiments to Mixtures of Trees.Comment: Machine Learning Journal paper (First Online), 24 page

    Fairness in Machine Learning with Tractable Models

    Get PDF
    Machine Learning techniques have become pervasive across a range of different applications, and are now widely used in areas as disparate as recidivism prediction, consumer credit-risk analysis and insurance pricing. The prevalence of machine learning techniques has raised concerns about the potential for learned algorithms to become biased against certain groups. Many definitions have been proposed in the literature, but the fundamental task of reasoning about probabilistic events is a challenging one, owing to the intractability of inference. The focus of this paper is taking steps towards the application of tractable models to fairness. Tractable probabilistic models have emerged that guarantee that conditional marginal can be computed in time linear in the size of the model. In particular, we show that sum product networks (SPNs) enable an effective technique for determining the statistical relationships between protected attributes and other training variables. If a subset of these training variables are found by the SPN to be independent of the training attribute then they can be considered `safe' variables, from which we can train a classification model without concern that the resulting classifier will result in disparate outcomes for different demographic groups. Our initial experiments on the `German Credit' data set indicate that this processing technique significantly reduces disparate treatment of male and female credit applicants, with a small reduction in classification accuracy compared to state of the art. We will also motivate the concept of "fairness through percentile equivalence", a new definition predicated on the notion that individuals at the same percentile of their respective distributions should be treated equivalently, and this prevents unfair penalisation of those individuals who lie at the extremities of their respective distributions.Comment: In AAAI Workshop: Statistical Relational Artificial Intelligence (StarAI), 2020. (This is the extended version.

    ๊นŠ์€ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ์ผ์ƒ ํ–‰๋™์— ๋Œ€ํ•œ ํ‰์ƒ ํ•™์Šต: ๋“€์–ผ ๋ฉ”๋ชจ๋ฆฌ ์•„ํ‚คํ…์ณ์™€ ์ ์ง„์  ๋ชจ๋ฉ˜ํŠธ ๋งค์นญ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2018. 8. ์žฅ๋ณ‘ํƒ.Learning from human behaviors in the real world is imperative for building human-aware intelligent systems. We attempt to train a personalized context recognizer continuously in a wearable device by rapidly adapting deep neural networks from sensor data streams of user behaviors. However, training deep neural networks from the data stream is challenging because learning new data through neural networks often results in loss of previously acquired information, referred to as catastrophic forgetting. This catastrophic forgetting problem has been studied for nearly three decades but has not been solved yet because the mechanism of deep learning has been not understood enough. We introduce two methods to deal with the catastrophic forgetting problem in deep neural networks. The first method is motivated by the concept of complementary learning systems (CLS) theory - contending that effective learning of the data stream in a lifetime requires complementary systems that comprise the neocortex and hippocampus in the human brain. We propose a dual memory architecture (DMA), which trains two learning structures: one gradually acquires structured knowledge representations, and the other rapidly learns the specifics of individual experiences. The ability of online learning is achieved by new techniques, such as weight transfer for the new deep module and hypernetworks for fast adaptation. The second method is incremental moment matching (IMM) algorithm. IMM incrementally matches the moment of the posterior distribution of neural networks, which is trained for the previous and the current task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. To provide an insight into the success of two proposed lifelong learning methods, we introduce an insight by introducing two online learning methods of sum-product network, which is a kind of deep probabilistic graphical model. We discuss online learning approaches which are valid in probabilistic models and explain how these approaches can be extended to the lifelong learning algorithms of deep neural networks. We evaluate proposed DMA and IMM on two types of datasets: the various artificial benchmarks devised for evaluating the performance of lifelong learning and the lifelog dataset collected through the Google Glass for 46 days. The experimental results show that our methods outperform comparative models in various experimental settings and that our trials for overcoming catastrophic forgetting are valuable and promising.1 Introduction 1 1.1 Wearable Devices and Lifelog Dataset . . . . . . . . . . . . . . . 1 1.2 Lifelong Learning and Catastrophic Forgetting . . . . . . . . . . 2 1.3 Approach and Contribution . . . . . . . . . . . . . . . . . . . . . 3 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . 6 2 Related Works 8 2.1 Lifelong Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Application-driven Lifelong Learning . . . . . . . . . . . . . . . . 9 2.3 Classical Approach for Preventing Catastrophic Forgetting . . . . 9 2.4 Learning Parameter Distribution for for Preventing Catastrophic Forgetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 Sequential Bayesian . . . . . . . . . . . . . . . . . . . . . 12 2.4.2 Approach to Simulating Parameter Distribution . . . . . 14 2.5 Learning Data Distribution for Preventing Catstrophic Forgetting 15 3 Preliminary Study: Online Learning of Sum-Product Networks 17 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Sum-Product Networks . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Representation of Sum-Product Networks . . . . . . . . . 19 3.2.2 Structure Learning of Sum-Product Networks . . . . . . . 22 3.3 Online Incremental Structure Learning of Sum-Product Networks 23 3.3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Non-Parametric Bayesian Sum-Product Networks . . . . . . . . . 29 3.4.1 Model 1: A Prior Distribution for SPN Trees . . . . . . . 29 3.4.2 Model 2: A Prior Distribution for a Class of dag-SPNs . . 34 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.1 History of Online Learning of Sum-Product Networks . . 38 3.5.2 Toward Lifelong Learning of Deep Neural Networks . . . 38 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4 Structure Learning for Lifelong Learning: Dual Memory Architecture 42 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 Complementary Learning Systems Theory . . . . . . . . . . . . . 44 4.3 Dual Memory Architectures . . . . . . . . . . . . . . . . . . . . . 46 4.4 Online Learning of Multiplicative-Gaussian Hypernetworks . . . 50 4.4.1 Multiplicative-Gaussian Hypernetworks . . . . . . . . . . 50 4.4.2 Evolutionary Structure Learning . . . . . . . . . . . . . . 52 4.4.3 Online Learning on Incremental Features . . . . . . . . . 53 4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5.1 Non-stationary Image Data Stream . . . . . . . . . . . . . 56 4.5.2 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 60 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6.1 Parameter-Decomposability in Deep Learning . . . . . . . 65 4.6.2 Online Bayesian Optimization . . . . . . . . . . . . . . . . 65 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 Sequential Bayesian for Lifelong Learning: Incremental Moment Matching 68 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2 Incremental Moment Matching . . . . . . . . . . . . . . . . . . . 69 5.2.1 Mean-based Incremental Moment Matching (mean-IMM) 70 5.2.2 Mode-based Incremental Moment Matching (mode-IMM) 71 5.3 Transfer Techniques for Incremental Moment Matching . . . . . . 74 5.3.1 Weight-Transfer . . . . . . . . . . . . . . . . . . . . . . . 74 5.3.2 L2-transfer . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.3 Drop-transfer . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.4 IMM Procedure . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4.1 Disjoint MNIST Experiment . . . . . . . . . . . . . . . . 80 5.4.2 Shuffled MNIST Experiment . . . . . . . . . . . . . . . . 83 5.4.3 ImageNet to CUB Dataset . . . . . . . . . . . . . . . . . 85 5.4.4 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 88 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.5.1 A Shift of Optimal Hyperparameter via Space Smoothing 89 5.5.2 Bayesian Approach on lifelong learning. . . . . . . . . . . 90 5.5.3 Balancing the Information of an Old and a New Task. . . 90 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6 Concluding Remarks 92 6.1 Summary of Methods and Contributions . . . . . . . . . . . . . . 92 6.2 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . 93 ์ดˆ๋ก 109Docto
    corecore