9 research outputs found
Visualizing and Understanding Sum-Product Networks
Sum-Product Networks (SPNs) are recently introduced deep tractable
probabilistic models by which several kinds of inference queries can be
answered exactly and in a tractable time. Up to now, they have been largely
used as black box density estimators, assessed only by comparing their
likelihood scores only. In this paper we explore and exploit the inner
representations learned by SPNs. We do this with a threefold aim: first we want
to get a better understanding of the inner workings of SPNs; secondly, we seek
additional ways to evaluate one SPN model and compare it against other
probabilistic models, providing diagnostic tools to practitioners; lastly, we
want to empirically evaluate how good and meaningful the extracted
representations are, as in a classic Representation Learning framework. In
order to do so we revise their interpretation as deep neural networks and we
propose to exploit several visualization techniques on their node activations
and network outputs under different types of inference queries. To investigate
these models as feature extractors, we plug some SPNs, learned in a greedy
unsupervised fashion on image datasets, in supervised classification learning
tasks. We extract several embedding types from node activations by filtering
nodes by their type, by their associated feature abstraction level and by their
scope. In a thorough empirical comparison we prove them to be competitive
against those generated from popular feature extractors as Restricted Boltzmann
Machines. Finally, we investigate embeddings generated from random
probabilistic marginal queries as means to compare other tractable
probabilistic models on a common ground, extending our experiments to Mixtures
of Trees.Comment: Machine Learning Journal paper (First Online), 24 page
Fairness in Machine Learning with Tractable Models
Machine Learning techniques have become pervasive across a range of different
applications, and are now widely used in areas as disparate as recidivism
prediction, consumer credit-risk analysis and insurance pricing. The prevalence
of machine learning techniques has raised concerns about the potential for
learned algorithms to become biased against certain groups. Many definitions
have been proposed in the literature, but the fundamental task of reasoning
about probabilistic events is a challenging one, owing to the intractability of
inference.
The focus of this paper is taking steps towards the application of tractable
models to fairness. Tractable probabilistic models have emerged that guarantee
that conditional marginal can be computed in time linear in the size of the
model. In particular, we show that sum product networks (SPNs) enable an
effective technique for determining the statistical relationships between
protected attributes and other training variables. If a subset of these
training variables are found by the SPN to be independent of the training
attribute then they can be considered `safe' variables, from which we can train
a classification model without concern that the resulting classifier will
result in disparate outcomes for different demographic groups.
Our initial experiments on the `German Credit' data set indicate that this
processing technique significantly reduces disparate treatment of male and
female credit applicants, with a small reduction in classification accuracy
compared to state of the art. We will also motivate the concept of "fairness
through percentile equivalence", a new definition predicated on the notion that
individuals at the same percentile of their respective distributions should be
treated equivalently, and this prevents unfair penalisation of those
individuals who lie at the extremities of their respective distributions.Comment: In AAAI Workshop: Statistical Relational Artificial Intelligence
(StarAI), 2020. (This is the extended version.
๊น์ ์ ๊ฒฝ๋ง ๊ธฐ๋ฐ ์ผ์ ํ๋์ ๋ํ ํ์ ํ์ต: ๋์ผ ๋ฉ๋ชจ๋ฆฌ ์ํคํ ์ณ์ ์ ์ง์ ๋ชจ๋ฉํธ ๋งค์นญ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ปดํจํฐ๊ณตํ๋ถ, 2018. 8. ์ฅ๋ณํ.Learning from human behaviors in the real world is imperative for building human-aware intelligent systems.
We attempt to train a personalized context recognizer continuously in a wearable device by rapidly adapting deep neural networks from sensor data streams of user behaviors.
However, training deep neural networks from the data stream is challenging because learning new data through neural networks often results in loss of previously acquired information, referred to as catastrophic forgetting.
This catastrophic forgetting problem has been studied for nearly three decades but has not been solved yet because the mechanism of deep learning has been not understood enough.
We introduce two methods to deal with the catastrophic forgetting problem in deep neural networks. The first method is motivated by the concept of complementary learning systems (CLS) theory - contending that effective learning of the data stream in a lifetime requires complementary systems that comprise the neocortex and hippocampus in the human brain.
We propose a dual memory architecture (DMA), which trains two learning structures: one gradually acquires structured knowledge representations, and the other rapidly learns the specifics of individual experiences.
The ability of online learning is achieved by new techniques, such as weight transfer for the new deep module and hypernetworks for fast adaptation.
The second method is incremental moment matching (IMM) algorithm.
IMM incrementally matches the moment of the posterior distribution of neural networks, which is trained for the previous and the current task, respectively.
To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter.
To provide an insight into the success of two proposed lifelong learning methods, we introduce an insight by introducing two online learning methods of sum-product network, which is a kind of deep probabilistic graphical model.
We discuss online learning approaches which are valid in probabilistic models and explain how these approaches can be extended to the lifelong learning algorithms of deep neural networks.
We evaluate proposed DMA and IMM on two types of datasets: the various artificial benchmarks devised for evaluating the performance of lifelong learning and the lifelog dataset collected through the Google Glass for 46 days.
The experimental results show that our methods outperform comparative models in various experimental settings and that our trials for overcoming catastrophic forgetting are valuable and promising.1 Introduction 1
1.1 Wearable Devices and Lifelog Dataset . . . . . . . . . . . . . . . 1
1.2 Lifelong Learning and Catastrophic Forgetting . . . . . . . . . . 2
1.3 Approach and Contribution . . . . . . . . . . . . . . . . . . . . . 3
1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . 6
2 Related Works 8
2.1 Lifelong Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Application-driven Lifelong Learning . . . . . . . . . . . . . . . . 9
2.3 Classical Approach for Preventing Catastrophic Forgetting . . . . 9
2.4 Learning Parameter Distribution for for Preventing Catastrophic
Forgetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Sequential Bayesian . . . . . . . . . . . . . . . . . . . . . 12
2.4.2 Approach to Simulating Parameter Distribution . . . . . 14
2.5 Learning Data Distribution for Preventing Catstrophic Forgetting 15
3 Preliminary Study: Online Learning of Sum-Product Networks 17
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Sum-Product Networks . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 Representation of Sum-Product Networks . . . . . . . . . 19
3.2.2 Structure Learning of Sum-Product Networks . . . . . . . 22
3.3 Online Incremental Structure Learning of Sum-Product Networks 23
3.3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Non-Parametric Bayesian Sum-Product Networks . . . . . . . . . 29
3.4.1 Model 1: A Prior Distribution for SPN Trees . . . . . . . 29
3.4.2 Model 2: A Prior Distribution for a Class of dag-SPNs . . 34
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5.1 History of Online Learning of Sum-Product Networks . . 38
3.5.2 Toward Lifelong Learning of Deep Neural Networks . . . 38
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 Structure Learning for Lifelong Learning: Dual Memory Architecture 42
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Complementary Learning Systems Theory . . . . . . . . . . . . . 44
4.3 Dual Memory Architectures . . . . . . . . . . . . . . . . . . . . . 46
4.4 Online Learning of Multiplicative-Gaussian Hypernetworks . . . 50
4.4.1 Multiplicative-Gaussian Hypernetworks . . . . . . . . . . 50
4.4.2 Evolutionary Structure Learning . . . . . . . . . . . . . . 52
4.4.3 Online Learning on Incremental Features . . . . . . . . . 53
4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5.1 Non-stationary Image Data Stream . . . . . . . . . . . . . 56
4.5.2 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 60
4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6.1 Parameter-Decomposability in Deep Learning . . . . . . . 65
4.6.2 Online Bayesian Optimization . . . . . . . . . . . . . . . . 65
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5 Sequential Bayesian for Lifelong Learning: Incremental Moment Matching 68
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Incremental Moment Matching . . . . . . . . . . . . . . . . . . . 69
5.2.1 Mean-based Incremental Moment Matching (mean-IMM) 70
5.2.2 Mode-based Incremental Moment Matching (mode-IMM) 71
5.3 Transfer Techniques for Incremental Moment Matching . . . . . . 74
5.3.1 Weight-Transfer . . . . . . . . . . . . . . . . . . . . . . . 74
5.3.2 L2-transfer . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3.3 Drop-transfer . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3.4 IMM Procedure . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.1 Disjoint MNIST Experiment . . . . . . . . . . . . . . . . 80
5.4.2 Shuffled MNIST Experiment . . . . . . . . . . . . . . . . 83
5.4.3 ImageNet to CUB Dataset . . . . . . . . . . . . . . . . . 85
5.4.4 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 88
5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.5.1 A Shift of Optimal Hyperparameter via Space Smoothing 89
5.5.2 Bayesian Approach on lifelong learning. . . . . . . . . . . 90
5.5.3 Balancing the Information of an Old and a New Task. . . 90
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6 Concluding Remarks 92
6.1 Summary of Methods and Contributions . . . . . . . . . . . . . . 92
6.2 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . 93
์ด๋ก 109Docto