17,088 research outputs found
Visualizing and Understanding Sum-Product Networks
Sum-Product Networks (SPNs) are recently introduced deep tractable
probabilistic models by which several kinds of inference queries can be
answered exactly and in a tractable time. Up to now, they have been largely
used as black box density estimators, assessed only by comparing their
likelihood scores only. In this paper we explore and exploit the inner
representations learned by SPNs. We do this with a threefold aim: first we want
to get a better understanding of the inner workings of SPNs; secondly, we seek
additional ways to evaluate one SPN model and compare it against other
probabilistic models, providing diagnostic tools to practitioners; lastly, we
want to empirically evaluate how good and meaningful the extracted
representations are, as in a classic Representation Learning framework. In
order to do so we revise their interpretation as deep neural networks and we
propose to exploit several visualization techniques on their node activations
and network outputs under different types of inference queries. To investigate
these models as feature extractors, we plug some SPNs, learned in a greedy
unsupervised fashion on image datasets, in supervised classification learning
tasks. We extract several embedding types from node activations by filtering
nodes by their type, by their associated feature abstraction level and by their
scope. In a thorough empirical comparison we prove them to be competitive
against those generated from popular feature extractors as Restricted Boltzmann
Machines. Finally, we investigate embeddings generated from random
probabilistic marginal queries as means to compare other tractable
probabilistic models on a common ground, extending our experiments to Mixtures
of Trees.Comment: Machine Learning Journal paper (First Online), 24 page
Practical recommendations for gradient-based training of deep architectures
Learning algorithms related to artificial neural networks and in particular
for Deep Learning may seem to involve many bells and whistles, called
hyper-parameters. This chapter is meant as a practical guide with
recommendations for some of the most commonly used hyper-parameters, in
particular in the context of learning algorithms based on back-propagated
gradient and gradient-based optimization. It also discusses how to deal with
the fact that more interesting results can be obtained when allowing one to
adjust many hyper-parameters. Overall, it describes elements of the practice
used to successfully and efficiently train and debug large-scale and often deep
multi-layer neural networks. It closes with open questions about the training
difficulties observed with deeper architectures
The Limitations of Optimization from Samples
In this paper we consider the following question: can we optimize objective
functions from the training data we use to learn them? We formalize this
question through a novel framework we call optimization from samples (OPS). In
OPS, we are given sampled values of a function drawn from some distribution and
the objective is to optimize the function under some constraint.
While there are interesting classes of functions that can be optimized from
samples, our main result is an impossibility. We show that there are classes of
functions which are statistically learnable and optimizable, but for which no
reasonable approximation for optimization from samples is achievable. In
particular, our main result shows that there is no constant factor
approximation for maximizing coverage functions under a cardinality constraint
using polynomially-many samples drawn from any distribution.
We also show tight approximation guarantees for maximization under a
cardinality constraint of several interesting classes of functions including
unit-demand, additive, and general monotone submodular functions, as well as a
constant factor approximation for monotone submodular functions with bounded
curvature
- …