111,769 research outputs found
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
Block-local learning with probabilistic latent representations
The ubiquitous backpropagation algorithm requires sequential updates across
blocks of a network, introducing a locking problem. Moreover, backpropagation
relies on the transpose of weight matrices to calculate updates, introducing a
weight transport problem across blocks. Both these issues prevent efficient
parallelisation and horizontal scaling of models across devices. We propose a
new method that introduces a twin network that propagates information backwards
from the targets to the input to provide auxiliary local losses. Forward and
backward propagation can work in parallel and with different sets of weights,
addressing the problems of weight transport and locking. Our approach derives
from a statistical interpretation of end-to-end training which treats
activations of network layers as parameters of probability distributions. The
resulting learning framework uses these parameters locally to assess the
matching between forward and backward information. Error backpropagation is
then performed locally within each block, leading to `block-local' learning.
Several previously proposed alternatives to error backpropagation emerge as
special cases of our model. We present results on various tasks and
architectures, including transformers, demonstrating state-of-the-art
performance using block-local learning. These results provide a new principled
framework to train very large networks in a distributed setting and can also be
applied in neuromorphic systems
Learning Credal Sum-Product Networks
Probabilistic representations, such as Bayesian and Markov networks, are
fundamental to much of statistical machine learning. Thus, learning
probabilistic representations directly from data is a deep challenge, the main
computational bottleneck being inference that is intractable. Tractable
learning is a powerful new paradigm that attempts to learn distributions that
support efficient probabilistic querying. By leveraging local structure,
representations such as sum-product networks (SPNs) can capture high tree-width
models with many hidden layers, essentially a deep architecture, while still
admitting a range of probabilistic queries to be computable in time polynomial
in the network size. While the progress is impressive, numerous data sources
are incomplete, and in the presence of missing data, structure learning methods
nonetheless revert to single distributions without characterizing the loss in
confidence. In recent work, credal sum-product networks, an imprecise extension
of sum-product networks, were proposed to capture this robustness angle. In
this work, we are interested in how such representations can be learnt and thus
study how the computational machinery underlying tractable learning and
inference can be generalized for imprecise probabilities.Comment: Accepted to AKBC 202
Learning probabilistic neural representations with randomly connected circuits
The brain represents and reasons probabilistically about complex stimuli and motor actions using a noisy, spike-based neural code. A key building block for such neural computations, as well as the basis for supervised and unsupervised learning, is the ability to estimate the surprise or likelihood of incoming high-dimensional neural activity patterns. Despite progress in statistical modeling of neural responses and deep learning, current approaches either do not scale to large neural populations or cannot be implemented using biologically realistic mechanisms. Inspired by the sparse and random connectivity of real neuronal circuits, we present a model for neural codes that accurately estimates the likelihood of individual spiking patterns and has a straightforward, scalable, efficient, learnable, and realistic neural implementation. This model’s performance on simultaneously recorded spiking activity of >100 neurons in the monkey visual and prefrontal cortices is comparable with or better than that of state-of-the-art models. Importantly, the model can be learned using a small number of samples and using a local learning rule that utilizes noise intrinsic to neural circuits. Slower, structural changes in random connectivity, consistent with rewiring and pruning processes, further improve the efficiency and sparseness of the resulting neural representations. Our results merge insights from neuroanatomy, machine learning, and theoretical neuroscience to suggest random sparse connectivity as a key design principle for neuronal computation
Variational Deep Semantic Hashing for Text Documents
As the amount of textual data has been rapidly increasing over the past
decade, efficient similarity search methods have become a crucial component of
large-scale information retrieval systems. A popular strategy is to represent
original data samples by compact binary codes through hashing. A spectrum of
machine learning methods have been utilized, but they often lack expressiveness
and flexibility in modeling to learn effective representations. The recent
advances of deep learning in a wide range of applications has demonstrated its
capability to learn robust and powerful feature representations for complex
data. Especially, deep generative models naturally combine the expressiveness
of probabilistic generative models with the high capacity of deep neural
networks, which is very suitable for text modeling. However, little work has
leveraged the recent progress in deep learning for text hashing.
In this paper, we propose a series of novel deep document generative models
for text hashing. The first proposed model is unsupervised while the second one
is supervised by utilizing document labels/tags for hashing. The third model
further considers document-specific factors that affect the generation of
words. The probabilistic generative formulation of the proposed models provides
a principled framework for model extension, uncertainty estimation, simulation,
and interpretability. Based on variational inference and reparameterization,
the proposed models can be interpreted as encoder-decoder deep neural networks
and thus they are capable of learning complex nonlinear distributed
representations of the original documents. We conduct a comprehensive set of
experiments on four public testbeds. The experimental results have demonstrated
the effectiveness of the proposed supervised learning models for text hashing.Comment: 11 pages, 4 figure
Visualizing and Understanding Sum-Product Networks
Sum-Product Networks (SPNs) are recently introduced deep tractable
probabilistic models by which several kinds of inference queries can be
answered exactly and in a tractable time. Up to now, they have been largely
used as black box density estimators, assessed only by comparing their
likelihood scores only. In this paper we explore and exploit the inner
representations learned by SPNs. We do this with a threefold aim: first we want
to get a better understanding of the inner workings of SPNs; secondly, we seek
additional ways to evaluate one SPN model and compare it against other
probabilistic models, providing diagnostic tools to practitioners; lastly, we
want to empirically evaluate how good and meaningful the extracted
representations are, as in a classic Representation Learning framework. In
order to do so we revise their interpretation as deep neural networks and we
propose to exploit several visualization techniques on their node activations
and network outputs under different types of inference queries. To investigate
these models as feature extractors, we plug some SPNs, learned in a greedy
unsupervised fashion on image datasets, in supervised classification learning
tasks. We extract several embedding types from node activations by filtering
nodes by their type, by their associated feature abstraction level and by their
scope. In a thorough empirical comparison we prove them to be competitive
against those generated from popular feature extractors as Restricted Boltzmann
Machines. Finally, we investigate embeddings generated from random
probabilistic marginal queries as means to compare other tractable
probabilistic models on a common ground, extending our experiments to Mixtures
of Trees.Comment: Machine Learning Journal paper (First Online), 24 page
- …