42,649 research outputs found
GrAMME: Semi-Supervised Learning using Multi-layered Graph Attention Models
Modern data analysis pipelines are becoming increasingly complex due to the
presence of multi-view information sources. While graphs are effective in
modeling complex relationships, in many scenarios a single graph is rarely
sufficient to succinctly represent all interactions, and hence multi-layered
graphs have become popular. Though this leads to richer representations,
extending solutions from the single-graph case is not straightforward.
Consequently, there is a strong need for novel solutions to solve classical
problems, such as node classification, in the multi-layered case. In this
paper, we consider the problem of semi-supervised learning with multi-layered
graphs. Though deep network embeddings, e.g. DeepWalk, are widely adopted for
community discovery, we argue that feature learning with random node
attributes, using graph neural networks, can be more effective. To this end, we
propose to use attention models for effective feature learning, and develop two
novel architectures, GrAMME-SG and GrAMME-Fusion, that exploit the inter-layer
dependencies for building multi-layered graph embeddings. Using empirical
studies on several benchmark datasets, we evaluate the proposed approaches and
demonstrate significant performance improvements in comparison to
state-of-the-art network embedding strategies. The results also show that using
simple random features is an effective choice, even in cases where explicit
node attributes are not available
Multi-Layered Gradient Boosting Decision Trees
Multi-layered representation is believed to be the key ingredient of deep
neural networks especially in cognitive tasks like computer vision. While
non-differentiable models such as gradient boosting decision trees (GBDTs) are
the dominant methods for modeling discrete or tabular data, they are hard to
incorporate with such representation learning ability. In this work, we propose
the multi-layered GBDT forest (mGBDTs), with an explicit emphasis on exploring
the ability to learn hierarchical representations by stacking several layers of
regression GBDTs as its building block. The model can be jointly trained by a
variant of target propagation across layers, without the need to derive
back-propagation nor differentiability. Experiments and visualizations
confirmed the effectiveness of the model in terms of performance and
representation learning ability
Nested LSTMs
We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple
levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to
stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell,
which has its own inner memory cell. Specifically, instead of computing the
value of the (outer) memory cell as , NLSTM memory cells use the concatenation as input to an inner LSTM (or NLSTM) memory cell, and set
= . Nested LSTMs outperform both stacked and
single-layer LSTMs with similar numbers of parameters in our experiments on
various character-level language modeling tasks, and the inner memories of an
LSTM learn longer term dependencies compared with the higher-level units of a
stacked LSTM.Comment: Accepted at ACML 201
Modular Representation of Layered Neural Networks
Layered neural networks have greatly improved the performance of various
applications including image processing, speech recognition, natural language
processing, and bioinformatics. However, it is still difficult to discover or
interpret knowledge from the inference provided by a layered neural network,
since its internal representation has many nonlinear and complex parameters
embedded in hierarchical layers. Therefore, it becomes important to establish a
new methodology by which layered neural networks can be understood.
In this paper, we propose a new method for extracting a global and simplified
structure from a layered neural network. Based on network analysis, the
proposed method detects communities or clusters of units with similar
connection patterns. We show its effectiveness by applying it to three use
cases. (1) Network decomposition: it can decompose a trained neural network
into multiple small independent networks thus dividing the problem and reducing
the computation time. (2) Training assessment: the appropriateness of a trained
result with a given hyperparameter or randomly chosen initial parameters can be
evaluated by using a modularity index. And (3) data analysis: in practical data
it reveals the community structure in the input, hidden, and output layers,
which serves as a clue for discovering knowledge from a trained neural network
Probabilistic Discriminative Learning with Layered Graphical Models
Probabilistic graphical models are traditionally known for their successes in
generative modeling. In this work, we advocate layered graphical models (LGMs)
for probabilistic discriminative learning. To this end, we design LGMs in close
analogy to neural networks (NNs), that is, they have deep hierarchical
structures and convolutional or local connections between layers. Equipped with
tensorized truncated variational inference, our LGMs can be efficiently trained
via backpropagation on mainstream deep learning frameworks such as PyTorch. To
deal with continuous valued inputs, we use a simple yet effective soft-clamping
strategy for efficient inference. Through extensive experiments on image
classification over MNIST and FashionMNIST datasets, we demonstrate that LGMs
are capable of achieving competitive results comparable to NNs of similar
architectures, while preserving transparent probabilistic modeling
Deep Echo State Network (DeepESN): A Brief Survey
The study of deep recurrent neural networks (RNNs) and, in particular, of
deep Reservoir Computing (RC) is gaining an increasing research attention in
the neural networks community. The recently introduced Deep Echo State Network
(DeepESN) model opened the way to an extremely efficient approach for designing
deep neural networks for temporal data. At the same time, the study of DeepESNs
allowed to shed light on the intrinsic properties of state dynamics developed
by hierarchical compositions of recurrent layers, i.e. on the bias of depth in
RNNs architectural design. In this paper, we summarize the advancements in the
development, analysis and applications of DeepESNs
Convolutional Neural Networks Analyzed via Convolutional Sparse Coding
Convolutional neural networks (CNN) have led to many state-of-the-art results
spanning through various fields. However, a clear and profound theoretical
understanding of the forward pass, the core algorithm of CNN, is still lacking.
In parallel, within the wide field of sparse approximation, Convolutional
Sparse Coding (CSC) has gained increasing attention in recent years. A
theoretical study of this model was recently conducted, establishing it as a
reliable and stable alternative to the commonly practiced patch-based
processing. Herein, we propose a novel multi-layer model, ML-CSC, in which
signals are assumed to emerge from a cascade of CSC layers. This is shown to be
tightly connected to CNN, so much so that the forward pass of the CNN is in
fact the thresholding pursuit serving the ML-CSC model. This connection brings
a fresh view to CNN, as we are able to attribute to this architecture
theoretical claims such as uniqueness of the representations throughout the
network, and their stable estimation, all guaranteed under simple local
sparsity conditions. Lastly, identifying the weaknesses in the above pursuit
scheme, we propose an alternative to the forward pass, which is connected to
deconvolutional, recurrent and residual networks, and has better theoretical
guarantees
Knowledge Discovery from Layered Neural Networks based on Non-negative Task Decomposition
Interpretability has become an important issue in the machine learning field,
along with the success of layered neural networks in various practical tasks.
Since a trained layered neural network consists of a complex nonlinear
relationship between large number of parameters, we failed to understand how
they could achieve input-output mappings with a given data set. In this paper,
we propose the non-negative task decomposition method, which applies
non-negative matrix factorization to a trained layered neural network. This
enables us to decompose the inference mechanism of a trained layered neural
network into multiple principal tasks of input-output mapping, and reveal the
roles of hidden units in terms of their contribution to each principal task
Limiting Network Size within Finite Bounds for Optimization
Largest theoretical contribution to Neural Networks comes from VC Dimension
which characterizes the sample complexity of classification model in a
probabilistic view and are widely used to study the generalization error. So
far in the literature the VC Dimension has only been used to approximate the
generalization error bounds on different Neural Network architectures. VC
Dimension has not yet been implicitly or explicitly stated to fix the network
size which is important as the wrong configuration could lead to high
computation effort in training and leads to over fitting. So there is a need to
bound these units so that task can be computed with only sufficient number of
parameters. For binary classification tasks shallow networks are used as they
have universal approximation property and it is enough to size the hidden layer
width for such networks. The paper brings out a theoretical justification on
required attribute size and its corresponding hidden layer dimension for a
given sample set that gives an optimal binary classification results with
minimum training complexity in a single layered feed forward network framework.
The paper also establishes proof on the existence of bounds on the width of the
hidden layer and its range subjected to certain conditions. Findings in this
paper are experimentally analyzed on three different dataset using Mathlab 2018
(b) software
Soft-Deep Boltzmann Machines
We present a layered Boltzmann machine (BM) that can better exploit the
advantages of a distributed representation. It is widely believed that deep BMs
(DBMs) have far greater representational power than its shallow counterpart,
restricted Boltzmann machines (RBMs). However, this expectation on the
supremacy of DBMs over RBMs has not ever been validated in a theoretical
fashion. In this paper, we provide both theoretical and empirical evidences
that the representational power of DBMs can be actually rather limited in
taking advantages of distributed representations. We propose an approximate
measure for the representational power of a BM regarding to the efficiency of a
distributed representation. With this measure, we show a surprising fact that
DBMs can make inefficient use of distributed representations. Based on these
observations, we propose an alternative BM architecture, which we dub soft-deep
BMs (sDBMs). We show that sDBMs can more efficiently exploit the distributed
representations in terms of the measure. Experiments demonstrate that sDBMs
outperform several state-of-the-art models, including DBMs, in generative tasks
on binarized MNIST and Caltech-101 silhouettes.Comment: Major revision after bug fixe
- β¦