2,117 research outputs found
You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks
Hypergraphs are used to model higher-order interactions amongst agents and
there exist many practically relevant instances of hypergraph datasets. To
enable efficient processing of hypergraph-structured data, several hypergraph
neural network platforms have been proposed for learning hypergraph properties
and structure, with a special focus on node classification. However, almost all
existing methods use heuristic propagation rules and offer suboptimal
performance on many datasets. We propose AllSet, a new hypergraph neural
network paradigm that represents a highly general framework for (hyper)graph
neural networks and for the first time implements hypergraph neural network
layers as compositions of two multiset functions that can be efficiently
learned for each task and each dataset. Furthermore, AllSet draws on new
connections between hypergraph neural networks and recent advances in deep
learning of multiset functions. In particular, the proposed architecture
utilizes Deep Sets and Set Transformer architectures that allow for significant
modeling flexibility and offer high expressive power. To evaluate the
performance of AllSet, we conduct the most extensive experiments to date
involving ten known benchmarking datasets and three newly curated datasets that
represent significant challenges for hypergraph node classification. The
results demonstrate that AllSet has the unique ability to consistently either
match or outperform all other hypergraph neural networks across the tested
datasets. Our implementation and dataset will be released upon acceptance
Saliency-based Sequential Image Attention with Multiset Prediction
Humans process visual scenes selectively and sequentially using attention.
Central to models of human visual attention is the saliency map. We propose a
hierarchical visual architecture that operates on a saliency map and uses a
novel attention mechanism to sequentially focus on salient regions and take
additional glimpses within those regions. The architecture is motivated by
human visual attention, and is used for multi-label image classification on a
novel multiset task, demonstrating that it achieves high precision and recall
while localizing objects with its attention. Unlike conventional multi-label
image classification models, the model supports multiset prediction due to a
reinforcement-learning based training process that allows for arbitrary label
permutation and multiple instances per label.Comment: To appear in Advances in Neural Information Processing Systems 30
(NIPS 2017
How Powerful are Graph Neural Networks?
Graph Neural Networks (GNNs) are an effective framework for representation
learning of graphs. GNNs follow a neighborhood aggregation scheme, where the
representation vector of a node is computed by recursively aggregating and
transforming representation vectors of its neighboring nodes. Many GNN variants
have been proposed and have achieved state-of-the-art results on both node and
graph classification tasks. However, despite GNNs revolutionizing graph
representation learning, there is limited understanding of their
representational properties and limitations. Here, we present a theoretical
framework for analyzing the expressive power of GNNs to capture different graph
structures. Our results characterize the discriminative power of popular GNN
variants, such as Graph Convolutional Networks and GraphSAGE, and show that
they cannot learn to distinguish certain simple graph structures. We then
develop a simple architecture that is provably the most expressive among the
class of GNNs and is as powerful as the Weisfeiler-Lehman graph isomorphism
test. We empirically validate our theoretical findings on a number of graph
classification benchmarks, and demonstrate that our model achieves
state-of-the-art performance
Provably Powerful Graph Networks
Recently, the Weisfeiler-Lehman (WL) graph isomorphism test was used to
measure the expressive power of graph neural networks (GNN). It was shown that
the popular message passing GNN cannot distinguish between graphs that are
indistinguishable by the 1-WL test (Morris et al. 2018; Xu et al. 2019).
Unfortunately, many simple instances of graphs are indistinguishable by the
1-WL test.
In search for more expressive graph learning models we build upon the recent
k-order invariant and equivariant graph neural networks (Maron et al. 2019a,b)
and present two results:
First, we show that such k-order networks can distinguish between
non-isomorphic graphs as good as the k-WL tests, which are provably stronger
than the 1-WL test for k>2. This makes these models strictly stronger than
message passing models. Unfortunately, the higher expressiveness of these
models comes with a computational cost of processing high order tensors.
Second, setting our goal at building a provably stronger, simple and scalable
model we show that a reduced 2-order network containing just scaled identity
operator, augmented with a single quadratic operation (matrix multiplication)
has a provable 3-WL expressive power. Differently put, we suggest a simple
model that interleaves applications of standard Multilayer-Perceptron (MLP)
applied to the feature dimension and matrix multiplication. We validate this
model by presenting state of the art results on popular graph classification
and regression tasks. To the best of our knowledge, this is the first practical
invariant/equivariant model with guaranteed 3-WL expressiveness, strictly
stronger than message passing models
Loss Functions for Multiset Prediction
We study the problem of multiset prediction. The goal of multiset prediction
is to train a predictor that maps an input to a multiset consisting of multiple
items. Unlike existing problems in supervised learning, such as classification,
ranking and sequence generation, there is no known order among items in a
target multiset, and each item in the multiset may appear more than once,
making this problem extremely challenging. In this paper, we propose a novel
multiset loss function by viewing this problem from the perspective of
sequential decision making. The proposed multiset loss function is empirically
evaluated on two families of datasets, one synthetic and the other real, with
varying levels of difficulty, against various baseline loss functions including
reinforcement learning, sequence, and aggregated distribution matching loss
functions. The experiments reveal the effectiveness of the proposed loss
function over the others.Comment: NIPS 201
Dynamic Cell Structure via Recursive-Recurrent Neural Networks
In a recurrent setting, conventional approaches to neural architecture search
find and fix a general model for all data samples and time steps. We propose a
novel algorithm that can dynamically search for the structure of cells in a
recurrent neural network model. Based on a combination of recurrent and
recursive neural networks, our algorithm is able to construct customized cell
structures for each data sample and time step, allowing for a more efficient
architecture search than existing models. Experiments on three common datasets
show that the algorithm discovers high-performance cell architectures and
achieves better prediction accuracy compared to the GRU structure for language
modelling and sentiment analysis
Towards combinatorial clustering: preliminary research survey
The paper describes clustering problems from the combinatorial viewpoint. A
brief systemic survey is presented including the following: (i) basic
clustering problems (e.g., classification, clustering, sorting, clustering with
an order over cluster), (ii) basic approaches to assessment of objects and
object proximities (i.e., scales, comparison, aggregation issues), (iii) basic
approaches to evaluation of local quality characteristics for clusters and
total quality characteristics for clustering solutions, (iv) clustering as
multicriteria optimization problem, (v) generalized modular clustering
framework, (vi) basic clustering models/methods (e.g., hierarchical clustering,
k-means clustering, minimum spanning tree based clustering, clustering as
assignment, detection of clisue/quasi-clique based clustering, correlation
clustering, network communities based clustering), Special attention is
targeted to formulation of clustering as multicriteria optimization models.
Combinatorial optimization models are used as auxiliary problems (e.g.,
assignment, partitioning, knapsack problem, multiple choice problem,
morphological clique problem, searching for consensus/median for structures).
Numerical examples illustrate problem formulations, solving methods, and
applications. The material can be used as follows: (a) a research survey, (b) a
fundamental for designing the structure/architecture of composite modular
clustering software, (c) a bibliography reference collection, and (d) a
tutorial.Comment: 102 pages, 66 figures, 67 table
Principal Neighbourhood Aggregation for Graph Nets
Graph Neural Networks (GNNs) have been shown to be effective models for
different predictive tasks on graph-structured data. Recent work on their
expressive power has focused on isomorphism tasks and countable feature spaces.
We extend this theoretical framework to include continuous features - which
occur regularly in real-world input domains and within the hidden layers of
GNNs - and we demonstrate the requirement for multiple aggregation functions in
this context. Accordingly, we propose Principal Neighbourhood Aggregation
(PNA), a novel architecture combining multiple aggregators with degree-scalers
(which generalize the sum aggregator). Finally, we compare the capacity of
different models to capture and exploit the graph structure via a novel
benchmark containing multiple tasks taken from classical graph theory,
alongside existing benchmarks from real-world domains, all of which demonstrate
the strength of our model. With this work, we hope to steer some of the GNN
research towards new aggregation methods which we believe are essential in the
search for powerful and robust models
BourGAN: Generative Networks with Metric Embeddings
This paper addresses the mode collapse for generative adversarial networks
(GANs). We view modes as a geometric structure of data distribution in a metric
space. Under this geometric lens, we embed subsamples of the dataset from an
arbitrary metric space into the l2 space, while preserving their pairwise
distance distribution. Not only does this metric embedding determine the
dimensionality of the latent space automatically, it also enables us to
construct a mixture of Gaussians to draw latent space random vectors. We use
the Gaussian mixture model in tandem with a simple augmentation of the
objective function to train GANs. Every major step of our method is supported
by theoretical analysis, and our experiments on real and synthetic data confirm
that the generator is able to produce samples spreading over most of the modes
while avoiding unwanted samples, outperforming several recent GAN variants on a
number of metrics and offering new features.Comment: Neural Information Processing Systems, 201
Equivariant Entity-Relationship Networks
The relational model is a ubiquitous representation of big-data, in part due
to its extensive use in databases. In this paper, we propose the Equivariant
Entity-Relationship Network (EERN), which is a Multilayer Perceptron
equivariant to the symmetry transformations of the Entity-Relationship model.
To this end, we identify the most expressive family of linear maps that are
exactly equivariant to entity relationship symmetries, and further show that
they subsume recently introduced equivariant maps for sets, exchangeable
tensors, and graphs. The proposed feed-forward layer has linear complexity in
the data and can be used for both inductive and transductive reasoning about
relational databases, including database embedding, and the prediction of
missing records. This provides a principled theoretical foundation for the
application of deep learning to one of the most abundant forms of data.
Empirically, EERN outperforms different variants of coupled matrix tensor
factorization in both synthetic and real-data experiments
- …