1,449 research outputs found
Relational Graph Attention Networks
We investigate Relational Graph Attention Networks, a class of models that
extends non-relational graph attention mechanisms to incorporate relational
information, opening up these methods to a wider variety of problems. A
thorough evaluation of these models is performed, and comparisons are made
against established benchmarks. To provide a meaningful comparison, we retrain
Relational Graph Convolutional Networks, the spectral counterpart of Relational
Graph Attention Networks, and evaluate them under the same conditions. We find
that Relational Graph Attention Networks perform worse than anticipated,
although some configurations are marginally beneficial for modelling molecular
properties. We provide insights as to why this may be, and suggest both
modifications to evaluation strategies, as well as directions to investigate
for future work.Comment: 10 pages + 8 pages of appendices. Layer implementation available at
https://github.com/Babylonpartners/rgat
Cross-Graph Learning of Multi-Relational Associations
Cross-graph Relational Learning (CGRL) refers to the problem of predicting
the strengths or labels of multi-relational tuples of heterogeneous object
types, through the joint inference over multiple graphs which specify the
internal connections among each type of objects. CGRL is an open challenge in
machine learning due to the daunting number of all possible tuples to deal with
when the numbers of nodes in multiple graphs are large, and because the labeled
training instances are extremely sparse as typical. Existing methods such as
tensor factorization or tensor-kernel machines do not work well because of the
lack of convex formulation for the optimization of CGRL models, the poor
scalability of the algorithms in handling combinatorial numbers of tuples,
and/or the non-transductive nature of the learning methods which limits their
ability to leverage unlabeled data in training. This paper proposes a novel
framework which formulates CGRL as a convex optimization problem, enables
transductive learning using both labeled and unlabeled tuples, and offers a
scalable algorithm that guarantees the optimal solution and enjoys a linear
time complexity with respect to the sizes of input graphs. In our experiments
with a subset of DBLP publication records and an Enzyme multi-source dataset,
the proposed method successfully scaled to the large cross-graph inference
problem, and outperformed other representative approaches significantly
Transductive Classification Methods for Mixed Graphs
In this paper we provide a principled approach to solve a transductive
classification problem involving a similar graph (edges tend to connect nodes
with same labels) and a dissimilar graph (edges tend to connect nodes with
opposing labels). Most of the existing methods, e.g., Information
Regularization (IR), Weighted vote Relational Neighbor classifier (WvRN) etc,
assume that the given graph is only a similar graph. We extend the IR and WvRN
methods to deal with mixed graphs. We evaluate the proposed extensions on
several benchmark datasets as well as two real world datasets and demonstrate
the usefulness of our ideas.Comment: 8 Pages, 2 Tables, 2 Figures, KDD Workshop - MLG'11 San Diego, CA,
US
Edge-labeling Graph Neural Network for Few-shot Learning
In this paper, we propose a novel edge-labeling graph neural network (EGNN),
which adapts a deep neural network on the edge-labeling graph, for few-shot
learning. The previous graph neural network (GNN) approaches in few-shot
learning have been based on the node-labeling framework, which implicitly
models the intra-cluster similarity and the inter-cluster dissimilarity. In
contrast, the proposed EGNN learns to predict the edge-labels rather than the
node-labels on the graph that enables the evolution of an explicit clustering
by iteratively updating the edge-labels with direct exploitation of both
intra-cluster similarity and the inter-cluster dissimilarity. It is also well
suited for performing on various numbers of classes without retraining, and can
be easily extended to perform a transductive inference. The parameters of the
EGNN are learned by episodic training with an edge-labeling loss to obtain a
well-generalizable model for unseen low-data problem. On both of the supervised
and semi-supervised few-shot image classification tasks with two benchmark
datasets, the proposed EGNN significantly improves the performances over the
existing GNNs.Comment: accepted to CVPR 201
Learning to learn via Self-Critique
In few-shot learning, a machine learning system learns from a small set of
labelled examples relating to a specific task, such that it can generalize to
new examples of the same task. Given the limited availability of labelled
examples in such tasks, we wish to make use of all the information we can.
Usually a model learns task-specific information from a small training-set
(support-set) to predict on an unlabelled validation set (target-set). The
target-set contains additional task-specific information which is not utilized
by existing few-shot learning methods. Making use of the target-set examples
via transductive learning requires approaches beyond the current methods; at
inference time, the target-set contains only unlabelled input data-points, and
so discriminative learning cannot be used. In this paper, we propose a
framework called Self-Critique and Adapt or SCA, which learns to learn a
label-free loss function, parameterized as a neural network. A base-model
learns on a support-set using existing methods (e.g. stochastic gradient
descent combined with the cross-entropy loss), and then is updated for the
incoming target-task using the learnt loss function. This label-free loss
function is itself optimized such that the learnt model achieves higher
generalization performance. Experiments demonstrate that SCA offers
substantially reduced error-rates compared to baselines which only adapt on the
support-set, and results in state of the art benchmark performance on
Mini-ImageNet and Caltech-UCSD Birds 200.Comment: Accepted in NeurIPS 201
Graph Based Classification Methods Using Inaccurate External Classifier Information
In this paper we consider the problem of collectively classifying entities
where relational information is available across the entities. In practice
inaccurate class distribution for each entity is often available from another
(external) classifier. For example this distribution could come from a
classifier built using content features or a simple dictionary. Given the
relational and inaccurate external classifier information, we consider two
graph based settings in which the problem of collective classification can be
solved. In the first setting the class distribution is used to fix labels to a
subset of nodes and the labels for the remaining nodes are obtained like in a
transductive setting. In the other setting the class distributions of all nodes
are used to define the fitting function part of a graph regularized objective
function. We define a generalized objective function that handles both the
settings. Methods like harmonic Gaussian field and local-global consistency
(LGC) reported in the literature can be seen as special cases. We extend the
LGC and weighted vote relational neighbor classification (WvRN) methods to
support usage of external classifier information. We also propose an efficient
least squares regularization (LSR) based method and relate it to information
regularization methods. All the methods are evaluated on several benchmark and
real world datasets. Considering together speed, robustness and accuracy,
experimental results indicate that the LSR and WvRN-extension methods perform
better than other methods.Comment: 12 page
Lifted Convex Quadratic Programming
Symmetry is the essential element of lifted inference that has recently
demon- strated the possibility to perform very efficient inference in
highly-connected, but symmetric probabilistic models models. This raises the
question, whether this holds for optimisation problems in general. Here we show
that for a large class of optimisation methods this is actually the case. More
precisely, we introduce the concept of fractional symmetries of convex
quadratic programs (QPs), which lie at the heart of many machine learning
approaches, and exploit it to lift, i.e., to compress QPs. These lifted QPs can
then be tackled with the usual optimization toolbox (off-the-shelf solvers,
cutting plane algorithms, stochastic gradients etc.). If the original QP
exhibits symmetry, then the lifted one will generally be more compact, and
hence their optimization is likely to be more efficient
Information Extraction from Scientific Literature for Method Recommendation
As a research community grows, more and more papers are published each year.
As a result there is increasing demand for improved methods for finding
relevant papers, automatically understanding the key ideas and recommending
potential methods for a target problem. Despite advances in search engines, it
is still hard to identify new technologies according to a researcher's need.
Due to the large variety of domains and extremely limited annotated resources,
there has been relatively little work on leveraging natural language processing
in scientific recommendation. In this proposal, we aim at making scientific
recommendations by extracting scientific terms from a large collection of
scientific papers and organizing the terms into a knowledge graph. In
preliminary work, we trained a scientific term extractor using a small amount
of annotated data and obtained state-of-the-art performance by leveraging large
amount of unannotated papers through applying multiple semi-supervised
approaches. We propose to construct a knowledge graph in a way that can make
minimal use of hand annotated data, using only the extracted terms,
unsupervised relational signals such as co-occurrence, and structural external
resources such as Wikipedia. Latent relations between scientific terms can be
learned from the graph. Recommendations will be made through graph inference
for both observed and unobserved relational pairs.Comment: Thesis Proposal. arXiv admin note: text overlap with arXiv:1708.0607
Graph Attention Auto-Encoders
Auto-encoders have emerged as a successful framework for unsupervised
learning. However, conventional auto-encoders are incapable of utilizing
explicit relations in structured data. To take advantage of relations in
graph-structured data, several graph auto-encoders have recently been proposed,
but they neglect to reconstruct either the graph structure or node attributes.
In this paper, we present the graph attention auto-encoder (GATE), a neural
network architecture for unsupervised representation learning on
graph-structured data. Our architecture is able to reconstruct graph-structured
inputs, including both node attributes and the graph structure, through stacked
encoder/decoder layers equipped with self-attention mechanisms. In the encoder,
by considering node attributes as initial node representations, each layer
generates new representations of nodes by attending over their neighbors'
representations. In the decoder, we attempt to reverse the encoding process to
reconstruct node attributes. Moreover, node representations are regularized to
reconstruct the graph structure. Our proposed architecture does not need to
know the graph structure upfront, and thus it can be applied to inductive
learning. Our experiments demonstrate competitive performance on several node
classification benchmark datasets for transductive and inductive tasks, even
exceeding the performance of supervised learning baselines in most cases
A Simple Exponential Family Framework for Zero-Shot Learning
We present a simple generative framework for learning to predict previously
unseen classes, based on estimating class-attribute-gated class-conditional
distributions. We model each class-conditional distribution as an exponential
family distribution and the parameters of the distribution of each seen/unseen
class are defined as functions of the respective observed class attributes.
These functions can be learned using only the seen class data and can be used
to predict the parameters of the class-conditional distribution of each unseen
class. Unlike most existing methods for zero-shot learning that represent
classes as fixed embeddings in some vector space, our generative model
naturally represents each class as a probability distribution. It is simple to
implement and also allows leveraging additional unlabeled data from unseen
classes to improve the estimates of their class-conditional distributions using
transductive/semi-supervised learning. Moreover, it extends seamlessly to
few-shot learning by easily updating these distributions when provided with a
small number of additional labelled examples from unseen classes. Through a
comprehensive set of experiments on several benchmark data sets, we demonstrate
the efficacy of our framework.Comment: Accepted in ECML-PKDD 2017, 16 Pages: Code and Data are available:
https://github.com/vkverma01/Zero-Shot
- …