85 research outputs found
Bilinear Graph Neural Network with Neighbor Interactions
Graph Neural Network (GNN) is a powerful model to learn representations and
make predictions on graph data. Existing efforts on GNN have largely defined
the graph convolution as a weighted sum of the features of the connected nodes
to form the representation of the target node. Nevertheless, the operation of
weighted sum assumes the neighbor nodes are independent of each other, and
ignores the possible interactions between them. When such interactions exist,
such as the co-occurrence of two neighbor nodes is a strong signal of the
target node's characteristics, existing GNN models may fail to capture the
signal. In this work, we argue the importance of modeling the interactions
between neighbor nodes in GNN. We propose a new graph convolution operator,
which augments the weighted sum with pairwise interactions of the
representations of neighbor nodes. We term this framework as Bilinear Graph
Neural Network (BGNN), which improves GNN representation ability with bilinear
interactions between neighbor nodes. In particular, we specify two BGNN models
named BGCN and BGAT, based on the well-known GCN and GAT, respectively.
Empirical results on three public benchmarks of semi-supervised node
classification verify the effectiveness of BGNN -- BGCN (BGAT) outperforms GCN
(GAT) by 1.6% (1.5%) in classification accuracy.Codes are available at:
https://github.com/zhuhm1996/bgnn.Comment: Accepted by IJCAI 2020. SOLE copyright holder is IJCAI (International
Joint Conferences on Artificial Intelligence), all rights reserve
How to Retrain Recommender System? A Sequential Meta-Learning Method
Practical recommender systems need be periodically retrained to refresh the
model with new interaction data. To pursue high model fidelity, it is usually
desirable to retrain the model on both historical and new data, since it can
account for both long-term and short-term user preference. However, a full
model retraining could be very time-consuming and memory-costly, especially
when the scale of historical data is large. In this work, we study the model
retraining mechanism for recommender systems, a topic of high practical values
but has been relatively little explored in the research community.
Our first belief is that retraining the model on historical data is
unnecessary, since the model has been trained on it before. Nevertheless,
normal training on new data only may easily cause overfitting and forgetting
issues, since the new data is of a smaller scale and contains fewer information
on long-term user preference. To address this dilemma, we propose a new
training method, aiming to abandon the historical data during retraining
through learning to transfer the past training experience. Specifically, we
design a neural network-based transfer component, which transforms the old
model to a new model that is tailored for future recommendations. To learn the
transfer component well, we optimize the "future performance" -- i.e., the
recommendation accuracy evaluated in the next time period. Our Sequential
Meta-Learning(SML) method offers a general training paradigm that is applicable
to any differentiable model. We demonstrate SML on matrix factorization and
conduct experiments on two real-world datasets. Empirical results show that SML
not only achieves significant speed-up, but also outperforms the full model
retraining in recommendation accuracy, validating the effectiveness of our
proposals. We release our codes at: https://github.com/zyang1580/SML.Comment: Appear in SIGIR 202
Explainable Sparse Knowledge Graph Completion via High-order Graph Reasoning Network
Knowledge Graphs (KGs) are becoming increasingly essential infrastructures in
many applications while suffering from incompleteness issues. The KG completion
task (KGC) automatically predicts missing facts based on an incomplete KG.
However, existing methods perform unsatisfactorily in real-world scenarios. On
the one hand, their performance will dramatically degrade along with the
increasing sparsity of KGs. On the other hand, the inference procedure for
prediction is an untrustworthy black box.
This paper proposes a novel explainable model for sparse KGC, compositing
high-order reasoning into a graph convolutional network, namely HoGRN. It can
not only improve the generalization ability to mitigate the information
insufficiency issue but also provide interpretability while maintaining the
model's effectiveness and efficiency. There are two main components that are
seamlessly integrated for joint optimization. First, the high-order reasoning
component learns high-quality relation representations by capturing endogenous
correlation among relations. This can reflect logical rules to justify a
broader of missing facts. Second, the entity updating component leverages a
weight-free Graph Convolutional Network (GCN) to efficiently model KG
structures with interpretability. Unlike conventional methods, we conduct
entity aggregation and design composition-based attention in the relational
space without additional parameters. The lightweight design makes HoGRN better
suitable for sparse settings. For evaluation, we have conducted extensive
experiments-the results of HoGRN on several sparse KGs present impressive
improvements (9% MRR gain on average). Further ablation and case studies
demonstrate the effectiveness of the main components. Our codes will be
released upon acceptance.Comment: The manuscript under revie
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
With the maturity of visual detection techniques, we are more ambitious in
describing visual content with open-vocabulary, fine-grained and free-form
language, i.e., the task of image captioning. In particular, we are interested
in generating longer, richer and more fine-grained sentences and paragraphs as
image descriptions. Image captioning can be translated to the task of
sequential language prediction given visual content, where the output sequence
forms natural language description with plausible grammar. However, existing
image captioning methods focus only on language policy while not visual policy,
and thus fail to capture visual context that are crucial for compositional
reasoning such as object relationships (e.g., "man riding horse") and visual
comparisons (e.g., "small(er) cat"). This issue is especially severe when
generating longer sequences such as a paragraph. To fill the gap, we propose a
Context-Aware Visual Policy network (CAVP) for fine-grained image-to-language
generation: image sentence captioning and image paragraph captioning. During
captioning, CAVP explicitly considers the previous visual attentions as
context, and decides whether the context is used for the current word/sentence
generation given the current visual attention. Compared against traditional
visual attention mechanism that only fixes a single visual region at each step,
CAVP can attend to complex visual compositions over time. The whole image
captioning model -- CAVP and its subsequent language policy network -- can be
efficiently optimized end-to-end by using an actor-critic policy gradient
method. We have demonstrated the effectiveness of CAVP by state-of-the-art
performances on MS-COCO and Stanford captioning datasets, using various metrics
and sensible visualizations of qualitative visual context.Comment: Accepted to IEEE Transactions on Pattern Analysis and Machine
Intelligence (T-PAMI). Extended version of "Context-Aware Visual Policy
Network for Sequence-Level Image Captioning", ACM MM 2018 (arXiv:1808.05864
Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation
Machine learning has been successfully applied to improve the efficiency of
Mixed-Integer Linear Programming (MILP) solvers. However, the learning-based
solvers often suffer from severe performance degradation on unseen MILP
instances -- especially on large-scale instances from a perturbed environment
-- due to the limited diversity of training distributions. To tackle this
problem, we propose a novel approach, which is called Adversarial Instance
Augmentation and does not require to know the problem type for new instance
generation, to promote data diversity for learning-based branching modules in
the branch-and-bound (B&B) Solvers (AdaSolver). We use the bipartite graph
representations for MILP instances and obtain various perturbed instances to
regularize the solver by augmenting the graph structures with a learned
augmentation policy. The major technical contribution of AdaSolver is that we
formulate the non-differentiable instance augmentation as a contextual bandit
problem and adversarially train the learning-based solver and augmentation
policy, enabling efficient gradient-based training of the augmentation policy.
To the best of our knowledge, AdaSolver is the first general and effective
framework for understanding and improving the generalization of both
imitation-learning-based (IL-based) and reinforcement-learning-based (RL-based)
B&B solvers. Extensive experiments demonstrate that by producing various
augmented instances, AdaSolver leads to a remarkable efficiency improvement
across various distributions
- …