9 research outputs found
Implementing graph neural networks with TensorFlow-Keras
Graph neural networks are a versatile machine learning architecture that
received a lot of attention recently. In this technical report, we present an
implementation of convolution and pooling layers for TensorFlow-Keras models,
which allows a seamless and flexible integration into standard Keras layers to
set up graph models in a functional way. This implies the usage of mini-batches
as the first tensor dimension, which can be realized via the new RaggedTensor
class of TensorFlow best suited for graphs. We developed the Keras Graph
Convolutional Neural Network Python package kgcnn based on TensorFlow-Keras
that provides a set of Keras layers for graph networks which focus on a
transparent tensor structure passed between layers and an ease-of-use mindset
Knowledge is Power: Understanding Causality Makes Legal judgment Prediction Models More Generalizable and Robust
Legal judgment Prediction (LJP), aiming to predict a judgment based on fact
descriptions, serves as legal assistance to mitigate the great work burden of
limited legal practitioners. Most existing methods apply various large-scale
pre-trained language models (PLMs) finetuned in LJP tasks to obtain consistent
improvements. However, we discover the fact that the state-of-the-art (SOTA)
model makes judgment predictions according to wrong (or non-casual)
information, which not only weakens the model's generalization capability but
also results in severe social problems like discrimination. Here, we analyze
the causal mechanism misleading the LJP model to learn the spurious
correlations, and then propose a framework to guide the model to learn the
underlying causality knowledge in the legal texts. Specifically, we first
perform open information extraction (OIE) to refine the text having a high
proportion of causal information, according to which we generate a new set of
data. Then, we design a model learning the weights of the refined data and the
raw data for LJP model training. The extensive experimental results show that
our model is more generalizable and robust than the baselines and achieves a
new SOTA performance on two commonly used legal-specific datasets
A Hierarchical N-Gram Framework for Zero-Shot Link Prediction
Due to the incompleteness of knowledge graphs (KGs), zero-shot link
prediction (ZSLP) which aims to predict unobserved relations in KGs has
attracted recent interest from researchers. A common solution is to use textual
features of relations (e.g., surface name or textual descriptions) as auxiliary
information to bridge the gap between seen and unseen relations. Current
approaches learn an embedding for each word token in the text. These methods
lack robustness as they suffer from the out-of-vocabulary (OOV) problem.
Meanwhile, models built on character n-grams have the capability of generating
expressive representations for OOV words. Thus, in this paper, we propose a
Hierarchical N-Gram framework for Zero-Shot Link Prediction (HNZSLP), which
considers the dependencies among character n-grams of the relation surface name
for ZSLP. Our approach works by first constructing a hierarchical n-gram graph
on the surface name to model the organizational structure of n-grams that leads
to the surface name. A GramTransformer, based on the Transformer is then
presented to model the hierarchical n-gram graph to construct the relation
embedding for ZSLP. Experimental results show the proposed HNZSLP achieved
state-of-the-art performance on two ZSLP datasets.Comment: under revie
Investigating Pretrained Language Models for Graph-to-Text Generation
Graph-to-text generation aims to generate fluent texts from graph-based data.
In this paper, we investigate two recently proposed pretrained language models
(PLMs) and analyze the impact of different task-adaptive pretraining strategies
for PLMs in graph-to-text generation. We present a study across three graph
domains: meaning representations, Wikipedia knowledge graphs (KGs) and
scientific KGs. We show that the PLMs BART and T5 achieve new state-of-the-art
results and that task-adaptive pretraining strategies improve their performance
even further. In particular, we report new state-of-the-art BLEU scores of
49.72 on LDC2017T10, 59.70 on WebNLG, and 25.66 on AGENDA datasets - a relative
improvement of 31.8%, 4.5%, and 42.4%, respectively. In an extensive analysis,
we identify possible reasons for the PLMs' success on graph-to-text tasks. We
find evidence that their knowledge about true facts helps them perform well
even when the input graph representation is reduced to a simple bag of node and
edge labels.Comment: Our code and pretrained model checkpoints are available at
https://github.com/UKPLab/plms-graph2tex
Graph Neural Networks for Natural Language Processing: A Survey
Deep learning has become the dominant approach in coping with various tasks
in Natural LanguageProcessing (NLP). Although text inputs are typically
represented as a sequence of tokens, there isa rich variety of NLP problems
that can be best expressed with a graph structure. As a result, thereis a surge
of interests in developing new deep learning techniques on graphs for a large
numberof NLP tasks. In this survey, we present a comprehensive overview onGraph
Neural Networks(GNNs) for Natural Language Processing. We propose a new
taxonomy of GNNs for NLP, whichsystematically organizes existing research of
GNNs for NLP along three axes: graph construction,graph representation
learning, and graph based encoder-decoder models. We further introducea large
number of NLP applications that are exploiting the power of GNNs and summarize
thecorresponding benchmark datasets, evaluation metrics, and open-source codes.
Finally, we discussvarious outstanding challenges for making the full use of
GNNs for NLP as well as future researchdirections. To the best of our
knowledge, this is the first comprehensive overview of Graph NeuralNetworks for
Natural Language Processing.Comment: 127 page
Deep Graph Representation Learning and its Application on Graph Clustering
Graphs like social networks, molecular graphs, and traffic networks are everywhere in the real world. Deep Graph Representation Learning (DGL) is essential for most graph applications, such as Graph Classification, Link Prediction, and Community Detection. DGL has made significant progress in recent years because of the development of Graph Neural Networks (GNNs). However, there are still several crucial challenges that the field faces, including in (semi-)supervised DGL, self-supervised DGL, and DGL-based graph clustering. In this thesis, I proposed three models to address the problems in these three aspects respectively.
GNNs have been widely used in DGL problems. However, GNNs suffer from over- smoothing due to their repeated local aggregation and over-squashing due to the exponential growth in computation paths with increased model depth, which confines their expres- sive power. To solve this problem, a Hierarchical Structure Graph Transformer called HighFormer is proposed to leverage local and relatively global structure information. I use GNNs to learn the initial graph node representation based on the local structure in- formation. At the same time, a structural attention module is used to learn the relatively global structural similarity. Then, the improved attention matrix was obtained by adding the relatively global structure similarity matrix to the traditional attention matrix. Finally, the graph representation was learned by the improved attention matrix.
Graph contrastive learning (GCL) has recently become the most powerful method in self-supervised graph representation learning (SGL), of which graph augmentation is a critical component to generating different views of input graphs. Most existing GCL methods perform stochastic data augmentation schemes, for example, randomly dropping edges or masking node features. However, uniform transformations without carefully designed augmentation techniques may drastically change the underlying semantics of graphs or graph nodes. I argue that the graph augmentation schemes should preserve the intrinsic semantics of graphs. Besides, existing GCL methods neglect the semantic information that may introduce false-negative samples. Therefore, a novel GCL method with semantic invariance graph augmentation termed SemiGCL is proposed by designing a semantic invariance graph augmentation (SemiAug) and a semantic-based graph contrastive (SGC) scheme.
Deep graph clustering (DGC), which aims to divide the graph nodes into different clusters, is challenging for graph analysis. DGC usually consists of an encoding neural network and a clustering method. Although DGC has made remarkable progress with the development of deep learning, I observed two drawbacks to the existing methods:
1) Existing methods usually overlook learning the global structural information in the node encoding process. Consequently, the discriminative capability of representations will be limited. 2) Most existing methods leverage traditional clustering methods such as K- means and spectral clustering. However, these clustering methods can not simultaneously be trained with the DGL methods, leading to sub-optimal clustering performance. To address these issues, I propose a novel self-supervised DGC method termed Structural Semantic Contrastive Deep Graph Clustering (SECRET). To get a more discriminative representation, I design a structure contrastive scheme (SCS) by contrasting the aggregation of first-order neighbors with a graph diffusion. A consistent loss was also proposed to keep the structure of different views consistent. To jointly optimize the DGL and clustering method, I proposed a novel Self-supervised Deep-learning-based Clustering (SDC) model
Graph-based Approaches to Text Generation
Deep Learning advances have enabled more fluent and flexible text generation. However, while these neural generative approaches were initially successful in tasks such as machine translation, they face problems – such as unfaithfulness to the source, repetition and incoherence – when applied to generation tasks where the input is structured data, such as graphs. Generating text from graph-based data, including Abstract Meaning Representation (AMR) or Knowledge Graphs (KG), is a challenging task due to the inherent difficulty of properly encoding the input graph while maintaining its original semantic structure. Previous work requires linearizing the input graph, which makes it complicated to properly capture the graph structure since the linearized representation weakens structural information by diluting the explicit connectivity, particularly when the graph structure is complex.
This thesis makes an attempt to tackle these issues focusing on two major challenges: first, the creation and improvement of neural text generation systems that can better operate when consuming graph-based input data. Second, we examine text-to-text pretrained language models for graph-to-text generation, including multilingual generation, and present possible methods to adapt these models pretrained on natural language to graph-structured data.
In the first part of this thesis, we investigate how to directly exploit graph structures for text generation. We develop novel graph-to-text methods with the capability of incorporating the input graph structure into the learned representations, enhancing the quality of the generated text. For AMR-to-text generation, we present a dual encoder, which incorporates different graph neural network methods, to capture complementary perspectives of the AMR graph. Next, we propose a new KG-to-text framework that learns richer contextualized node embeddings, combining global and local node contexts. We thus introduce a parameter-efficient mechanism for inserting the node connections into the Transformer architecture operating with shortest path lengths between nodes, showing strong performance while using considerably fewer parameters.
The second part of this thesis focuses on pretrained language models for text generation from graph-based input data. We first examine how encoder-decoder text-to-text pretrained language models perform in various graph-to-text tasks and propose different task-adaptive pretraining strategies for improving their downstream performance. We then propose a novel structure-aware adapter method that allows to directly inject the input graph structure into pretrained models, without updating their parameters and reducing their reliance on specific representations of the graph structure. Finally, we investigate multilingual text generation from AMR structures, developing approaches that can operate in languages beyond English