19,940 research outputs found
Graph Transformer for Graph-to-Sequence Learning
The dominant graph-to-sequence transduction models employ graph neural
networks for graph representation learning, where the structural information is
reflected by the receptive field of neurons. Unlike graph neural networks that
restrict the information exchange between immediate neighborhood, we propose a
new model, known as Graph Transformer, that uses explicit relation encoding and
allows direct communication between two distant nodes. It provides a more
efficient way for global graph structure modeling. Experiments on the
applications of text generation from Abstract Meaning Representation (AMR) and
syntax-based neural machine translation show the superiority of our proposed
model. Specifically, our model achieves 27.4 BLEU on LDC2015E86 and 29.7 BLEU
on LDC2017T10 for AMR-to-text generation, outperforming the state-of-the-art
results by up to 2.2 points. On the syntax-based translation tasks, our model
establishes new single-model state-of-the-art BLEU scores, 21.3 for
English-to-German and 14.1 for English-to-Czech, improving over the existing
best results, including ensembles, by over 1 BLEU.Comment: accepted by AAAI202
TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification
Person re-identification (re-ID) via 3D skeleton data is an emerging topic
with prominent advantages. Existing methods usually design skeleton descriptors
with raw body joints or perform skeleton sequence representation learning.
However, they typically cannot concurrently model different body-component
relations, and rarely explore useful semantics from fine-grained
representations of body joints. In this paper, we propose a generic
Transformer-based Skeleton Graph prototype contrastive learning (TranSG)
approach with structure-trajectory prompted reconstruction to fully capture
skeletal relations and valuable spatial-temporal semantics from skeleton graphs
for person re-ID. Specifically, we first devise the Skeleton Graph Transformer
(SGT) to simultaneously learn body and motion relations within skeleton graphs,
so as to aggregate key correlative node features into graph representations.
Then, we propose the Graph Prototype Contrastive learning (GPC) to mine the
most typical graph features (graph prototypes) of each identity, and contrast
the inherent similarity between graph representations and different prototypes
from both skeleton and sequence levels to learn discriminative graph
representations. Last, a graph Structure-Trajectory Prompted Reconstruction
(STPR) mechanism is proposed to exploit the spatial and temporal contexts of
graph nodes to prompt skeleton graph reconstruction, which facilitates
capturing more valuable patterns and graph semantics for person re-ID.
Empirical evaluations demonstrate that TranSG significantly outperforms
existing state-of-the-art methods. We further show its generality under
different graph modeling, RGB-estimated skeletons, and unsupervised scenarios.Comment: Accepted by CVPR 2023. Codes are available at
https://github.com/Kali-Hac/TranSG. Supplemental material is included in the
conference proceeding
NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation Learning
As more deep learning models are being applied in real-world applications,
there is a growing need for modeling and learning the representations of neural
networks themselves. An efficient representation can be used to predict target
attributes of networks without the need for actual training and deployment
procedures, facilitating efficient network deployment and design. Recently,
inspired by the success of Transformer, some Transformer-based representation
learning frameworks have been proposed and achieved promising performance in
handling cell-structured models. However, graph neural network (GNN) based
approaches still dominate the field of learning representation for the entire
network. In this paper, we revisit Transformer and compare it with GNN to
analyse their different architecture characteristics. We then propose a
modified Transformer-based universal neural network representation learning
model NAR-Former V2. It can learn efficient representations from both
cell-structured networks and entire networks. Specifically, we first take the
network as a graph and design a straightforward tokenizer to encode the network
into a sequence. Then, we incorporate the inductive representation learning
capability of GNN into Transformer, enabling Transformer to generalize better
when encountering unseen architecture. Additionally, we introduce a series of
simple yet effective modifications to enhance the ability of the Transformer in
learning representation from graph structures. Our proposed method surpasses
the GNN-based method NNLP by a significant margin in latency estimation on the
NNLQP dataset. Furthermore, regarding accuracy prediction on the NASBench101
and NASBench201 datasets, our method achieves highly comparable performance to
other state-of-the-art methods.Comment: 9 pages, 2 figures, 6 tables. Code is available at
https://github.com/yuny220/NAR-Former-V
The prediction of the quality of results in Logic Synthesis using Transformer and Graph Neural Networks
In the logic synthesis stage, structure transformations in the synthesis tool
need to be combined into optimization sequences and act on the circuit to meet
the specified circuit area and delay. However, logic synthesis optimization
sequences are time-consuming to run, and predicting the quality of the results
(QoR) against the synthesis optimization sequence for a circuit can help
engineers find a better optimization sequence faster. In this work, we propose
a deep learning method to predict the QoR of unseen circuit-optimization
sequences pairs. Specifically, the structure transformations are translated
into vectors by embedding methods and advanced natural language processing
(NLP) technology (Transformer) is used to extract the features of the
optimization sequences. In addition, to enable the prediction process of the
model to be generalized from circuit to circuit, the graph representation of
the circuit is represented as an adjacency matrix and a feature matrix. Graph
neural networks(GNN) are used to extract the structural features of the
circuits. For this problem, the Transformer and three typical GNNs are used.
Furthermore, the Transformer and GNNs are adopted as a joint learning policy
for the QoR prediction of the unseen circuit-optimization sequences. The
methods resulting from the combination of Transformer and GNNs are benchmarked.
The experimental results show that the joint learning of Transformer and
GraphSage gives the best results. The Mean Absolute Error (MAE) of the
predicted result is 0.412
UniMAP: Universal SMILES-Graph Representation Learning
Molecular representation learning is fundamental for many drug related
applications. Most existing molecular pre-training models are limited in using
single molecular modality, either SMILES or graph representation. To
effectively leverage both modalities, we argue that it is critical to capture
the fine-grained 'semantics' between SMILES and graph, because subtle
sequence/graph differences may lead to contrary molecular properties. In this
paper, we propose a universal SMILE-graph representation learning model, namely
UniMAP. Firstly, an embedding layer is employed to obtain the token and
node/edge representation in SMILES and graph, respectively. A multi-layer
Transformer is then utilized to conduct deep cross-modality fusion. Specially,
four kinds of pre-training tasks are designed for UniMAP, including Multi-Level
Cross-Modality Masking (CMM), SMILES-Graph Matching (SGM), Fragment-Level
Alignment (FLA), and Domain Knowledge Learning (DKL). In this way, both global
(i.e. SGM and DKL) and local (i.e. CMM and FLA) alignments are integrated to
achieve comprehensive cross-modality fusion. We evaluate UniMAP on various
downstream tasks, i.e. molecular property prediction, drug-target affinity
prediction and drug-drug interaction. Experimental results show that UniMAP
outperforms current state-of-the-art pre-training methods.We also visualize the
learned representations to demonstrate the effect of multi-modality
integration
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers
Given an input video, its associated audio, and a brief caption, the
audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a
question-answer dialog with a human about the audio-visual content. This task
thus poses a challenging multi-modal representation learning and reasoning
scenario, advancements into which could influence several human-machine
interaction applications. To solve this task, we introduce a
semantics-controlled multi-modal shuffled Transformer reasoning framework,
consisting of a sequence of Transformer modules, each taking a modality as
input and producing representations conditioned on the input question. Our
proposed Transformer variant uses a shuffling scheme on their multi-head
outputs, demonstrating better regularization. To encode fine-grained visual
information, we present a novel dynamic scene graph representation learning
pipeline that consists of an intra-frame reasoning layer producing
spatio-semantic graph representations for every frame, and an inter-frame
aggregation module capturing temporal cues. Our entire pipeline is trained
end-to-end. We present experiments on the benchmark AVSD dataset, both on
answer generation and selection tasks. Our results demonstrate state-of-the-art
performances on all evaluation metrics.Comment: Accepted at AAAI 202
- …