131 research outputs found
Neural Attention: Enhancing QKV Calculation in Self-Attention Mechanism with Neural Networks
In the realm of deep learning, the self-attention mechanism has substantiated
its pivotal role across a myriad of tasks, encompassing natural language
processing and computer vision. Despite achieving success across diverse
applications, the traditional self-attention mechanism primarily leverages
linear transformations for the computation of query, key, and value (QKV),
which may not invariably be the optimal choice under specific circumstances.
This paper probes into a novel methodology for QKV computation-implementing a
specially-designed neural network structure for the calculation. Utilizing a
modified Marian model, we conducted experiments on the IWSLT 2017
German-English translation task dataset and juxtaposed our method with the
conventional approach. The experimental results unveil a significant
enhancement in BLEU scores with our method. Furthermore, our approach also
manifested superiority when training the Roberta model with the Wikitext-103
dataset, reflecting a notable reduction in model perplexity compared to its
original counterpart. These experimental outcomes not only validate the
efficacy of our method but also reveal the immense potential in optimizing the
self-attention mechanism through neural network-based QKV computation, paving
the way for future research and practical applications. The source code and
implementation details for our proposed method can be accessed at
https://github.com/ocislyjrti/NeuralAttention.Comment: Updated the formulas in Section 3.2 "Detailed Methodology" and
revised Section 2 "Background" for clarity and accurac
Graph Deep Learning: Methods and Applications
The past few years have seen the growing prevalence of deep neural networks on various application domains including image processing, computer vision, speech recognition, machine translation, self-driving cars, game playing, social networks, bioinformatics, and healthcare etc. Due to the broad applications and strong performance, deep learning, a subfield of machine learning and artificial intelligence, is changing everyone\u27s life.Graph learning has been another hot field among the machine learning and data mining communities, which learns knowledge from graph-structured data. Examples of graph learning range from social network analysis such as community detection and link prediction, to relational machine learning such as knowledge graph completion and recommender systems, to mutli-graph tasks such as graph classification and graph generation etc.An emerging new field, graph deep learning, aims at applying deep learning to graphs. To deal with graph-structured data, graph neural networks (GNNs) are invented in recent years which directly take graphs as input and output graph/node representations. Although GNNs have shown superior performance than traditional methods in tasks such as semi-supervised node classification, there still exist a wide range of other important graph learning problems where either GNNs\u27 applicabilities have not been explored or GNNs only have less satisfying performance.In this dissertation, we dive deeper into the field of graph deep learning. By developing new algorithms, architectures and theories, we push graph neural networks\u27 boundaries to a much wider range of graph learning problems. The problems we have explored include: 1) graph classification; 2) medical ontology embedding; 3) link prediction; 4) recommender systems; 5) graph generation; and 6) graph structure optimization.We first focus on two graph representation learning problems: graph classification and medical ontology embedding.For graph classification, we develop a novel deep GNN architecture which aggregates node features through a novel SortPooling layer that replaces the simple summing used in previous works. We demonstrate its state-of-the-art graph classification performance on benchmark datasets. For medical ontology embedding, we propose a novel hierarchical attention propagation model, which uses attention mechanism to learn embeddings of medical concepts from hierarchically-structured medical ontologies such as ICD-9 and CCS. We validate the learned embeddings on sequential procedure/diagnosis prediction tasks with real patient data.Then we investigate GNNs\u27 potential for predicting relations, specifically link prediction and recommender systems. For link prediction, we first develop a theory unifying various traditional link prediction heuristics, and then design a framework to automatically learn suitable heuristics from a given network based on GNNs. Our model shows unprecedented strong link prediction performance, significantly outperforming all traditional methods. For recommender systems, we propose a novel graph-based matrix completion model, which uses a GNN to learn graph structure features from the bipartite graph formed by user and item interactions. Our model not only outperforms various matrix completion baselines, but also demonstrates excellent transfer learning ability -- a model trained on MovieLens can be directly used to predict Douban movie ratings with high performance.Finally, we explore GNNs\u27 applicability to graph generation and graph structure optimization. We focus on a specific type of graphs which usually carry computations on them, namely directed acyclic graphs (DAGs). We develop a variational autoencoder (VAE) for DAGs and prove that it can injectively map computations into a latent space. This injectivity allows us to perform optimization in the continuous latent space instead of the original discrete structure space. We then apply our VAE to two types of DAGs, neural network architectures and Bayesian networks. Experiments show that our model not only generates novel and valid DAGs, but also finds high-quality neural architectures and Bayesian networks through performing Bayesian optimization in its latent space
Graph Neural Network with Local Frame for Molecular Potential Energy Surface
Modeling molecular potential energy surface is of pivotal importance in
science. Graph Neural Networks have shown great success in this field. However,
their message passing schemes need special designs to capture geometric
information and fulfill symmetry requirement like rotation equivariance,
leading to complicated architectures. To avoid these designs, we introduce a
novel local frame method to molecule representation learning and analyze its
expressivity. Projected onto a frame, equivariant features like 3D coordinates
are converted to invariant features, so that we can capture geometric
information with these projections and decouple the symmetry requirement from
GNN design. Theoretically, we prove that given non-degenerate frames, even
ordinary GNNs can encode molecules injectively and reach maximum expressivity
with coordinate projection and frame-frame projection. In experiments, our
model uses a simple ordinary GNN architecture yet achieves state-of-the-art
accuracy. The simpler architecture also leads to higher scalability. Our model
only takes about 30% inference time and 10% GPU memory compared to the most
efficient baselines.Comment: Learning on Graphs (LoG) 202
PyTorch Geometric High Order: A Unified Library for High Order Graph Neural Network
We introduce PyTorch Geometric High Order (PyGHO), a library for High Order
Graph Neural Networks (HOGNNs) that extends PyTorch Geometric (PyG). Unlike
ordinary Message Passing Neural Networks (MPNNs) that exchange messages between
nodes, HOGNNs, encompassing subgraph GNNs and k-WL GNNs, encode node tuples, a
method previously lacking a standardized framework and often requiring complex
coding. PyGHO's main objective is to provide an unified and user-friendly
interface for various HOGNNs. It accomplishes this through streamlined data
structures for node tuples, comprehensive data processing utilities, and a
flexible suite of operators for high-order GNN methodologies. In this work, we
present a detailed in-depth of PyGHO and compare HOGNNs implemented with PyGHO
with their official implementation on real-world tasks. PyGHO achieves up to
acceleration and reduces the code needed for implementation by an order
of magnitude. Our library is available at
\url{https://github.com/GraphPKU/PygHO}
Identifying Patch Correctness in Test-Based Program Repair
Test-based automatic program repair has attracted a lot of attention in
recent years. However, the test suites in practice are often too weak to
guarantee correctness and existing approaches often generate a large number of
incorrect patches.
To reduce the number of incorrect patches generated, we propose a novel
approach that heuristically determines the correctness of the generated
patches. The core idea is to exploit the behavior similarity of test case
executions. The passing tests on original and patched programs are likely to
behave similarly while the failing tests on original and patched programs are
likely to behave differently. Also, if two tests exhibit similar runtime
behavior, the two tests are likely to have the same test results. Based on
these observations, we generate new test inputs to enhance the test suites and
use their behavior similarity to determine patch correctness.
Our approach is evaluated on a dataset consisting of 139 patches generated
from existing program repair systems including jGenProg, Nopol, jKali, ACS and
HDRepair. Our approach successfully prevented 56.3\% of the incorrect patches
to be generated, without blocking any correct patches.Comment: ICSE 201
Detecting Floating-Point Errors via Atomic Conditions
This paper tackles the important, difficult problem of detecting program inputs that trigger large floating-point errors in numerical code. It introduces a novel, principled dynamic analysis that leverages the mathematically rigorously analyzed condition numbers for atomic numerical operations, which we call atomic conditions, to effectively guide the search for large floating-point errors. Compared with existing approaches, our work based on atomic conditions has several distinctive benefits: (1) it does not rely on high-precision implementations to act as approximate oracles, which are difficult to obtain in general and computationally costly; and (2) atomic conditions provide accurate, modular search guidance. These benefits in combination lead to a highly effective approach that detects more significant errors in real-world code (e.g., widely-used numerical library functions) and achieves several orders of speedups over the state-of-the-art, thus making error analysis significantly more practical. We expect the methodology and principles behind our approach to benefit other floating-point program analysis tasks such as debugging, repair and synthesis. To facilitate the reproduction of our work, we have made our implementation, evaluation data and results publicly available on GitHub at https://github.com/FP-Analysis/atomic-condition.ISSN:2475-142
Facilitating Graph Neural Networks with Random Walk on Simplicial Complexes
Node-level random walk has been widely used to improve Graph Neural Networks.
However, there is limited attention to random walk on edge and, more generally,
on -simplices. This paper systematically analyzes how random walk on
different orders of simplicial complexes (SC) facilitates GNNs in their
theoretical expressivity. First, on -simplices or node level, we establish a
connection between existing positional encoding (PE) and structure encoding
(SE) methods through the bridge of random walk. Second, on -simplices or
edge level, we bridge edge-level random walk and Hodge -Laplacians and
design corresponding edge PE respectively. In the spatial domain, we directly
make use of edge level random walk to construct EdgeRWSE. Based on the spectral
analysis of Hodge -Laplcians, we propose Hodge1Lap, a permutation
equivariant and expressive edge-level positional encoding. Third, we generalize
our theory to random walk on higher-order simplices and propose the general
principle to design PE on simplices based on random walk and Hodge Laplacians.
Inter-level random walk is also introduced to unify a wide range of simplicial
networks. Extensive experiments verify the effectiveness of our random
walk-based methods.Comment: Accepted by NeurIPS 202
Rethinking Knowledge Graph Evaluation Under the Open-World Assumption
Most knowledge graphs (KGs) are incomplete, which motivates one important
research topic on automatically complementing knowledge graphs. However,
evaluation of knowledge graph completion (KGC) models often ignores the
incompleteness -- facts in the test set are ranked against all unknown triplets
which may contain a large number of missing facts not included in the KG yet.
Treating all unknown triplets as false is called the closed-world assumption.
This closed-world assumption might negatively affect the fairness and
consistency of the evaluation metrics. In this paper, we study KGC evaluation
under a more realistic setting, namely the open-world assumption, where unknown
triplets are considered to include many missing facts not included in the
training or test sets. For the currently most used metrics such as mean
reciprocal rank (MRR) and Hits@K, we point out that their behavior may be
unexpected under the open-world assumption. Specifically, with not many missing
facts, their numbers show a logarithmic trend with respect to the true strength
of the model, and thus, the metric increase could be insignificant in terms of
reflecting the true model improvement. Further, considering the variance, we
show that the degradation in the reported numbers may result in incorrect
comparisons between different models, where stronger models may have lower
metric numbers. We validate the phenomenon both theoretically and
experimentally. Finally, we suggest possible causes and solutions for this
problem. Our code and data are available at
https://github.com/GraphPKU/Open-World-KG .Comment: Accepted at NeurIPS 202
- …