16 research outputs found
Implicit Graph Neural Diffusion Networks: Convergence, Generalization, and Over-Smoothing
Implicit Graph Neural Networks (GNNs) have achieved significant success in
addressing graph learning problems recently. However, poorly designed implicit
GNN layers may have limited adaptability to learn graph metrics, experience
over-smoothing issues, or exhibit suboptimal convergence and generalization
properties, potentially hindering their practical performance. To tackle these
issues, we introduce a geometric framework for designing implicit graph
diffusion layers based on a parameterized graph Laplacian operator. Our
framework allows learning the metrics of vertex and edge spaces, as well as the
graph diffusion strength from data. We show how implicit GNN layers can be
viewed as the fixed-point equation of a Dirichlet energy minimization problem
and give conditions under which it may suffer from over-smoothing during
training (OST) and inference (OSI). We further propose a new implicit GNN model
to avoid OST and OSI. We establish that with an appropriately chosen
hyperparameter greater than the largest eigenvalue of the parameterized graph
Laplacian, DIGNN guarantees a unique equilibrium, quick convergence, and strong
generalization bounds. Our models demonstrate better performance than most
implicit and explicit GNN baselines on benchmark datasets for both node and
graph classification tasks.Comment: 57 page
SyNDock: N Rigid Protein Docking via Learnable Group Synchronization
The regulation of various cellular processes heavily relies on the protein
complexes within a living cell, necessitating a comprehensive understanding of
their three-dimensional structures to elucidate the underlying mechanisms.
While neural docking techniques have exhibited promising outcomes in binary
protein docking, the application of advanced neural architectures to multimeric
protein docking remains uncertain. This study introduces SyNDock, an automated
framework that swiftly assembles precise multimeric complexes within seconds,
showcasing performance that can potentially surpass or be on par with recent
advanced approaches. SyNDock possesses several appealing advantages not present
in previous approaches. Firstly, SyNDock formulates multimeric protein docking
as a problem of learning global transformations to holistically depict the
placement of chain units of a complex, enabling a learning-centric solution.
Secondly, SyNDock proposes a trainable two-step SE(3) algorithm, involving
initial pairwise transformation and confidence estimation, followed by global
transformation synchronization. This enables effective learning for assembling
the complex in a globally consistent manner. Lastly, extensive experiments
conducted on our proposed benchmark dataset demonstrate that SyNDock
outperforms existing docking software in crucial performance metrics, including
accuracy and runtime. For instance, it achieves a 4.5% improvement in
performance and a remarkable millionfold acceleration in speed
Construction of a cross-species cell landscape at single-cell level.
Individual cells are basic units of life. Despite extensive efforts to characterize the cellular heterogeneity of different organisms, cross-species comparisons of landscape dynamics have not been achieved. Here, we applied single-cell RNA sequencing (scRNA-seq) to map organism-level cell landscapes at multiple life stages for mice, zebrafish and Drosophila. By integrating the comprehensive dataset of > 2.6 million single cells, we constructed a cross-species cell landscape and identified signatures and common pathways that changed throughout the life span. We identified structural inflammation and mitochondrial dysfunction as the most common hallmarks of organism aging, and found that pharmacological activation of mitochondrial metabolism alleviated aging phenotypes in mice. The cross-species cell landscape with other published datasets were stored in an integrated online portal-Cell Landscape. Our work provides a valuable resource for studying lineage development, maturation and aging
What has been Enhanced in my Knowledge-Enhanced Language Model?
A number of knowledge integration (KI) methods have recently been proposed to incorporate external knowledge into pretrained language models (LMs). Even though knowledge-enhanced LMs (KELMs) outperform base LMs on knowledge-intensive tasks, the inner-workings of these KI methods are not well-understood. For instance, it is unclear which knowledge is effectively integrated into KELMs and which is not; and if such integration led to catastrophic forgetting of already learned knowledge. We show that existing model interpretation methods such as linear probes and prompts have some key limitations in answering these questions. Then, we revisit KI from an information-theoretic view and propose a new theoretically sound probe model called Graph Convolution Simulator (GCS) for KI interpretation. GCS is eventually quite simple – it uses graph attention on the corresponding knowledge graph for interpretation.We conduct various experiments to verify that GCS provides reasonable interpretation results for two well-known KELMs: ERNIE and K-Adapter. Our experiments reveal that only little knowledge is successfully integrated in these models, and simply increasing the size of the KI corpus may not lead to better KELMs
Recent Advances in Reliable Deep Graph Learning: Adversarial Attack, Inherent Noise, and Distribution Shift
Deep graph learning (DGL) has achieved remarkable progress in both business
and scientific areas ranging from finance and e-commerce to drug and advanced
material discovery. Despite the progress, applying DGL to real-world
applications faces a series of reliability threats including adversarial
attacks, inherent noise, and distribution shift. This survey aims to provide a
comprehensive review of recent advances for improving the reliability of DGL
algorithms against the above threats. In contrast to prior related surveys
which mainly focus on adversarial attacks and defense, our survey covers more
reliability-related aspects of DGL, i.e., inherent noise and distribution
shift. Additionally, we discuss the relationships among above aspects and
highlight some important issues to be explored in future research