947 research outputs found
A Comprehensive Survey on Graph Neural Networks
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications, where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on the existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this article, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial-temporal GNNs. We further discuss the applications of GNNs across various domains and summarize the open-source codes, benchmark data sets, and model evaluation of GNNs. Finally, we propose potential research directions in this rapidly growing field
RS2G: Data-Driven Scene-Graph Extraction and Embedding for Robust Autonomous Perception and Scenario Understanding
Human drivers naturally reason about interactions between road users to
understand and safely navigate through traffic. Thus, developing autonomous
vehicles necessitates the ability to mimic such knowledge and model
interactions between road users to understand and navigate unpredictable,
dynamic environments. However, since real-world scenarios often differ from
training datasets, effectively modeling the behavior of various road users in
an environment remains a significant research challenge. This reality
necessitates models that generalize to a broad range of domains and explicitly
model interactions between road users and the environment to improve scenario
understanding. Graph learning methods address this problem by modeling
interactions using graph representations of scenarios. However, existing
methods cannot effectively transfer knowledge gained from the training domain
to real-world scenarios. This constraint is caused by the domain-specific rules
used for graph extraction that can vary in effectiveness across domains,
limiting generalization ability. To address these limitations, we propose
RoadScene2Graph (RS2G): a data-driven graph extraction and modeling approach
that learns to extract the best graph representation of a road scene for
solving autonomous scene understanding tasks. We show that RS2G enables better
performance at subjective risk assessment than rule-based graph extraction
methods and deep-learning-based models. RS2G also improves generalization and
Sim2Real transfer learning, which denotes the ability to transfer knowledge
gained from simulation datasets to unseen real-world scenarios. We also present
ablation studies showing how RS2G produces a more useful graph representation
for downstream classifiers. Finally, we show how RS2G can identify the relative
importance of rule-based graph edges and enables intelligent graph sparsity
tuning
Structured Sequence Modeling with Graph Convolutional Recurrent Networks
This paper introduces Graph Convolutional Recurrent Network (GCRN), a deep
learning model able to predict structured sequences of data. Precisely, GCRN is
a generalization of classical recurrent neural networks (RNN) to data
structured by an arbitrary graph. Such structured sequences can represent
series of frames in videos, spatio-temporal measurements on a network of
sensors, or random walks on a vocabulary graph for natural language modeling.
The proposed model combines convolutional neural networks (CNN) on graphs to
identify spatial structures and RNN to find dynamic patterns. We study two
possible architectures of GCRN, and apply the models to two practical problems:
predicting moving MNIST data, and modeling natural language with the Penn
Treebank dataset. Experiments show that exploiting simultaneously graph spatial
and dynamic information about data can improve both precision and learning
speed
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification
Audiovisual data is everywhere in this digital age, which raises higher
requirements for the deep learning models developed on them. To well handle the
information of the multi-modal data is the key to a better audiovisual modal.
We observe that these audiovisual data naturally have temporal attributes, such
as the time information for each frame in the video. More concretely, such data
is inherently multi-modal according to both audio and visual cues, which
proceed in a strict chronological order. It indicates that temporal information
is important in multi-modal acoustic event modeling for both intra- and
inter-modal. However, existing methods deal with each modal feature
independently and simply fuse them together, which neglects the mining of
temporal relation and thus leads to sub-optimal performance. With this
motivation, we propose a Temporal Multi-modal graph learning method for
Acoustic event Classification, called TMac, by modeling such temporal
information via graph learning techniques. In particular, we construct a
temporal graph for each acoustic event, dividing its audio data and video data
into multiple segments. Each segment can be considered as a node, and the
temporal relationships between nodes can be considered as timestamps on their
edges. In this case, we can smoothly capture the dynamic information in
intra-modal and inter-modal. Several experiments are conducted to demonstrate
TMac outperforms other SOTA models in performance. Our code is available at
https://github.com/MGitHubL/TMac.Comment: This work has been accepted by ACM MM 2023 for publicatio
- …