209 research outputs found
VCD: Visual Causality Discovery for Cross-Modal Question Reasoning
Existing visual question reasoning methods usually fail to explicitly
discover the inherent causal mechanism and ignore jointly modeling cross-modal
event temporality and causality. In this paper, we propose a visual question
reasoning framework named Cross-Modal Question Reasoning (CMQR), to discover
temporal causal structure and mitigate visual spurious correlation by causal
intervention. To explicitly discover visual causal structure, the Visual
Causality Discovery (VCD) architecture is proposed to find question-critical
scene temporally and disentangle the visual spurious correlations by
attention-based front-door causal intervention module named Local-Global Causal
Attention Module (LGCAM). To align the fine-grained interactions between
linguistic semantics and spatial-temporal representations, we build an
Interactive Visual-Linguistic Transformer (IVLT) that builds the multi-modal
co-occurrence interactions between visual and linguistic content. Extensive
experiments on four datasets demonstrate the superiority of CMQR for
discovering visual causal structures and achieving robust question reasoning.Comment: 12 pages, 6 figures. arXiv admin note: substantial text overlap with
arXiv:2207.1264
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
Existing visual question answering methods tend to capture the cross-modal
spurious correlations and fail to discover the true causal mechanism that
facilitates reasoning truthfully based on the dominant visual evidence and the
question intention. Additionally, the existing methods usually ignore the
cross-modal event-level understanding that requires to jointly model event
temporality, causality, and dynamics. In this work, we focus on event-level
visual question answering from a new perspective, i.e., cross-modal causal
relational reasoning, by introducing causal intervention methods to discover
the true causal structures for visual and linguistic modalities. Specifically,
we propose a novel event-level visual question answering framework named
Cross-Modal Causal RelatIonal Reasoning (CMCIR), to achieve robust
causality-aware visual-linguistic question answering. To discover cross-modal
causal structures, the Causality-aware Visual-Linguistic Reasoning (CVLR)
module is proposed to collaboratively disentangle the visual and linguistic
spurious correlations via front-door and back-door causal interventions. To
model the fine-grained interactions between linguistic semantics and
spatial-temporal representations, we build a Spatial-Temporal Transformer (STT)
that creates multi-modal co-occurrence interactions between visual and
linguistic content. To adaptively fuse the causality-ware visual and linguistic
features, we introduce a Visual-Linguistic Feature Fusion (VLFF) module that
leverages the hierarchical linguistic semantic relations as the guidance to
learn the global semantic-aware visual-linguistic representations adaptively.
Extensive experiments on four event-level datasets demonstrate the superiority
of our CMCIR in discovering visual-linguistic causal structures and achieving
robust event-level visual question answering.Comment: 17 pages, 9 figures. This work has been submitted to the IEEE for
possible publication. Copyright may be transferred without notice, after
which this version may no longer be accessible. The datasets, code and models
are available at https://github.com/YangLiu9208/CMCI
A Survey on Temporal Knowledge Graph Completion: Taxonomy, Progress, and Prospects
Temporal characteristics are prominently evident in a substantial volume of
knowledge, which underscores the pivotal role of Temporal Knowledge Graphs
(TKGs) in both academia and industry. However, TKGs often suffer from
incompleteness for three main reasons: the continuous emergence of new
knowledge, the weakness of the algorithm for extracting structured information
from unstructured data, and the lack of information in the source dataset.
Thus, the task of Temporal Knowledge Graph Completion (TKGC) has attracted
increasing attention, aiming to predict missing items based on the available
information. In this paper, we provide a comprehensive review of TKGC methods
and their details. Specifically, this paper mainly consists of three
components, namely, 1)Background, which covers the preliminaries of TKGC
methods, loss functions required for training, as well as the dataset and
evaluation protocol; 2)Interpolation, that estimates and predicts the missing
elements or set of elements through the relevant available information. It
further categorizes related TKGC methods based on how to process temporal
information; 3)Extrapolation, which typically focuses on continuous TKGs and
predicts future events, and then classifies all extrapolation methods based on
the algorithms they utilize. We further pinpoint the challenges and discuss
future research directions of TKGC
Learning Gaussian Mixture Representations for Tensor Time Series Forecasting
Tensor time series (TTS) data, a generalization of one-dimensional time
series on a high-dimensional space, is ubiquitous in real-world scenarios,
especially in monitoring systems involving multi-source spatio-temporal data
(e.g., transportation demands and air pollutants). Compared to modeling time
series or multivariate time series, which has received much attention and
achieved tremendous progress in recent years, tensor time series has been paid
less effort. Properly coping with the tensor time series is a much more
challenging task, due to its high-dimensional and complex inner structure. In
this paper, we develop a novel TTS forecasting framework, which seeks to
individually model each heterogeneity component implied in the time, the
location, and the source variables. We name this framework as GMRL, short for
Gaussian Mixture Representation Learning. Experiment results on two real-world
TTS datasets verify the superiority of our approach compared with the
state-of-the-art baselines.Comment: 9 pages, 5 figures, published to IJCAI 202
A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision
Deep learning has the potential to revolutionize sports performance, with
applications ranging from perception and comprehension to decision. This paper
presents a comprehensive survey of deep learning in sports performance,
focusing on three main aspects: algorithms, datasets and virtual environments,
and challenges. Firstly, we discuss the hierarchical structure of deep learning
algorithms in sports performance which includes perception, comprehension and
decision while comparing their strengths and weaknesses. Secondly, we list
widely used existing datasets in sports and highlight their characteristics and
limitations. Finally, we summarize current challenges and point out future
trends of deep learning in sports. Our survey provides valuable reference
material for researchers interested in deep learning in sports applications
A Comprehensive Survey on Deep Graph Representation Learning
Graph representation learning aims to effectively encode high-dimensional
sparse graph-structured data into low-dimensional dense vectors, which is a
fundamental task that has been widely studied in a range of fields, including
machine learning and data mining. Classic graph embedding methods follow the
basic idea that the embedding vectors of interconnected nodes in the graph can
still maintain a relatively close distance, thereby preserving the structural
information between the nodes in the graph. However, this is sub-optimal due
to: (i) traditional methods have limited model capacity which limits the
learning performance; (ii) existing techniques typically rely on unsupervised
learning strategies and fail to couple with the latest learning paradigms;
(iii) representation learning and downstream tasks are dependent on each other
which should be jointly enhanced. With the remarkable success of deep learning,
deep graph representation learning has shown great potential and advantages
over shallow (traditional) methods, there exist a large number of deep graph
representation learning techniques have been proposed in the past decade,
especially graph neural networks. In this survey, we conduct a comprehensive
survey on current deep graph representation learning algorithms by proposing a
new taxonomy of existing state-of-the-art literature. Specifically, we
systematically summarize the essential components of graph representation
learning and categorize existing approaches by the ways of graph neural network
architectures and the most recent advanced learning paradigms. Moreover, this
survey also provides the practical and promising applications of deep graph
representation learning. Last but not least, we state new perspectives and
suggest challenging directions which deserve further investigations in the
future
Inductive biases in deep learning models for weather prediction
Deep learning has recently gained immense popularity in the Earth sciences as
it enables us to formulate purely data-driven models of complex Earth system
processes. Deep learning-based weather prediction (DLWP) models have made
significant progress in the last few years, achieving forecast skills
comparable to established numerical weather prediction (NWP) models with
comparatively lesser computational costs. In order to train accurate, reliable,
and tractable DLWP models with several millions of parameters, the model design
needs to incorporate suitable inductive biases that encode structural
assumptions about the data and modelled processes. When chosen appropriately,
these biases enable faster learning and better generalisation to unseen data.
Although inductive biases play a crucial role in successful DLWP models, they
are often not stated explicitly and how they contribute to model performance
remains unclear. Here, we review and analyse the inductive biases of six
state-of-the-art DLWP models, involving a deeper look at five key design
elements: input data, forecasting objective, loss components, layered design of
the deep learning architectures, and optimisation methods. We show how the
design choices made in each of the five design elements relate to structural
assumptions. Given recent developments in the broader DL community, we
anticipate that the future of DLWP will likely see a wider use of foundation
models -- large models pre-trained on big databases with self-supervised
learning -- combined with explicit physics-informed inductive biases that allow
the models to provide competitive forecasts even at the more challenging
subseasonal-to-seasonal scales
Graph Information Bottleneck for Remote Sensing Segmentation
Remote sensing segmentation has a wide range of applications in environmental
protection, and urban change detection, etc. Despite the success of deep
learning-based remote sensing segmentation methods (e.g., CNN and Transformer),
they are not flexible enough to model irregular objects. In addition, existing
graph contrastive learning methods usually adopt the way of maximizing mutual
information to keep the node representations consistent between different graph
views, which may cause the model to learn task-independent redundant
information. To tackle the above problems, this paper treats images as graph
structures and introduces a simple contrastive vision GNN (SC-ViG) architecture
for remote sensing segmentation. Specifically, we construct a node-masked and
edge-masked graph view to obtain an optimal graph structure representation,
which can adaptively learn whether to mask nodes and edges. Furthermore, this
paper innovatively introduces information bottleneck theory into graph
contrastive learning to maximize task-related information while minimizing
task-independent redundant information. Finally, we replace the convolutional
module in UNet with the SC-ViG module to complete the segmentation and
classification tasks of remote sensing images. Extensive experiments on
publicly available real datasets demonstrate that our method outperforms
state-of-the-art remote sensing image segmentation methods.Comment: 13 pages, 6 figure
Traffic Prediction using Artificial Intelligence: Review of Recent Advances and Emerging Opportunities
Traffic prediction plays a crucial role in alleviating traffic congestion
which represents a critical problem globally, resulting in negative
consequences such as lost hours of additional travel time and increased fuel
consumption. Integrating emerging technologies into transportation systems
provides opportunities for improving traffic prediction significantly and
brings about new research problems. In order to lay the foundation for
understanding the open research challenges in traffic prediction, this survey
aims to provide a comprehensive overview of traffic prediction methodologies.
Specifically, we focus on the recent advances and emerging research
opportunities in Artificial Intelligence (AI)-based traffic prediction methods,
due to their recent success and potential in traffic prediction, with an
emphasis on multivariate traffic time series modeling. We first provide a list
and explanation of the various data types and resources used in the literature.
Next, the essential data preprocessing methods within the traffic prediction
context are categorized, and the prediction methods and applications are
subsequently summarized. Lastly, we present primary research challenges in
traffic prediction and discuss some directions for future research.Comment: Published in Transportation Research Part C: Emerging Technologies
(TR_C), Volume 145, 202
- …