76 research outputs found
Deep Adaptive Feature Embedding with Local Sample Distributions for Person Re-identification
Person re-identification (re-id) aims to match pedestrians observed by
disjoint camera views. It attracts increasing attention in computer vision due
to its importance to surveillance system. To combat the major challenge of
cross-view visual variations, deep embedding approaches are proposed by
learning a compact feature space from images such that the Euclidean distances
correspond to their cross-view similarity metric. However, the global Euclidean
distance cannot faithfully characterize the ideal similarity in a complex
visual feature space because features of pedestrian images exhibit unknown
distributions due to large variations in poses, illumination and occlusion.
Moreover, intra-personal training samples within a local range are robust to
guide deep embedding against uncontrolled variations, which however, cannot be
captured by a global Euclidean distance. In this paper, we study the problem of
person re-id by proposing a novel sampling to mine suitable \textit{positives}
(i.e. intra-class) within a local range to improve the deep embedding in the
context of large intra-class variations. Our method is capable of learning a
deep similarity metric adaptive to local sample structure by minimizing each
sample's local distances while propagating through the relationship between
samples to attain the whole intra-class minimization. To this end, a novel
objective function is proposed to jointly optimize similarity metric learning,
local positive mining and robust deep embedding. This yields local
discriminations by selecting local-ranged positive samples, and the learned
features are robust to dramatic intra-class variations. Experiments on
benchmarks show state-of-the-art results achieved by our method.Comment: Published on Pattern Recognitio
What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification
Matching pedestrians across disjoint camera views, known as person
re-identification (re-id), is a challenging problem that is of importance to
visual recognition and surveillance. Most existing methods exploit local
regions within spatial manipulation to perform matching in local
correspondence. However, they essentially extract \emph{fixed} representations
from pre-divided regions for each image and perform matching based on the
extracted representation subsequently. For models in this pipeline, local finer
patterns that are crucial to distinguish positive pairs from negative ones
cannot be captured, and thus making them underperformed. In this paper, we
propose a novel deep multiplicative integration gating function, which answers
the question of \emph{what-and-where to match} for effective person re-id. To
address \emph{what} to match, our deep network emphasizes common local patterns
by learning joint representations in a multiplicative way. The network
comprises two Convolutional Neural Networks (CNNs) to extract convolutional
activations, and generates relevant descriptors for pedestrian matching. This
thus, leads to flexible representations for pair-wise images. To address
\emph{where} to match, we combat the spatial misalignment by performing
spatially recurrent pooling via a four-directional recurrent neural network to
impose spatial dependency over all positions with respect to the entire image.
The proposed network is designed to be end-to-end trainable to characterize
local pairwise feature interactions in a spatially aligned manner. To
demonstrate the superiority of our method, extensive experiments are conducted
over three benchmark data sets: VIPeR, CUHK03 and Market-1501.Comment: Published at Pattern Recognition, Elsevie
From Continuous Dynamics to Graph Neural Networks: Neural Diffusion and Beyond
Graph neural networks (GNNs) have demonstrated significant promise in
modelling relational data and have been widely applied in various fields of
interest. The key mechanism behind GNNs is the so-called message passing where
information is being iteratively aggregated to central nodes from their
neighbourhood. Such a scheme has been found to be intrinsically linked to a
physical process known as heat diffusion, where the propagation of GNNs
naturally corresponds to the evolution of heat density. Analogizing the process
of message passing to the heat dynamics allows to fundamentally understand the
power and pitfalls of GNNs and consequently informs better model design.
Recently, there emerges a plethora of works that proposes GNNs inspired from
the continuous dynamics formulation, in an attempt to mitigate the known
limitations of GNNs, such as oversmoothing and oversquashing. In this survey,
we provide the first systematic and comprehensive review of studies that
leverage the continuous perspective of GNNs. To this end, we introduce
foundational ingredients for adapting continuous dynamics to GNNs, along with a
general framework for the design of graph neural dynamics. We then review and
categorize existing works based on their driven mechanisms and underlying
dynamics. We also summarize how the limitations of classic GNNs can be
addressed under the continuous framework. We conclude by identifying multiple
open research directions
Bregman Graph Neural Network
Numerous recent research on graph neural networks (GNNs) has focused on
formulating GNN architectures as an optimization problem with the smoothness
assumption. However, in node classification tasks, the smoothing effect induced
by GNNs tends to assimilate representations and over-homogenize labels of
connected nodes, leading to adverse effects such as over-smoothing and
misclassification. In this paper, we propose a novel bilevel optimization
framework for GNNs inspired by the notion of Bregman distance. We demonstrate
that the GNN layer proposed accordingly can effectively mitigate the
over-smoothing issue by introducing a mechanism reminiscent of the "skip
connection". We validate our theoretical results through comprehensive
empirical studies in which Bregman-enhanced GNNs outperform their original
counterparts in both homophilic and heterophilic graphs. Furthermore, our
experiments also show that Bregman GNNs can produce more robust learning
accuracy even when the number of layers is high, suggesting the effectiveness
of the proposed method in alleviating the over-smoothing issue
FedGST:Federated Graph Spatio-Temporal Framework for Brain Functional Disease Prediction
Currently, most medical institutions face the challenge of training a unified model using fragmented and isolated data to address disease prediction problems. Although federated learning has become the recognized paradigm for privacy-preserving model training, how to integrate federated learning with fMRI temporal characteristics to enhance predictive performance remains an open question for functional disease prediction. To address this challenging task, we propose a novel Federated Graph Spatio-Temporal (FedGST) framework for brain functional disease prediction. Specifically, anchor sampling is used to process variable-length time series data on local clients. Then dynamic functional connectivity graphs are generated via sliding windows and Pearson correlation coefficients. Next, we propose an InceptionTime model to extract temporal information from the dynamic functional connectivity graphs on the local clients. Finally, the hidden activation variables are sent to a global server. We propose a UniteGCN model on the global server to receive and process the hidden activation variables from clients. Then, the global server returns gradient information to clients for backpropagation and model parameter updating. Client models aggregate model parameters on the local server and distribute them to clients for the next round of training. We demonstrate that FedGST outperforms other federated learning methods and baselines on ABIDE-1 and ADHD200 datasets.</p
FedGST:Federated Graph Spatio-Temporal Framework for Brain Functional Disease Prediction
Currently, most medical institutions face the challenge of training a unified model using fragmented and isolated data to address disease prediction problems. Although federated learning has become the recognized paradigm for privacy-preserving model training, how to integrate federated learning with fMRI temporal characteristics to enhance predictive performance remains an open question for functional disease prediction. To address this challenging task, we propose a novel Federated Graph Spatio-Temporal (FedGST) framework for brain functional disease prediction. Specifically, anchor sampling is used to process variable-length time series data on local clients. Then dynamic functional connectivity graphs are generated via sliding windows and Pearson correlation coefficients. Next, we propose an InceptionTime model to extract temporal information from the dynamic functional connectivity graphs on the local clients. Finally, the hidden activation variables are sent to a global server. We propose a UniteGCN model on the global server to receive and process the hidden activation variables from clients. Then, the global server returns gradient information to clients for backpropagation and model parameter updating. Client models aggregate model parameters on the local server and distribute them to clients for the next round of training. We demonstrate that FedGST outperforms other federated learning methods and baselines on ABIDE-1 and ADHD200 datasets.</p
Exposition on over-squashing problem on GNNs: Current Methods, Benchmarks and Challenges
Graph-based message-passing neural networks (MPNNs) have achieved remarkable
success in both node and graph-level learning tasks. However, several
identified problems, including over-smoothing (OSM), limited expressive power,
and over-squashing (OSQ), still limit the performance of MPNNs. In particular,
OSQ serves as the latest identified problem, where MPNNs gradually lose their
learning accuracy when long-range dependencies between graph nodes are
required. In this work, we provide an exposition on the OSQ problem by
summarizing different formulations of OSQ from current literature, as well as
the three different categories of approaches for addressing the OSQ problem. In
addition, we also discuss the alignment between OSQ and expressive power and
the trade-off between OSQ and OSM. Furthermore, we summarize the empirical
methods leveraged from existing works to verify the efficiency of OSQ
mitigation approaches, with illustrations of their computational complexities.
Lastly, we list some open questions that are of interest for further
exploration of the OSQ problem along with potential directions from the best of
our knowledge
- …