47 research outputs found
Scaling Up, Scaling Deep: Blockwise Graph Contrastive Learning
Oversmoothing is a common phenomenon in graph neural networks (GNNs), in
which an increase in the network depth leads to a deterioration in their
performance. Graph contrastive learning (GCL) is emerging as a promising way of
leveraging vast unlabeled graph data. As a marriage between GNNs and
contrastive learning, it remains unclear whether GCL inherits the same
oversmoothing defect from GNNs. This work undertakes a fundamental analysis of
GCL from the perspective of oversmoothing on the first hand. We demonstrate
empirically that increasing network depth in GCL also leads to oversmoothing in
their deep representations, and surprisingly, the shallow ones. We refer to
this phenomenon in GCL as long-range starvation', wherein lower layers in deep
networks suffer from degradation due to the lack of sufficient guidance from
supervision (e.g., loss computing). Based on our findings, we present BlockGCL,
a remarkably simple yet effective blockwise training framework to prevent GCL
from notorious oversmoothing. Without bells and whistles, BlockGCL consistently
improves robustness and stability for well-established GCL methods with
increasing numbers of layers on real-world graph benchmarks. We believe our
work will provide insights for future improvements of scalable and deep GCL
frameworks.Comment: Preprint; Code is available at
https://github.com/EdisonLeeeee/BlockGC
Rethinking and Simplifying Bootstrapped Graph Latents
Graph contrastive learning (GCL) has emerged as a representative paradigm in
graph self-supervised learning, where negative samples are commonly regarded as
the key to preventing model collapse and producing distinguishable
representations. Recent studies have shown that GCL without negative samples
can achieve state-of-the-art performance as well as scalability improvement,
with bootstrapped graph latent (BGRL) as a prominent step forward. However,
BGRL relies on a complex architecture to maintain the ability to scatter
representations, and the underlying mechanisms enabling the success remain
largely unexplored. In this paper, we introduce an instance-level decorrelation
perspective to tackle the aforementioned issue and leverage it as a springboard
to reveal the potential unnecessary model complexity within BGRL. Based on our
findings, we present SGCL, a simple yet effective GCL framework that utilizes
the outputs from two consecutive iterations as positive pairs, eliminating the
negative samples. SGCL only requires a single graph augmentation and a single
graph encoder without additional parameters. Extensive experiments conducted on
various graph benchmarks demonstrate that SGCL can achieve competitive
performance with fewer parameters, lower time and space costs, and significant
convergence speedup.Comment: Accepted by WSDM 202
Multi-modality cardiac image computing: a survey
Multi-modality cardiac imaging plays a key role in the management of patients with cardiovascular diseases. It allows a combination of complementary anatomical, morphological and functional information, increases diagnosis accuracy, and improves the efficacy of cardiovascular interventions and clinical outcomes. Fully-automated processing and quantitative analysis of multi-modality cardiac images could have a direct impact on clinical research and evidence-based patient management. However, these require overcoming significant challenges including inter-modality misalignment and finding optimal methods to integrate information from different modalities.
This paper aims to provide a comprehensive review of multi-modality imaging in cardiology, the computing methods, the validation strategies, the related clinical workflows and future perspectives. For the computing methodologies, we have a favored focus on the three tasks, i.e., registration, fusion and segmentation, which generally involve multi-modality imaging data, either combining information from different modalities or transferring information across modalities. The review highlights that multi-modality cardiac imaging data has the potential of wide applicability in the clinic, such as trans-aortic valve implantation guidance, myocardial viability assessment, and catheter ablation therapy and its patient selection. Nevertheless, many challenges remain unsolved, such as missing modality, modality selection, combination of imaging and non-imaging data, and uniform analysis and representation of different modalities. There is also work to do in defining how the well-developed techniques fit in clinical workflows and how much additional and relevant information they introduce. These problems are likely to continue to be an active field of research and the questions to be answered in the future
DeepMerge: Deep-Learning-Based Region-Merging for Image Segmentation
Image segmentation aims to partition an image according to the objects in the
scene and is a fundamental step in analysing very high spatial-resolution (VHR)
remote sensing imagery. Current methods struggle to effectively consider land
objects with diverse shapes and sizes. Additionally, the determination of
segmentation scale parameters frequently adheres to a static and empirical
doctrine, posing limitations on the segmentation of large-scale remote sensing
images and yielding algorithms with limited interpretability. To address the
above challenges, we propose a deep-learning-based region merging method dubbed
DeepMerge to handle the segmentation of complete objects in large VHR images by
integrating deep learning and region adjacency graph (RAG). This is the first
method to use deep learning to learn the similarity and merge similar adjacent
super-pixels in RAG. We propose a modified binary tree sampling method to
generate shift-scale data, serving as inputs for transformer-based deep
learning networks, a shift-scale attention with 3-Dimension relative position
embedding to learn features across scales, and an embedding to fuse learned
features with hand-crafted features. DeepMerge can achieve high segmentation
accuracy in a supervised manner from large-scale remotely sensed images and
provides an interpretable optimal scale parameter, which is validated using a
remote sensing image of 0.55 m resolution covering an area of 5,660 km^2. The
experimental results show that DeepMerge achieves the highest F value (0.9550)
and the lowest total error TE (0.0895), correctly segmenting objects of
different sizes and outperforming all competing segmentation methods
Recommended from our members
Femtosecond visualization of oxygen vacancies in metal oxides.
Oxygen vacancies often determine the electronic structure of metal oxides, but existing techniques cannot distinguish the oxygen-vacancy sites in the crystal structure. We report here that time-resolved optical spectroscopy can solve this challenge and determine the spatial locations of oxygen vacancies. Using tungsten oxides as examples, we identified the true oxygen-vacancy sites in WO2.9 and WO2.72, typical derivatives of WO3 and determined their fingerprint optoelectronic features. We find that a metastable band with a three-stage evolution dynamics of the excited states is present in WO2.9 but is absent in WO2.72. By comparison with model bandstructure calculations, this enables determination of the most closely neighbored oxygen-vacancy pairs in the crystal structure of WO2.72, for which two oxygen vacancies are ortho-positioned to a single W atom as a sole configuration among all O─W bonds. These findings verify the existence of preference rules of oxygen vacancies in metal oxides
Random Style Transfer based Domain Generalization Networks Integrating Shape and Spatial Information
Deep learning (DL)-based models have demonstrated good performance in medical
image segmentation. However, the models trained on a known dataset often fail
when performed on an unseen dataset collected from different centers, vendors
and disease populations. In this work, we present a random style transfer
network to tackle the domain generalization problem for multi-vendor and center
cardiac image segmentation. Style transfer is used to generate training data
with a wider distribution/ heterogeneity, namely domain augmentation. As the
target domain could be unknown, we randomly generate a modality vector for the
target modality in the style transfer stage, to simulate the domain shift for
unknown domains. The model can be trained in a semi-supervised manner by
simultaneously optimizing a supervised segmentation and an unsupervised style
translation objective. Besides, the framework incorporates the spatial
information and shape prior of the target by introducing two regularization
terms. We evaluated the proposed framework on 40 subjects from the M\&Ms
challenge2020, and obtained promising performance in the segmentation for data
from unknown vendors and centers.Comment: 11 page
What's Behind the Mask: Understanding Masked Graph Modeling for Graph Autoencoders
The last years have witnessed the emergence of a promising self-supervised
learning strategy, referred to as masked autoencoding. However, there is a lack
of theoretical understanding of how masking matters on graph autoencoders
(GAEs). In this work, we present masked graph autoencoder (MaskGAE), a
self-supervised learning framework for graph-structured data. Different from
standard GAEs, MaskGAE adopts masked graph modeling (MGM) as a principled
pretext task - masking a portion of edges and attempting to reconstruct the
missing part with partially visible, unmasked graph structure. To understand
whether MGM can help GAEs learn better representations, we provide both
theoretical and empirical evidence to comprehensively justify the benefits of
this pretext task. Theoretically, we establish close connections between GAEs
and contrastive learning, showing that MGM significantly improves the
self-supervised learning scheme of GAEs. Empirically, we conduct extensive
experiments on a variety of graph benchmarks, demonstrating the superiority of
MaskGAE over several state-of-the-arts on both link prediction and node
classification tasks.Comment: KDD 2023 research track. Code available at
https://github.com/EdisonLeeeee/MaskGA
Cross-modality multi-atlas segmentation via deep registration and label fusion
Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation. Generally, MAS methods register multiple atlases, i.e., medical images with corresponding labels, to a target image; and the transformed atlas labels can be combined to generate target segmentation via label fusion schemes. Many conventional MAS methods employed the atlases from the same modality as the target image. However, the number of atlases with the same modality may be limited or even missing in many clinical applications. Besides, conventional MAS methods suffer from the computational burden of registration or label fusion procedures. In this work, we design a novel cross-modality MAS framework, which uses available atlases from a certain modality to segment a target image from another modality. To boost the computational efficiency of the framework, both the image registration and label fusion are achieved by well-designed deep neural networks. For the atlas-to-target image registration, we propose a bi-directional registration network (BiRegNet), which can efficiently align images from different modalities. For the label fusion, we design a similarity estimation network (SimNet), which estimates the fusion weight of each atlas by measuring its similarity to the target image. SimNet can learn multi-scale information for similarity estimation to improve the performance of label fusion. The proposed framework was evaluated by the left ventricle and liver segmentation tasks on the MM-WHS and CHAOS datasets, respectively. Results have shown that the framework is effective for cross-modality MAS in both registration and label fusion https://github.com/NanYoMy/cmmas.</p
Right ventricular segmentation from short- and long-axis MRIs via information transition
Right ventricular (RV) segmentation from magnetic resonance imaging (MRI) is a crucial step for cardiac morphology and function analysis. However, automatic RV segmentation from MRI is still challenging, mainly due to the heterogeneous intensity, the complex variable shapes, and the unclear RV boundary. Moreover, current methods for the RV segmentation tend to suffer from performance degradation at the basal and apical slices of MRI. In this work, we propose an automatic RV segmentation framework, where the information from long-axis (LA) views is utilized to assist the segmentation of short-axis (SA) views via information transition. Specifically, we employed the transformed segmentation from LA views as a prior information, to extract the ROI from SA views for better segmentation. The information transition aims to remove the surrounding ambiguous regions in the SA views. We tested our model on a public dataset with 360 multi-center, multi-vendor and multi-disease subjects that consist of both LA and SA MRIs. Our experimental results show that including LA views can be effective to improve the accuracy of the SA segmentation. Our model is publicly available at https://github.com/NanYoMy/MMs-2.</p
A building change detection framework with patch-pairing single-temporal supervised learning and metric guided attention mechanism
Building change detection (CD) aims to detect changes in buildings from bi-temporal pairwise images obtained at different times. Typically, a deep learning-based building CD algorithm requires bi-temporal samples with significant building changes for training. However, obtaining such bi-temporal samples is challenging because building changes have a low probability of occurrence. Fortunately, it is relatively simple to obtain single-temporal samples that include a substantial number of buildings. By using these single-temporal building samples, pseudo bi-temporal building change samples can be generated, which can effectively address the problem of limited available bi-temporal building change samples. In view of that, this study proposes a metric guided single-temporal supervised learning framework that uses single-temporal building samples for building CD. In the proposed framework, patch-pairing single-temporal supervised learning (PPSL) adopts a patch-pairing method to construct pseudo bi-temporal building change samples, while equipping the network to effectively suppresses the negative impact of geometric offset and radiation difference in real samples. To further suppress the impact of radiation difference and enhance the effectiveness of our framework, a metric-guided spatial attention module (MGSAM) is designed to minimize the intra-class feature differences between temporal samples and augment the spatial context modeling ability. The proposed method is verified by experiments on different datasets, and the results demonstrate that the proposed method can outperform the existing methods and achieve superior performance