41 research outputs found
Progressive Scale-aware Network for Remote sensing Image Change Captioning
Remote sensing (RS) images contain numerous objects of different scales,
which poses significant challenges for the RS image change captioning (RSICC)
task to identify visual changes of interest in complex scenes and describe them
via language. However, current methods still have some weaknesses in
sufficiently extracting and utilizing multi-scale information. In this paper,
we propose a progressive scale-aware network (PSNet) to address the problem.
PSNet is a pure Transformer-based model. To sufficiently extract multi-scale
visual features, multiple progressive difference perception (PDP) layers are
stacked to progressively exploit the differencing features of bitemporal
features. To sufficiently utilize the extracted multi-scale features for
captioning, we propose a scale-aware reinforcement (SR) module and combine it
with the Transformer decoding layer to progressively utilize the features from
different PDP layers. Experiments show that the PDP layer and SR module are
effective and our PSNet outperforms previous methods. Our code is public at
https://github.com/Chen-Yang-Liu/PSNetComment: IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing
Symposiu
Implicit Ray-Transformers for Multi-view Remote Sensing Image Segmentation
The mainstream CNN-based remote sensing (RS) image semantic segmentation
approaches typically rely on massive labeled training data. Such a paradigm
struggles with the problem of RS multi-view scene segmentation with limited
labeled views due to the lack of considering 3D information within the scene.
In this paper, we propose ''Implicit Ray-Transformer (IRT)'' based on Implicit
Neural Representation (INR), for RS scene semantic segmentation with sparse
labels (such as 4-6 labels per 100 images). We explore a new way of introducing
multi-view 3D structure priors to the task for accurate and view-consistent
semantic segmentation. The proposed method includes a two-stage learning
process. In the first stage, we optimize a neural field to encode the color and
3D structure of the remote sensing scene based on multi-view images. In the
second stage, we design a Ray Transformer to leverage the relations between the
neural field 3D features and 2D texture features for learning better semantic
representations. Different from previous methods that only consider 3D prior or
2D features, we incorporate additional 2D texture information and 3D prior by
broadcasting CNN features to different point features along the sampled ray. To
verify the effectiveness of the proposed method, we construct a challenging
dataset containing six synthetic sub-datasets collected from the Carla platform
and three real sub-datasets from Google Maps. Experiments show that the
proposed method outperforms the CNN-based methods and the state-of-the-art
INR-based segmentation methods in quantitative and qualitative metrics
CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation
Rectal cancer segmentation of CT image plays a crucial role in timely
clinical diagnosis, radiotherapy treatment, and follow-up. Although current
segmentation methods have shown promise in delineating cancerous tissues, they
still encounter challenges in achieving high segmentation precision. These
obstacles arise from the intricate anatomical structures of the rectum and the
difficulties in performing differential diagnosis of rectal cancer.
Additionally, a major obstacle is the lack of a large-scale, finely annotated
CT image dataset for rectal cancer segmentation. To address these issues, this
work introduces a novel large scale rectal cancer CT image dataset CARE with
pixel-level annotations for both normal and cancerous rectum, which serves as a
valuable resource for algorithm research and clinical application development.
Moreover, we propose a novel medical cancer lesion segmentation benchmark model
named U-SAM. The model is specifically designed to tackle the challenges posed
by the intricate anatomical structures of abdominal organs by incorporating
prompt information. U-SAM contains three key components: promptable information
(e.g., points) to aid in target area localization, a convolution module for
capturing low-level lesion details, and skip-connections to preserve and
recover spatial information during the encoding-decoding process. To evaluate
the effectiveness of U-SAM, we systematically compare its performance with
several popular segmentation methods on the CARE dataset. The generalization of
the model is further verified on the WORD dataset. Extensive experiments
demonstrate that the proposed U-SAM outperforms state-of-the-art methods on
these two datasets. These experiments can serve as the baseline for future
research and clinical application development.Comment: 8 page
FormalGeo: An Extensible Formalized Framework for Olympiad Geometric Problem Solving
This is the first paper in a series of work we have accomplished over the
past three years. In this paper, we have constructed a consistent formal plane
geometry system. This will serve as a crucial bridge between IMO-level plane
geometry challenges and readable AI automated reasoning. Within this formal
framework, we have been able to seamlessly integrate modern AI models with our
formal system. AI is now capable of providing deductive reasoning solutions to
IMO-level plane geometry problems, just like handling other natural languages,
and these proofs are readable, traceable, and verifiable. We propose the
geometry formalization theory (GFT) to guide the development of the geometry
formal system. Based on the GFT, we have established the FormalGeo, which
consists of 88 geometric predicates and 196 theorems. It can represent,
validate, and solve IMO-level geometry problems. we also have crafted the FGPS
(formal geometry problem solver) in Python. It serves as both an interactive
assistant for verifying problem-solving processes and an automated problem
solver. We've annotated the formalgeo7k and formalgeo-imo datasets. The former
contains 6,981 (expand to 133,818 through data augmentation) geometry problems,
while the latter includes 18 (expand to 2,627 and continuously increasing)
IMO-level challenging geometry problems. All annotated problems include
detailed formal language descriptions and solutions. Implementation of the
formal system and experiments validate the correctness and utility of the GFT.
The backward depth-first search method only yields a 2.42% problem-solving
failure rate, and we can incorporate deep learning techniques to achieve lower
one. The source code of FGPS and datasets are available at
https://github.com/BitSecret/FGPS.Comment: 44 page
RSMamba: Remote Sensing Image Classification with State Space Model
Remote sensing image classification forms the foundation of various
understanding tasks, serving a crucial function in remote sensing image
interpretation. The recent advancements of Convolutional Neural Networks (CNNs)
and Transformers have markedly enhanced classification accuracy. Nonetheless,
remote sensing scene classification remains a significant challenge, especially
given the complexity and diversity of remote sensing scenarios and the
variability of spatiotemporal resolutions. The capacity for whole-image
understanding can provide more precise semantic cues for scene discrimination.
In this paper, we introduce RSMamba, a novel architecture for remote sensing
image classification. RSMamba is based on the State Space Model (SSM) and
incorporates an efficient, hardware-aware design known as the Mamba. It
integrates the advantages of both a global receptive field and linear modeling
complexity. To overcome the limitation of the vanilla Mamba, which can only
model causal sequences and is not adaptable to two-dimensional image data, we
propose a dynamic multi-path activation mechanism to augment Mamba's capacity
to model non-causal data. Notably, RSMamba maintains the inherent modeling
mechanism of the vanilla Mamba, yet exhibits superior performance across
multiple remote sensing image classification datasets. This indicates that
RSMamba holds significant potential to function as the backbone of future
visual foundation models. The code will be available at
\url{https://github.com/KyanChen/RSMamba}