93 research outputs found
Projected Spatiotemporal Dynamics of Drought under Global Warming in Central Asia
Drought, one of the most common natural disasters that have the greatest impact on human social life, has been extremely challenging to accurately assess and predict. With global warming, it has become more important to make accurate drought predictions and assessments. In this study, based on climate model data provided by the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP), we used the Palmer Drought Severity Index (PDSI) to analyze and project drought characteristics and their trends under two global warming scenarios—1.5 °C and 2.0 °C—in Central Asia. The results showed a marked decline in the PDSI in Central Asia under the influence of global warming, indicating that the drought situation in Central Asia would further worsen under both warming scenarios. Under the 1.5 °C warming scenario, the PDSI in Central Asia decreased first and then increased, and the change time was around 2080, while the PDSI values showed a continuous decline after 2025 in the 2.0 °C warming scenario. Under the two warming scenarios, the spatial characteristics of dry and wet areas in Central Asia are projected to change significantly in the future. In the 1.5 °C warming scenario, the frequency of drought and the proportion of arid areas in Central Asia were significantly higher than those under the 2.0 °C warming scenario. Using the Thornthwaite (TH) formula to calculate the PDSI produced an overestimation of drought, and the Penman–Monteith (PM) formula is therefore recommended to calculate the index
This Looks Like Those: Illuminating Prototypical Concepts Using Multiple Visualizations
We present ProtoConcepts, a method for interpretable image classification
combining deep learning and case-based reasoning using prototypical parts.
Existing work in prototype-based image classification uses a ``this looks like
that'' reasoning process, which dissects a test image by finding prototypical
parts and combining evidence from these prototypes to make a final
classification. However, all of the existing prototypical part-based image
classifiers provide only one-to-one comparisons, where a single training image
patch serves as a prototype to compare with a part of our test image. With
these single-image comparisons, it can often be difficult to identify the
underlying concept being compared (e.g., ``is it comparing the color or the
shape?''). Our proposed method modifies the architecture of prototype-based
networks to instead learn prototypical concepts which are visualized using
multiple image patches. Having multiple visualizations of the same prototype
allows us to more easily identify the concept captured by that prototype (e.g.,
``the test image and the related training patches are all the same shade of
blue''), and allows our model to create richer, more interpretable visual
explanations. Our experiments show that our ``this looks like those'' reasoning
process can be applied as a modification to a wide range of existing
prototypical image classification networks while achieving comparable accuracy
on benchmark datasets
Transforming the Interactive Segmentation for Medical Imaging
The goal of this paper is to interactively refine the automatic segmentation
on challenging structures that fall behind human performance, either due to the
scarcity of available annotations or the difficulty nature of the problem
itself, for example, on segmenting cancer or small organs. Specifically, we
propose a novel Transformer-based architecture for Interactive Segmentation
(TIS), that treats the refinement task as a procedure for grouping pixels with
similar features to those clicks given by the end users. Our proposed
architecture is composed of Transformer Decoder variants, which naturally
fulfills feature comparison with the attention mechanisms. In contrast to
existing approaches, our proposed TIS is not limited to binary segmentations,
and allows the user to edit masks for arbitrary number of categories. To
validate the proposed approach, we conduct extensive experiments on three
challenging datasets and demonstrate superior performance over the existing
state-of-the-art methods. The project page is: https://wtliu7.github.io/tis/.Comment: Accepted to MICCAI 202
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models
When trained at a sufficient scale, self-supervised learning has exhibited a
notable ability to solve a wide range of visual or language understanding
tasks. In this paper, we investigate simple, yet effective approaches for
adapting the pre-trained foundation models to the downstream task of interest,
namely, open-vocabulary semantic segmentation. To this end, we make the
following contributions: (i) we introduce Fusioner, with a lightweight,
transformer-based fusion module, that pairs the frozen visual representation
with language concept through a handful of image segmentation data. As a
consequence, the model gains the capability of zero-shot transfer to segment
novel categories; (ii) without loss of generality, we experiment on a broad
range of self-supervised models that have been pre-trained with different
schemes, e.g. visual-only models (MoCo v3, DINO), language-only models (BERT),
visual-language model (CLIP), and show that, the proposed fusion approach is
effective to any pair of visual and language models, even those pre-trained on
a corpus of uni-modal data; (iii) we conduct thorough ablation studies to
analyze the critical components in our proposed Fusioner, while evaluating on
standard benchmarks, e.g. PASCAL-5i and COCO-20i , it surpasses existing
state-of-the-art models by a large margin, despite only being trained on frozen
visual and language features; (iv) to measure the model's robustness on
learning visual-language correspondence, we further evaluate on synthetic
dataset, named Mosaic-4, where images are constructed by mosaicking the samples
from FSS-1000. Fusioner demonstrates superior performance over previous models.Comment: BMVC 2022 Ora
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
The goal of the audio-visual segmentation (AVS) task is to segment the
sounding objects in the video frames using audio cues. However, current
fusion-based methods have the performance limitations due to the small
receptive field of convolution and inadequate fusion of audio-visual features.
To overcome these issues, we propose a novel \textbf{Au}dio-aware
query-enhanced \textbf{TR}ansformer (AuTR) to tackle the task. Unlike existing
methods, our approach introduces a multimodal transformer architecture that
enables deep fusion and aggregation of audio-visual features. Furthermore, we
devise an audio-aware query-enhanced transformer decoder that explicitly helps
the model focus on the segmentation of the pinpointed sounding objects based on
audio signals, while disregarding silent yet salient objects. Experimental
results show that our method outperforms previous methods and demonstrates
better generalization ability in multi-sound and open-set scenarios.Comment: arXiv admin note: text overlap with arXiv:2305.1101
- …