1,606 research outputs found
FSS-2019-nCov:A deep learning architecture for semi-supervised few-shot segmentation of COVID-19 infection
The newly discovered coronavirus (COVID-19) pneumonia is providing major challenges to research in terms of diagnosis and disease quantification. Deep-learning (DL) techniques allow extremely precise image segmentation; yet, they necessitate huge volumes of manually labeled data to be trained in a supervised manner. Few-Shot Learning (FSL) paradigms tackle this issue by learning a novel category from a small number of annotated instances. We present an innovative semi-supervised few-shot segmentation (FSS) approach for efficient segmentation of 2019-nCov infection (FSS-2019-nCov) from only a few amounts of annotated lung CT scans. The key challenge of this study is to provide accurate segmentation of COVID-19 infection from a limited number of annotated instances. For that purpose, we propose a novel dual-path deep-learning architecture for FSS. Every path contains encoder–decoder (E-D) architecture to extract high-level information while maintaining the channel information of COVID-19 CT slices. The E-D architecture primarily consists of three main modules: a feature encoder module, a context enrichment (CE) module, and a feature decoder module. We utilize the pre-trained ResNet34 as an encoder backbone for feature extraction. The CE module is designated by a newly introduced proposed Smoothed Atrous Convolution (SAC) block and Multi-scale Pyramid Pooling (MPP) block. The conditioner path takes the pairs of CT images and their labels as input and produces a relevant knowledge representation that is transferred to the segmentation path to be used to segment the new images. To enable effective collaboration between both paths, we propose an adaptive recombination and recalibration (RR) module that permits intensive knowledge exchange between paths with a trivial increase in computational complexity. The model is extended to multi-class labeling for various types of lung infections. This contribution overcomes the limitation of the lack of large numbers of COVID-19 CT scans. It also provides a general framework for lung disease diagnosis in limited data situations
Cascade-DETR: Delving into High-Quality Universal Object Detection
Object localization in general environments is a fundamental part of vision
systems. While dominating on the COCO benchmark, recent Transformer-based
detection methods are not competitive in diverse domains. Moreover, these
methods still struggle to very accurately estimate the object bounding boxes in
complex environments.
We introduce Cascade-DETR for high-quality universal object detection. We
jointly tackle the generalization to diverse domains and localization accuracy
by proposing the Cascade Attention layer, which explicitly integrates
object-centric information into the detection decoder by limiting the attention
to the previous box prediction. To further enhance accuracy, we also revisit
the scoring of queries. Instead of relying on classification scores, we predict
the expected IoU of the query, leading to substantially more well-calibrated
confidences. Lastly, we introduce a universal object detection benchmark,
UDB10, that contains 10 datasets from diverse domains. While also advancing the
state-of-the-art on COCO, Cascade-DETR substantially improves DETR-based
detectors on all datasets in UDB10, even by over 10 mAP in some cases. The
improvements under stringent quality requirements are even more pronounced. Our
code and models will be released at https://github.com/SysCV/cascade-detr.Comment: Accepted in ICCV 2023. Our code and models will be released at
https://github.com/SysCV/cascade-det
Heteroskedastic Geospatial Tracking with Distributed Camera Networks
Visual object tracking has seen significant progress in recent years.
However, the vast majority of this work focuses on tracking objects within the
image plane of a single camera and ignores the uncertainty associated with
predicted object locations. In this work, we focus on the geospatial object
tracking problem using data from a distributed camera network. The goal is to
predict an object's track in geospatial coordinates along with uncertainty over
the object's location while respecting communication constraints that prohibit
centralizing raw image data. We present a novel single-object geospatial
tracking data set that includes high-accuracy ground truth object locations and
video data from a network of four cameras. We present a modeling framework for
addressing this task including a novel backbone model and explore how
uncertainty calibration and fine-tuning through a differentiable tracker affect
performance
Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation
Semantic segmentation plays an important role in widespread applications such
as autonomous driving and robotic sensing. Traditional methods mostly use RGB
images which are heavily affected by lighting conditions, \eg, darkness. Recent
studies show thermal images are robust to the night scenario as a compensating
modality for segmentation. However, existing works either simply fuse
RGB-Thermal (RGB-T) images or adopt the encoder with the same structure for
both the RGB stream and the thermal stream, which neglects the modality
difference in segmentation under varying lighting conditions. Therefore, this
work proposes a Residual Spatial Fusion Network (RSFNet) for RGB-T semantic
segmentation. Specifically, we employ an asymmetric encoder to learn the
compensating features of the RGB and the thermal images. To effectively fuse
the dual-modality features, we generate the pseudo-labels by saliency detection
to supervise the feature learning, and develop the Residual Spatial Fusion
(RSF) module with structural re-parameterization to learn more promising
features by spatially fusing the cross-modality features. RSF employs a
hierarchical feature fusion to aggregate multi-level features, and applies the
spatial weights with the residual connection to adaptively control the
multi-spectral feature fusion by the confidence gate. Extensive experiments
were carried out on two benchmarks, \ie, MFNet database and PST900 database.
The results have shown the state-of-the-art segmentation performance of our
method, which achieves a good balance between accuracy and speed
- …