688 research outputs found
Efficient Semantic Segmentation on Edge Devices
Semantic segmentation works on the computer vision algorithm for assigning
each pixel of an image into a class. The task of semantic segmentation should
be performed with both accuracy and efficiency. Most of the existing deep FCNs
yield to heavy computations and these networks are very power hungry,
unsuitable for real-time applications on portable devices. This project
analyzes current semantic segmentation models to explore the feasibility of
applying these models for emergency response during catastrophic events. We
compare the performance of real-time semantic segmentation models with
non-real-time counterparts constrained by aerial images under oppositional
settings. Furthermore, we train several models on the Flood-Net dataset,
containing UAV images captured after Hurricane Harvey, and benchmark their
execution on special classes such as flooded buildings vs. non-flooded
buildings or flooded roads vs. non-flooded roads. In this project, we developed
a real-time UNet based model and deployed that network on Jetson AGX Xavier
module
Efficient Hybrid Transformer: Learning Global-local Context for Urban Scene Segmentation
Semantic segmentation of fine-resolution urban scene images plays a vital
role in extensive practical applications, such as land cover mapping, urban
change detection, environmental protection and economic assessment. Driven by
rapid developments in deep learning technologies, the convolutional neural
network (CNN) has dominated the semantic segmentation task for many years.
Convolutional neural networks adopt hierarchical feature representation,
demonstrating strong local information extraction. However, the local property
of the convolution layer limits the network from capturing global context that
is crucial for precise segmentation. Recently, Transformer comprise a hot topic
in the computer vision domain. Transformer demonstrates the great capability of
global information modelling, boosting many vision tasks, such as image
classification, object detection and especially semantic segmentation. In this
paper, we propose an efficient hybrid Transformer (EHT) for real-time urban
scene segmentation. The EHT adopts a hybrid structure with and CNN-based
encoder and a transformer-based decoder, learning global-local context with
lower computation. Extensive experiments demonstrate that our EHT has faster
inference speed with competitive accuracy compared with state-of-the-art
lightweight models. Specifically, the proposed EHT achieves a 66.9% mIoU on the
UAVid test set and outperforms other benchmark networks significantly. The code
will be available soon
Expediting Building Footprint Segmentation from High-resolution Remote Sensing Images via progressive lenient supervision
The efficacy of building footprint segmentation from remotely sensed images
has been hindered by model transfer effectiveness. Many existing building
segmentation methods were developed upon the encoder-decoder architecture of
U-Net, in which the encoder is finetuned from the newly developed backbone
networks that are pre-trained on ImageNet. However, the heavy computational
burden of the existing decoder designs hampers the successful transfer of these
modern encoder networks to remote sensing tasks. Even the widely-adopted deep
supervision strategy fails to mitigate these challenges due to its invalid loss
in hybrid regions where foreground and background pixels are intermixed. In
this paper, we conduct a comprehensive evaluation of existing decoder network
designs for building footprint segmentation and propose an efficient framework
denoted as BFSeg to enhance learning efficiency and effectiveness.
Specifically, a densely-connected coarse-to-fine feature fusion decoder network
that facilitates easy and fast feature fusion across scales is proposed.
Moreover, considering the invalidity of hybrid regions in the down-sampled
ground truth during the deep supervision process, we present a lenient deep
supervision and distillation strategy that enables the network to learn proper
knowledge from deep supervision. Building upon these advancements, we have
developed a new family of building segmentation networks, which consistently
surpass prior works with outstanding performance and efficiency across a wide
range of newly developed encoder networks. The code will be released on
https://github.com/HaonanGuo/BFSeg-Efficient-Building-Footprint-Segmentation-Framework.Comment: 13 pages,8 figures. Submitted to IEEE Transactions on Neural Networks
and Learning System
A Comprehensive Review on Computer Vision Analysis of Aerial Data
With the emergence of new technologies in the field of airborne platforms and
imaging sensors, aerial data analysis is becoming very popular, capitalizing on
its advantages over land data. This paper presents a comprehensive review of
the computer vision tasks within the domain of aerial data analysis. While
addressing fundamental aspects such as object detection and tracking, the
primary focus is on pivotal tasks like change detection, object segmentation,
and scene-level analysis. The paper provides the comparison of various hyper
parameters employed across diverse architectures and tasks. A substantial
section is dedicated to an in-depth discussion on libraries, their
categorization, and their relevance to different domain expertise. The paper
encompasses aerial datasets, the architectural nuances adopted, and the
evaluation metrics associated with all the tasks in aerial data analysis.
Applications of computer vision tasks in aerial data across different domains
are explored, with case studies providing further insights. The paper
thoroughly examines the challenges inherent in aerial data analysis, offering
practical solutions. Additionally, unresolved issues of significance are
identified, paving the way for future research directions in the field of
aerial data analysis.Comment: 112 page
Long-Range Correlation Supervision for Land-Cover Classification from Remote Sensing Images
Long-range dependency modeling has been widely considered in modern deep
learning based semantic segmentation methods, especially those designed for
large-size remote sensing images, to compensate the intrinsic locality of
standard convolutions. However, in previous studies, the long-range dependency,
modeled with an attention mechanism or transformer model, has been based on
unsupervised learning, instead of explicit supervision from the objective
ground truth. In this paper, we propose a novel supervised long-range
correlation method for land-cover classification, called the supervised
long-range correlation network (SLCNet), which is shown to be superior to the
currently used unsupervised strategies. In SLCNet, pixels sharing the same
category are considered highly correlated and those having different categories
are less relevant, which can be easily supervised by the category consistency
information available in the ground truth semantic segmentation map. Under such
supervision, the recalibrated features are more consistent for pixels of the
same category and more discriminative for pixels of other categories,
regardless of their proximity. To complement the detailed information lacking
in the global long-range correlation, we introduce an auxiliary adaptive
receptive field feature extraction module, parallel to the long-range
correlation module in the encoder, to capture finely detailed feature
representations for multi-size objects in multi-scale remote sensing images. In
addition, we apply multi-scale side-output supervision and a hybrid loss
function as local and global constraints to further boost the segmentation
accuracy. Experiments were conducted on three remote sensing datasets. Compared
with the advanced segmentation methods from the computer vision, medicine, and
remote sensing communities, the SLCNet achieved a state-of-the-art performance
on all the datasets.Comment: 14 pages, 11 figure
Generative Adversarial Networks based Skin Lesion Segmentation
Skin cancer is a serious condition that requires accurate identification and
treatment. One way to assist clinicians in this task is by using computer-aided
diagnosis (CAD) tools that can automatically segment skin lesions from
dermoscopic images. To this end, a new adversarial learning-based framework
called EGAN has been developed. This framework uses an unsupervised generative
network to generate accurate lesion masks. It consists of a generator module
with a top-down squeeze excitation-based compound scaled path and an asymmetric
lateral connection-based bottom-up path, and a discriminator module that
distinguishes between original and synthetic masks. Additionally, a
morphology-based smoothing loss is implemented to encourage the network to
create smooth semantic boundaries of lesions. The framework is evaluated on the
International Skin Imaging Collaboration (ISIC) Lesion Dataset 2018 and
outperforms the current state-of-the-art skin lesion segmentation approaches
with a Dice coefficient, Jaccard similarity, and Accuracy of 90.1%, 83.6%, and
94.5%, respectively. This represents a 2% increase in Dice Coefficient, 1%
increase in Jaccard Index, and 1% increase in Accuracy
- …