8 research outputs found
Submeter-level Land Cover Mapping of Japan
Deep learning has shown promising performance in submeter-level mapping
tasks; however, the annotation cost of submeter-level imagery remains a
challenge, especially when applied on a large scale. In this paper, we present
the first submeter-level land cover mapping of Japan with eight classes, at a
relatively low annotation cost. We introduce a human-in-the-loop deep learning
framework leveraging OpenEarthMap, a recently introduced benchmark dataset for
global submeter-level land cover mapping, with a U-Net model that achieves
national-scale mapping with a small amount of additional labeled data. By
adding a small amount of labeled data of areas or regions where a U-Net model
trained on OpenEarthMap clearly failed and retraining the model, an overall
accuracy of 80\% was achieved, which is a nearly 16 percentage point
improvement after retraining. Using aerial imagery provided by the Geospatial
Information Authority of Japan, we create land cover classification maps of
eight classes for the entire country of Japan. Our framework, with its low
annotation cost and high-accuracy mapping results, demonstrates the potential
to contribute to the automatic updating of national-scale land cover mapping
using submeter-level optical remote sensing data. The mapping results will be
made publicly available.Comment: 16 pages, 10 figure
Evolutionary NAS with Gene Expression Programming of Cellular Encoding
The renaissance of neural architecture search (NAS) has seen classical
methods such as genetic algorithms (GA) and genetic programming (GP) being
exploited for convolutional neural network (CNN) architectures. While recent
work have achieved promising performance on visual perception tasks, the direct
encoding scheme of both GA and GP has functional complexity deficiency and does
not scale well on large architectures like CNN. To address this, we present a
new generative encoding scheme --
(SLGE) -- simple, yet powerful scheme which embeds local graph transformations
in chromosomes of linear fixed-length string to develop CNN architectures of
variant shapes and sizes via evolutionary process of gene expression
programming. In experiments, the effectiveness of SLGE is shown in discovering
architectures that improve the performance of the state-of-the-art handcrafted
CNN architectures on CIFAR-10 and CIFAR-100 image classification tasks; and
achieves a competitive classification error rate with the existing NAS methods
using less GPU resources.Comment: Accepted at IEEE SSCI 2020 (7 pages, 3 figures
Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer
Optical high-resolution imagery and OpenStreetMap (OSM) data are two
important data sources for land-cover change detection. Previous studies in
these two data sources focus on utilizing the information in OSM data to aid
the change detection on multi-temporal optical high-resolution images. This
paper pioneers the direct detection of land-cover changes utilizing paired OSM
data and optical imagery, thereby broadening the horizons of change detection
tasks to encompass more dynamic earth observations. To this end, we propose an
object-guided Transformer (ObjFormer) architecture by naturally combining the
prevalent object-based image analysis (OBIA) technique with the advanced vision
Transformer architecture. The introduction of OBIA can significantly reduce the
computational overhead and memory burden in the self-attention module.
Specifically, the proposed ObjFormer has a hierarchical pseudo-siamese encoder
consisting of object-guided self-attention modules that extract representative
features of different levels from OSM data and optical images; a decoder
consisting of object-guided cross-attention modules can progressively recover
the land-cover changes from the extracted heterogeneous features. In addition
to the basic supervised binary change detection task, this paper raises a new
semi-supervised semantic change detection task that does not require any
manually annotated land-cover labels of optical images to train semantic change
detectors. Two lightweight semantic decoders are added to ObjFormer to
accomplish this task efficiently. A converse cross-entropy loss is designed to
fully utilize the negative samples, thereby contributing to the great
performance improvement in this task. The first large-scale benchmark dataset
containing 1,287 map-image pairs (1024 1024 pixels for each sample)
covering 40 regions on six continents ...(see the manuscript for the full
abstract
A Survey on African Computer Vision Datasets, Topics and Researchers
Computer vision encompasses a range of tasks such as object detection,
semantic segmentation, and 3D reconstruction. Despite its relevance to African
communities, research in this field within Africa represents only 0.06% of
top-tier publications over the past decade. This study undertakes a thorough
analysis of 63,000 Scopus-indexed computer vision publications from Africa,
spanning from 2012 to 2022. The aim is to provide a survey of African computer
vision topics, datasets and researchers. A key aspect of our study is the
identification and categorization of African Computer Vision datasets using
large language models that automatically parse abstracts of these publications.
We also provide a compilation of unofficial African Computer Vision datasets
distributed through challenges or data hosting platforms, and provide a full
taxonomy of dataset categories. Our survey also pinpoints computer vision
topics trends specific to different African regions, indicating their unique
focus areas. Additionally, we carried out an extensive survey to capture the
views of African researchers on the current state of computer vision research
in the continent and the structural barriers they believe need urgent
attention. In conclusion, this study catalogs and categorizes Computer Vision
datasets and topics contributed or initiated by African institutions and
identifies barriers to publishing in top-tier Computer Vision venues. This
survey underscores the importance of encouraging African researchers and
institutions in advancing computer vision research in the continent. It also
stresses on the need for research topics to be more aligned with the needs of
African communities.Comment: Under Review, Community Work of Ro'ya Grassroots,
https://ro-ya-cv4africa.github.io/homepage/.Journal extension of our
conference paper, arXiv admin note: text overlap with arXiv:2305.0677
Automated Deep Neural Networks with Gene Expression Programming of Cellular Encoding -Towards the Applications in Remote Sensing Image Understanding-
Deep neural networks (DNNs) such as convolutional neural networks (CNNs) have enabled remarkable progress in the application of machine learning and artificial intelligence. Research scientists are gearing up for adopting DNN methods to their respective domain problems. Automated neural architecture search (NAS), also known as automated DNN (AutoDNN), aims to automate the architecture search of neural networks to enable researchers adopt DNN methods with ease, and with little or no expertise in deep learning. As metaheuristic approach, automated NAS requires a representation scheme to encode the candidate solutions (architectures). Direct encodings of genetic algorithms and genetic programming have been widely employed in automated NAS methods. Though easy to implement, direct encoding cannot be easily modularized and the lack of distinctive separation of genotype and phenotype spaces limits their functional complexity. Therefore, it may be difficult for direct encodings to evolve modules (building-blocks) with shortcut and multi-branch connections which can improve training and enhance network performance in image understanding tasks. This work presents a novel generative encoding, called symbolic linear generative encoding (SLGE), that combines the complementary strengths of gene expression programming (GEP) and cellular encoding (CE) for automatic architecture search of deep neural networks for image understanding. In particular, evolving modularized CNNs with shortcut and multi-branch modularity properties (similar to the ones commonly adopted by human experts) for remote sensing (RS) image understanding tasks such as scene classification and semantic segmentation. GEP is known for its simplicity in implementation and multi-gene chromosomes with flexible genetic modification, whereas CE has the ability to produce modular artificial neural networks (ANNs). Both GEP and CE are well established evolutionary computation methods which have experienced a lot of development and theoretical study. A large part of this previous work involves architecture search of ANNs in a small scale, and therefore this work provides the possibility for CNNs architecture development for image understanding tasks, particularly in the field of RS. We adopt two automated NAS search strategies: random search with early-stopping and evolutionary algorithm, to automatically evolve modularized CNNs architectures for classification of RS imagery scenes and semantic segmentation of aerial/satellite imagery respectively. Two types of multi-class image scene classification tasks were performed: single-label scene classification and multi-label scene classification, using four different remotely-sensed imagery datasets, to validate the expressiveness and tractability of SLGE representation space. Moreover, we constructed a two-separate SLGE representation spaces: normal cell and atrous spatial pyramid pooling (ASPP) cell. Then, using evolutionary algorithm with genetic operators such as uniform mutation, two-point crossover and gene crossover, we joint search for a normal cell and an ASPP cell as a pair of cells to build a modularized encoder-decoder CNN architecture for solving RS image semantic segmentation problem. Three RS semantic segmentation benchmarks were used to verify the performance of the SLGE architecture representation. By doing this, we also validated the effectiveness and robustness the proposed SLGE architecture representation. The results position SLGE architecture representation amongst the best of the state-of-the-art systems.創価大
Submeter-level land cover mapping of Japan
Deep learning has shown promising performance in submeter-level mapping tasks; however, its annotation cost remains a challenge, especially when applied on a large scale. In this paper, we introduce the first submeter-level land cover mapping of Japan, employing eight classes. We present a human-in-the-loop framework that achieves national-scale mapping with a small amount of additional labeled data together with OpenEarthMap, a recently introduced benchmark dataset for global submeter-level land cover mapping. Using aerial imagery provided by the Geospatial Information Authority of Japan, we create land cover classification maps for the entire country of Japan and evaluate their accuracy. By adding a small amount of labeled data to areas where a U-Net model trained on OpenEarthMap clearly failed and retraining the model, an overall accuracy of 80% was achieved, which is a nearly 16 percentage point improvement after retraining. Our framework, with its low-cost and high-accuracy mapping results, demonstrates the potential to contribute to the automatic updating of land cover maps using submeter-level optical remote sensing data. The mapping results will be made publicly available
Exploring Multi-Stage GAN with Self-Attention for Speech Enhancement
Multi-stage or multi-generator generative adversarial networks (GANs) have recently been demonstrated to be effective for speech enhancement. The existing multi-generator GANs for speech enhancement only use convolutional layers for synthesising clean speech signals. This reliance on convolution operation may result in masking the temporal dependencies within the signal sequence. This study explores self-attention to address the temporal dependency issue in multi-generator speech enhancement GANs to improve their enhancement performance. We empirically study the effect of integrating a self-attention mechanism into the convolutional layers of the multiple generators in multi-stage or multi-generator speech enhancement GANs, specifically, the ISEGAN and the DSEGAN networks. The experimental results show that introducing a self-attention mechanism into ISEGAN and DSEGAN leads to improvements in their speech enhancement quality and intelligibility across the objective evaluation metrics. Furthermore, we observe that adding self-attention to the ISEGAN’s generators does not only improves its enhancement performance but also bridges the performance gap between the ISEGAN and the DSEGAN with a smaller model footprint. Overall, our findings highlight the potential of self-attention in improving the enhancement performance of multi-generator speech enhancement GANs