8 research outputs found

    Submeter-level Land Cover Mapping of Japan

    Full text link
    Deep learning has shown promising performance in submeter-level mapping tasks; however, the annotation cost of submeter-level imagery remains a challenge, especially when applied on a large scale. In this paper, we present the first submeter-level land cover mapping of Japan with eight classes, at a relatively low annotation cost. We introduce a human-in-the-loop deep learning framework leveraging OpenEarthMap, a recently introduced benchmark dataset for global submeter-level land cover mapping, with a U-Net model that achieves national-scale mapping with a small amount of additional labeled data. By adding a small amount of labeled data of areas or regions where a U-Net model trained on OpenEarthMap clearly failed and retraining the model, an overall accuracy of 80\% was achieved, which is a nearly 16 percentage point improvement after retraining. Using aerial imagery provided by the Geospatial Information Authority of Japan, we create land cover classification maps of eight classes for the entire country of Japan. Our framework, with its low annotation cost and high-accuracy mapping results, demonstrates the potential to contribute to the automatic updating of national-scale land cover mapping using submeter-level optical remote sensing data. The mapping results will be made publicly available.Comment: 16 pages, 10 figure

    Evolutionary NAS with Gene Expression Programming of Cellular Encoding

    Full text link
    The renaissance of neural architecture search (NAS) has seen classical methods such as genetic algorithms (GA) and genetic programming (GP) being exploited for convolutional neural network (CNN) architectures. While recent work have achieved promising performance on visual perception tasks, the direct encoding scheme of both GA and GP has functional complexity deficiency and does not scale well on large architectures like CNN. To address this, we present a new generative encoding scheme -- symbolic linear generative encodingsymbolic\ linear\ generative\ encoding (SLGE) -- simple, yet powerful scheme which embeds local graph transformations in chromosomes of linear fixed-length string to develop CNN architectures of variant shapes and sizes via evolutionary process of gene expression programming. In experiments, the effectiveness of SLGE is shown in discovering architectures that improve the performance of the state-of-the-art handcrafted CNN architectures on CIFAR-10 and CIFAR-100 image classification tasks; and achieves a competitive classification error rate with the existing NAS methods using less GPU resources.Comment: Accepted at IEEE SSCI 2020 (7 pages, 3 figures

    Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer

    Full text link
    Optical high-resolution imagery and OpenStreetMap (OSM) data are two important data sources for land-cover change detection. Previous studies in these two data sources focus on utilizing the information in OSM data to aid the change detection on multi-temporal optical high-resolution images. This paper pioneers the direct detection of land-cover changes utilizing paired OSM data and optical imagery, thereby broadening the horizons of change detection tasks to encompass more dynamic earth observations. To this end, we propose an object-guided Transformer (ObjFormer) architecture by naturally combining the prevalent object-based image analysis (OBIA) technique with the advanced vision Transformer architecture. The introduction of OBIA can significantly reduce the computational overhead and memory burden in the self-attention module. Specifically, the proposed ObjFormer has a hierarchical pseudo-siamese encoder consisting of object-guided self-attention modules that extract representative features of different levels from OSM data and optical images; a decoder consisting of object-guided cross-attention modules can progressively recover the land-cover changes from the extracted heterogeneous features. In addition to the basic supervised binary change detection task, this paper raises a new semi-supervised semantic change detection task that does not require any manually annotated land-cover labels of optical images to train semantic change detectors. Two lightweight semantic decoders are added to ObjFormer to accomplish this task efficiently. A converse cross-entropy loss is designed to fully utilize the negative samples, thereby contributing to the great performance improvement in this task. The first large-scale benchmark dataset containing 1,287 map-image pairs (1024×\times 1024 pixels for each sample) covering 40 regions on six continents ...(see the manuscript for the full abstract

    A Survey on African Computer Vision Datasets, Topics and Researchers

    Full text link
    Computer vision encompasses a range of tasks such as object detection, semantic segmentation, and 3D reconstruction. Despite its relevance to African communities, research in this field within Africa represents only 0.06% of top-tier publications over the past decade. This study undertakes a thorough analysis of 63,000 Scopus-indexed computer vision publications from Africa, spanning from 2012 to 2022. The aim is to provide a survey of African computer vision topics, datasets and researchers. A key aspect of our study is the identification and categorization of African Computer Vision datasets using large language models that automatically parse abstracts of these publications. We also provide a compilation of unofficial African Computer Vision datasets distributed through challenges or data hosting platforms, and provide a full taxonomy of dataset categories. Our survey also pinpoints computer vision topics trends specific to different African regions, indicating their unique focus areas. Additionally, we carried out an extensive survey to capture the views of African researchers on the current state of computer vision research in the continent and the structural barriers they believe need urgent attention. In conclusion, this study catalogs and categorizes Computer Vision datasets and topics contributed or initiated by African institutions and identifies barriers to publishing in top-tier Computer Vision venues. This survey underscores the importance of encouraging African researchers and institutions in advancing computer vision research in the continent. It also stresses on the need for research topics to be more aligned with the needs of African communities.Comment: Under Review, Community Work of Ro'ya Grassroots, https://ro-ya-cv4africa.github.io/homepage/.Journal extension of our conference paper, arXiv admin note: text overlap with arXiv:2305.0677

    Automated Deep Neural Networks with Gene Expression Programming of Cellular Encoding -Towards the Applications in Remote Sensing Image Understanding-

    Get PDF
    Deep neural networks (DNNs) such as convolutional neural networks (CNNs) have enabled remarkable progress in the application of machine learning and artificial intelligence. Research scientists are gearing up for adopting DNN methods to their respective domain problems. Automated neural architecture search (NAS), also known as automated DNN (AutoDNN), aims to automate the architecture search of neural networks to enable researchers adopt DNN methods with ease, and with little or no expertise in deep learning. As metaheuristic approach, automated NAS requires a representation scheme to encode the candidate solutions (architectures). Direct encodings of genetic algorithms and genetic programming have been widely employed in automated NAS methods. Though easy to implement, direct encoding cannot be easily modularized and the lack of distinctive separation of genotype and phenotype spaces limits their functional complexity. Therefore, it may be difficult for direct encodings to evolve modules (building-blocks) with shortcut and multi-branch connections which can improve training and enhance network performance in image understanding tasks. This work presents a novel generative encoding, called symbolic linear generative encoding (SLGE), that combines the complementary strengths of gene expression programming (GEP) and cellular encoding (CE) for automatic architecture search of deep neural networks for image understanding. In particular, evolving modularized CNNs with shortcut and multi-branch modularity properties (similar to the ones commonly adopted by human experts) for remote sensing (RS) image understanding tasks such as scene classification and semantic segmentation. GEP is known for its simplicity in implementation and multi-gene chromosomes with flexible genetic modification, whereas CE has the ability to produce modular artificial neural networks (ANNs). Both GEP and CE are well established evolutionary computation methods which have experienced a lot of development and theoretical study. A large part of this previous work involves architecture search of ANNs in a small scale, and therefore this work provides the possibility for CNNs architecture development for image understanding tasks, particularly in the field of RS. We adopt two automated NAS search strategies: random search with early-stopping and evolutionary algorithm, to automatically evolve modularized CNNs architectures for classification of RS imagery scenes and semantic segmentation of aerial/satellite imagery respectively. Two types of multi-class image scene classification tasks were performed: single-label scene classification and multi-label scene classification, using four different remotely-sensed imagery datasets, to validate the expressiveness and tractability of SLGE representation space. Moreover, we constructed a two-separate SLGE representation spaces: normal cell and atrous spatial pyramid pooling (ASPP) cell. Then, using evolutionary algorithm with genetic operators such as uniform mutation, two-point crossover and gene crossover, we joint search for a normal cell and an ASPP cell as a pair of cells to build a modularized encoder-decoder CNN architecture for solving RS image semantic segmentation problem. Three RS semantic segmentation benchmarks were used to verify the performance of the SLGE architecture representation. By doing this, we also validated the effectiveness and robustness the proposed SLGE architecture representation. The results position SLGE architecture representation amongst the best of the state-of-the-art systems.創価大

    Submeter-level land cover mapping of Japan

    No full text
    Deep learning has shown promising performance in submeter-level mapping tasks; however, its annotation cost remains a challenge, especially when applied on a large scale. In this paper, we introduce the first submeter-level land cover mapping of Japan, employing eight classes. We present a human-in-the-loop framework that achieves national-scale mapping with a small amount of additional labeled data together with OpenEarthMap, a recently introduced benchmark dataset for global submeter-level land cover mapping. Using aerial imagery provided by the Geospatial Information Authority of Japan, we create land cover classification maps for the entire country of Japan and evaluate their accuracy. By adding a small amount of labeled data to areas where a U-Net model trained on OpenEarthMap clearly failed and retraining the model, an overall accuracy of 80% was achieved, which is a nearly 16 percentage point improvement after retraining. Our framework, with its low-cost and high-accuracy mapping results, demonstrates the potential to contribute to the automatic updating of land cover maps using submeter-level optical remote sensing data. The mapping results will be made publicly available

    Exploring Multi-Stage GAN with Self-Attention for Speech Enhancement

    No full text
    Multi-stage or multi-generator generative adversarial networks (GANs) have recently been demonstrated to be effective for speech enhancement. The existing multi-generator GANs for speech enhancement only use convolutional layers for synthesising clean speech signals. This reliance on convolution operation may result in masking the temporal dependencies within the signal sequence. This study explores self-attention to address the temporal dependency issue in multi-generator speech enhancement GANs to improve their enhancement performance. We empirically study the effect of integrating a self-attention mechanism into the convolutional layers of the multiple generators in multi-stage or multi-generator speech enhancement GANs, specifically, the ISEGAN and the DSEGAN networks. The experimental results show that introducing a self-attention mechanism into ISEGAN and DSEGAN leads to improvements in their speech enhancement quality and intelligibility across the objective evaluation metrics. Furthermore, we observe that adding self-attention to the ISEGAN’s generators does not only improves its enhancement performance but also bridges the performance gap between the ISEGAN and the DSEGAN with a smaller model footprint. Overall, our findings highlight the potential of self-attention in improving the enhancement performance of multi-generator speech enhancement GANs
    corecore