Search CORE

117 research outputs found

Joint & Progressive Learning from High-Dimensional Data for Multi-Label Classification

Author: Hong Danfeng
Xu Jian
Yokoya Naoto
Zhu Xiaoxiang
Publication venue
Publication date: 01/01/2018
Field of study

Despite the fact that nonlinear subspace learning techniques (e.g. manifold learning) have successfully applied to data representation, there is still room for improvement in explainability (explicit mapping), generalization (out-of-samples), and cost-effectiveness (linearization). To this end, a novel linearized subspace learning technique is developed in a joint and progressive way, called \textbf{j}oint and \textbf{p}rogressive \textbf{l}earning str\textbf{a}teg\textbf{y} (J-Play), with its application to multi-label classification. The J-Play learns high-level and semantically meaningful feature representation from high-dimensional data by 1) jointly performing multiple subspace learning and classification to find a latent subspace where samples are expected to be better classified; 2) progressively learning multi-coupled projections to linearly approach the optimal mapping bridging the original space with the most discriminative subspace; 3) locally embedding manifold structure in each learnable latent subspace. Extensive experiments are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous state-of-the-art methods.Comment: accepted in ECCV 201

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Understanding Dark Scenes by Contrasting Multi-Modal Observations

Author: Dong Xiaoyu
Yokoya Naoto
Publication venue
Publication date: 23/08/2023
Field of study

Understanding dark scenes based on multi-modal image data is challenging, as both the visible and auxiliary modalities provide limited semantic information for the task. Previous methods focus on fusing the two modalities but neglect the correlations among semantic classes when minimizing losses to align pixels with labels, resulting in inaccurate class predictions. To address these issues, we introduce a supervised multi-modal contrastive learning approach to increase the semantic discriminability of the learned multi-modal feature spaces by jointly performing cross-modal and intra-modal contrast under the supervision of the class correlations. The cross-modal contrast encourages same-class embeddings from across the two modalities to be closer and pushes different-class ones apart. The intra-modal contrast forces same-class or different-class embeddings within each modality to be together or apart. We validate our approach on a variety of tasks that cover diverse light conditions and image modalities. Experiments show that our approach can effectively enhance dark scene understanding based on multi-modal images with limited semantics by shaping semantic-discriminative feature spaces. Comparisons with previous methods demonstrate our state-of-the-art performance. Code and pretrained models are available at https://github.com/palmdong/SMMCL

arXiv.org e-Print Archive

Multi-temporal Sentinel-1 and -2 Data Fusion for Optical Image Simulation

Author: He Wei
Yokoya Naoto
Publication venue
Publication date: 26/07/2018
Field of study

In this paper, we present the optical image simulation from a synthetic aperture radar (SAR) data using deep learning based methods. Two models, i.e., optical image simulation directly from the SAR data and from multi-temporal SARoptical data, are proposed to testify the possibilities. The deep learning based methods that we chose to achieve the models are a convolutional neural network (CNN) with a residual architecture and a conditional generative adversarial network (cGAN). We validate our models using the Sentinel-1 and -2 datasets. The experiments demonstrate that the model with multi-temporal SAR-optical data can successfully simulate the optical image, meanwhile, the model with simple SAR data as input failed. The optical image simulation results indicate the possibility of SARoptical information blending for the subsequent applications such as large-scale cloud removal, and optical data temporal superresolution. We also investigate the sensitivity of the proposed models against the training samples, and reveal possible future directions

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Graph Regularized Coupled Spectral Unmixing for Change Detection

Author: Yokoya Naoto
Zhu Xiao Xiang
Publication venue: IEEE Xplore
Publication date: 01/06/2015
Field of study

This paper presents a methodology of coupled spectral unmixing for multitemporal hyperspectral data analysis. Coupled spectral unmixing simultaneously extracts the sets of spectral signatures of endmembers and respective abundance maps from multiple spectral images with differences in observation conditions and sensor characteristics. The problem is formulated in the framework of coupled nonnegative matrix factorization. A graph regularization that reflects spectral correlation between two images on abundance fractions is introduced into the optimization of coupled spectral unmixing to consider temporal changes of the earth’s surface. An alternating optimization algorithm is investigated using the method of Lagrange multipliers to guarantee a stable convergence. The proposed method was applied to dual-temporal Hyperion images taken over the Fukushima Daiichi nuclear power plant. Experimental results showed that the proposed method can extract essential information on the earth’s surface in a data-driven manner beyond multitemporal data modality

Institute of Transport Research:Publications

SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping and Building Change Detection

Author: Chen Hongruixuan
Song Jian
Yokoya Naoto
Publication venue
Publication date: 04/09/2023
Field of study

Synthetic datasets, recognized for their cost effectiveness, play a pivotal role in advancing computer vision tasks and techniques. However, when it comes to remote sensing image processing, the creation of synthetic datasets becomes challenging due to the demand for larger-scale and more diverse 3D models. This complexity is compounded by the difficulties associated with real remote sensing datasets, including limited data acquisition and high annotation costs, which amplifies the need for high-quality synthetic alternatives. To address this, we present SyntheWorld, a synthetic dataset unparalleled in quality, diversity, and scale. It includes 40,000 images with submeter-level pixels and fine-grained land cover annotations of eight categories, and it also provides 40,000 pairs of bitemporal image pairs with building change annotations for building change detection task. We conduct experiments on multiple benchmark remote sensing datasets to verify the effectiveness of SyntheWorld and to investigate the conditions under which our synthetic data yield advantages. We will release SyntheWorld to facilitate remote sensing image processing research.Comment: Accepted by WACV 202

arXiv.org e-Print Archive

Submeter-level Land Cover Mapping of Japan

Author: Broni-Bediako Clifford
Xia Junshi
Yokoya Naoto
Publication venue
Publication date: 19/11/2023
Field of study

Deep learning has shown promising performance in submeter-level mapping tasks; however, the annotation cost of submeter-level imagery remains a challenge, especially when applied on a large scale. In this paper, we present the first submeter-level land cover mapping of Japan with eight classes, at a relatively low annotation cost. We introduce a human-in-the-loop deep learning framework leveraging OpenEarthMap, a recently introduced benchmark dataset for global submeter-level land cover mapping, with a U-Net model that achieves national-scale mapping with a small amount of additional labeled data. By adding a small amount of labeled data of areas or regions where a U-Net model trained on OpenEarthMap clearly failed and retraining the model, an overall accuracy of 80\% was achieved, which is a nearly 16 percentage point improvement after retraining. Using aerial imagery provided by the Geospatial Information Authority of Japan, we create land cover classification maps of eight classes for the entire country of Japan. Our framework, with its low annotation cost and high-accuracy mapping results, demonstrates the potential to contribute to the automatic updating of national-scale land cover mapping using submeter-level optical remote sensing data. The mapping results will be made publicly available.Comment: 16 pages, 10 figure

arXiv.org e-Print Archive

Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution

Author: Dong Xiaoyu
Uezato Tatsumi
Wang Longguang
Yokoya Naoto
Publication venue
Publication date: 19/07/2022
Field of study

Self-supervised cross-modal super-resolution (SR) can overcome the difficulty of acquiring paired training data, but is challenging because only low-resolution (LR) source and high-resolution (HR) guide images from different modalities are available. Existing methods utilize pseudo or weak supervision in LR space and thus deliver results that are blurry or not faithful to the source modality. To address this issue, we present a mutual modulation SR (MMSR) model, which tackles the task by a mutual modulation strategy, including a source-to-guide modulation and a guide-to-source modulation. In these modulations, we develop cross-domain adaptive filters to fully exploit cross-modal spatial dependency and help induce the source to emulate the resolution of the guide and induce the guide to mimic the modality characteristics of the source. Moreover, we adopt a cycle consistency constraint to train MMSR in a fully self-supervised manner. Experiments on various tasks demonstrate the state-of-the-art performance of our MMSR.Comment: ECCV 202

arXiv.org e-Print Archive

A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving

Author: Gan Wanshui
Mo Ningkai
Xu Hongbin
Yokoya Naoto
Publication venue
Publication date: 03/09/2023
Field of study

The task of estimating 3D occupancy from surrounding-view images is an exciting development in the field of autonomous driving, following the success of Bird's Eye View (BEV) perception. This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. In this work, we present a simple attempt for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation, such as network design, optimization, and evaluation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation, stereo matching, and BEV perception (3D object detection and map segmentation), which could advance the study on 3D occupancy estimation. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish a new benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets and achieve competitive performance.The relevant code will be available in https://github.com/GANWANSHUI/SimpleOccupanc

arXiv.org e-Print Archive