117 research outputs found
Joint & Progressive Learning from High-Dimensional Data for Multi-Label Classification
Despite the fact that nonlinear subspace learning techniques (e.g. manifold
learning) have successfully applied to data representation, there is still room
for improvement in explainability (explicit mapping), generalization
(out-of-samples), and cost-effectiveness (linearization). To this end, a novel
linearized subspace learning technique is developed in a joint and progressive
way, called \textbf{j}oint and \textbf{p}rogressive \textbf{l}earning
str\textbf{a}teg\textbf{y} (J-Play), with its application to multi-label
classification. The J-Play learns high-level and semantically meaningful
feature representation from high-dimensional data by 1) jointly performing
multiple subspace learning and classification to find a latent subspace where
samples are expected to be better classified; 2) progressively learning
multi-coupled projections to linearly approach the optimal mapping bridging the
original space with the most discriminative subspace; 3) locally embedding
manifold structure in each learnable latent subspace. Extensive experiments are
performed to demonstrate the superiority and effectiveness of the proposed
method in comparison with previous state-of-the-art methods.Comment: accepted in ECCV 201
Understanding Dark Scenes by Contrasting Multi-Modal Observations
Understanding dark scenes based on multi-modal image data is challenging, as
both the visible and auxiliary modalities provide limited semantic information
for the task. Previous methods focus on fusing the two modalities but neglect
the correlations among semantic classes when minimizing losses to align pixels
with labels, resulting in inaccurate class predictions. To address these
issues, we introduce a supervised multi-modal contrastive learning approach to
increase the semantic discriminability of the learned multi-modal feature
spaces by jointly performing cross-modal and intra-modal contrast under the
supervision of the class correlations. The cross-modal contrast encourages
same-class embeddings from across the two modalities to be closer and pushes
different-class ones apart. The intra-modal contrast forces same-class or
different-class embeddings within each modality to be together or apart. We
validate our approach on a variety of tasks that cover diverse light conditions
and image modalities. Experiments show that our approach can effectively
enhance dark scene understanding based on multi-modal images with limited
semantics by shaping semantic-discriminative feature spaces. Comparisons with
previous methods demonstrate our state-of-the-art performance. Code and
pretrained models are available at https://github.com/palmdong/SMMCL
Multi-temporal Sentinel-1 and -2 Data Fusion for Optical Image Simulation
In this paper, we present the optical image simulation from a synthetic
aperture radar (SAR) data using deep learning based methods. Two models, i.e.,
optical image simulation directly from the SAR data and from multi-temporal
SARoptical data, are proposed to testify the possibilities. The deep learning
based methods that we chose to achieve the models are a convolutional neural
network (CNN) with a residual architecture and a conditional generative
adversarial network (cGAN). We validate our models using the Sentinel-1 and -2
datasets. The experiments demonstrate that the model with multi-temporal
SAR-optical data can successfully simulate the optical image, meanwhile, the
model with simple SAR data as input failed. The optical image simulation
results indicate the possibility of SARoptical information blending for the
subsequent applications such as large-scale cloud removal, and optical data
temporal superresolution. We also investigate the sensitivity of the proposed
models against the training samples, and reveal possible future directions
Graph Regularized Coupled Spectral Unmixing for Change Detection
This paper presents a methodology of coupled spectral unmixing for multitemporal hyperspectral data analysis. Coupled spectral unmixing simultaneously extracts the sets of spectral signatures of endmembers and respective abundance maps from multiple spectral images with differences in observation conditions and sensor characteristics. The problem is formulated in the framework of coupled nonnegative matrix factorization. A graph regularization that reflects spectral correlation between two images on abundance fractions is introduced into the optimization of coupled spectral unmixing to consider temporal changes of the earth’s surface. An
alternating optimization algorithm is investigated using the
method of Lagrange multipliers to guarantee a stable convergence. The proposed method was applied to dual-temporal Hyperion images taken over the Fukushima Daiichi nuclear power plant. Experimental results showed that the proposed method can extract essential information on the earth’s surface in a data-driven manner beyond multitemporal data modality
SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping and Building Change Detection
Synthetic datasets, recognized for their cost effectiveness, play a pivotal
role in advancing computer vision tasks and techniques. However, when it comes
to remote sensing image processing, the creation of synthetic datasets becomes
challenging due to the demand for larger-scale and more diverse 3D models. This
complexity is compounded by the difficulties associated with real remote
sensing datasets, including limited data acquisition and high annotation costs,
which amplifies the need for high-quality synthetic alternatives. To address
this, we present SyntheWorld, a synthetic dataset unparalleled in quality,
diversity, and scale. It includes 40,000 images with submeter-level pixels and
fine-grained land cover annotations of eight categories, and it also provides
40,000 pairs of bitemporal image pairs with building change annotations for
building change detection task. We conduct experiments on multiple benchmark
remote sensing datasets to verify the effectiveness of SyntheWorld and to
investigate the conditions under which our synthetic data yield advantages. We
will release SyntheWorld to facilitate remote sensing image processing
research.Comment: Accepted by WACV 202
Submeter-level Land Cover Mapping of Japan
Deep learning has shown promising performance in submeter-level mapping
tasks; however, the annotation cost of submeter-level imagery remains a
challenge, especially when applied on a large scale. In this paper, we present
the first submeter-level land cover mapping of Japan with eight classes, at a
relatively low annotation cost. We introduce a human-in-the-loop deep learning
framework leveraging OpenEarthMap, a recently introduced benchmark dataset for
global submeter-level land cover mapping, with a U-Net model that achieves
national-scale mapping with a small amount of additional labeled data. By
adding a small amount of labeled data of areas or regions where a U-Net model
trained on OpenEarthMap clearly failed and retraining the model, an overall
accuracy of 80\% was achieved, which is a nearly 16 percentage point
improvement after retraining. Using aerial imagery provided by the Geospatial
Information Authority of Japan, we create land cover classification maps of
eight classes for the entire country of Japan. Our framework, with its low
annotation cost and high-accuracy mapping results, demonstrates the potential
to contribute to the automatic updating of national-scale land cover mapping
using submeter-level optical remote sensing data. The mapping results will be
made publicly available.Comment: 16 pages, 10 figure
Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution
Self-supervised cross-modal super-resolution (SR) can overcome the difficulty
of acquiring paired training data, but is challenging because only
low-resolution (LR) source and high-resolution (HR) guide images from different
modalities are available. Existing methods utilize pseudo or weak supervision
in LR space and thus deliver results that are blurry or not faithful to the
source modality. To address this issue, we present a mutual modulation SR
(MMSR) model, which tackles the task by a mutual modulation strategy, including
a source-to-guide modulation and a guide-to-source modulation. In these
modulations, we develop cross-domain adaptive filters to fully exploit
cross-modal spatial dependency and help induce the source to emulate the
resolution of the guide and induce the guide to mimic the modality
characteristics of the source. Moreover, we adopt a cycle consistency
constraint to train MMSR in a fully self-supervised manner. Experiments on
various tasks demonstrate the state-of-the-art performance of our MMSR.Comment: ECCV 202
A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving
The task of estimating 3D occupancy from surrounding-view images is an
exciting development in the field of autonomous driving, following the success
of Bird's Eye View (BEV) perception. This task provides crucial 3D attributes
of the driving environment, enhancing the overall understanding and perception
of the surrounding space. In this work, we present a simple attempt for 3D
occupancy estimation, which is a CNN-based framework designed to reveal several
key factors for 3D occupancy estimation, such as network design, optimization,
and evaluation. In addition, we explore the relationship between 3D occupancy
estimation and other related tasks, such as monocular depth estimation, stereo
matching, and BEV perception (3D object detection and map segmentation), which
could advance the study on 3D occupancy estimation. For evaluation, we propose
a simple sampling strategy to define the metric for occupancy evaluation, which
is flexible for current public datasets. Moreover, we establish a new benchmark
in terms of the depth estimation metric, where we compare our proposed method
with monocular depth estimation methods on the DDAD and Nuscenes datasets and
achieve competitive performance.The relevant code will be available in
https://github.com/GANWANSHUI/SimpleOccupanc
- …