176 research outputs found
Deep Plug-and-Play Prior for Hyperspectral Image Restoration
Deep-learning-based hyperspectral image (HSI) restoration methods have gained
great popularity for their remarkable performance but often demand expensive
network retraining whenever the specifics of task changes. In this paper, we
propose to restore HSIs in a unified approach with an effective plug-and-play
method, which can jointly retain the flexibility of optimization-based methods
and utilize the powerful representation capability of deep neural networks.
Specifically, we first develop a new deep HSI denoiser leveraging gated
recurrent convolution units, short- and long-term skip connections, and an
augmented noise level map to better exploit the abundant spatio-spectral
information within HSIs. It, therefore, leads to the state-of-the-art
performance on HSI denoising under both Gaussian and complex noise settings.
Then, the proposed denoiser is inserted into the plug-and-play framework as a
powerful implicit HSI prior to tackle various HSI restoration tasks. Through
extensive experiments on HSI super-resolution, compressed sensing, and
inpainting, we demonstrate that our approach often achieves superior
performance, which is competitive with or even better than the state-of-the-art
on each task, via a single model without any task-specific training.Comment: code at https://github.com/Zeqiang-Lai/DPHSI
Single-shot Phase Retrieval from a Fractional Fourier Transform Perspective
The realm of classical phase retrieval concerns itself with the arduous task
of recovering a signal from its Fourier magnitude measurements, which are
fraught with inherent ambiguities. A single-exposure intensity measurement is
commonly deemed insufficient for the reconstruction of the primal signal, given
that the absent phase component is imperative for the inverse transformation.
In this work, we present a novel single-shot phase retrieval paradigm from a
fractional Fourier transform (FrFT) perspective, which involves integrating the
FrFT-based physical measurement model within a self-supervised reconstruction
scheme. Specifically, the proposed FrFT-based measurement model addresses the
aliasing artifacts problem in the numerical calculation of Fresnel diffraction,
featuring adaptability to both short-distance and long-distance propagation
scenarios. Moreover, the intensity measurement in the FrFT domain proves highly
effective in alleviating the ambiguities of phase retrieval and relaxing the
previous conditions on oversampled or multiple measurements in the Fourier
domain. Furthermore, the proposed self-supervised reconstruction approach
harnesses the fast discrete algorithm of FrFT alongside untrained neural
network priors, thereby attaining preeminent results. Through numerical
simulations, we demonstrate that both amplitude and phase objects can be
effectively retrieved from a single-shot intensity measurement using the
proposed approach and provide a promising technique for support-free coherent
diffraction imaging
Instance Segmentation in the Dark
Existing instance segmentation techniques are primarily tailored for
high-visibility inputs, but their performance significantly deteriorates in
extremely low-light environments. In this work, we take a deep look at instance
segmentation in the dark and introduce several techniques that substantially
boost the low-light inference accuracy. The proposed method is motivated by the
observation that noise in low-light images introduces high-frequency
disturbances to the feature maps of neural networks, thereby significantly
degrading performance. To suppress this ``feature noise", we propose a novel
learning method that relies on an adaptive weighted downsampling layer, a
smooth-oriented convolutional block, and disturbance suppression learning.
These components effectively reduce feature noise during downsampling and
convolution operations, enabling the model to learn disturbance-invariant
features. Furthermore, we discover that high-bit-depth RAW images can better
preserve richer scene information in low-light conditions compared to typical
camera sRGB outputs, thus supporting the use of RAW-input algorithms. Our
analysis indicates that high bit-depth can be critical for low-light instance
segmentation. To mitigate the scarcity of annotated RAW datasets, we leverage a
low-light RAW synthetic pipeline to generate realistic low-light data. In
addition, to facilitate further research in this direction, we capture a
real-world low-light instance segmentation dataset comprising over two thousand
paired low/normal-light images with instance-level pixel-wise annotations.
Remarkably, without any image preprocessing, we achieve satisfactory
performance on instance segmentation in very low light (4~\% AP higher than
state-of-the-art competitors), meanwhile opening new opportunities for future
research.Comment: Accepted by International Journal of Computer Vision (IJCV) 202
Agent-Aware Training for Agent-Agnostic Action Advising in Deep Reinforcement Learning
Action advising endeavors to leverage supplementary guidance from expert
teachers to alleviate the issue of sampling inefficiency in Deep Reinforcement
Learning (DRL). Previous agent-specific action advising methods are hindered by
imperfections in the agent itself, while agent-agnostic approaches exhibit
limited adaptability to the learning agent. In this study, we propose a novel
framework called Agent-Aware trAining yet Agent-Agnostic Action Advising (A7)
to strike a balance between the two. The underlying concept of A7 revolves
around utilizing the similarity of state features as an indicator for
soliciting advice. However, unlike prior methodologies, the measurement of
state feature similarity is performed by neither the error-prone learning agent
nor the agent-agnostic advisor. Instead, we employ a proxy model to extract
state features that are both discriminative (adaptive to the agent) and
generally applicable (robust to agent noise). Furthermore, we utilize behavior
cloning to train a model for reusing advice and introduce an intrinsic reward
for the advised samples to incentivize the utilization of expert guidance.
Experiments are conducted on the GridWorld, LunarLander, and six prominent
scenarios from Atari games. The results demonstrate that A7 significantly
accelerates the learning process and surpasses existing methods (both
agent-specific and agent-agnostic) by a substantial margin. Our code will be
made publicly available
Spatially Varying Nanophotonic Neural Networks
The explosive growth of computation and energy cost of artificial
intelligence has spurred strong interests in new computing modalities as
potential alternatives to conventional electronic processors. Photonic
processors that execute operations using photons instead of electrons, have
promised to enable optical neural networks with ultra-low latency and power
consumption. However, existing optical neural networks, limited by the
underlying network designs, have achieved image recognition accuracy much lower
than state-of-the-art electronic neural networks. In this work, we close this
gap by introducing a large-kernel spatially-varying convolutional neural
network learned via low-dimensional reparameterization techniques. We
experimentally instantiate the network with a flat meta-optical system that
encompasses an array of nanophotonic structures designed to induce
angle-dependent responses. Combined with an extremely lightweight electronic
backend with approximately 2K parameters we demonstrate a nanophotonic neural
network reaches 73.80\% blind test classification accuracy on CIFAR-10 dataset,
and, as such, the first time, an optical neural network outperforms the first
modern digital neural network -- AlexNet (72.64\%) with 57M parameters,
bringing optical neural network into modern deep learning era
Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes
Multi-frame depth estimation generally achieves high accuracy relying on the
multi-view geometric consistency. When applied in dynamic scenes, e.g.,
autonomous driving, this consistency is usually violated in the dynamic areas,
leading to corrupted estimations. Many multi-frame methods handle dynamic areas
by identifying them with explicit masks and compensating the multi-view cues
with monocular cues represented as local monocular depth or features. The
improvements are limited due to the uncontrolled quality of the masks and the
underutilized benefits of the fusion of the two types of cues. In this paper,
we propose a novel method to learn to fuse the multi-view and monocular cues
encoded as volumes without needing the heuristically crafted masks. As unveiled
in our analyses, the multi-view cues capture more accurate geometric
information in static areas, and the monocular cues capture more useful
contexts in dynamic areas. To let the geometric perception learned from
multi-view cues in static areas propagate to the monocular representation in
dynamic areas and let monocular cues enhance the representation of multi-view
cost volume, we propose a cross-cue fusion (CCF) module, which includes the
cross-cue attention (CCA) to encode the spatially non-local relative
intra-relations from each source to enhance the representation of the other.
Experiments on real-world datasets prove the significant effectiveness and
generalization ability of the proposed method.Comment: Accepted by CVPR 2023. Code and models are available at:
https://github.com/ruili3/dynamic-multiframe-dept
- …