7,550 research outputs found
Plug-and-Play Regulators for Image-Text Matching
Exploiting fine-grained correspondence and visual-semantic alignments has
shown great potential in image-text matching. Generally, recent approaches
first employ a cross-modal attention unit to capture latent region-word
interactions, and then integrate all the alignments to obtain the final
similarity. However, most of them adopt one-time forward association or
aggregation strategies with complex architectures or additional information,
while ignoring the regulation ability of network feedback. In this paper, we
develop two simple but quite effective regulators which efficiently encode the
message output to automatically contextualize and aggregate cross-modal
representations. Specifically, we propose (i) a Recurrent Correspondence
Regulator (RCR) which facilitates the cross-modal attention unit progressively
with adaptive attention factors to capture more flexible correspondence, and
(ii) a Recurrent Aggregation Regulator (RAR) which adjusts the aggregation
weights repeatedly to increasingly emphasize important alignments and dilute
unimportant ones. Besides, it is interesting that RCR and RAR are
plug-and-play: both of them can be incorporated into many frameworks based on
cross-modal interaction to obtain significant benefits, and their cooperation
achieves further improvements. Extensive experiments on MSCOCO and Flickr30K
datasets validate that they can bring an impressive and consistent R@1 gain on
multiple models, confirming the general effectiveness and generalization
ability of the proposed methods. Code and pre-trained models are available at:
https://github.com/Paranioar/RCAR.Comment: 13 pages, 9 figures, Accepted by TIP202
Deep Plug-and-Play Prior for Hyperspectral Image Restoration
Deep-learning-based hyperspectral image (HSI) restoration methods have gained
great popularity for their remarkable performance but often demand expensive
network retraining whenever the specifics of task changes. In this paper, we
propose to restore HSIs in a unified approach with an effective plug-and-play
method, which can jointly retain the flexibility of optimization-based methods
and utilize the powerful representation capability of deep neural networks.
Specifically, we first develop a new deep HSI denoiser leveraging gated
recurrent convolution units, short- and long-term skip connections, and an
augmented noise level map to better exploit the abundant spatio-spectral
information within HSIs. It, therefore, leads to the state-of-the-art
performance on HSI denoising under both Gaussian and complex noise settings.
Then, the proposed denoiser is inserted into the plug-and-play framework as a
powerful implicit HSI prior to tackle various HSI restoration tasks. Through
extensive experiments on HSI super-resolution, compressed sensing, and
inpainting, we demonstrate that our approach often achieves superior
performance, which is competitive with or even better than the state-of-the-art
on each task, via a single model without any task-specific training.Comment: code at https://github.com/Zeqiang-Lai/DPHSI
Deep Learning for Single Image Super-Resolution: A Brief Review
Single image super-resolution (SISR) is a notoriously challenging ill-posed
problem, which aims to obtain a high-resolution (HR) output from one of its
low-resolution (LR) versions. To solve the SISR problem, recently powerful deep
learning algorithms have been employed and achieved the state-of-the-art
performance. In this survey, we review representative deep learning-based SISR
methods, and group them into two categories according to their major
contributions to two essential aspects of SISR: the exploration of efficient
neural network architectures for SISR, and the development of effective
optimization objectives for deep SISR learning. For each category, a baseline
is firstly established and several critical limitations of the baseline are
summarized. Then representative works on overcoming these limitations are
presented based on their original contents as well as our critical
understandings and analyses, and relevant comparisons are conducted from a
variety of perspectives. Finally we conclude this review with some vital
current challenges and future trends in SISR leveraging deep learning
algorithms.Comment: Accepted by IEEE Transactions on Multimedia (TMM
A Plug-and-Play Image Registration Network
Deformable image registration (DIR) is an active research topic in biomedical
imaging. There is a growing interest in developing DIR methods based on deep
learning (DL). A traditional DL approach to DIR is based on training a
convolutional neural network (CNN) to estimate the registration field between
two input images. While conceptually simple, this approach comes with a
limitation that it exclusively relies on a pre-trained CNN without explicitly
enforcing fidelity between the registered image and the reference. We present
plug-and-play image registration network (PIRATE) as a new DIR method that
addresses this issue by integrating an explicit data-fidelity penalty and a CNN
prior. PIRATE pre-trains a CNN denoiser on the registration field and "plugs"
it into an iterative method as a regularizer. We additionally present PIRATE+
that fine-tunes the CNN prior in PIRATE using deep equilibrium models (DEQ).
PIRATE+ interprets the fixed-point iteration of PIRATE as a network with
effectively infinite layers and then trains the resulting network end-to-end,
enabling it to learn more task-specific information and boosting its
performance. Our numerical results on OASIS and CANDI datasets show that our
methods achieve state-of-the-art performance on DIR
- …