1,722 research outputs found
Inverting Adversarially Robust Networks for Image Synthesis
Recent research in adversarially robust classifiers suggests their
representations tend to be aligned with human perception, which makes them
attractive for image synthesis and restoration applications. Despite favorable
empirical results on a few downstream tasks, their advantages are limited to
slow and sensitive optimization-based techniques. Moreover, their use on
generative models remains unexplored. This work proposes the use of robust
representations as a perceptual primitive for feature inversion models, and
show its benefits with respect to standard non-robust image features. We
empirically show that adopting robust representations as an image prior
significantly improves the reconstruction accuracy of CNN-based feature
inversion models. Furthermore, it allows reconstructing images at multiple
scales out-of-the-box. Following these findings, we propose an
encoding-decoding network based on robust representations and show its
advantages for applications such as anomaly detection, style transfer and image
denoising
Making Vision Transformers Truly Shift-Equivariant
For computer vision, Vision Transformers (ViTs) have become one of the go-to
deep net architectures. Despite being inspired by Convolutional Neural Networks
(CNNs), ViTs' output remains sensitive to small spatial shifts in the input,
i.e., not shift invariant. To address this shortcoming, we introduce novel
data-adaptive designs for each of the modules in ViTs, such as tokenization,
self-attention, patch merging, and positional encoding. With our proposed
modules, we achieve true shift-equivariance on four well-established ViTs,
namely, Swin, SwinV2, CvT, and MViTv2. Empirically, we evaluate the proposed
adaptive models on image classification and semantic segmentation tasks. These
models achieve competitive performance across three different datasets while
maintaining 100% shift consistency
Local fluctuations in quantum critical metals
We show that spatially local, yet low-energy, fluctuations can play an
essential role in the physics of strongly correlated electron systems tuned to
a quantum critical point. A detailed microscopic analysis of the Kondo lattice
model is carried out within an extended dynamical mean-field approach. The
correlation functions for the lattice model are calculated through a
self-consistent Bose-Fermi Kondo problem, in which a local moment is coupled
both to a fermionic bath and to a bosonic bath (a fluctuating magnetic field).
A renormalization-group treatment of this impurity problem--perturbative in
, where is an exponent characterizing the spectrum
of the bosonic bath--shows that competition between the two couplings can drive
the local-moment fluctuations critical. As a result, two distinct types of
quantum critical point emerge in the Kondo lattice, one being of the usual
spin-density-wave type, the other ``locally critical.'' Near the locally
critical point, the dynamical spin susceptibility exhibits scaling
with a fractional exponent. While the spin-density-wave critical point is
Gaussian, the locally critical point is an interacting fixed point at which
long-wavelength and spatially local critical modes coexist. A Ginzburg-Landau
description for the locally critical point is discussed. It is argued that
these results are robust, that local criticality provides a natural description
of the quantum critical behavior seen in a number of heavy-fermion metals, and
that this picture may also be relevant to other strongly correlated metals.Comment: 20 pages, 12 figures; typos in figure 3 and in the main text
corrected, version as publishe
Text-Guided Neural Image Inpainting
Image inpainting task requires filling the corrupted image with contents
coherent with the context. This research field has achieved promising progress
by using neural image inpainting methods. Nevertheless, there is still a
critical challenge in guessing the missed content with only the context pixels.
The goal of this paper is to fill the semantic information in corrupted images
according to the provided descriptive text. Unique from existing text-guided
image generation works, the inpainting models are required to compare the
semantic content of the given text and the remaining part of the image, then
find out the semantic content that should be filled for missing part. To
fulfill such a task, we propose a novel inpainting model named Text-Guided Dual
Attention Inpainting Network (TDANet). Firstly, a dual multimodal attention
mechanism is designed to extract the explicit semantic information about the
corrupted regions, which is done by comparing the descriptive text and
complementary image areas through reciprocal attention. Secondly, an image-text
matching loss is applied to maximize the semantic similarity of the generated
image and the text. Experiments are conducted on two open datasets. Results
show that the proposed TDANet model reaches new state-of-the-art on both
quantitative and qualitative measures. Result analysis suggests that the
generated images are consistent with the guidance text, enabling the generation
of various results by providing different descriptions. Codes are available at
https://github.com/idealwhite/TDANetComment: ACM MM'2020 (Oral). 9 pages, 4 tables, 7 figure
- …