1,722 research outputs found

    Inverting Adversarially Robust Networks for Image Synthesis

    Full text link
    Recent research in adversarially robust classifiers suggests their representations tend to be aligned with human perception, which makes them attractive for image synthesis and restoration applications. Despite favorable empirical results on a few downstream tasks, their advantages are limited to slow and sensitive optimization-based techniques. Moreover, their use on generative models remains unexplored. This work proposes the use of robust representations as a perceptual primitive for feature inversion models, and show its benefits with respect to standard non-robust image features. We empirically show that adopting robust representations as an image prior significantly improves the reconstruction accuracy of CNN-based feature inversion models. Furthermore, it allows reconstructing images at multiple scales out-of-the-box. Following these findings, we propose an encoding-decoding network based on robust representations and show its advantages for applications such as anomaly detection, style transfer and image denoising

    Making Vision Transformers Truly Shift-Equivariant

    Full text link
    For computer vision, Vision Transformers (ViTs) have become one of the go-to deep net architectures. Despite being inspired by Convolutional Neural Networks (CNNs), ViTs' output remains sensitive to small spatial shifts in the input, i.e., not shift invariant. To address this shortcoming, we introduce novel data-adaptive designs for each of the modules in ViTs, such as tokenization, self-attention, patch merging, and positional encoding. With our proposed modules, we achieve true shift-equivariance on four well-established ViTs, namely, Swin, SwinV2, CvT, and MViTv2. Empirically, we evaluate the proposed adaptive models on image classification and semantic segmentation tasks. These models achieve competitive performance across three different datasets while maintaining 100% shift consistency

    Local fluctuations in quantum critical metals

    Full text link
    We show that spatially local, yet low-energy, fluctuations can play an essential role in the physics of strongly correlated electron systems tuned to a quantum critical point. A detailed microscopic analysis of the Kondo lattice model is carried out within an extended dynamical mean-field approach. The correlation functions for the lattice model are calculated through a self-consistent Bose-Fermi Kondo problem, in which a local moment is coupled both to a fermionic bath and to a bosonic bath (a fluctuating magnetic field). A renormalization-group treatment of this impurity problem--perturbative in ϵ=1γ\epsilon=1-\gamma, where γ\gamma is an exponent characterizing the spectrum of the bosonic bath--shows that competition between the two couplings can drive the local-moment fluctuations critical. As a result, two distinct types of quantum critical point emerge in the Kondo lattice, one being of the usual spin-density-wave type, the other ``locally critical.'' Near the locally critical point, the dynamical spin susceptibility exhibits ω/T\omega/T scaling with a fractional exponent. While the spin-density-wave critical point is Gaussian, the locally critical point is an interacting fixed point at which long-wavelength and spatially local critical modes coexist. A Ginzburg-Landau description for the locally critical point is discussed. It is argued that these results are robust, that local criticality provides a natural description of the quantum critical behavior seen in a number of heavy-fermion metals, and that this picture may also be relevant to other strongly correlated metals.Comment: 20 pages, 12 figures; typos in figure 3 and in the main text corrected, version as publishe

    Text-Guided Neural Image Inpainting

    Full text link
    Image inpainting task requires filling the corrupted image with contents coherent with the context. This research field has achieved promising progress by using neural image inpainting methods. Nevertheless, there is still a critical challenge in guessing the missed content with only the context pixels. The goal of this paper is to fill the semantic information in corrupted images according to the provided descriptive text. Unique from existing text-guided image generation works, the inpainting models are required to compare the semantic content of the given text and the remaining part of the image, then find out the semantic content that should be filled for missing part. To fulfill such a task, we propose a novel inpainting model named Text-Guided Dual Attention Inpainting Network (TDANet). Firstly, a dual multimodal attention mechanism is designed to extract the explicit semantic information about the corrupted regions, which is done by comparing the descriptive text and complementary image areas through reciprocal attention. Secondly, an image-text matching loss is applied to maximize the semantic similarity of the generated image and the text. Experiments are conducted on two open datasets. Results show that the proposed TDANet model reaches new state-of-the-art on both quantitative and qualitative measures. Result analysis suggests that the generated images are consistent with the guidance text, enabling the generation of various results by providing different descriptions. Codes are available at https://github.com/idealwhite/TDANetComment: ACM MM'2020 (Oral). 9 pages, 4 tables, 7 figure
    corecore