76 research outputs found

    StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

    Full text link
    We investigate the potential of learning visual representations using synthetic images generated by text-to-image models. This is a natural question in the light of the excellent performance of such models in generating high-quality images. We consider specifically the Stable Diffusion, one of the leading open source text-to-image models. We show that (1) when the generative model is configured with proper classifier-free guidance scale, training self-supervised methods on synthetic images can match or beat the real image counterpart; (2) by treating the multiple images generated from the same text prompt as positives for each other, we develop a multi-positive contrastive learning method, which we call StableRep. With solely synthetic images, the representations learned by StableRep surpass the performance of representations learned by SimCLR and CLIP using the same set of text prompts and corresponding real images, on large scale datasets. When we further add language supervision, StableRep trained with 20M synthetic images achieves better accuracy than CLIP trained with 50M real images.Comment: code is available at: https://github.com/google-research/syn-rep-lear

    BLT: Bidirectional Layout Transformer for Controllable Layout Generation

    Full text link
    Creating visual layouts is a critical step in graphic design. Automatic generation of such layouts is essential for scalable and diverse visual designs. To advance conditional layout generation, we introduce BLT, a bidirectional layout transformer. BLT differs from previous work on transformers in adopting non-autoregressive transformers. In training, BLT learns to predict the masked attributes by attending to surrounding attributes in two directions. During inference, BLT first generates a draft layout from the input and then iteratively refines it into a high-quality layout by masking out low-confident attributes. The masks generated in both training and inference are controlled by a new hierarchical sampling policy. We verify the proposed model on six benchmarks of diverse design tasks. Experimental results demonstrate two benefits compared to the state-of-the-art layout transformer models. First, our model empowers layout transformers to fulfill controllable layout generation. Second, it achieves up to 10x speedup in generating a layout at inference time than the layout transformer baseline. Code is released at https://shawnkx.github.io/blt.Comment: ECCV 202

    Learning Disentangled Prompts for Compositional Image Synthesis

    Full text link
    We study domain-adaptive image synthesis, the problem of teaching pretrained image generative models a new style or concept from as few as one image to synthesize novel images, to better understand the compositional image synthesis. We present a framework that leverages a pretrained class-conditional generation model and visual prompt tuning. Specifically, we propose a novel source class distilled visual prompt that learns disentangled prompts of semantic (e.g., class) and domain (e.g., style) from a few images. Learned domain prompt is then used to synthesize images of any classes in the style of target domain. We conduct studies on various target domains with the number of images ranging from one to a few to many, and show qualitative results which show the compositional generalization of our method. Moreover, we show that our method can help improve zero-shot domain adaptation classification accuracy.Comment: tech repor

    VQ3D: Learning a 3D-Aware Generative Model on ImageNet

    Full text link
    Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars. However, these models struggle on larger, more complex datasets. To model diverse and unconstrained image collections such as ImageNet, we present VQ3D, which introduces a NeRF-based decoder into a two-stage vector-quantized autoencoder. Our Stage 1 allows for the reconstruction of an input image and the ability to change the camera position around the image, and our Stage 2 allows for the generation of new 3D scenes. VQ3D is capable of generating and reconstructing 3D-aware images from the 1000-class ImageNet dataset of 1.2 million training images. We achieve an ImageNet generation FID score of 16.8, compared to 69.8 for the next best baseline method.Comment: 15 pages. For visual results, please visit the project webpage at http://kylesargent.github.io/vq3

    Palette: Image-to-Image Diffusion Models

    Full text link
    This paper develops a unified framework for image-to-image translation based on conditional diffusion models and evaluates this framework on four challenging image-to-image translation tasks, namely colorization, inpainting, uncropping, and JPEG restoration. Our simple implementation of image-to-image diffusion models outperforms strong GAN and regression baselines on all tasks, without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss or sophisticated new techniques needed. We uncover the impact of an L2 vs. L1 loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention in the neural architecture through empirical studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, with human evaluation and sample quality scores (FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against original images). We expect this standardized evaluation protocol to play a role in advancing image-to-image translation research. Finally, we show that a generalist, multi-task diffusion model performs as well or better than task-specific specialist counterparts. Check out https://diffusion-palette.github.io for an overview of the results

    Zoom-to-Inpaint: Image Inpainting with High-Frequency Details

    Full text link
    Although deep learning has enabled a huge leap forward in image inpainting, current methods are often unable to synthesize realistic high-frequency details. In this paper, we propose applying super-resolution to coarsely reconstructed outputs, refining them at high resolution, and then downscaling the output to the original resolution. By introducing high-resolution images to the refinement network, our framework is able to reconstruct finer details that are usually smoothed out due to spectral bias - the tendency of neural networks to reconstruct low frequencies better than high frequencies. To assist training the refinement network on large upscaled holes, we propose a progressive learning technique in which the size of the missing regions increases as training progresses. Our zoom-in, refine and zoom-out strategy, combined with high-resolution supervision and progressive learning, constitutes a framework-agnostic approach for enhancing high-frequency details that can be applied to any CNN-based inpainting method. We provide qualitative and quantitative evaluations along with an ablation analysis to show the effectiveness of our approach. This seemingly simple, yet powerful approach, outperforms state-of-the-art inpainting methods

    RNA-binding protein CUGBP1 regulates insulin secretion via activation of phosphodiesterase 3B in mice

    Get PDF
    International audienceAims/hypothesis: CUG-binding protein 1 (CUGBP1) is a multifunctional RNA-binding protein that regulates RNA processing at several stages including translation, deadenylation and alternative splicing, as well as RNA stability. Recent studies indicate that CUGBP1 may play a role in metabolic disorders. Our objective was to examine its role in endocrine pancreas function through gain- and loss-of-function experiments and to further decipher the underlying molecular mechanisms.Methods: A mouse model in which type 2 diabetes was induced by a high-fat diet (HFD; 60% energy from fat) and mice on a standard chow diet (10% energy from fat) were compared. Pancreas-specific CUGBP1 overexpression and knockdown mice were generated. Different lengths of the phosphodiesterase subtype 3B (PDE3B) 3′ untranslated region (UTR) were cloned for luciferase reporter analysis. Purified CUGBP1 protein was used for gel shift experiments.Results: CUGBP1 is present in rodent islets and in beta cell lines; it is overexpressed in the islets of diabetic mice. Compared with control mice, the plasma insulin level after a glucose load was significantly lower and glucose clearance was greatly delayed in mice with pancreas-specific CUGBP1 overexpression; the opposite results were obtained upon pancreas-specific CUGBP1 knockdown. Glucose- and glucagon-like peptide1 (GLP-1)-stimulated insulin secretion was significantly attenuated in mouse islets upon CUGBP1 overexpression. This was associated with a strong decrease in intracellular cAMP levels, pointing to a potential role for cAMP PDEs. CUGBP1 overexpression had no effect on the mRNA levels of PDE1A, 1C, 2A, 3A, 4A, 4B, 4D, 7A and 8B subtypes, but resulted in increased PDE3B expression. CUGBP1 was found to directly bind to a specific ATTTGTT sequence residing in the 3′ UTR of PDE3B and stabilised PDE3B mRNA. In the presence of the PDE3 inhibitor cilostamide, glucose- and GLP-1-stimulated insulin secretion was no longer reduced by CUGBP1 overexpression. Similar to CUGBP1, PDE3B was overexpressed in the islets of diabetic mice.Conclusions/interpretation: We conclude that CUGBP1 is a critical regulator of insulin secretion via activating PDE3B. Repressing this protein might provide a potential strategy for treating type 2 diabetes
    • …
    corecore