259 research outputs found

    Multi-Concept Customization of Text-to-Image Diffusion

    Full text link
    While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to quickly acquire a new concept, given a few examples? Furthermore, can we compose multiple new concepts together? We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts while enabling fast tuning (~6 minutes). Additionally, we can jointly train for multiple concepts or combine multiple fine-tuned models into one via closed-form constrained optimization. Our fine-tuned model generates variations of multiple new concepts and seamlessly composes them with existing concepts in novel settings. Our method outperforms or performs on par with several baselines and concurrent works in both qualitative and quantitative evaluations while being memory and computationally efficient.Comment: Updated v2 with results on the new CustomConcept101 dataset https://www.cs.cmu.edu/~custom-diffusion/dataset.html Project webpage: https://www.cs.cmu.edu/~custom-diffusio

    Conservation and Expression Patterns Divergence of Ascorbic Acid d-mannose/l-galactose Pathway Genes in Brassica rapa

    Get PDF
    Ascorbic acid (AsA) participates in diverse biological processes, is regulated by multiple factors and is a potent antioxidant and cellular reductant. The D-mannose/L-galactose pathway is a major plant AsA biosynthetic pathway that is highly connected within biosynthetic networks, and generally conserved across plants. Previous work has shown that, although most genes of this pathway are expressed under standard growth conditions in Brassica rapa, some paralogs of these genes are not. We hypothesize that regulatory evolution in duplicate AsA pathway genes has occurred as an adaptation to environmental stressors, and that gene retention has been influenced by polyploidation events in Brassicas. To test these hypotheses, we explored the conservation of these genes in Brassicas and their expression patterns divergence in B. rapa. Similar retention and a high degree of gene sequence similarity were identified in B. rapa (A genome), Brassica oleracea (C genome) and Brassica napus (AC genome). However, the number of genes that encode the same type of enzymes varied among the three plant species. With the exception of GMP, which has nine genes, there were one to four genes that encoded the other enzymes. Moreover, we found that expression patterns divergence widely exists among these genes. i) VTC2 and VTC5 are paralogous genes, but only VTC5 is influenced by FLC. ii) Under light treatment, PMI1 co-regulates the AsA pool size with other D-Man/L-Gal pathway genes, whereas PMI2 is regulated only by darkness. iii) Under NaCl, Cu2+, MeJA and wounding stresses, most of the paralogs exhibit different expression patterns. Additionally, GME and GPP are the key regulatory enzymes that limit AsA biosynthesis in response to these treatments. In conclusion, our data support that the conservative and divergent expression patterns of D-Man/L-Gal pathway genes not only avoid AsA biosynthesis network instability but also allow B. rapa to better adapt to complex environments

    Drag on a partially immersed sphere at the capillary scale

    Full text link
    We study the drag on a centimetric sphere in a uniform flow in the presence of a free surface as a function of submergence depth. Through direct force measurements in a custom benchtop recirculating flume, we demonstrate that the drag can significantly exceed the corresponding drag in a single-phase flow and achieves a peak at submergence depths just prior to complete immersion. The additional drag in the partially immersed state is rationalized by considering hydrostatic effects associated with the asymmetric surface height profile induced by the obstacle in the flow direction which persists for flow speeds below the minimum capillary-gravity wave speed. At these scales, the sphere's wettability plays a pronounced role in determining the maximum possible drag and results in hysteretic behaviors near touchdown and complete immersion. The influence of flow speed, sphere size, and surface tension on the drag characteristics are additionally explored through a combination of experiments and numerical simulations.Comment: 9 figure

    Ablating Concepts in Text-to-Image Diffusion Models

    Full text link
    Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.Comment: ICCV 2023. Project website: https://www.cs.cmu.edu/~concept-ablation

    Scaling up GANs for Text-to-Image Synthesis

    Full text link
    The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.Comment: CVPR 2023. Project webpage at https://mingukkang.github.io/GigaGAN
    corecore