348 research outputs found

    Deep Visual Unsupervised Domain Adaptation for Classification Tasks:A Survey

    Get PDF

    StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models

    Full text link
    Content and style (C-S) disentanglement is a fundamental problem and critical challenge of style transfer. Existing approaches based on explicit definitions (e.g., Gram matrix) or implicit learning (e.g., GANs) are neither interpretable nor easy to control, resulting in entangled representations and less satisfying results. In this paper, we propose a new C-S disentangled framework for style transfer without using previous assumptions. The key insight is to explicitly extract the content information and implicitly learn the complementary style information, yielding interpretable and controllable C-S disentanglement and style transfer. A simple yet effective CLIP-based style disentanglement loss coordinated with a style reconstruction prior is introduced to disentangle C-S in the CLIP image space. By further leveraging the powerful style removal and generative ability of diffusion models, our framework achieves superior results than state of the art and flexible C-S disentanglement and trade-off control. Our work provides new insights into the C-S disentanglement in style transfer and demonstrates the potential of diffusion models for learning well-disentangled C-S characteristics.Comment: Accepted by ICCV 202

    Evaluation of Generative Models for Predicting Microstructure Geometries in Laser Powder Bed Fusion Additive Manufacturing

    Get PDF
    In-situ process monitoring for metals additive manufacturing is paramount to the successful build of an object for application in extreme or high stress environments. In selective laser melting additive manufacturing, the process by which a laser melts metal powder during the build will dictate the internal microstructure of that object once the metal cools and solidifies. The difficulty lies in that obtaining enough variety of data to quantify the internal microstructures for the evaluation of its physical properties is problematic, as the laser passes at high speeds over powder grains at a micrometer scale. Imaging the process in-situ is complex and cost-prohibitive. However, generative modes can provide new artificially generated data. Generative adversarial networks synthesize new computationally derived data through a process that learns the underlying features corresponding to the different laser process parameters in a generator network, then improves upon those artificial renderings by evaluating through the discriminator network. While this technique was effective at delivering high-quality images, modifications to the network through conditions showed improved capabilities at creating these new images. Using multiple evaluation metrics, it has been shown that generative models can be used to create new data for various laser process parameter combinations, thereby allowing a more comprehensive evaluation of ideal laser conditions for any particular build

    Multimodal Adversarial Learning

    Get PDF
    Deep Convolutional Neural Networks (DCNN) have proven to be an exceptional tool for object recognition, generative modelling, and multi-modal learning in various computer vision applications. However, recent findings have shown that such state-of-the-art models can be easily deceived by inserting slight imperceptible perturbations to key pixels in the input. A good target detection systems can accurately identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. However, prior research still confirms that such state of the art targets models are susceptible to adversarial attacks. In the case of generative models, facial sketches drawn by artists mostly used by law enforcement agencies depend on the ability of the artist to clearly replicate all the key facial features that aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. However, the incorporation of hybrid discriminators, which perform attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network have shown to be powerful tools towards better multi modal learning techniques. In general, our overall approach was aimed at improving target detection systems and the visual appeal of synthesized images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesized image. We synthesized sketches using XDOG filter for the CelebA, Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results overall for different model applications are impressive compared to current state of the art
    • …
    corecore