10 research outputs found

    Integrative Feature and Cost Aggregation with Transformers for Dense Correspondence

    Full text link
    We present a novel architecture for dense correspondence. The current state-of-the-art are Transformer-based approaches that focus on either feature descriptors or cost volume aggregation. However, they generally aggregate one or the other but not both, though joint aggregation would boost each other by providing information that one has but other lacks, i.e., structural or semantic information of an image, or pixel-wise matching similarity. In this work, we propose a novel Transformer-based network that interleaves both forms of aggregations in a way that exploits their complementary information. Specifically, we design a self-attention layer that leverages the descriptor to disambiguate the noisy cost volume and that also utilizes the cost volume to aggregate features in a manner that promotes accurate matching. A subsequent cross-attention layer performs further aggregation conditioned on the descriptors of both images and aided by the aggregated outputs of earlier layers. We further boost the performance with hierarchical processing, in which coarser level aggregations guide those at finer levels. We evaluate the effectiveness of the proposed method on dense matching tasks and achieve state-of-the-art performance on all the major benchmarks. Extensive ablation studies are also provided to validate our design choices.Comment: v2 includes supplementary material, while v1 does no

    Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

    Full text link
    This paper presents a novel cost aggregation network, called Volumetric Aggregation with Transformers (VAT), for few-shot segmentation. The use of transformers can benefit correlation map aggregation through self-attention over a global receptive field. However, the tokenization of a correlation map for transformer processing can be detrimental, because the discontinuity at token boundaries reduces the local context available near the token edges and decreases inductive bias. To address this problem, we propose a 4D Convolutional Swin Transformer, where a high-dimensional Swin Transformer is preceded by a series of small-kernel convolutions that impart local context to all pixels and introduce convolutional inductive bias. We additionally boost aggregation performance by applying transformers within a pyramidal structure, where aggregation at a coarser level guides aggregation at a finer level. Noise in the transformer output is then filtered in the subsequent decoder with the help of the query's appearance embedding. With this model, a new state-of-the-art is set for all the standard benchmarks in few-shot segmentation. It is shown that VAT attains state-of-the-art performance for semantic correspondence as well, where cost aggregation also plays a central role.Comment: Code and trained models are available at https://seokju-cho.github.io/VAT/ . This is ECCV'22 camera-ready version, which is revised from arXiv:2112.1168

    MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image Translation

    Full text link
    We present a novel method for exemplar-based image translation, called matching interleaved diffusion models (MIDMs). Most existing methods for this task were formulated as GAN-based matching-then-generation framework. However, in this framework, matching errors induced by the difficulty of semantic matching across cross-domain, e.g., sketch and photo, can be easily propagated to the generation step, which in turn leads to degenerated results. Motivated by the recent success of diffusion models overcoming the shortcomings of GANs, we incorporate the diffusion models to overcome these limitations. Specifically, we formulate a diffusion-based matching-and-generation framework that interleaves cross-domain matching and diffusion steps in the latent space by iteratively feeding the intermediate warp into the noising process and denoising it to generate a translated image. In addition, to improve the reliability of the diffusion process, we design a confidence-aware process using cycle-consistency to consider only confident regions during translation. Experimental results show that our MIDMs generate more plausible images than state-of-the-art methods

    D\"aRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation

    Full text link
    Neural radiance fields (NeRF) shows powerful performance in novel view synthesis and 3D geometry reconstruction, but it suffers from critical performance degradation when the number of known viewpoints is drastically reduced. Existing works attempt to overcome this problem by employing external priors, but their success is limited to certain types of scenes or datasets. Employing monocular depth estimation (MDE) networks, pretrained on large-scale RGB-D datasets, with powerful generalization capability would be a key to solving this problem: however, using MDE in conjunction with NeRF comes with a new set of challenges due to various ambiguity problems exhibited by monocular depths. In this light, we propose a novel framework, dubbed D\"aRF, that achieves robust NeRF reconstruction with a handful of real-world images by combining the strengths of NeRF and monocular depth estimation through online complementary training. Our framework imposes the MDE network's powerful geometry prior to NeRF representation at both seen and unseen viewpoints to enhance its robustness and coherence. In addition, we overcome the ambiguity problems of monocular depths through patch-wise scale-shift fitting and geometry distillation, which adapts the MDE network to produce depths aligned accurately with NeRF geometry. Experiments show our framework achieves state-of-the-art results both quantitatively and qualitatively, demonstrating consistent and reliable performance in both indoor and outdoor real-world datasets. Project page is available at https://ku-cvlab.github.io/DaRF/.Comment: Project Page: https://ku-cvlab.github.io/DaRF

    LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

    Full text link
    Existing techniques for image-to-image translation commonly have suffered from two critical problems: heavy reliance on per-sample domain annotation and/or inability of handling multiple attributes per image. Recent truly-unsupervised methods adopt clustering approaches to easily provide per-sample one-hot domain labels. However, they cannot account for the real-world setting: one sample may have multiple attributes. In addition, the semantics of the clusters are not easily coupled to the human understanding. To overcome these, we present a LANguage-driven Image-to-image Translation model, dubbed LANIT. We leverage easy-to-obtain candidate attributes given in texts for a dataset: the similarity between images and attributes indicates per-sample domain labels. This formulation naturally enables multi-hot label so that users can specify the target domain with a set of attributes in language. To account for the case that the initial prompts are inaccurate, we also present prompt learning. We further present domain regularization loss that enforces translated images be mapped to the corresponding domain. Experiments on several standard benchmarks demonstrate that LANIT achieves comparable or superior performance to existing models.Comment: Accepted to CVPR 2023. Project Page: https://ku-cvlab.github.io/LANIT

    DiffFace: Diffusion-based Face Swapping with Facial Guidance

    Full text link
    In this paper, we propose a diffusion-based face swapping framework for the first time, called DiffFace, composed of training ID conditional DDPM, sampling with facial guidance, and a target-preserving blending. In specific, in the training process, the ID conditional DDPM is trained to generate face images with the desired identity. In the sampling process, we use the off-the-shelf facial expert models to make the model transfer source identity while preserving target attributes faithfully. During this process, to preserve the background of the target image and obtain the desired face swapping result, we additionally propose a target-preserving blending strategy. It helps our model to keep the attributes of the target face from noise while transferring the source facial identity. In addition, without any re-training, our model can flexibly apply additional facial guidance and adaptively control the ID-attributes trade-off to achieve the desired results. To the best of our knowledge, this is the first approach that applies the diffusion model in face swapping task. Compared with previous GAN-based approaches, by taking advantage of the diffusion model for the face swapping task, DiffFace achieves better benefits such as training stability, high fidelity, diversity of the samples, and controllability. Extensive experiments show that our DiffFace is comparable or superior to the state-of-the-art methods on several standard face swapping benchmarks.Comment: Project Page: https://hxngiee.github.io/DiffFac

    Piezo-Transmissive Structure Using a Multi-layered Heterogeneous Film for Optical Transmittance Modulation

    No full text
    As the damage caused by the recent climate crisis increases, efforts are being made to develop low-power and high-efficiency technologies to reduce pollution for energy production worldwide. Among them, research on the mechano-responsive optical transmittance modulation technology is being actively conducted as it can be applied to various application fields for reducing energy consumption: low-power sensors and smart windows. The piezo-transmittance structure, which is one of the optical transmittance modulation structures, has fewer constraints on the installation environment; therefore, many applications have been proposed. However, it is still challenging to fabricate a piezo-transmittance structure with a large-area production, high throughput, and good tunability because of complex curing and dissolution processes. Herein, we present an efficient fabrication method for a multi-layered piezo-transmittance structure using a large-area abrasive mold and thermal imprinting process. The piezo-transmittance performance (e.g., sensitivity and relative change of transmittance) shows temperature/humidity-independent characteristics and can be designed by tuning design parameters such as the number of layers, abrasive grade, and film material. Also, the surrogate model of the performance obtained from the Monte Carlo simulation and prediction model can offer tunability for various applications. Finally, we demonstrated two energy-efficient applications: the smart window integrated with a hydraulic pump showed high thermal efficiency in indoor environment control, and the telemetry system was demonstrated to measure pressure remotely

    Piezo-Transmissive Structure Using a Multi-layered Heterogeneous Film for Optical Transmittance Modulation

    No full text
    As the damage caused by the recent climate crisis increases, efforts are being made to develop low-power and high-efficiency technologies to reduce pollution for energy production worldwide. Among them, research on the mechano-responsive optical transmittance modulation technology is being actively conducted as it can be applied to various application fields for reducing energy consumption: low-power sensors and smart windows. The piezo-transmittance structure, which is one of the optical transmittance modulation structures, has fewer constraints on the installation environment; therefore, many applications have been proposed. However, it is still challenging to fabricate a piezo-transmittance structure with a large-area production, high throughput, and good tunability because of complex curing and dissolution processes. Herein, we present an efficient fabrication method for a multi-layered piezo-transmittance structure using a large-area abrasive mold and thermal imprinting process. The piezo-transmittance performance (e.g., sensitivity and relative change of transmittance) shows temperature/humidity-independent characteristics and can be designed by tuning design parameters such as the number of layers, abrasive grade, and film material. Also, the surrogate model of the performance obtained from the Monte Carlo simulation and prediction model can offer tunability for various applications. Finally, we demonstrated two energy-efficient applications: the smart window integrated with a hydraulic pump showed high thermal efficiency in indoor environment control, and the telemetry system was demonstrated to measure pressure remotely

    Piezo-Transmissive Structure Using a Multi-layered Heterogeneous Film for Optical Transmittance Modulation

    No full text
    As the damage caused by the recent climate crisis increases, efforts are being made to develop low-power and high-efficiency technologies to reduce pollution for energy production worldwide. Among them, research on the mechano-responsive optical transmittance modulation technology is being actively conducted as it can be applied to various application fields for reducing energy consumption: low-power sensors and smart windows. The piezo-transmittance structure, which is one of the optical transmittance modulation structures, has fewer constraints on the installation environment; therefore, many applications have been proposed. However, it is still challenging to fabricate a piezo-transmittance structure with a large-area production, high throughput, and good tunability because of complex curing and dissolution processes. Herein, we present an efficient fabrication method for a multi-layered piezo-transmittance structure using a large-area abrasive mold and thermal imprinting process. The piezo-transmittance performance (e.g., sensitivity and relative change of transmittance) shows temperature/humidity-independent characteristics and can be designed by tuning design parameters such as the number of layers, abrasive grade, and film material. Also, the surrogate model of the performance obtained from the Monte Carlo simulation and prediction model can offer tunability for various applications. Finally, we demonstrated two energy-efficient applications: the smart window integrated with a hydraulic pump showed high thermal efficiency in indoor environment control, and the telemetry system was demonstrated to measure pressure remotely
    corecore