234 research outputs found

    CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment

    Full text link
    Recent image tone adjustment (or enhancement) approaches have predominantly adopted supervised learning for learning human-centric perceptual assessment. However, these approaches are constrained by intrinsic challenges of supervised learning. Primarily, the requirement for expertly-curated or retouched images escalates the data acquisition expenses. Moreover, their coverage of target style is confined to stylistic variants inferred from the training data. To surmount the above challenges, we propose an unsupervised learning-based approach for text-based image tone adjustment method, CLIPtone, that extends an existing image enhancement method to accommodate natural language descriptions. Specifically, we design a hyper-network to adaptively modulate the pretrained parameters of the backbone model based on text description. To assess whether the adjusted image aligns with the text description without ground truth image, we utilize CLIP, which is trained on a vast set of language-image pairs and thus encompasses knowledge of human perception. The major advantages of our approach are three fold: (i) minimal data collection expenses, (ii) support for a range of adjustments, and (iii) the ability to handle novel text descriptions unseen in training. Our approach's efficacy is demonstrated through comprehensive experiments, including a user study

    Task-Oriented Diffusion Model Compression

    Full text link
    As recent advancements in large-scale Text-to-Image (T2I) diffusion models have yielded remarkable high-quality image generation, diverse downstream Image-to-Image (I2I) applications have emerged. Despite the impressive results achieved by these I2I models, their practical utility is hampered by their large model size and the computational burden of the iterative denoising process. In this paper, we explore the compression potential of these I2I models in a task-oriented manner and introduce a novel method for reducing both model size and the number of timesteps. Through extensive experiments, we observe key insights and use our empirical knowledge to develop practical solutions that aim for near-optimal results with minimal exploration costs. We validate the effectiveness of our method by applying it to InstructPix2Pix for image editing and StableSR for image restoration. Our approach achieves satisfactory output quality with 39.2% and 56.4% reduction in model footprint and 81.4% and 68.7% decrease in latency to InstructPix2Pix and StableSR, respectively

    Discriminative Indexing for Probabilistic Image Patch Priors

    Get PDF
    Abstract. Newly emerged probabilistic image patch priors, such as Expected Patch Log-Likelihood (EPLL), have shown excellent performance on image restoration tasks, especially deconvolution, due to its rich expressiveness. However, its applicability is limited by the heavy computation involved in the associated optimization process. Inspired by the recent advances on using regression trees to index priors defined on a Conditional Random Field, we propose a novel discriminative indexing approach on patch-based priors to expedite the optimization process. Specifically, we propose an efficient tree indexing structure for EPLL, and overcome its training tractability challenges in high-dimensional spaces by utilizing special structures of the prior. Experimental results show that our approach accelerates state-of-the-art EPLL-based deconvolution methods by up to 40 times, with very little quality compromise.

    Neural Spectro-polarimetric Fields

    Full text link
    Modeling the spatial radiance distribution of light rays in a scene has been extensively explored for applications, including view synthesis. Spectrum and polarization, the wave properties of light, are often neglected due to their integration into three RGB spectral bands and their non-perceptibility to human vision. Despite this, these properties encompass substantial material and geometric information about a scene. In this work, we propose to model spectro-polarimetric fields, the spatial Stokes-vector distribution of any light ray at an arbitrary wavelength. We present Neural Spectro-polarimetric Fields (NeSpoF), a neural representation that models the physically-valid Stokes vector at given continuous variables of position, direction, and wavelength. NeSpoF manages inherently noisy raw measurements, showcases memory efficiency, and preserves physically vital signals, factors that are crucial for representing the high-dimensional signal of a spectro-polarimetric field. To validate NeSpoF, we introduce the first multi-view hyperspectral-polarimetric image dataset, comprised of both synthetic and real-world scenes. These were captured using our compact hyperspectral-polarimetric imaging system, which has been calibrated for robustness against system imperfections. We demonstrate the capabilities of NeSpoF on diverse scenes

    UGPNet: Universal Generative Prior for Image Restoration

    Full text link
    Recent image restoration methods can be broadly categorized into two classes: (1) regression methods that recover the rough structure of the original image without synthesizing high-frequency details and (2) generative methods that synthesize perceptually-realistic high-frequency details even though the resulting image deviates from the original structure of the input. While both directions have been extensively studied in isolation, merging their benefits with a single framework has been rarely studied. In this paper, we propose UGPNet, a universal image restoration framework that can effectively achieve the benefits of both approaches by simply adopting a pair of an existing regression model and a generative model. UGPNet first restores the image structure of a degraded input using a regression model and synthesizes a perceptually-realistic image with a generative model on top of the regressed output. UGPNet then combines the regressed output and the synthesized output, resulting in a final result that faithfully reconstructs the structure of the original image in addition to perceptually-realistic textures. Our extensive experiments on deblurring, denoising, and super-resolution demonstrate that UGPNet can successfully exploit both regression and generative methods for high-fidelity image restoration.Comment: Accepted to WACV 202

    ExBluRF: Efficient Radiance Fields for Extreme Motion Blurred Images

    Full text link
    We present ExBluRF, a novel view synthesis method for extreme motion blurred images based on efficient radiance fields optimization. Our approach consists of two main components: 6-DOF camera trajectory-based motion blur formulation and voxel-based radiance fields. From extremely blurred images, we optimize the sharp radiance fields by jointly estimating the camera trajectories that generate the blurry images. In training, multiple rays along the camera trajectory are accumulated to reconstruct single blurry color, which is equivalent to the physical motion blur operation. We minimize the photo-consistency loss on blurred image space and obtain the sharp radiance fields with camera trajectories that explain the blur of all images. The joint optimization on the blurred image space demands painfully increasing computation and resources proportional to the blur size. Our method solves this problem by replacing the MLP-based framework to low-dimensional 6-DOF camera poses and voxel-based radiance fields. Compared with the existing works, our approach restores much sharper 3D scenes from challenging motion blurred views with the order of 10 times less training time and GPU memory consumption.Comment: https://github.com/taekkii/ExBluRF/tree/mai

    Video deblurring for hand-held cameras using patch-based synthesis

    Get PDF

    Digestive neural networks:A novel defense strategy against inference attacks in federated learning

    Get PDF
    Federated Learning (FL) is an efficient and secure machine learning technique designed for decentralized computing systems such as fog and edge computing. Its learning process employs frequent communications as the participating local devices send updates, either gradients or parameters of their models, to a central server that aggregates them and redistributes new weights to the devices. In FL, private data does not leave the individual local devices, and thus, rendered as a robust solution in terms of privacy preservation. However, the recently introduced membership inference attacks pose a critical threat to the impeccability of FL mechanisms. By eavesdropping only on the updates transferring to the center server, these attacks can recover the private data of a local device. A prevalent solution against such attacks is the differential privacy scheme that augments a sufficient amount of noise to each update to hinder the recovering process. However, it suffers from a significant sacrifice in the classification accuracy of the FL. To effectively alleviate the problem, this paper proposes a Digestive Neural Network (DNN), an independent neural network attached to the FL. The private data owned by each device will pass through the DNN and then train the FL. The DNN modifies the input data, which results in distorting updates, in a way to maximize the classification accuracy of FL while the accuracy of inference attacks is minimized. Our simulation result shows that the proposed DNN shows significant performance on both gradient sharing- and weight sharing-based FL mechanisms. For the gradient sharing, the DNN achieved higher classification accuracy by 16.17% while 9% lower attack accuracy than the existing differential privacy schemes. For the weight sharing FL scheme, the DNN achieved at most 46.68% lower attack success rate with 3% higher classification accuracy
    corecore