41 research outputs found

    Taming Reversible Halftoning via Predictive Luminance

    Full text link
    Traditional halftoning usually drops colors when dithering images with binary dots, which makes it difficult to recover the original color information. We proposed a novel halftoning technique that converts a color image into a binary halftone with full restorability to its original version. Our novel base halftoning technique consists of two convolutional neural networks (CNNs) to produce the reversible halftone patterns, and a noise incentive block (NIB) to mitigate the flatness degradation issue of CNNs. Furthermore, to tackle the conflicts between the blue-noise quality and restoration accuracy in our novel base method, we proposed a predictor-embedded approach to offload predictable information from the network, which in our case is the luminance information resembling from the halftone pattern. Such an approach allows the network to gain more flexibility to produce halftones with better blue-noise quality without compromising the restoration quality. Detailed studies on the multiple-stage training method and loss weightings have been conducted. We have compared our predictor-embedded method and our novel method regarding spectrum analysis on halftone, halftone accuracy, restoration accuracy, and the data embedding studies. Our entropy evaluation evidences our halftone contains less encoding information than our novel base method. The experiments show our predictor-embedded method gains more flexibility to improve the blue-noise quality of halftones and maintains a comparable restoration quality with a higher tolerance for disturbances.Comment: to be published in IEEE Transactions on Visualization and Computer Graphic

    Sketch Video Synthesis

    Full text link
    Understanding semantic intricacies and high-level concepts is essential in image sketch generation, and this challenge becomes even more formidable when applied to the domain of videos. To address this, we propose a novel optimization-based framework for sketching videos represented by the frame-wise B\'ezier curve. In detail, we first propose a cross-frame stroke initialization approach to warm up the location and the width of each curve. Then, we optimize the locations of these curves by utilizing a semantic loss based on CLIP features and a newly designed consistency loss using the self-decomposed 2D atlas network. Built upon these design elements, the resulting sketch video showcases impressive visual abstraction and temporal coherence. Furthermore, by transforming a video into SVG lines through the sketching process, our method unlocks applications in sketch-based video editing and video doodling, enabled through video composition, as exemplified in the teaser.Comment: Webpage: https://sketchvideo.github.io/ Github: https://github.com/yudianzheng/SketchVide

    VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

    Full text link
    Text-to-video generation aims to produce a video based on a given prompt. Recently, several commercial video models have been able to generate plausible videos with minimal noise, excellent details, and high aesthetic scores. However, these models rely on large-scale, well-filtered, high-quality videos that are not accessible to the community. Many existing research works, which train models using the low-quality WebVid-10M dataset, struggle to generate high-quality videos because the models are optimized to fit WebVid-10M. In this work, we explore the training scheme of video models extended from Stable Diffusion and investigate the feasibility of leveraging low-quality videos and synthesized high-quality images to obtain a high-quality video model. We first analyze the connection between the spatial and temporal modules of video models and the distribution shift to low-quality videos. We observe that full training of all modules results in a stronger coupling between spatial and temporal modules than only training temporal modules. Based on this stronger coupling, we shift the distribution to higher quality without motion degradation by finetuning spatial modules with high-quality images, resulting in a generic high-quality video model. Evaluations are conducted to demonstrate the superiority of the proposed method, particularly in picture quality, motion, and concept composition.Comment: Homepage: https://ailab-cvc.github.io/videocrafter; Github: https://github.com/AILab-CVC/VideoCrafte

    FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

    Full text link
    With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress. However, existing video generation models are typically trained on a limited number of frames, resulting in the inability to generate high-fidelity long videos during inference. Furthermore, these models only support single-text conditions, whereas real-life scenarios often require multi-text conditions as the video content changes over time. To tackle these challenges, this study explores the potential of extending the text-driven capability to generate longer videos conditioned on multiple texts. 1) We first analyze the impact of initial noise in video diffusion models. Then building upon the observation of noise, we propose FreeNoise, a tuning-free and time-efficient paradigm to enhance the generative capabilities of pretrained video diffusion models while preserving content consistency. Specifically, instead of initializing noises for all frames, we reschedule a sequence of noises for long-range correlation and perform temporal attention over them by window-based function. 2) Additionally, we design a novel motion injection method to support the generation of videos conditioned on multiple text prompts. Extensive experiments validate the superiority of our paradigm in extending the generative capabilities of video diffusion models. It is noteworthy that compared with the previous best-performing method which brought about 255% extra time cost, our method incurs only negligible time cost of approximately 17%. Generated video samples are available at our website: http://haonanqiu.com/projects/FreeNoise.html.Comment: Project Page: http://haonanqiu.com/projects/FreeNoise.html Code Repo: https://github.com/arthur-qiu/LongerCrafte

    The 2-Aminoethoxydiphenyl Borate Analog Dpb161 Blocks Storeoperated Ca 2+ Entry In Acutely Dissociated Rat Submandibular Cells

    Get PDF
    Cellular Ca 2+ signals play a critical role in cell physiology and pathology. In most non-excitable cells, store-operated Ca 2+ entry (SOCE) is an important mechanism by which intracellular Ca 2+ signaling is regulated. However, few drugs can selectively modulate SOCE. 2-Aminoethoxydiphenyl borate (2APB) and its analogs (DPB162 and DPB163) have been reported to inhibit SOCE. Here, we examined the effects of another 2-APB analog, DPB161 on SOCE in acutely-isolated rat submandibular cells. Both patch-clamp recordings and Ca 2+ imaging showed that upon removal of extracellular Ca 2+ ([Ca 2+ ] o =0), rat submandibular cells were unable to maintain ACh-induced Ca 2+ oscillations, but restoration of [Ca 2+ ] o to refill Ca 2+ stores enable recovery of these Ca 2+ oscillations. However, addition of 50 μM DPB161 with [Ca 2+ ] o to extracellular solution prevented the refilling of Ca 2+ store. Fura-2 Ca 2+ imaging showed that DPB161 inhibited SOCE in a concentration-dependent manner. After depleting Ca 2+ stores by thapsigargin treatment, bath perfusion of 1 mM Ca 2+ induced [Ca 2+ ] i elevation in a manner that was prevented by DPB161. Collectively, these results show that the 2-APB analog DPB161 blocks SOCE in rat submandibular cells, suggesting that this compound can be developed as a pharmacological tool for the study of SOCE function and as a new therapeutic agent for treating SOCE-associated disorders

    StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

    Full text link
    Text-to-video (T2V) models have shown remarkable capabilities in generating diverse videos. However, they struggle to produce user-desired stylized videos due to (i) text's inherent clumsiness in expressing specific styles and (ii) the generally degraded style fidelity. To address these challenges, we introduce StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, enabling video generation in any style by providing a reference image. Considering the scarcity of stylized video datasets, we propose to first train a style control adapter using style-rich image datasets, then transfer the learned stylization ability to video generation through a tailor-made finetuning paradigm. To promote content-style disentanglement, we remove style descriptions from the text prompt and extract style information solely from the reference image using a decoupling learning strategy. Additionally, we design a scale-adaptive fusion module to balance the influences of text-based content features and image-based style features, which helps generalization across various text and style combinations. StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images. Experiments demonstrate that our approach is more flexible and efficient than existing competitors.Comment: Project page: https://gongyeliu.github.io/StyleCrafter.github.io/ ; GitHub repository: https://github.com/GongyeLiu/StyleCrafte

    VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

    Full text link
    We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism. Given a talking-head video, we first modify the expression of each frame according to the same expression template using the expression editing network, resulting in a video with the canonical expression. This video, together with the given audio, is then fed into the lip-sync network to generate a lip-syncing video. Finally, we improve the photo-realism of the synthesized faces through an identity-aware face enhancement network and post-processing. We use learning-based approaches for all three steps and all our modules can be tackled in a sequential pipeline without any user intervention. Furthermore, our system is a generic approach that does not need to be retrained to a specific person. Evaluations on two widely-used datasets and in-the-wild examples demonstrate the superiority of our framework over other state-of-the-art methods in terms of lip-sync accuracy and visual quality.Comment: Accepted by SIGGRAPH Asia 2022 Conference Proceedings. Project page: https://vinthony.github.io/video-retalking
    corecore