41 research outputs found
Taming Reversible Halftoning via Predictive Luminance
Traditional halftoning usually drops colors when dithering images with binary
dots, which makes it difficult to recover the original color information. We
proposed a novel halftoning technique that converts a color image into a binary
halftone with full restorability to its original version. Our novel base
halftoning technique consists of two convolutional neural networks (CNNs) to
produce the reversible halftone patterns, and a noise incentive block (NIB) to
mitigate the flatness degradation issue of CNNs. Furthermore, to tackle the
conflicts between the blue-noise quality and restoration accuracy in our novel
base method, we proposed a predictor-embedded approach to offload predictable
information from the network, which in our case is the luminance information
resembling from the halftone pattern. Such an approach allows the network to
gain more flexibility to produce halftones with better blue-noise quality
without compromising the restoration quality. Detailed studies on the
multiple-stage training method and loss weightings have been conducted. We have
compared our predictor-embedded method and our novel method regarding spectrum
analysis on halftone, halftone accuracy, restoration accuracy, and the data
embedding studies. Our entropy evaluation evidences our halftone contains less
encoding information than our novel base method. The experiments show our
predictor-embedded method gains more flexibility to improve the blue-noise
quality of halftones and maintains a comparable restoration quality with a
higher tolerance for disturbances.Comment: to be published in IEEE Transactions on Visualization and Computer
Graphic
Sketch Video Synthesis
Understanding semantic intricacies and high-level concepts is essential in
image sketch generation, and this challenge becomes even more formidable when
applied to the domain of videos. To address this, we propose a novel
optimization-based framework for sketching videos represented by the frame-wise
B\'ezier curve. In detail, we first propose a cross-frame stroke initialization
approach to warm up the location and the width of each curve. Then, we optimize
the locations of these curves by utilizing a semantic loss based on CLIP
features and a newly designed consistency loss using the self-decomposed 2D
atlas network. Built upon these design elements, the resulting sketch video
showcases impressive visual abstraction and temporal coherence. Furthermore, by
transforming a video into SVG lines through the sketching process, our method
unlocks applications in sketch-based video editing and video doodling, enabled
through video composition, as exemplified in the teaser.Comment: Webpage: https://sketchvideo.github.io/ Github:
https://github.com/yudianzheng/SketchVide
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Text-to-video generation aims to produce a video based on a given prompt.
Recently, several commercial video models have been able to generate plausible
videos with minimal noise, excellent details, and high aesthetic scores.
However, these models rely on large-scale, well-filtered, high-quality videos
that are not accessible to the community. Many existing research works, which
train models using the low-quality WebVid-10M dataset, struggle to generate
high-quality videos because the models are optimized to fit WebVid-10M. In this
work, we explore the training scheme of video models extended from Stable
Diffusion and investigate the feasibility of leveraging low-quality videos and
synthesized high-quality images to obtain a high-quality video model. We first
analyze the connection between the spatial and temporal modules of video models
and the distribution shift to low-quality videos. We observe that full training
of all modules results in a stronger coupling between spatial and temporal
modules than only training temporal modules. Based on this stronger coupling,
we shift the distribution to higher quality without motion degradation by
finetuning spatial modules with high-quality images, resulting in a generic
high-quality video model. Evaluations are conducted to demonstrate the
superiority of the proposed method, particularly in picture quality, motion,
and concept composition.Comment: Homepage: https://ailab-cvc.github.io/videocrafter; Github:
https://github.com/AILab-CVC/VideoCrafte
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
With the availability of large-scale video datasets and the advances of
diffusion models, text-driven video generation has achieved substantial
progress. However, existing video generation models are typically trained on a
limited number of frames, resulting in the inability to generate high-fidelity
long videos during inference. Furthermore, these models only support
single-text conditions, whereas real-life scenarios often require multi-text
conditions as the video content changes over time. To tackle these challenges,
this study explores the potential of extending the text-driven capability to
generate longer videos conditioned on multiple texts. 1) We first analyze the
impact of initial noise in video diffusion models. Then building upon the
observation of noise, we propose FreeNoise, a tuning-free and time-efficient
paradigm to enhance the generative capabilities of pretrained video diffusion
models while preserving content consistency. Specifically, instead of
initializing noises for all frames, we reschedule a sequence of noises for
long-range correlation and perform temporal attention over them by window-based
function. 2) Additionally, we design a novel motion injection method to support
the generation of videos conditioned on multiple text prompts. Extensive
experiments validate the superiority of our paradigm in extending the
generative capabilities of video diffusion models. It is noteworthy that
compared with the previous best-performing method which brought about 255%
extra time cost, our method incurs only negligible time cost of approximately
17%. Generated video samples are available at our website:
http://haonanqiu.com/projects/FreeNoise.html.Comment: Project Page: http://haonanqiu.com/projects/FreeNoise.html Code Repo:
https://github.com/arthur-qiu/LongerCrafte
The 2-Aminoethoxydiphenyl Borate Analog Dpb161 Blocks Storeoperated Ca 2+ Entry In Acutely Dissociated Rat Submandibular Cells
Cellular Ca 2+ signals play a critical role in cell physiology and pathology. In most non-excitable cells, store-operated Ca 2+ entry (SOCE) is an important mechanism by which intracellular Ca 2+ signaling is regulated. However, few drugs can selectively modulate SOCE. 2-Aminoethoxydiphenyl borate (2APB) and its analogs (DPB162 and DPB163) have been reported to inhibit SOCE. Here, we examined the effects of another 2-APB analog, DPB161 on SOCE in acutely-isolated rat submandibular cells. Both patch-clamp recordings and Ca 2+ imaging showed that upon removal of extracellular Ca 2+ ([Ca 2+ ] o =0), rat submandibular cells were unable to maintain ACh-induced Ca 2+ oscillations, but restoration of [Ca 2+ ] o to refill Ca 2+ stores enable recovery of these Ca 2+ oscillations. However, addition of 50 μM DPB161 with [Ca 2+ ] o to extracellular solution prevented the refilling of Ca 2+ store. Fura-2 Ca 2+ imaging showed that DPB161 inhibited SOCE in a concentration-dependent manner. After depleting Ca 2+ stores by thapsigargin treatment, bath perfusion of 1 mM Ca 2+ induced [Ca 2+ ] i elevation in a manner that was prevented by DPB161. Collectively, these results show that the 2-APB analog DPB161 blocks SOCE in rat submandibular cells, suggesting that this compound can be developed as a pharmacological tool for the study of SOCE function and as a new therapeutic agent for treating SOCE-associated disorders
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
Text-to-video (T2V) models have shown remarkable capabilities in generating
diverse videos. However, they struggle to produce user-desired stylized videos
due to (i) text's inherent clumsiness in expressing specific styles and (ii)
the generally degraded style fidelity. To address these challenges, we
introduce StyleCrafter, a generic method that enhances pre-trained T2V models
with a style control adapter, enabling video generation in any style by
providing a reference image. Considering the scarcity of stylized video
datasets, we propose to first train a style control adapter using style-rich
image datasets, then transfer the learned stylization ability to video
generation through a tailor-made finetuning paradigm. To promote content-style
disentanglement, we remove style descriptions from the text prompt and extract
style information solely from the reference image using a decoupling learning
strategy. Additionally, we design a scale-adaptive fusion module to balance the
influences of text-based content features and image-based style features, which
helps generalization across various text and style combinations. StyleCrafter
efficiently generates high-quality stylized videos that align with the content
of the texts and resemble the style of the reference images. Experiments
demonstrate that our approach is more flexible and efficient than existing
competitors.Comment: Project page: https://gongyeliu.github.io/StyleCrafter.github.io/ ;
GitHub repository: https://github.com/GongyeLiu/StyleCrafte
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
We present VideoReTalking, a new system to edit the faces of a real-world
talking head video according to input audio, producing a high-quality and
lip-syncing output video even with a different emotion. Our system disentangles
this objective into three sequential tasks: (1) face video generation with a
canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for
improving photo-realism. Given a talking-head video, we first modify the
expression of each frame according to the same expression template using the
expression editing network, resulting in a video with the canonical expression.
This video, together with the given audio, is then fed into the lip-sync
network to generate a lip-syncing video. Finally, we improve the photo-realism
of the synthesized faces through an identity-aware face enhancement network and
post-processing. We use learning-based approaches for all three steps and all
our modules can be tackled in a sequential pipeline without any user
intervention. Furthermore, our system is a generic approach that does not need
to be retrained to a specific person. Evaluations on two widely-used datasets
and in-the-wild examples demonstrate the superiority of our framework over
other state-of-the-art methods in terms of lip-sync accuracy and visual
quality.Comment: Accepted by SIGGRAPH Asia 2022 Conference Proceedings. Project page:
https://vinthony.github.io/video-retalking