38 research outputs found
Taming Reversible Halftoning via Predictive Luminance
Traditional halftoning usually drops colors when dithering images with binary
dots, which makes it difficult to recover the original color information. We
proposed a novel halftoning technique that converts a color image into a binary
halftone with full restorability to its original version. Our novel base
halftoning technique consists of two convolutional neural networks (CNNs) to
produce the reversible halftone patterns, and a noise incentive block (NIB) to
mitigate the flatness degradation issue of CNNs. Furthermore, to tackle the
conflicts between the blue-noise quality and restoration accuracy in our novel
base method, we proposed a predictor-embedded approach to offload predictable
information from the network, which in our case is the luminance information
resembling from the halftone pattern. Such an approach allows the network to
gain more flexibility to produce halftones with better blue-noise quality
without compromising the restoration quality. Detailed studies on the
multiple-stage training method and loss weightings have been conducted. We have
compared our predictor-embedded method and our novel method regarding spectrum
analysis on halftone, halftone accuracy, restoration accuracy, and the data
embedding studies. Our entropy evaluation evidences our halftone contains less
encoding information than our novel base method. The experiments show our
predictor-embedded method gains more flexibility to improve the blue-noise
quality of halftones and maintains a comparable restoration quality with a
higher tolerance for disturbances.Comment: to be published in IEEE Transactions on Visualization and Computer
Graphic
Sketch Video Synthesis
Understanding semantic intricacies and high-level concepts is essential in
image sketch generation, and this challenge becomes even more formidable when
applied to the domain of videos. To address this, we propose a novel
optimization-based framework for sketching videos represented by the frame-wise
B\'ezier curve. In detail, we first propose a cross-frame stroke initialization
approach to warm up the location and the width of each curve. Then, we optimize
the locations of these curves by utilizing a semantic loss based on CLIP
features and a newly designed consistency loss using the self-decomposed 2D
atlas network. Built upon these design elements, the resulting sketch video
showcases impressive visual abstraction and temporal coherence. Furthermore, by
transforming a video into SVG lines through the sketching process, our method
unlocks applications in sketch-based video editing and video doodling, enabled
through video composition, as exemplified in the teaser.Comment: Webpage: https://sketchvideo.github.io/ Github:
https://github.com/yudianzheng/SketchVide
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
With the availability of large-scale video datasets and the advances of
diffusion models, text-driven video generation has achieved substantial
progress. However, existing video generation models are typically trained on a
limited number of frames, resulting in the inability to generate high-fidelity
long videos during inference. Furthermore, these models only support
single-text conditions, whereas real-life scenarios often require multi-text
conditions as the video content changes over time. To tackle these challenges,
this study explores the potential of extending the text-driven capability to
generate longer videos conditioned on multiple texts. 1) We first analyze the
impact of initial noise in video diffusion models. Then building upon the
observation of noise, we propose FreeNoise, a tuning-free and time-efficient
paradigm to enhance the generative capabilities of pretrained video diffusion
models while preserving content consistency. Specifically, instead of
initializing noises for all frames, we reschedule a sequence of noises for
long-range correlation and perform temporal attention over them by window-based
function. 2) Additionally, we design a novel motion injection method to support
the generation of videos conditioned on multiple text prompts. Extensive
experiments validate the superiority of our paradigm in extending the
generative capabilities of video diffusion models. It is noteworthy that
compared with the previous best-performing method which brought about 255%
extra time cost, our method incurs only negligible time cost of approximately
17%. Generated video samples are available at our website:
http://haonanqiu.com/projects/FreeNoise.html.Comment: Project Page: http://haonanqiu.com/projects/FreeNoise.html Code Repo:
https://github.com/arthur-qiu/LongerCrafte
The 2-Aminoethoxydiphenyl Borate Analog Dpb161 Blocks Storeoperated Ca 2+ Entry In Acutely Dissociated Rat Submandibular Cells
Cellular Ca 2+ signals play a critical role in cell physiology and pathology. In most non-excitable cells, store-operated Ca 2+ entry (SOCE) is an important mechanism by which intracellular Ca 2+ signaling is regulated. However, few drugs can selectively modulate SOCE. 2-Aminoethoxydiphenyl borate (2APB) and its analogs (DPB162 and DPB163) have been reported to inhibit SOCE. Here, we examined the effects of another 2-APB analog, DPB161 on SOCE in acutely-isolated rat submandibular cells. Both patch-clamp recordings and Ca 2+ imaging showed that upon removal of extracellular Ca 2+ ([Ca 2+ ] o =0), rat submandibular cells were unable to maintain ACh-induced Ca 2+ oscillations, but restoration of [Ca 2+ ] o to refill Ca 2+ stores enable recovery of these Ca 2+ oscillations. However, addition of 50 μM DPB161 with [Ca 2+ ] o to extracellular solution prevented the refilling of Ca 2+ store. Fura-2 Ca 2+ imaging showed that DPB161 inhibited SOCE in a concentration-dependent manner. After depleting Ca 2+ stores by thapsigargin treatment, bath perfusion of 1 mM Ca 2+ induced [Ca 2+ ] i elevation in a manner that was prevented by DPB161. Collectively, these results show that the 2-APB analog DPB161 blocks SOCE in rat submandibular cells, suggesting that this compound can be developed as a pharmacological tool for the study of SOCE function and as a new therapeutic agent for treating SOCE-associated disorders
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
We present VideoReTalking, a new system to edit the faces of a real-world
talking head video according to input audio, producing a high-quality and
lip-syncing output video even with a different emotion. Our system disentangles
this objective into three sequential tasks: (1) face video generation with a
canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for
improving photo-realism. Given a talking-head video, we first modify the
expression of each frame according to the same expression template using the
expression editing network, resulting in a video with the canonical expression.
This video, together with the given audio, is then fed into the lip-sync
network to generate a lip-syncing video. Finally, we improve the photo-realism
of the synthesized faces through an identity-aware face enhancement network and
post-processing. We use learning-based approaches for all three steps and all
our modules can be tackled in a sequential pipeline without any user
intervention. Furthermore, our system is a generic approach that does not need
to be retrained to a specific person. Evaluations on two widely-used datasets
and in-the-wild examples demonstrate the superiority of our framework over
other state-of-the-art methods in terms of lip-sync accuracy and visual
quality.Comment: Accepted by SIGGRAPH Asia 2022 Conference Proceedings. Project page:
https://vinthony.github.io/video-retalking
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Animating a still image offers an engaging visual experience. Traditional
image animation techniques mainly focus on animating natural scenes with
stochastic dynamics (e.g. clouds and fluid) or domain-specific motions (e.g.
human hair or body motions), and thus limits their applicability to more
general visual content. To overcome this limitation, we explore the synthesis
of dynamic content for open-domain images, converting them into animated
videos. The key idea is to utilize the motion prior of text-to-video diffusion
models by incorporating the image into the generative process as guidance.
Given an image, we first project it into a text-aligned rich context
representation space using a query transformer, which facilitates the video
model to digest the image content in a compatible fashion. However, some visual
details still struggle to be preserved in the resultant videos. To supplement
with more precise image information, we further feed the full image to the
diffusion model by concatenating it with the initial noises. Experimental
results show that our proposed method can produce visually convincing and more
logical & natural motions, as well as higher conformity to the input image.
Comparative evaluation demonstrates the notable superiority of our approach
over existing competitors.Comment: Project page: https://doubiiu.github.io/projects/DynamiCrafte
TaleCrafter: Interactive Story Visualization with Multiple Characters
Accurate Story visualization requires several necessary elements, such as
identity consistency across frames, the alignment between plain text and visual
content, and a reasonable layout of objects in images. Most previous works
endeavor to meet these requirements by fitting a text-to-image (T2I) model on a
set of videos in the same style and with the same characters, e.g., the
FlintstonesSV dataset. However, the learned T2I models typically struggle to
adapt to new characters, scenes, and styles, and often lack the flexibility to
revise the layout of the synthesized images. This paper proposes a system for
generic interactive story visualization, capable of handling multiple novel
characters and supporting the editing of layout and local structure. It is
developed by leveraging the prior knowledge of large language and T2I models,
trained on massive corpora. The system comprises four interconnected
components: story-to-prompt generation (S2P), text-to-layout generation (T2L),
controllable text-to-image generation (C-T2I), and image-to-video animation
(I2V). First, the S2P module converts concise story information into detailed
prompts required for subsequent stages. Next, T2L generates diverse and
reasonable layouts based on the prompts, offering users the ability to adjust
and refine the layout to their preference. The core component, C-T2I, enables
the creation of images guided by layouts, sketches, and actor-specific
identifiers to maintain consistency and detail across visualizations. Finally,
I2V enriches the visualization process by animating the generated images.
Extensive experiments and a user study are conducted to validate the
effectiveness and flexibility of interactive editing of the proposed system.Comment: Github repository: https://github.com/VideoCrafter/TaleCrafte