105 research outputs found
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model
Recently, diffusion-based image generation methods are credited for their
remarkable text-to-image generation capabilities, while still facing challenges
in accurately generating multilingual scene text images. To tackle this
problem, we propose Diff-Text, which is a training-free scene text generation
framework for any language. Our model outputs a photo-realistic image given a
text of any language along with a textual description of a scene. The model
leverages rendered sketch images as priors, thus arousing the potential
multilingual-generation ability of the pre-trained Stable Diffusion. Based on
the observation from the influence of the cross-attention map on object
placement in generated images, we propose a localized attention constraint into
the cross-attention layer to address the unreasonable positioning problem of
scene text. Additionally, we introduce contrastive image-level prompts to
further refine the position of the textual region and achieve more accurate
scene text generation. Experiments demonstrate that our method outperforms the
existing method in both the accuracy of text recognition and the naturalness of
foreground-background blending.Comment: Accepted to AAAI 2024. Code:
https://github.com/ecnuljzhang/brush-your-tex
Long-Term Rhythmic Video Soundtracker
We consider the problem of generating musical soundtracks in sync with
rhythmic visual cues. Most existing works rely on pre-defined music
representations, leading to the incompetence of generative flexibility and
complexity. Other methods directly generating video-conditioned waveforms
suffer from limited scenarios, short lengths, and unstable generation quality.
To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel
framework to synthesize long-term conditional waveforms. Specifically, our
framework consists of a latent conditional diffusion probabilistic model to
perform waveform synthesis. Furthermore, a series of context-aware conditioning
encoders are proposed to take temporal information into consideration for a
long-term generation. Notably, we extend our model's applicability from dances
to multiple sports scenarios such as floor exercise and figure skating. To
perform comprehensive evaluations, we establish a benchmark for rhythmic video
soundtracks including the pre-processed dataset, improved evaluation metrics,
and robust generative baselines. Extensive experiments show that our model
generates long-term soundtracks with state-of-the-art musical quality and
rhythmic correspondence. Codes are available at
\url{https://github.com/OpenGVLab/LORIS}.Comment: ICML202
Recombinant porcine rotavirus VP4 and VP4-LTB expressed in Lactobacillus casei induced mucosal and systemic antibody responses in mice
<p>Abstract</p> <p>Background</p> <p>Porcine rotavirus infection is a significant cause of morbidity and mortality in the swine industry necessitating the development of effective vaccines for the prevention of infection. Immune responses associated with protection are primarily mucosal in nature and induction of mucosal immunity is important for preventing porcine rotavirus infection.</p> <p>Results</p> <p><it>Lactobacillus casei </it>expressing the major protective antigen VP4 of porcine rotavirus (pPG612.1-VP4) or VP4-LTB (heat-labile toxin B subunit from <it>Echerichia coli</it>) (pPG612.1-VP4-LTB) fusion protein was used to immunize mice orally. The expression of recombinant pPG612.1-VP4 and pPG612.1-VP4-LTB was confirmed by SDS-PAGE and Western blot analysis and surface-displayed expression on <it>L. casei </it>was verified by immunofluorescence. Mice orally immunized with recombinant protein-expressing <it>L. casei </it>produced high levels of serum immunoglobulin G (IgG) and mucosal IgA. The IgA titters from mice immunized with pPG612.1-VP4-LTB were higher than titters from pPG612.1-VP4-immunized mice. The induced antibodies demonstrated neutralizing effects on RV infection.</p> <p>Conclusion</p> <p>These results demonstrated that VP4 administered in the context of an <it>L. casei </it>expression system is an effective method for stimulating mucosal immunity and that LTB served to further stimulate mucosal immunity suggesting that this strategy can be adapted for use in pigs.</p
Online near-infrared analysis coupled with MWPLS and SiPLS models for the multi-ingredient and multi-phase extraction of licorice (Gancao)
Additional file 1. Table S1. The sampling intervals in different extraction phases. Table S2. The HPLC results of different indicators. Table S3. The evaluation parameters of PLS and SiPLS models
Diff-Font: Diffusion Model for Robust One-Shot Font Generation
Font generation is a difficult and time-consuming task, especially in those
languages using ideograms that have complicated structures with a large number
of characters, such as Chinese. To solve this problem, few-shot font generation
and even one-shot font generation have attracted a lot of attention. However,
most existing font generation methods may still suffer from (i) large
cross-font gap challenge; (ii) subtle cross-font variation problem; and (iii)
incorrect generation of complicated characters. In this paper, we propose a
novel one-shot font generation method based on a diffusion model, named
Diff-Font, which can be stably trained on large datasets. The proposed model
aims to generate the entire font library by giving only one sample as the
reference. Specifically, a large stroke-wise dataset is constructed, and a
stroke-wise diffusion model is proposed to preserve the structure and the
completion of each generated character. To our best knowledge, the proposed
Diff-Font is the first work that developed diffusion models to handle the font
generation task. The well-trained Diff-Font is not only robust to font gap and
font variation, but also achieved promising performance on difficult character
generation. Compared to previous font generation methods, our model reaches
state-of-the-art performance both qualitatively and quantitatively
Vlogger: Make Your Dream A Vlog
In this work, we present Vlogger, a generic AI system for generating a
minute-level video blog (i.e., vlog) of user descriptions. Different from short
videos with a few seconds, vlog often contains a complex storyline with
diversified scenes, which is challenging for most existing video generation
approaches. To break through this bottleneck, our Vlogger smartly leverages
Large Language Model (LLM) as Director and decomposes a long video generation
task of vlog into four key stages, where we invoke various foundation models to
play the critical roles of vlog professionals, including (1) Script, (2) Actor,
(3) ShowMaker, and (4) Voicer. With such a design of mimicking human beings,
our Vlogger can generate vlogs through explainable cooperation of top-down
planning and bottom-up shooting. Moreover, we introduce a novel video diffusion
model, ShowMaker, which serves as a videographer in our Vlogger for generating
the video snippet of each shooting scene. By incorporating Script and Actor
attentively as textual and visual prompts, it can effectively enhance
spatial-temporal coherence in the snippet. Besides, we design a concise mixed
training paradigm for ShowMaker, boosting its capacity for both T2V generation
and prediction. Finally, the extensive experiments show that our method
achieves state-of-the-art performance on zero-shot T2V generation and
prediction tasks. More importantly, Vlogger can generate over 5-minute vlogs
from open-world descriptions, without loss of video coherence on script and
actor. The code and model is all available at
https://github.com/zhuangshaobin/Vlogger.Comment: 16 pages, 8 figures, 11 table
Modeling Multi-wavelength Pulse Profiles of Millisecond Pulsar PSR B1821-24
PSR B182124 is a solitary millisecond pulsar (MSP) which radiates
multi-wavelength pulsed photons. It has complex radio, X-ray and -ray
pulse profiles with distinct peak phase-separations that challenge the
traditional caustic emission models. Using the single-pole annular gap model
with suitable magnetic inclination angle () and viewing angle
(), we managed to reproduce its pulse profiles of three
wavebands. It is found that the middle radio peak is originated from the core
gap region at high altitudes, and the other two radio peaks are originated from
the annular gap region at relatively low altitudes. Two peaks of both X-ray and
-ray wavebands are fundamentally originated from annular gap region,
while the -ray emission generated from the core gap region contributes
somewhat to the first -ray peak. Precisely reproducing the
multi-wavelength pulse profiles of PSR B182124 enables us to understand
emission regions of distinct wavebands and justify pulsar emission models.Comment: Accepted for publication in Ap
Latte: Latent Diffusion Transformer for Video Generation
We propose a novel Latent Diffusion Transformer, namely Latte, for video
generation. Latte first extracts spatio-temporal tokens from input videos and
then adopts a series of Transformer blocks to model video distribution in the
latent space. In order to model a substantial number of tokens extracted from
videos, four efficient variants are introduced from the perspective of
decomposing the spatial and temporal dimensions of input videos. To improve the
quality of generated videos, we determine the best practices of Latte through
rigorous experimental analysis, including video clip patch embedding, model
variants, timestep-class information injection, temporal positional embedding,
and learning strategies. Our comprehensive evaluation demonstrates that Latte
achieves state-of-the-art performance across four standard video generation
datasets, i.e., FaceForensics, SkyTimelapse, UCF101, and Taichi-HD. In
addition, we extend Latte to text-to-video generation (T2V) task, where Latte
achieves comparable results compared to recent T2V models. We strongly believe
that Latte provides valuable insights for future research on incorporating
Transformers into diffusion models for video generation.Comment: Project page: https://maxin-cn.github.io/latte_projec
SinSR: Diffusion-Based Image Super-Resolution in a Single Step
While super-resolution (SR) methods based on diffusion models exhibit
promising results, their practical application is hindered by the substantial
number of required inference steps. Recent methods utilize degraded images in
the initial state, thereby shortening the Markov chain. Nevertheless, these
solutions either rely on a precise formulation of the degradation process or
still necessitate a relatively lengthy generation path (e.g., 15 iterations).
To enhance inference speed, we propose a simple yet effective method for
achieving single-step SR generation, named SinSR. Specifically, we first derive
a deterministic sampling process from the most recent state-of-the-art (SOTA)
method for accelerating diffusion-based SR. This allows the mapping between the
input random noise and the generated high-resolution image to be obtained in a
reduced and acceptable number of inference steps during training. We show that
this deterministic mapping can be distilled into a student model that performs
SR within only one inference step. Additionally, we propose a novel
consistency-preserving loss to simultaneously leverage the ground-truth image
during the distillation process, ensuring that the performance of the student
model is not solely bound by the feature manifold of the teacher model,
resulting in further performance improvement. Extensive experiments conducted
on synthetic and real-world datasets demonstrate that the proposed method can
achieve comparable or even superior performance compared to both previous SOTA
methods and the teacher model, in just one sampling step, resulting in a
remarkable up to x10 speedup for inference. Our code will be released at
https://github.com/wyf0912/SinS
Rapid Discrimination of Chlorpheniramine Maleate and Assessment of Its Surface Content Uniformity in a Pharmaceutical Formulation by NIR-CI Coupled with Statistical Measurement
This study demonstrated that near infrared chemical imaging (NIR-CI) was a rapid and nondestructive technique for discrimination of chlorpheniramine maleate (CPM) and assessment of its surface content uniformity (SCU) in a pharmaceutical formulation. The characteristic wavenumber method was used for discriminating CPM distribution on the tablet surface. To assess the surface content uniformity of CPM, binary image and statistical measurement were proposed. Furthermore, high-performance liquid chromatography (HPLC) was used as reference method for accurately determining volume content of CPM in the sample. Moreover, HPLC was performed to assess volume content uniformity (VCU) of CPM in whole region and part region of the tablets. The NIR-CI result showed that the spatial distribution of CPM was heterogeneous on the tablet surface. Through the comparison of content uniformity of CPM determined by NIR-CI and HPLC, respectively, it demonstrated that a high degree of VCU did not imply a high degree of SCU of the samples. These results indicate that HPLC method is not suitable for testing SCU, and this has been verified by NIR-CI. This study proves the feasibility of NIR-CI for rapid discrimination of CPM and assessment of its SCU, which is helpful for the quality control of commercial CPM tablets
- …