105 research outputs found

    Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

    Full text link
    Recently, diffusion-based image generation methods are credited for their remarkable text-to-image generation capabilities, while still facing challenges in accurately generating multilingual scene text images. To tackle this problem, we propose Diff-Text, which is a training-free scene text generation framework for any language. Our model outputs a photo-realistic image given a text of any language along with a textual description of a scene. The model leverages rendered sketch images as priors, thus arousing the potential multilingual-generation ability of the pre-trained Stable Diffusion. Based on the observation from the influence of the cross-attention map on object placement in generated images, we propose a localized attention constraint into the cross-attention layer to address the unreasonable positioning problem of scene text. Additionally, we introduce contrastive image-level prompts to further refine the position of the textual region and achieve more accurate scene text generation. Experiments demonstrate that our method outperforms the existing method in both the accuracy of text recognition and the naturalness of foreground-background blending.Comment: Accepted to AAAI 2024. Code: https://github.com/ecnuljzhang/brush-your-tex

    Long-Term Rhythmic Video Soundtracker

    Full text link
    We consider the problem of generating musical soundtracks in sync with rhythmic visual cues. Most existing works rely on pre-defined music representations, leading to the incompetence of generative flexibility and complexity. Other methods directly generating video-conditioned waveforms suffer from limited scenarios, short lengths, and unstable generation quality. To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms. Specifically, our framework consists of a latent conditional diffusion probabilistic model to perform waveform synthesis. Furthermore, a series of context-aware conditioning encoders are proposed to take temporal information into consideration for a long-term generation. Notably, we extend our model's applicability from dances to multiple sports scenarios such as floor exercise and figure skating. To perform comprehensive evaluations, we establish a benchmark for rhythmic video soundtracks including the pre-processed dataset, improved evaluation metrics, and robust generative baselines. Extensive experiments show that our model generates long-term soundtracks with state-of-the-art musical quality and rhythmic correspondence. Codes are available at \url{https://github.com/OpenGVLab/LORIS}.Comment: ICML202

    Recombinant porcine rotavirus VP4 and VP4-LTB expressed in Lactobacillus casei induced mucosal and systemic antibody responses in mice

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Porcine rotavirus infection is a significant cause of morbidity and mortality in the swine industry necessitating the development of effective vaccines for the prevention of infection. Immune responses associated with protection are primarily mucosal in nature and induction of mucosal immunity is important for preventing porcine rotavirus infection.</p> <p>Results</p> <p><it>Lactobacillus casei </it>expressing the major protective antigen VP4 of porcine rotavirus (pPG612.1-VP4) or VP4-LTB (heat-labile toxin B subunit from <it>Echerichia coli</it>) (pPG612.1-VP4-LTB) fusion protein was used to immunize mice orally. The expression of recombinant pPG612.1-VP4 and pPG612.1-VP4-LTB was confirmed by SDS-PAGE and Western blot analysis and surface-displayed expression on <it>L. casei </it>was verified by immunofluorescence. Mice orally immunized with recombinant protein-expressing <it>L. casei </it>produced high levels of serum immunoglobulin G (IgG) and mucosal IgA. The IgA titters from mice immunized with pPG612.1-VP4-LTB were higher than titters from pPG612.1-VP4-immunized mice. The induced antibodies demonstrated neutralizing effects on RV infection.</p> <p>Conclusion</p> <p>These results demonstrated that VP4 administered in the context of an <it>L. casei </it>expression system is an effective method for stimulating mucosal immunity and that LTB served to further stimulate mucosal immunity suggesting that this strategy can be adapted for use in pigs.</p

    Online near-infrared analysis coupled with MWPLS and SiPLS models for the multi-ingredient and multi-phase extraction of licorice (Gancao)

    Get PDF
    Additional file 1. Table S1. The sampling intervals in different extraction phases. Table S2. The HPLC results of different indicators. Table S3. The evaluation parameters of PLS and SiPLS models

    Diff-Font: Diffusion Model for Robust One-Shot Font Generation

    Full text link
    Font generation is a difficult and time-consuming task, especially in those languages using ideograms that have complicated structures with a large number of characters, such as Chinese. To solve this problem, few-shot font generation and even one-shot font generation have attracted a lot of attention. However, most existing font generation methods may still suffer from (i) large cross-font gap challenge; (ii) subtle cross-font variation problem; and (iii) incorrect generation of complicated characters. In this paper, we propose a novel one-shot font generation method based on a diffusion model, named Diff-Font, which can be stably trained on large datasets. The proposed model aims to generate the entire font library by giving only one sample as the reference. Specifically, a large stroke-wise dataset is constructed, and a stroke-wise diffusion model is proposed to preserve the structure and the completion of each generated character. To our best knowledge, the proposed Diff-Font is the first work that developed diffusion models to handle the font generation task. The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation. Compared to previous font generation methods, our model reaches state-of-the-art performance both qualitatively and quantitatively

    Vlogger: Make Your Dream A Vlog

    Full text link
    In this work, we present Vlogger, a generic AI system for generating a minute-level video blog (i.e., vlog) of user descriptions. Different from short videos with a few seconds, vlog often contains a complex storyline with diversified scenes, which is challenging for most existing video generation approaches. To break through this bottleneck, our Vlogger smartly leverages Large Language Model (LLM) as Director and decomposes a long video generation task of vlog into four key stages, where we invoke various foundation models to play the critical roles of vlog professionals, including (1) Script, (2) Actor, (3) ShowMaker, and (4) Voicer. With such a design of mimicking human beings, our Vlogger can generate vlogs through explainable cooperation of top-down planning and bottom-up shooting. Moreover, we introduce a novel video diffusion model, ShowMaker, which serves as a videographer in our Vlogger for generating the video snippet of each shooting scene. By incorporating Script and Actor attentively as textual and visual prompts, it can effectively enhance spatial-temporal coherence in the snippet. Besides, we design a concise mixed training paradigm for ShowMaker, boosting its capacity for both T2V generation and prediction. Finally, the extensive experiments show that our method achieves state-of-the-art performance on zero-shot T2V generation and prediction tasks. More importantly, Vlogger can generate over 5-minute vlogs from open-world descriptions, without loss of video coherence on script and actor. The code and model is all available at https://github.com/zhuangshaobin/Vlogger.Comment: 16 pages, 8 figures, 11 table

    Modeling Multi-wavelength Pulse Profiles of Millisecond Pulsar PSR B1821-24

    Full text link
    PSR B1821−-24 is a solitary millisecond pulsar (MSP) which radiates multi-wavelength pulsed photons. It has complex radio, X-ray and γ\gamma-ray pulse profiles with distinct peak phase-separations that challenge the traditional caustic emission models. Using the single-pole annular gap model with suitable magnetic inclination angle (α=40∘\alpha=40^\circ) and viewing angle (ζ=75∘\zeta=75^\circ), we managed to reproduce its pulse profiles of three wavebands. It is found that the middle radio peak is originated from the core gap region at high altitudes, and the other two radio peaks are originated from the annular gap region at relatively low altitudes. Two peaks of both X-ray and γ\gamma-ray wavebands are fundamentally originated from annular gap region, while the γ\gamma-ray emission generated from the core gap region contributes somewhat to the first γ\gamma-ray peak. Precisely reproducing the multi-wavelength pulse profiles of PSR B1821−-24 enables us to understand emission regions of distinct wavebands and justify pulsar emission models.Comment: Accepted for publication in Ap

    Latte: Latent Diffusion Transformer for Video Generation

    Full text link
    We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte first extracts spatio-temporal tokens from input videos and then adopts a series of Transformer blocks to model video distribution in the latent space. In order to model a substantial number of tokens extracted from videos, four efficient variants are introduced from the perspective of decomposing the spatial and temporal dimensions of input videos. To improve the quality of generated videos, we determine the best practices of Latte through rigorous experimental analysis, including video clip patch embedding, model variants, timestep-class information injection, temporal positional embedding, and learning strategies. Our comprehensive evaluation demonstrates that Latte achieves state-of-the-art performance across four standard video generation datasets, i.e., FaceForensics, SkyTimelapse, UCF101, and Taichi-HD. In addition, we extend Latte to text-to-video generation (T2V) task, where Latte achieves comparable results compared to recent T2V models. We strongly believe that Latte provides valuable insights for future research on incorporating Transformers into diffusion models for video generation.Comment: Project page: https://maxin-cn.github.io/latte_projec

    SinSR: Diffusion-Based Image Super-Resolution in a Single Step

    Full text link
    While super-resolution (SR) methods based on diffusion models exhibit promising results, their practical application is hindered by the substantial number of required inference steps. Recent methods utilize degraded images in the initial state, thereby shortening the Markov chain. Nevertheless, these solutions either rely on a precise formulation of the degradation process or still necessitate a relatively lengthy generation path (e.g., 15 iterations). To enhance inference speed, we propose a simple yet effective method for achieving single-step SR generation, named SinSR. Specifically, we first derive a deterministic sampling process from the most recent state-of-the-art (SOTA) method for accelerating diffusion-based SR. This allows the mapping between the input random noise and the generated high-resolution image to be obtained in a reduced and acceptable number of inference steps during training. We show that this deterministic mapping can be distilled into a student model that performs SR within only one inference step. Additionally, we propose a novel consistency-preserving loss to simultaneously leverage the ground-truth image during the distillation process, ensuring that the performance of the student model is not solely bound by the feature manifold of the teacher model, resulting in further performance improvement. Extensive experiments conducted on synthetic and real-world datasets demonstrate that the proposed method can achieve comparable or even superior performance compared to both previous SOTA methods and the teacher model, in just one sampling step, resulting in a remarkable up to x10 speedup for inference. Our code will be released at https://github.com/wyf0912/SinS

    Rapid Discrimination of Chlorpheniramine Maleate and Assessment of Its Surface Content Uniformity in a Pharmaceutical Formulation by NIR-CI Coupled with Statistical Measurement

    Get PDF
    This study demonstrated that near infrared chemical imaging (NIR-CI) was a rapid and nondestructive technique for discrimination of chlorpheniramine maleate (CPM) and assessment of its surface content uniformity (SCU) in a pharmaceutical formulation. The characteristic wavenumber method was used for discriminating CPM distribution on the tablet surface. To assess the surface content uniformity of CPM, binary image and statistical measurement were proposed. Furthermore, high-performance liquid chromatography (HPLC) was used as reference method for accurately determining volume content of CPM in the sample. Moreover, HPLC was performed to assess volume content uniformity (VCU) of CPM in whole region and part region of the tablets. The NIR-CI result showed that the spatial distribution of CPM was heterogeneous on the tablet surface. Through the comparison of content uniformity of CPM determined by NIR-CI and HPLC, respectively, it demonstrated that a high degree of VCU did not imply a high degree of SCU of the samples. These results indicate that HPLC method is not suitable for testing SCU, and this has been verified by NIR-CI. This study proves the feasibility of NIR-CI for rapid discrimination of CPM and assessment of its SCU, which is helpful for the quality control of commercial CPM tablets
    • …
    corecore