163 research outputs found

    FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

    Full text link
    With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress. However, existing video generation models are typically trained on a limited number of frames, resulting in the inability to generate high-fidelity long videos during inference. Furthermore, these models only support single-text conditions, whereas real-life scenarios often require multi-text conditions as the video content changes over time. To tackle these challenges, this study explores the potential of extending the text-driven capability to generate longer videos conditioned on multiple texts. 1) We first analyze the impact of initial noise in video diffusion models. Then building upon the observation of noise, we propose FreeNoise, a tuning-free and time-efficient paradigm to enhance the generative capabilities of pretrained video diffusion models while preserving content consistency. Specifically, instead of initializing noises for all frames, we reschedule a sequence of noises for long-range correlation and perform temporal attention over them by window-based function. 2) Additionally, we design a novel motion injection method to support the generation of videos conditioned on multiple text prompts. Extensive experiments validate the superiority of our paradigm in extending the generative capabilities of video diffusion models. It is noteworthy that compared with the previous best-performing method which brought about 255% extra time cost, our method incurs only negligible time cost of approximately 17%. Generated video samples are available at our website: http://haonanqiu.com/projects/FreeNoise.html.Comment: Project Page: http://haonanqiu.com/projects/FreeNoise.html Code Repo: https://github.com/arthur-qiu/LongerCrafte

    MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval

    Full text link
    With the success of large-scale visual-language pretraining models and the wide application of image-text retrieval in industry areas, reducing the model size and streamlining their terminal-device deployment have become urgently necessary. The mainstream model structures for image-text retrieval are single-stream and dual-stream, both aiming to close the semantic gap between visual and textual modalities. Dual-stream models excel at offline indexing and fast inference, while single-stream models achieve more accurate cross-model alignment by employing adequate feature fusion. We propose a multi-teacher cross-modality alignment distillation (MCAD) technique to integrate the advantages of single-stream and dual-stream models. By incorporating the fused single-stream features into the image and text features of the dual-stream model, we formulate new modified teacher features and logits. Then, we conduct both logit and feature distillation to boost the capability of the student dual-stream model, achieving high retrieval performance without increasing inference complexity. Extensive experiments demonstrate the remarkable performance and high efficiency of MCAD on image-text retrieval tasks. Furthermore, we implement a mobile CLIP model on Snapdragon clips with only 93M running memory and 30ms search latency, without apparent performance degradation of the original large CLIP

    Pretreatment technology of lignocellulose

    Get PDF
    Lignocellulose is the most abundant renewable biomass resource in nature. Pretreatment of lignocellulose can improve the accessibility of cellulase to cellulose raw materials, reduce the ineffective adsorption of cellulase, reduce the crystallinity and obtain higher reducing sugar. In this paper, several practical pretreatment technologies of lignocellulose are summarized, and the methods, principles, advantages and disadvantages of each pretreatment technology are summarized, and then the development prospect of lignocellulose pretreatment methods is prospected

    Enhancing the 3D printing fidelity of vat photopolymerization with machine learning-driven boundary prediction

    Get PDF
    Like many pixel-based additive manufacturing (AM) techniques, digital light processing (DLP) based vat pho-topolymerization faces the challenge that the square pixel based processing strategy can lead to zigzag edges especially when feature sizes come close to single-pixel levels. Introducing greyscale pixels has been a strategy to smoothen such edges, but it is a challenging task to understand which of the many permutations of projected pix-els would give the optimal 3D printing performance. To address this challenge, a novel data acquisition strategy based on machine learning (ML) principles is proposed, and a training routine is implemented to reproduce the smallest shape of an intended 3D printed object. Through this approach, a chessboard patterning strategy is developed along with an automated data refining and augmentation workflow, demonstrating its efficiency and effectiveness by reducing the deviation by around 30%

    Association of sleep duration and sleep quality with the risk of metabolic syndrome in adults: a systematic review and meta-analysis

    Get PDF
    Introduction: The association between sleep duration and metabolic syndrome (MetS) remains controversial, and few have considered the effects of sleep quality. We performed a meta-analysis to clarify the relationship of sleep duration and sleep quality with the risk of MetS. Material and methods: We conducted a systematic and comprehensive literature search of electronic databases from inception to 17 February 2022. The effect sizes of covariates from each study were pooled using a random or fixed model, and a restricted cubic spline random-effects meta-analysis was performed to examine the dose-response relationship between sleep duration and MetS. Results: A total of 62 studies were included in this meta-analysis. Compared to normal sleep duration, short sleep duration [odds ratio (OR) = 1.14, 95% confidence interval (CI): 1.10–1.19] and long sleep duration (OR = 1.15, 95% CI: 1.09–1.23) were associated with an increased risk of MetS. The restricted cubic spline analysis indicated that sleep durations of 8.5 h (OR = 0.95, 95% CI: 0.92–0.97) and 11 h (OR = 1.58, 95% CI: 1.31–1.91) were significantly associated with the risk of MetS. The pooled results showed that poor sleep quality (OR = 1.46, 95% CI: 1.03–2.06) and sleep complaints had significant positive associations with MetS. Conclusion: Our results demonstrated that short sleep duration increased the risk of developing MetS. Long sleep duration was also associated with MetS, especially for 11 h. 8.5 h can be considered the recommended sleep duration for MetS. Poor sleep quality and sleep complaints were also associated with MetS

    PyPose: A Library for Robot Learning with Physics-based Optimization

    Full text link
    Deep learning has had remarkable success in robotic perception, but its data-centric nature suffers when it comes to generalizing to ever-changing environments. By contrast, physics-based optimization generalizes better, but it does not perform as well in complicated tasks due to the lack of high-level semantic information and the reliance on manual parametric tuning. To take advantage of these two complementary worlds, we present PyPose: a robotics-oriented, PyTorch-based library that combines deep perceptual models with physics-based optimization techniques. Our design goal for PyPose is to make it user-friendly, efficient, and interpretable with a tidy and well-organized architecture. Using an imperative style interface, it can be easily integrated into real-world robotic applications. Besides, it supports parallel computing of any order gradients of Lie groups and Lie algebras and 2nd2^{\text{nd}}-order optimizers, such as trust region methods. Experiments show that PyPose achieves 3-20×\times speedup in computation compared to state-of-the-art libraries. To boost future research, we provide concrete examples across several fields of robotics, including SLAM, inertial navigation, planning, and control

    PyPose v0.6: The Imperative Programming Interface for Robotics

    Full text link
    PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, incorporating a wide variety of new features into its platform. To satisfy the growing demand for understanding and utilizing the library and reduce the learning curve of new users, we present the fundamental design principle of the imperative programming interface, and showcase the flexible usage of diverse functionalities and modules using an extremely simple Dubins car example. We also demonstrate that the PyPose can be easily used to navigate a real quadruped robot with a few lines of code
    • …
    corecore