99 research outputs found

    AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

    Full text link
    AI illustrator aims to automatically design visually appealing images for books to provoke rich thoughts and emotions. To achieve this goal, we propose a framework for translating raw descriptions with complex semantics into semantically corresponding images. The main challenge lies in the complexity of the semantics of raw descriptions, which may be hard to be visualized (e.g., "gloomy" or "Asian"). It usually poses challenges for existing methods to handle such descriptions. To address this issue, we propose a Prompt-based Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN. Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN which takes Image Embeddings as inputs and is trained by combined semantic consistency losses. To bridge the gap between realistic images and illustration designs, we further adopt a stylization model as post-processing in our framework for better visual effects. Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training. Furthermore, we have built a benchmark that consists of 200 raw descriptions. We conduct a user study to demonstrate our superiority over the competing methods with complicated texts. We release our code at https://github.com/researchmm/AI_Illustrator

    Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

    Full text link
    Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks. However, due to the randomness introduced in the reverse process of diffusion models, the performances of diffusion-based SR models are fluctuating at every time of sampling, especially for samplers with few resampled steps. This inherent randomness of diffusion models results in ineffectiveness and instability, making it challenging for users to guarantee the quality of SR results. However, our work takes this randomness as an opportunity: fully analyzing and leveraging it leads to the construction of an effective plug-and-play sampling method that owns the potential to benefit a series of diffusion-based SR methods. More in detail, we propose to steadily sample high-quality SR images from pretrained diffusion-based SR models by solving diffusion ordinary differential equations (diffusion ODEs) with optimal boundary conditions (BCs) and analyze the characteristics between the choices of BCs and their corresponding SR results. Our analysis shows the route to obtain an approximately optimal BC via an efficient exploration in the whole space. The quality of SR results sampled by the proposed method with fewer steps outperforms the quality of results sampled by current methods with randomness from the same pretrained diffusion-based SR model, which means that our sampling method "boosts" current diffusion-based SR models without any additional training

    MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

    Full text link
    We propose the first joint audio-video generation framework that brings engaging watching and listening experiences simultaneously, towards high-quality realistic videos. To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i.e., MM-Diffusion), with two-coupled denoising autoencoders. In contrast to existing single-modal diffusion models, MM-Diffusion consists of a sequential multi-modal U-Net for a joint denoising process by design. Two subnets for audio and video learn to gradually generate aligned audio-video pairs from Gaussian noises. To ensure semantic consistency across modalities, we propose a novel random-shift based attention block bridging over the two subnets, which enables efficient cross-modal alignment, and thus reinforces the audio-video fidelity for each other. Extensive experiments show superior results in unconditional audio-video generation, and zero-shot conditional tasks (e.g., video-to-audio). In particular, we achieve the best FVD and FAD on Landscape and AIST++ dancing datasets. Turing tests of 10k votes further demonstrate dominant preferences for our model. The code and pre-trained models can be downloaded at https://github.com/researchmm/MM-Diffusion.Comment: Accepted by CVPR 202

    Optimization of photovoltaic panel deployment in centralized photovoltaic power plant under multiple factors

    Get PDF
    Solar energy is one of the main renewable energy sources and has rapidly developed in many countries. However, the photovoltaic (PV) output power will be different under various meteorological and geographical conditions. Therefore, this paper presents an optimization method for the deployment of PV panels in a centralized PV power plant considering multiple factors. Firstly, the whole planning area is divided into a certain amount of sub-areas according to a given area, and fuzzy C-means algorithm is used for terrain clustering according to the geographical characteristics of the sub-areas. Secondly, the correlation analysis between each meteorological factor and PV output power is carried out separately to select the main factors affecting PV output power, and then the expected annual PV output power under the joint action of several main meteorological factors in each terrain is calculated by dual-stage attention mechanism based long short-term memory algorithm. Finally, according to the expected annual PV output of each terrain, considering the constraints including cost, area and so on, the deployment optimization of PV panels is obtained to maximize the annual PV output of the whole PV power plant and minimize the construction cost. The results of case studies show that the proposed methods effectively improve the expected PV output power of the PV power plant and reduce the construction cost

    Association between prothrombin time-international normalized ratio and prognosis of post-cardiac arrest patients: A retrospective cohort study

    Get PDF
    BackgroundCardiac arrest (CA) can activate blood coagulation. This study aimed to explore the potential prognostic value of prothrombin time–international normalized ratio (INR) in post-CA patients.MethodsThe clinical data of eligible subjects diagnosed with CA was extracted from the MIMIC-IV database as the training cohort. Restricted cubic spline (RCS), Kaplan–Meier (K-M) survival curve, and Cox regression analyses were conducted to elucidate the association between the INR and all-cause mortality of post-CA patients. Subgroup analysis, propensity score matching (PSM), and inverse probability of treatment (IPTW) were also conducted to improve stability and reliability. Data of the validation cohort were collected from the eICU database, and logistic-regression analyses were performed to verify the findings of the training cohort.ResultsA total of 1,324 subjects were included in the training cohort. A linear correlation existed between INR and the risk of all-cause death of post-CA patients, as shown in RCS analysis, with a hazard ratio (HR) >1 when INR exceeded 1.2. K-M survival curve preliminarily indicated that subjects with INR ≥ 1.2 presented lower survival rate and shorter survival time, and the high level of INR was independently associated with 30-day, 90-day, 1-year, and in-hospital mortalities, with multivariate-adjusted HR of 1.44 (1.20, 1.73), 1.46 (1.23, 1.74), 1.44 (1.23, 1.69), and 1.37 (1.14, 1.64), respectively. These findings were consistent and robust across the subgroup analysis, PSM and IPTW analyses, and validation cohort.ConclusionsWe systematically and comprehensively demonstrated that elevated INR was associated with increased short- and long-term all-cause mortality of post-CA patients. Therefore, elevated INR may be a promising biomarker with prognosis significance

    Real-time Monitoring for the Next Core-Collapse Supernova in JUNO

    Full text link
    Core-collapse supernova (CCSN) is one of the most energetic astrophysical events in the Universe. The early and prompt detection of neutrinos before (pre-SN) and during the SN burst is a unique opportunity to realize the multi-messenger observation of the CCSN events. In this work, we describe the monitoring concept and present the sensitivity of the system to the pre-SN and SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), which is a 20 kton liquid scintillator detector under construction in South China. The real-time monitoring system is designed with both the prompt monitors on the electronic board and online monitors at the data acquisition stage, in order to ensure both the alert speed and alert coverage of progenitor stars. By assuming a false alert rate of 1 per year, this monitoring system can be sensitive to the pre-SN neutrinos up to the distance of about 1.6 (0.9) kpc and SN neutrinos up to about 370 (360) kpc for a progenitor mass of 30M⊙M_{\odot} for the case of normal (inverted) mass ordering. The pointing ability of the CCSN is evaluated by using the accumulated event anisotropy of the inverse beta decay interactions from pre-SN or SN neutrinos, which, along with the early alert, can play important roles for the followup multi-messenger observations of the next Galactic or nearby extragalactic CCSN.Comment: 24 pages, 9 figure

    SoccerNet 2023 Challenges Results

    Full text link
    peer reviewedThe SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on https://www.soccer-net.org. Baselines and development kits can be found on https://github.com/SoccerNet

    A Filtering-Based Stochastic Gradient Estimation Method for Multivariate Pseudo-Linear Systems Using the Partial Coupling Concept

    No full text
    Solutions for enhancing parameter identification effects for multivariate equation-error systems in random interference and parameter coupling conditions are considered in this paper. For the purpose of avoiding the impact of colored noises on parameter identification precision, an appropriate filter is utilized to process the autoregressive moving average noise. Then, the filtered system is transformed into a number of sub-identification models based on system output dimensions. Founded on negative gradient search, a new multivariate filtering algorithm employing a partial coupling approach is proposed, and a conventional gradient algorithm is derived for comparison. Parameter identification for multivariate equation-error systems has a high estimation accuracy and an efficient calculation speed with the application of the partial coupling approach and the data filtering method. Two simulations are performed to reveal the proposed method’s effectiveness
    • …
    corecore