99 research outputs found
AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation
AI illustrator aims to automatically design visually appealing images for
books to provoke rich thoughts and emotions. To achieve this goal, we propose a
framework for translating raw descriptions with complex semantics into
semantically corresponding images. The main challenge lies in the complexity of
the semantics of raw descriptions, which may be hard to be visualized (e.g.,
"gloomy" or "Asian"). It usually poses challenges for existing methods to
handle such descriptions. To address this issue, we propose a Prompt-based
Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful
pre-trained models, including CLIP and StyleGAN. Our framework consists of two
components: a projection module from Text Embeddings to Image Embeddings based
on prompts, and an adapted image generation module built on StyleGAN which
takes Image Embeddings as inputs and is trained by combined semantic
consistency losses. To bridge the gap between realistic images and illustration
designs, we further adopt a stylization model as post-processing in our
framework for better visual effects. Benefiting from the pre-trained models,
our method can handle complex descriptions and does not require external paired
data for training. Furthermore, we have built a benchmark that consists of 200
raw descriptions. We conduct a user study to demonstrate our superiority over
the competing methods with complicated texts. We release our code at
https://github.com/researchmm/AI_Illustrator
Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution
Diffusion models, as a kind of powerful generative model, have given
impressive results on image super-resolution (SR) tasks. However, due to the
randomness introduced in the reverse process of diffusion models, the
performances of diffusion-based SR models are fluctuating at every time of
sampling, especially for samplers with few resampled steps. This inherent
randomness of diffusion models results in ineffectiveness and instability,
making it challenging for users to guarantee the quality of SR results.
However, our work takes this randomness as an opportunity: fully analyzing and
leveraging it leads to the construction of an effective plug-and-play sampling
method that owns the potential to benefit a series of diffusion-based SR
methods. More in detail, we propose to steadily sample high-quality SR images
from pretrained diffusion-based SR models by solving diffusion ordinary
differential equations (diffusion ODEs) with optimal boundary conditions (BCs)
and analyze the characteristics between the choices of BCs and their
corresponding SR results. Our analysis shows the route to obtain an
approximately optimal BC via an efficient exploration in the whole space. The
quality of SR results sampled by the proposed method with fewer steps
outperforms the quality of results sampled by current methods with randomness
from the same pretrained diffusion-based SR model, which means that our
sampling method "boosts" current diffusion-based SR models without any
additional training
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
We propose the first joint audio-video generation framework that brings
engaging watching and listening experiences simultaneously, towards
high-quality realistic videos. To generate joint audio-video pairs, we propose
a novel Multi-Modal Diffusion model (i.e., MM-Diffusion), with two-coupled
denoising autoencoders. In contrast to existing single-modal diffusion models,
MM-Diffusion consists of a sequential multi-modal U-Net for a joint denoising
process by design. Two subnets for audio and video learn to gradually generate
aligned audio-video pairs from Gaussian noises. To ensure semantic consistency
across modalities, we propose a novel random-shift based attention block
bridging over the two subnets, which enables efficient cross-modal alignment,
and thus reinforces the audio-video fidelity for each other. Extensive
experiments show superior results in unconditional audio-video generation, and
zero-shot conditional tasks (e.g., video-to-audio). In particular, we achieve
the best FVD and FAD on Landscape and AIST++ dancing datasets. Turing tests of
10k votes further demonstrate dominant preferences for our model. The code and
pre-trained models can be downloaded at
https://github.com/researchmm/MM-Diffusion.Comment: Accepted by CVPR 202
Optimization of photovoltaic panel deployment in centralized photovoltaic power plant under multiple factors
Solar energy is one of the main renewable energy sources and has rapidly developed in many countries. However, the photovoltaic (PV) output power will be different under various meteorological and geographical conditions. Therefore, this paper presents an optimization method for the deployment of PV panels in a centralized PV power plant considering multiple factors. Firstly, the whole planning area is divided into a certain amount of sub-areas according to a given area, and fuzzy C-means algorithm is used for terrain clustering according to the geographical characteristics of the sub-areas. Secondly, the correlation analysis between each meteorological factor and PV output power is carried out separately to select the main factors affecting PV output power, and then the expected annual PV output power under the joint action of several main meteorological factors in each terrain is calculated by dual-stage attention mechanism based long short-term memory algorithm. Finally, according to the expected annual PV output of each terrain, considering the constraints including cost, area and so on, the deployment optimization of PV panels is obtained to maximize the annual PV output of the whole PV power plant and minimize the construction cost. The results of case studies show that the proposed methods effectively improve the expected PV output power of the PV power plant and reduce the construction cost
Association between prothrombin time-international normalized ratio and prognosis of post-cardiac arrest patients: A retrospective cohort study
BackgroundCardiac arrest (CA) can activate blood coagulation. This study aimed to explore the potential prognostic value of prothrombin time–international normalized ratio (INR) in post-CA patients.MethodsThe clinical data of eligible subjects diagnosed with CA was extracted from the MIMIC-IV database as the training cohort. Restricted cubic spline (RCS), Kaplan–Meier (K-M) survival curve, and Cox regression analyses were conducted to elucidate the association between the INR and all-cause mortality of post-CA patients. Subgroup analysis, propensity score matching (PSM), and inverse probability of treatment (IPTW) were also conducted to improve stability and reliability. Data of the validation cohort were collected from the eICU database, and logistic-regression analyses were performed to verify the findings of the training cohort.ResultsA total of 1,324 subjects were included in the training cohort. A linear correlation existed between INR and the risk of all-cause death of post-CA patients, as shown in RCS analysis, with a hazard ratio (HR) >1 when INR exceeded 1.2. K-M survival curve preliminarily indicated that subjects with INR ≥ 1.2 presented lower survival rate and shorter survival time, and the high level of INR was independently associated with 30-day, 90-day, 1-year, and in-hospital mortalities, with multivariate-adjusted HR of 1.44 (1.20, 1.73), 1.46 (1.23, 1.74), 1.44 (1.23, 1.69), and 1.37 (1.14, 1.64), respectively. These findings were consistent and robust across the subgroup analysis, PSM and IPTW analyses, and validation cohort.ConclusionsWe systematically and comprehensively demonstrated that elevated INR was associated with increased short- and long-term all-cause mortality of post-CA patients. Therefore, elevated INR may be a promising biomarker with prognosis significance
Real-time Monitoring for the Next Core-Collapse Supernova in JUNO
Core-collapse supernova (CCSN) is one of the most energetic astrophysical
events in the Universe. The early and prompt detection of neutrinos before
(pre-SN) and during the SN burst is a unique opportunity to realize the
multi-messenger observation of the CCSN events. In this work, we describe the
monitoring concept and present the sensitivity of the system to the pre-SN and
SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), which is
a 20 kton liquid scintillator detector under construction in South China. The
real-time monitoring system is designed with both the prompt monitors on the
electronic board and online monitors at the data acquisition stage, in order to
ensure both the alert speed and alert coverage of progenitor stars. By assuming
a false alert rate of 1 per year, this monitoring system can be sensitive to
the pre-SN neutrinos up to the distance of about 1.6 (0.9) kpc and SN neutrinos
up to about 370 (360) kpc for a progenitor mass of 30 for the case
of normal (inverted) mass ordering. The pointing ability of the CCSN is
evaluated by using the accumulated event anisotropy of the inverse beta decay
interactions from pre-SN or SN neutrinos, which, along with the early alert,
can play important roles for the followup multi-messenger observations of the
next Galactic or nearby extragalactic CCSN.Comment: 24 pages, 9 figure
SoccerNet 2023 Challenges Results
peer reviewedThe SoccerNet 2023 challenges were the third annual video understanding
challenges organized by the SoccerNet team. For this third edition, the
challenges were composed of seven vision-based tasks split into three main
themes. The first theme, broadcast video understanding, is composed of three
high-level tasks related to describing events occurring in the video
broadcasts: (1) action spotting, focusing on retrieving all timestamps related
to global actions in soccer, (2) ball action spotting, focusing on retrieving
all timestamps related to the soccer ball change of state, and (3) dense video
captioning, focusing on describing the broadcast with natural language and
anchored timestamps. The second theme, field understanding, relates to the
single task of (4) camera calibration, focusing on retrieving the intrinsic and
extrinsic camera parameters from images. The third and last theme, player
understanding, is composed of three low-level tasks related to extracting
information about the players: (5) re-identification, focusing on retrieving
the same players across multiple views, (6) multiple object tracking, focusing
on tracking players and the ball through unedited video streams, and (7) jersey
number recognition, focusing on recognizing the jersey number of players from
tracklets. Compared to the previous editions of the SoccerNet challenges, tasks
(2-3-7) are novel, including new annotations and data, task (4) was enhanced
with more data and annotations, and task (6) now focuses on end-to-end
approaches. More information on the tasks, challenges, and leaderboards are
available on https://www.soccer-net.org. Baselines and development kits can be
found on https://github.com/SoccerNet
A Filtering-Based Stochastic Gradient Estimation Method for Multivariate Pseudo-Linear Systems Using the Partial Coupling Concept
Solutions for enhancing parameter identification effects for multivariate equation-error systems in random interference and parameter coupling conditions are considered in this paper. For the purpose of avoiding the impact of colored noises on parameter identification precision, an appropriate filter is utilized to process the autoregressive moving average noise. Then, the filtered system is transformed into a number of sub-identification models based on system output dimensions. Founded on negative gradient search, a new multivariate filtering algorithm employing a partial coupling approach is proposed, and a conventional gradient algorithm is derived for comparison. Parameter identification for multivariate equation-error systems has a high estimation accuracy and an efficient calculation speed with the application of the partial coupling approach and the data filtering method. Two simulations are performed to reveal the proposed method’s effectiveness
- …