59 research outputs found
Learning the Degradation Distribution for Blind Image Super-Resolution
Synthetic high-resolution (HR) \& low-resolution (LR) pairs are widely used
in existing super-resolution (SR) methods. To avoid the domain gap between
synthetic and test images, most previous methods try to adaptively learn the
synthesizing (degrading) process via a deterministic model. However, some
degradations in real scenarios are stochastic and cannot be determined by the
content of the image. These deterministic models may fail to model the random
factors and content-independent parts of degradations, which will limit the
performance of the following SR models. In this paper, we propose a
probabilistic degradation model (PDM), which studies the degradation
as a random variable, and learns its distribution by modeling the
mapping from a priori random variable to . Compared
with previous deterministic degradation models, PDM could model more diverse
degradations and generate HR-LR pairs that may better cover the various
degradations of test images, and thus prevent the SR model from over-fitting to
specific ones. Extensive experiments have demonstrated that our degradation
model can help the SR model achieve better performance on different datasets.
The source codes are released at \url{[email protected]:greatlog/UnpairedSR.git}.Comment: Accepted to CVRP202
End-to-end Alternating Optimization for Real-World Blind Super Resolution
Blind Super-Resolution (SR) usually involves two sub-problems: 1) estimating
the degradation of the given low-resolution (LR) image; 2) super-resolving the
LR image to its high-resolution (HR) counterpart. Both problems are ill-posed
due to the information loss in the degrading process. Most previous methods try
to solve the two problems independently, but often fall into a dilemma: a good
super-resolved HR result requires an accurate degradation estimation, which
however, is difficult to be obtained without the help of original HR
information. To address this issue, instead of considering these two problems
independently, we adopt an alternating optimization algorithm, which can
estimate the degradation and restore the SR image in a single model.
Specifically, we design two convolutional neural modules, namely
\textit{Restorer} and \textit{Estimator}. \textit{Restorer} restores the SR
image based on the estimated degradation, and \textit{Estimator} estimates the
degradation with the help of the restored SR image. We alternate these two
modules repeatedly and unfold this process to form an end-to-end trainable
network. In this way, both \textit{Restorer} and \textit{Estimator} could get
benefited from the intermediate results of each other, and make each
sub-problem easier. Moreover, \textit{Restorer} and \textit{Estimator} are
optimized in an end-to-end manner, thus they could get more tolerant of the
estimation deviations of each other and cooperate better to achieve more robust
and accurate final results. Extensive experiments on both synthetic datasets
and real-world images show that the proposed method can largely outperform
state-of-the-art methods and produce more visually favorable results. The codes
are rleased at \url{https://github.com/greatlog/RealDAN.git}.Comment: Extension of our previous NeurIPS paper. Accepted to IJC
Intelligent control for predicting and mitigating major disruptions in magnetic confinement fusion
Magnetic confinement fusion is believed to be one of the promising paths that provides us with an infinite supply of an environment-friendly energy source, naturally contributing to a green economy and low-carbon development. Nevertheless, the major disruption of high temperature plasmas, a big threat to fusion devices, is still in the way of mankind accessing to fusion energy. Although a bunch of individual techniques have been proved to be feasible for the control, mitigation, and prediction of disruptions, complicated experimental environments make it hard to decide on specific control strategies. The traditional control approach, designing a series of independent controllers in a nested structure, cannot meet the needs of real-time complicated plasma control, which requires extended engineering expertise and complicated evaluation of system states referring to multiple plasma parameters. Fortunately, artificial intelligence (AI) offers potential solutions towards entirely resolving this troublesome issue. To simplify the control system, a radically novel idea for designing controllers via AI is brought forward in this work. Envisioned intelligent controllers should be developed to replace the traditional nested structure. The successful development of intelligent control is expected to effectively predict and mitigate major disruptions, which would definitely enhance fusion performance, and thus offers inspiring odds to improve the accessibility of sustainable fusion energy
Generative Multimodal Models are In-Context Learners
The human ability to easily solve multimodal tasks in context (i.e., with
only a few demonstrations or simple instructions), is what current multimodal
systems have largely struggled to imitate. In this work, we demonstrate that
the task-agnostic in-context learning capabilities of large multimodal models
can be significantly enhanced by effective scaling-up. We introduce Emu2, a
generative multimodal model with 37 billion parameters, trained on large-scale
multimodal sequences with a unified autoregressive objective. Emu2 exhibits
strong multimodal in-context learning abilities, even emerging to solve tasks
that require on-the-fly reasoning, such as visual prompting and object-grounded
generation. The model sets a new record on multiple multimodal understanding
tasks in few-shot settings. When instruction-tuned to follow specific
instructions, Emu2 further achieves new state-of-the-art on challenging tasks
such as question answering benchmarks for large multimodal models and
open-ended subject-driven generation. These achievements demonstrate that Emu2
can serve as a base model and general-purpose interface for a wide range of
multimodal tasks. Code and models are publicly available to facilitate future
research.Comment: Accepted to CVPR 2024. Project page:
https://baaivision.github.io/emu
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
A diffusion probabilistic model (DPM), which constructs a forward diffusion
process by gradually adding noise to data points and learns the reverse
denoising process to generate new samples, has been shown to handle complex
data distribution. Despite its recent success in image synthesis, applying DPMs
to video generation is still challenging due to high-dimensional data spaces.
Previous methods usually adopt a standard diffusion process, where frames in
the same video clip are destroyed with independent noises, ignoring the content
redundancy and temporal correlation. This work presents a decomposed diffusion
process via resolving the per-frame noise into a base noise that is shared
among all frames and a residual noise that varies along the time axis. The
denoising pipeline employs two jointly-learned networks to match the noise
decomposition accordingly. Experiments on various datasets confirm that our
approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based
alternatives in high-quality video generation. We further show that our
decomposed formulation can benefit from pre-trained image diffusion models and
well-support text-conditioned video creation.Comment: Accepted to CVPR202
- …