Search CORE

59 research outputs found

Learning the Degradation Distribution for Blind Image Super-Resolution

Author: Huang Yan
Li Shang
Luo Zhengxiong
Tan Tieniu
Wang Liang
Publication venue
Publication date: 14/01/2024
Field of study

Synthetic high-resolution (HR) \& low-resolution (LR) pairs are widely used in existing super-resolution (SR) methods. To avoid the domain gap between synthetic and test images, most previous methods try to adaptively learn the synthesizing (degrading) process via a deterministic model. However, some degradations in real scenarios are stochastic and cannot be determined by the content of the image. These deterministic models may fail to model the random factors and content-independent parts of degradations, which will limit the performance of the following SR models. In this paper, we propose a probabilistic degradation model (PDM), which studies the degradation

\mathbf{D}

as a random variable, and learns its distribution by modeling the mapping from a priori random variable

\mathbf{z}

\mathbf{D}

. Compared with previous deterministic degradation models, PDM could model more diverse degradations and generate HR-LR pairs that may better cover the various degradations of test images, and thus prevent the SR model from over-fitting to specific ones. Extensive experiments have demonstrated that our degradation model can help the SR model achieve better performance on different datasets. The source codes are released at \url{[email protected]:greatlog/UnpairedSR.git}.Comment: Accepted to CVRP202

arXiv.org e-Print Archive

End-to-end Alternating Optimization for Real-World Blind Super Resolution

Author: Huang Yan
Li Shang
Luo Zhengxiong
Tan Tieniu
Wang Liang
Publication venue
Publication date: 17/08/2023
Field of study

Blind Super-Resolution (SR) usually involves two sub-problems: 1) estimating the degradation of the given low-resolution (LR) image; 2) super-resolving the LR image to its high-resolution (HR) counterpart. Both problems are ill-posed due to the information loss in the degrading process. Most previous methods try to solve the two problems independently, but often fall into a dilemma: a good super-resolved HR result requires an accurate degradation estimation, which however, is difficult to be obtained without the help of original HR information. To address this issue, instead of considering these two problems independently, we adopt an alternating optimization algorithm, which can estimate the degradation and restore the SR image in a single model. Specifically, we design two convolutional neural modules, namely \textit{Restorer} and \textit{Estimator}. \textit{Restorer} restores the SR image based on the estimated degradation, and \textit{Estimator} estimates the degradation with the help of the restored SR image. We alternate these two modules repeatedly and unfold this process to form an end-to-end trainable network. In this way, both \textit{Restorer} and \textit{Estimator} could get benefited from the intermediate results of each other, and make each sub-problem easier. Moreover, \textit{Restorer} and \textit{Estimator} are optimized in an end-to-end manner, thus they could get more tolerant of the estimation deviations of each other and cooperate better to achieve more robust and accurate final results. Extensive experiments on both synthetic datasets and real-world images show that the proposed method can largely outperform state-of-the-art methods and produce more visually favorable results. The codes are rleased at \url{https://github.com/greatlog/RealDAN.git}.Comment: Extension of our previous NeurIPS paper. Accepted to IJC

arXiv.org e-Print Archive

Machine learning application in complicated burning plasmas for future magnetic fusion exploration

Author: Hui Li
Xiaolong Zhu
Yifei Zhao
Zhengxiong Wang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2022
Field of study

Directory of Open Access Journals

Intelligent control for predicting and mitigating major disruptions in magnetic confinement fusion

Author: Hui Li
Tong Liu
Weikang Tang
Zhengxiong Wang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2022
Field of study

Magnetic confinement fusion is believed to be one of the promising paths that provides us with an infinite supply of an environment-friendly energy source, naturally contributing to a green economy and low-carbon development. Nevertheless, the major disruption of high temperature plasmas, a big threat to fusion devices, is still in the way of mankind accessing to fusion energy. Although a bunch of individual techniques have been proved to be feasible for the control, mitigation, and prediction of disruptions, complicated experimental environments make it hard to decide on specific control strategies. The traditional control approach, designing a series of independent controllers in a nested structure, cannot meet the needs of real-time complicated plasma control, which requires extended engineering expertise and complicated evaluation of system states referring to multiple plasma parameters. Fortunately, artificial intelligence (AI) offers potential solutions towards entirely resolving this troublesome issue. To simplify the control system, a radically novel idea for designing controllers via AI is brought forward in this work. Envisioned intelligent controllers should be developed to replace the traditional nested structure. The successful development of intelligent control is expected to effectively predict and mitigate major disruptions, which would definitely enhance fusion performance, and thus offers inspiring odds to improve the accessibility of sustainable fusion energy

Directory of Open Access Journals

Generative Multimodal Models are In-Context Learners

Author: Cui Yufeng
Huang Tiejun
Liu Jingjing
Luo Zhengxiong
Rao Yongming
Sun Quan
Wang Xinlong
Wang Yueze
Yu Qiying
Zhang Fan
Zhang Xiaosong
Publication venue
Publication date: 07/05/2024
Field of study

The human ability to easily solve multimodal tasks in context (i.e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate. In this work, we demonstrate that the task-agnostic in-context learning capabilities of large multimodal models can be significantly enhanced by effective scaling-up. We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences with a unified autoregressive objective. Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning, such as visual prompting and object-grounded generation. The model sets a new record on multiple multimodal understanding tasks in few-shot settings. When instruction-tuned to follow specific instructions, Emu2 further achieves new state-of-the-art on challenging tasks such as question answering benchmarks for large multimodal models and open-ended subject-driven generation. These achievements demonstrate that Emu2 can serve as a base model and general-purpose interface for a wide range of multimodal tasks. Code and models are publicly available to facilitate future research.Comment: Accepted to CVPR 2024. Project page: https://baaivision.github.io/emu

arXiv.org e-Print Archive

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

Author: Chen Dayou
Huang Yan
Luo Zhengxiong
Shen Yujun
Tan Tieniu
Wang Liang
Zhang Yingya
Zhao Deli
Zhou Jingren
Publication venue
Publication date: 12/10/2023
Field of study

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution. Despite its recent success in image synthesis, applying DPMs to video generation is still challenging due to high-dimensional data spaces. Previous methods usually adopt a standard diffusion process, where frames in the same video clip are destroyed with independent noises, ignoring the content redundancy and temporal correlation. This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. The denoising pipeline employs two jointly-learned networks to match the noise decomposition accordingly. Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation. We further show that our decomposed formulation can benefit from pre-trained image diffusion models and well-support text-conditioned video creation.Comment: Accepted to CVPR202

arXiv.org e-Print Archive