Search CORE

1,157 research outputs found

MotionEditor: Editing Video Motion via Content-Aware Diffusion

Author: Cheng Zhi-Qi
Dai Qi
Han Xintong
Hu Han
Jiang Yu-Gang
Tu Shuyuan
Wu Zuxuan
Publication venue
Publication date: 30/11/2023
Field of study

Existing diffusion-based video editing models have made gorgeous advances for editing attributes of a source video over time but struggle to manipulate the motion information while preserving the original protagonist's appearance and background. To address this, we propose MotionEditor, a diffusion model for video motion editing. MotionEditor incorporates a novel content-aware motion adapter into ControlNet to capture temporal motion correspondence. While ControlNet enables direct generation based on skeleton poses, it encounters challenges when modifying the source motion in the inverted noise due to contradictory signals between the noise (source) and the condition (reference). Our adapter complements ControlNet by involving source content to transfer adapted control signals seamlessly. Further, we build up a two-branch architecture (a reconstruction branch and an editing branch) with a high-fidelity attention injection mechanism facilitating branch interaction. This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance. We also propose a skeleton alignment algorithm to address the discrepancies in pose size and position. Experiments demonstrate the promising motion editing ability of MotionEditor, both qualitatively and quantitatively.Comment: 18 pages, 15 figures. Project page at https://francis-rings.github.io/MotionEditor

arXiv.org e-Print Archive

Solving Einstein equations using deep learning

Author: Li Chen-Qi
Li Zhi-Han
Pang Long-Gang
Publication venue
Publication date: 13/09/2023
Field of study

Einstein field equations are notoriously challenging to solve due to their complex mathematical form, with few analytical solutions available in the absence of highly symmetric systems or ideal matter distribution. However, accurate solutions are crucial, particularly in systems with strong gravitational field such as black holes or neutron stars. In this work, we use neural networks and auto differentiation to solve the Einstein field equations numerically inspired by the idea of physics-informed neural networks (PINNs). By utilizing these techniques, we successfully obtain the Schwarzschild metric and the charged Schwarzschild metric given the energy-momentum tensor of matter. This innovative method could open up a different way for solving space-time coupled Einstein field equations and become an integral part of numerical relativity.Comment: 18 pages, 4 figure

arXiv.org e-Print Archive

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

Author: Cheng Zhi-Qi
Dai Qi
Hu Han
Jiang Yu-Gang
Tu Shuyuan
Wu Zuxuan
Publication venue
Publication date: 15/08/2023
Field of study

Contrastive language-image pretraining (CLIP) has demonstrated remarkable success in various image tasks. However, how to extend CLIP with effective temporal modeling is still an open and crucial problem. Existing factorized or joint spatial-temporal modeling trades off between the efficiency and performance. While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention. To this end, in this paper, we proposed a novel Implicit Learnable Alignment (ILA) method, which minimizes the temporal modeling effort while achieving incredibly high performance. Specifically, for a frame pair, an interactive point is predicted in each frame, serving as a mutual information rich region. By enhancing the features around the interactive point, two frames are implicitly aligned. The aligned features are then pooled into a single token, which is leveraged in the subsequent spatial self-attention. Our method allows eliminating the costly or insufficient temporal self-attention in video. Extensive experiments on benchmarks demonstrate the superiority and generality of our module. Particularly, the proposed ILA achieves a top-1 accuracy of 88.7% on Kinetics-400 with much fewer FLOPs compared with Swin-L and ViViT-H. Code is released at https://github.com/Francis-Rings/ILA .Comment: ICCV 2023 oral. 14 pages, 7 figures. Code released at https://github.com/Francis-Rings/IL

arXiv.org e-Print Archive