1,157 research outputs found
MotionEditor: Editing Video Motion via Content-Aware Diffusion
Existing diffusion-based video editing models have made gorgeous advances for
editing attributes of a source video over time but struggle to manipulate the
motion information while preserving the original protagonist's appearance and
background. To address this, we propose MotionEditor, a diffusion model for
video motion editing. MotionEditor incorporates a novel content-aware motion
adapter into ControlNet to capture temporal motion correspondence. While
ControlNet enables direct generation based on skeleton poses, it encounters
challenges when modifying the source motion in the inverted noise due to
contradictory signals between the noise (source) and the condition (reference).
Our adapter complements ControlNet by involving source content to transfer
adapted control signals seamlessly. Further, we build up a two-branch
architecture (a reconstruction branch and an editing branch) with a
high-fidelity attention injection mechanism facilitating branch interaction.
This mechanism enables the editing branch to query the key and value from the
reconstruction branch in a decoupled manner, making the editing branch retain
the original background and protagonist appearance. We also propose a skeleton
alignment algorithm to address the discrepancies in pose size and position.
Experiments demonstrate the promising motion editing ability of MotionEditor,
both qualitatively and quantitatively.Comment: 18 pages, 15 figures. Project page at
https://francis-rings.github.io/MotionEditor
Solving Einstein equations using deep learning
Einstein field equations are notoriously challenging to solve due to their
complex mathematical form, with few analytical solutions available in the
absence of highly symmetric systems or ideal matter distribution. However,
accurate solutions are crucial, particularly in systems with strong
gravitational field such as black holes or neutron stars. In this work, we use
neural networks and auto differentiation to solve the Einstein field equations
numerically inspired by the idea of physics-informed neural networks (PINNs).
By utilizing these techniques, we successfully obtain the Schwarzschild metric
and the charged Schwarzschild metric given the energy-momentum tensor of
matter. This innovative method could open up a different way for solving
space-time coupled Einstein field equations and become an integral part of
numerical relativity.Comment: 18 pages, 4 figure
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
Contrastive language-image pretraining (CLIP) has demonstrated remarkable
success in various image tasks. However, how to extend CLIP with effective
temporal modeling is still an open and crucial problem. Existing factorized or
joint spatial-temporal modeling trades off between the efficiency and
performance. While modeling temporal information within straight through tube
is widely adopted in literature, we find that simple frame alignment already
provides enough essence without temporal attention. To this end, in this paper,
we proposed a novel Implicit Learnable Alignment (ILA) method, which minimizes
the temporal modeling effort while achieving incredibly high performance.
Specifically, for a frame pair, an interactive point is predicted in each
frame, serving as a mutual information rich region. By enhancing the features
around the interactive point, two frames are implicitly aligned. The aligned
features are then pooled into a single token, which is leveraged in the
subsequent spatial self-attention. Our method allows eliminating the costly or
insufficient temporal self-attention in video. Extensive experiments on
benchmarks demonstrate the superiority and generality of our module.
Particularly, the proposed ILA achieves a top-1 accuracy of 88.7% on
Kinetics-400 with much fewer FLOPs compared with Swin-L and ViViT-H. Code is
released at https://github.com/Francis-Rings/ILA .Comment: ICCV 2023 oral. 14 pages, 7 figures. Code released at
https://github.com/Francis-Rings/IL
- …