6,434 research outputs found
An Anomaly-Based Method for Identifying Signals of Spring and Autumn Low-Temperature Events in the Yangtze River Valley, China
June 2015 QIAN ET AL. Vol. 54 1216-123
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
Contrastive language-image pretraining (CLIP) has demonstrated remarkable
success in various image tasks. However, how to extend CLIP with effective
temporal modeling is still an open and crucial problem. Existing factorized or
joint spatial-temporal modeling trades off between the efficiency and
performance. While modeling temporal information within straight through tube
is widely adopted in literature, we find that simple frame alignment already
provides enough essence without temporal attention. To this end, in this paper,
we proposed a novel Implicit Learnable Alignment (ILA) method, which minimizes
the temporal modeling effort while achieving incredibly high performance.
Specifically, for a frame pair, an interactive point is predicted in each
frame, serving as a mutual information rich region. By enhancing the features
around the interactive point, two frames are implicitly aligned. The aligned
features are then pooled into a single token, which is leveraged in the
subsequent spatial self-attention. Our method allows eliminating the costly or
insufficient temporal self-attention in video. Extensive experiments on
benchmarks demonstrate the superiority and generality of our module.
Particularly, the proposed ILA achieves a top-1 accuracy of 88.7% on
Kinetics-400 with much fewer FLOPs compared with Swin-L and ViViT-H. Code is
released at https://github.com/Francis-Rings/ILA .Comment: ICCV 2023 oral. 14 pages, 7 figures. Code released at
https://github.com/Francis-Rings/IL
New results on the dynamics of critical collapse
We study the dynamics of critical collapse of a spherically symmetric scalar
field. Approximate analytic expressions for the metric functions and matter
field in the large-radius region are obtained. It is found that because of the
boundary conditions at the center, in the central region, the equation of
motion for the scalar field is reduced to the flat-spacetime form. On the other
hand, due to the connection to its neighbouring region where gravity plays an
important role, the scalar field in the central region feels the gravitational
effects indirectly.Comment: 6 pages, 5 figure
MotionEditor: Editing Video Motion via Content-Aware Diffusion
Existing diffusion-based video editing models have made gorgeous advances for
editing attributes of a source video over time but struggle to manipulate the
motion information while preserving the original protagonist's appearance and
background. To address this, we propose MotionEditor, a diffusion model for
video motion editing. MotionEditor incorporates a novel content-aware motion
adapter into ControlNet to capture temporal motion correspondence. While
ControlNet enables direct generation based on skeleton poses, it encounters
challenges when modifying the source motion in the inverted noise due to
contradictory signals between the noise (source) and the condition (reference).
Our adapter complements ControlNet by involving source content to transfer
adapted control signals seamlessly. Further, we build up a two-branch
architecture (a reconstruction branch and an editing branch) with a
high-fidelity attention injection mechanism facilitating branch interaction.
This mechanism enables the editing branch to query the key and value from the
reconstruction branch in a decoupled manner, making the editing branch retain
the original background and protagonist appearance. We also propose a skeleton
alignment algorithm to address the discrepancies in pose size and position.
Experiments demonstrate the promising motion editing ability of MotionEditor,
both qualitatively and quantitatively.Comment: 18 pages, 15 figures. Project page at
https://francis-rings.github.io/MotionEditor
- …