283 research outputs found
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
Contrastive language-image pretraining (CLIP) has demonstrated remarkable
success in various image tasks. However, how to extend CLIP with effective
temporal modeling is still an open and crucial problem. Existing factorized or
joint spatial-temporal modeling trades off between the efficiency and
performance. While modeling temporal information within straight through tube
is widely adopted in literature, we find that simple frame alignment already
provides enough essence without temporal attention. To this end, in this paper,
we proposed a novel Implicit Learnable Alignment (ILA) method, which minimizes
the temporal modeling effort while achieving incredibly high performance.
Specifically, for a frame pair, an interactive point is predicted in each
frame, serving as a mutual information rich region. By enhancing the features
around the interactive point, two frames are implicitly aligned. The aligned
features are then pooled into a single token, which is leveraged in the
subsequent spatial self-attention. Our method allows eliminating the costly or
insufficient temporal self-attention in video. Extensive experiments on
benchmarks demonstrate the superiority and generality of our module.
Particularly, the proposed ILA achieves a top-1 accuracy of 88.7% on
Kinetics-400 with much fewer FLOPs compared with Swin-L and ViViT-H. Code is
released at https://github.com/Francis-Rings/ILA .Comment: ICCV 2023 oral. 14 pages, 7 figures. Code released at
https://github.com/Francis-Rings/IL
MotionEditor: Editing Video Motion via Content-Aware Diffusion
Existing diffusion-based video editing models have made gorgeous advances for
editing attributes of a source video over time but struggle to manipulate the
motion information while preserving the original protagonist's appearance and
background. To address this, we propose MotionEditor, a diffusion model for
video motion editing. MotionEditor incorporates a novel content-aware motion
adapter into ControlNet to capture temporal motion correspondence. While
ControlNet enables direct generation based on skeleton poses, it encounters
challenges when modifying the source motion in the inverted noise due to
contradictory signals between the noise (source) and the condition (reference).
Our adapter complements ControlNet by involving source content to transfer
adapted control signals seamlessly. Further, we build up a two-branch
architecture (a reconstruction branch and an editing branch) with a
high-fidelity attention injection mechanism facilitating branch interaction.
This mechanism enables the editing branch to query the key and value from the
reconstruction branch in a decoupled manner, making the editing branch retain
the original background and protagonist appearance. We also propose a skeleton
alignment algorithm to address the discrepancies in pose size and position.
Experiments demonstrate the promising motion editing ability of MotionEditor,
both qualitatively and quantitatively.Comment: 18 pages, 15 figures. Project page at
https://francis-rings.github.io/MotionEditor
SVFormer: Semi-supervised Video Transformer for Action Recognition
Semi-supervised action recognition is a challenging but critical task due to
the high cost of video annotations. Existing approaches mainly use
convolutional neural networks, yet current revolutionary vision transformer
models have been less explored. In this paper, we investigate the use of
transformer models under the SSL setting for action recognition. To this end,
we introduce SVFormer, which adopts a steady pseudo-labeling framework (ie,
EMA-Teacher) to cope with unlabeled video samples. While a wide range of data
augmentations have been shown effective for semi-supervised image
classification, they generally produce limited results for video recognition.
We therefore introduce a novel augmentation strategy, Tube TokenMix, tailored
for video data where video clips are mixed via a mask with consistent masked
tokens over the temporal axis. In addition, we propose a temporal warping
augmentation to cover the complex temporal variation in videos, which stretches
selected frames to various temporal durations in the clip. Extensive
experiments on three datasets Kinetics-400, UCF-101, and HMDB-51 verify the
advantage of SVFormer. In particular, SVFormer outperforms the state-of-the-art
by 31.5% with fewer training epochs under the 1% labeling rate of Kinetics-400.
Our method can hopefully serve as a strong benchmark and encourage future
search on semi-supervised action recognition with Transformer networks
ResFormer: Scaling ViTs with Multi-Resolution Training
Vision Transformers (ViTs) have achieved overwhelming success, yet they
suffer from vulnerable resolution scalability, i.e., the performance drops
drastically when presented with input resolutions that are unseen during
training. We introduce, ResFormer, a framework that is built upon the seminal
idea of multi-resolution training for improved performance on a wide spectrum
of, mostly unseen, testing resolutions. In particular, ResFormer operates on
replicated images of different resolutions and enforces a scale consistency
loss to engage interactive information across different scales. More
importantly, to alternate among varying resolutions effectively, especially
novel ones in testing, we propose a global-local positional embedding strategy
that changes smoothly conditioned on input sizes. We conduct extensive
experiments for image classification on ImageNet. The results provide strong
quantitative evidence that ResFormer has promising scaling abilities towards a
wide range of resolutions. For instance, ResFormer-B-MR achieves a Top-1
accuracy of 75.86% and 81.72% when evaluated on relatively low and high
resolutions respectively (i.e., 96 and 640), which are 48% and 7.49% better
than DeiT-B. We also demonstrate, moreover, ResFormer is flexible and can be
easily extended to semantic segmentation, object detection and video action
recognition. Code is available at https://github.com/ruitian12/resformer.Comment: CVPR 202
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
Diffusion models have achieved significant success in image and video
generation. This motivates a growing interest in video editing tasks, where
videos are edited according to provided text descriptions. However, most
existing approaches only focus on video editing for short clips and rely on
time-consuming tuning or inference. We are the first to propose Video
Instruction Diffusion (VIDiff), a unified foundation model designed for a wide
range of video tasks. These tasks encompass both understanding tasks (such as
language-guided video object segmentation) and generative tasks (video editing
and enhancement). Our model can edit and translate the desired results within
seconds based on user instructions. Moreover, we design an iterative
auto-regressive method to ensure consistency in editing and enhancing long
videos. We provide convincing generative results for diverse input videos and
written instructions, both qualitatively and quantitatively. More examples can
be found at our website https://ChenHsing.github.io/VIDiff
Surgical Incision Induces Anxiety-Like Behavior and Amygdala Sensitization: Effects of Morphine and Gabapentin
The role of affective dimension in the postoperative pain is still poorly understood. The present study investigated the development of anxiety-like behavior and amygdala sensitization in incisional pain. Using hind-paw incision model in rats, we showed that surgical incision induced the anxiety-like behavior as determined by elevated plus-maze and open-field tests. Intraperitoneal (IP) morphine administration reversed mechanical allodynia and anxiety-like behavior in a dose-dependent manner. Gabapentin also partially reduced incision-evoked mechanical allodynia and anxiety-like behavior in a dose-dependent manner. After incision, the expression of phosphorylated cAMP response elements (CRE-) binding protein (p-CREB) was transiently upregulated in the central and basolateral nuclei in the bilateral amygdala. The upregulation of p-CREB was inhibited by morphine and gabapentin. The present study suggested that surgical incision could induce anxiety and amygdala sensitization that can be inhibited by morphine and gabapentin. Thus treatment of surgery-induced affective disturbances by morphine and gabapentin may be a potential important adjunct therapy in the postoperative pain management
-wave Superconductivity, Pseudogap, and the Phase Diagram of -- Model at Finite Temperature
Recently, a robust -wave superconductivity has been unveiled in the ground
state of the 2D -- model -- with both nearest-neighbor () and
next-nearest-neighbor () hoppings -- through the density matrix
renormalization group calculations in the ground state. In this study, we
exploit the state-of-the-art thermal tensor network approach to accurately
simulate the finite-temperature electron states of the -- model on
cylinders with widths up to . Our analysis suggests that in the dome-like
superconducting phase, the -wave pairing susceptibility exhibits a divergent
behavior with below the onset temperature
. Near the optimal doping, reaches its highest value of about
(). Above yet below a higher crossover
temperature , the magnetic susceptibility is suppressed, and the Fermi
surface also exhibits node-antinode structure, resembling the pseudogap
behaviors observed in cuprates. Our unbiased and accurate thermal tensor
network calculations obtain the phase diagram of the -- model with
, shedding light on the -wave superconducting and pseudogap phases
in the enigmatic cuprate phase diagram.Comment: 7+5 pages, 4+8 figure
Adjuvant treatment for patients with incidentally resected limited disease small cell lung cancer-a retrospective study.
Background
With the exception of very early-stage small cell lung cancer (SCLC), surgery is not typically recommended for this disease; however, incidental resection still occurs. After incidental resection, adjuvant salvage therapy is widely offered, but the evidence supporting its use is limited. This study aimed to explore proper adjuvant therapy for these incidentally resected SCLC cases.
Methods
Patients incidentally diagnosed with SCLC after surgery at the Shanghai Pulmonary Hospital in China from January 2005 to December 2014 were included in this study. The primary outcome was overall survival. Patients were classified into different group according to the type of adjuvant therapy they received and stratified by their pathological lymph node status. Patients' survival was analyzed using a Kaplan-Meier analysis and Cox regression analysis.
Results
A total of 161 patients were included in this study. Overall 5-year survival rate was 36.5%. For pathological N0 (pN0) cases (n=70), multivariable analysis revealed that adjuvant chemotherapy (ad-chemo) was associated with reduced risk of death [hazard ratio (HR): 0.373; 95% confidence interval (CI): 0.141-0.985, P=0.047] compared to omission of adjuvant therapy. For pathological N1 or N2 (pN1/2) cases (n=91), taking no adjuvant therapy cases as a reference, the multivariable analysis showed that ad-chemo was not associated with a lower risk of death (HR: 0.869; 95% CI: 0.459-1.645, P=0.666), while adjuvant chemo-radiotherapy (ad-CRT) was associated with a lower risk of death (HR: 0.279; 95% CI: 0.102-0.761, P=0.013).
Conclusions
Patients who incidentally receive surgical resection and are diagnosed with limited disease SCLC after resection should be offered adjuvant therapy as a salvage treatment. For incidentally resected pN0 cases, ad-chemo should be considered and for pN1/2 cases, ad-CRT should be received
- …