Search CORE

280 research outputs found

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

Author: Cheng Zhi-Qi
Dai Qi
Hu Han
Jiang Yu-Gang
Tu Shuyuan
Wu Zuxuan
Publication venue
Publication date: 15/08/2023
Field of study

Contrastive language-image pretraining (CLIP) has demonstrated remarkable success in various image tasks. However, how to extend CLIP with effective temporal modeling is still an open and crucial problem. Existing factorized or joint spatial-temporal modeling trades off between the efficiency and performance. While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention. To this end, in this paper, we proposed a novel Implicit Learnable Alignment (ILA) method, which minimizes the temporal modeling effort while achieving incredibly high performance. Specifically, for a frame pair, an interactive point is predicted in each frame, serving as a mutual information rich region. By enhancing the features around the interactive point, two frames are implicitly aligned. The aligned features are then pooled into a single token, which is leveraged in the subsequent spatial self-attention. Our method allows eliminating the costly or insufficient temporal self-attention in video. Extensive experiments on benchmarks demonstrate the superiority and generality of our module. Particularly, the proposed ILA achieves a top-1 accuracy of 88.7% on Kinetics-400 with much fewer FLOPs compared with Swin-L and ViViT-H. Code is released at https://github.com/Francis-Rings/ILA .Comment: ICCV 2023 oral. 14 pages, 7 figures. Code released at https://github.com/Francis-Rings/IL

arXiv.org e-Print Archive

MotionEditor: Editing Video Motion via Content-Aware Diffusion

Author: Cheng Zhi-Qi
Dai Qi
Han Xintong
Hu Han
Jiang Yu-Gang
Tu Shuyuan
Wu Zuxuan
Publication venue
Publication date: 30/11/2023
Field of study

Existing diffusion-based video editing models have made gorgeous advances for editing attributes of a source video over time but struggle to manipulate the motion information while preserving the original protagonist's appearance and background. To address this, we propose MotionEditor, a diffusion model for video motion editing. MotionEditor incorporates a novel content-aware motion adapter into ControlNet to capture temporal motion correspondence. While ControlNet enables direct generation based on skeleton poses, it encounters challenges when modifying the source motion in the inverted noise due to contradictory signals between the noise (source) and the condition (reference). Our adapter complements ControlNet by involving source content to transfer adapted control signals seamlessly. Further, we build up a two-branch architecture (a reconstruction branch and an editing branch) with a high-fidelity attention injection mechanism facilitating branch interaction. This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance. We also propose a skeleton alignment algorithm to address the discrepancies in pose size and position. Experiments demonstrate the promising motion editing ability of MotionEditor, both qualitatively and quantitatively.Comment: 18 pages, 15 figures. Project page at https://francis-rings.github.io/MotionEditor

arXiv.org e-Print Archive

SVFormer: Semi-supervised Video Transformer for Action Recognition

Author: Chen Jingjing
Dai Qi
Hu Han
Jiang Yu-Gang
Wu Zuxuan
Xing Zhen
Publication venue
Publication date: 06/04/2023
Field of study

Semi-supervised action recognition is a challenging but critical task due to the high cost of video annotations. Existing approaches mainly use convolutional neural networks, yet current revolutionary vision transformer models have been less explored. In this paper, we investigate the use of transformer models under the SSL setting for action recognition. To this end, we introduce SVFormer, which adopts a steady pseudo-labeling framework (ie, EMA-Teacher) to cope with unlabeled video samples. While a wide range of data augmentations have been shown effective for semi-supervised image classification, they generally produce limited results for video recognition. We therefore introduce a novel augmentation strategy, Tube TokenMix, tailored for video data where video clips are mixed via a mask with consistent masked tokens over the temporal axis. In addition, we propose a temporal warping augmentation to cover the complex temporal variation in videos, which stretches selected frames to various temporal durations in the clip. Extensive experiments on three datasets Kinetics-400, UCF-101, and HMDB-51 verify the advantage of SVFormer. In particular, SVFormer outperforms the state-of-the-art by 31.5% with fewer training epochs under the 1% labeling rate of Kinetics-400. Our method can hopefully serve as a strong benchmark and encourage future search on semi-supervised action recognition with Transformer networks

arXiv.org e-Print Archive

ResFormer: Scaling ViTs with Multi-Resolution Training

Author: Dai Qi
Hu Han
Jiang Yu-Gang
Qiao Yu
Tian Rui
Wu Zuxuan
Publication venue
Publication date: 03/04/2023
Field of study

Vision Transformers (ViTs) have achieved overwhelming success, yet they suffer from vulnerable resolution scalability, i.e., the performance drops drastically when presented with input resolutions that are unseen during training. We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions. In particular, ResFormer operates on replicated images of different resolutions and enforces a scale consistency loss to engage interactive information across different scales. More importantly, to alternate among varying resolutions effectively, especially novel ones in testing, we propose a global-local positional embedding strategy that changes smoothly conditioned on input sizes. We conduct extensive experiments for image classification on ImageNet. The results provide strong quantitative evidence that ResFormer has promising scaling abilities towards a wide range of resolutions. For instance, ResFormer-B-MR achieves a Top-1 accuracy of 75.86% and 81.72% when evaluated on relatively low and high resolutions respectively (i.e., 96 and 640), which are 48% and 7.49% better than DeiT-B. We also demonstrate, moreover, ResFormer is flexible and can be easily extended to semantic segmentation, object detection and video action recognition. Code is available at https://github.com/ruitian12/resformer.Comment: CVPR 202

arXiv.org e-Print Archive

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models

Author: Dai Qi
Hu Han
Jiang Yu-Gang
Wu Zuxuan
Xing Zhen
Zhang Hui
Zhang Zihao
Publication venue
Publication date: 30/11/2023
Field of study

Diffusion models have achieved significant success in image and video generation. This motivates a growing interest in video editing tasks, where videos are edited according to provided text descriptions. However, most existing approaches only focus on video editing for short clips and rely on time-consuming tuning or inference. We are the first to propose Video Instruction Diffusion (VIDiff), a unified foundation model designed for a wide range of video tasks. These tasks encompass both understanding tasks (such as language-guided video object segmentation) and generative tasks (video editing and enhancement). Our model can edit and translate the desired results within seconds based on user instructions. Moreover, we design an iterative auto-regressive method to ensure consistency in editing and enhancing long videos. We provide convincing generative results for diverse input videos and written instructions, both qualitatively and quantitatively. More examples can be found at our website https://ChenHsing.github.io/VIDiff

arXiv.org e-Print Archive

Surgical Incision Induces Anxiety-Like Behavior and Amygdala Sensitization: Effects of Morphine and Gabapentin

Author: Dai Ru-Ping
Li Chang-Qi
Luo Xue-Gang
Wang Juan
Zhang Jian-Wei
Zhou Xin-Fu
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2010
Field of study

The role of affective dimension in the postoperative pain is still poorly understood. The present study investigated the development of anxiety-like behavior and amygdala sensitization in incisional pain. Using hind-paw incision model in rats, we showed that surgical incision induced the anxiety-like behavior as determined by elevated plus-maze and open-field tests. Intraperitoneal (IP) morphine administration reversed mechanical allodynia and anxiety-like behavior in a dose-dependent manner. Gabapentin also partially reduced incision-evoked mechanical allodynia and anxiety-like behavior in a dose-dependent manner. After incision, the expression of phosphorylated cAMP response elements (CRE-) binding protein (p-CREB) was transiently upregulated in the central and basolateral nuclei in the bilateral amygdala. The upregulation of p-CREB was inhibited by morphine and gabapentin. The present study suggested that surgical incision could induce anxiety and amygdala sensitization that can be inhibited by morphine and gabapentin. Thus treatment of surgery-induced affective disturbances by morphine and gabapentin may be a potential important adjunct therapy in the postoperative pain management

Crossref

Directory of Open Access Journals

PubMed Central

Flinders Academic Commons

$d$ -wave Superconductivity, Pseudogap, and the Phase Diagram of $t$ - $t'$ - $J$ Model at Finite Temperature

Author: Chen Bin-Bin
Gong Shou-Shu
Li Qiaoyi
Li Wei
Lu Xin
Qi Yang
Qu Dai-Wei
Su Gang
Publication venue
Publication date: 08/10/2023
Field of study

Recently, a robust

d

-wave superconductivity has been unveiled in the ground state of the 2D

t

t'

J

model -- with both nearest-neighbor (

t

) and next-nearest-neighbor (

t'

) hoppings -- through the density matrix renormalization group calculations in the ground state. In this study, we exploit the state-of-the-art thermal tensor network approach to accurately simulate the finite-temperature electron states of the

t

t'

J

model on cylinders with widths up to

W=6

. Our analysis suggests that in the dome-like superconducting phase, the

d

-wave pairing susceptibility exhibits a divergent behavior with

\chi_\textrm{SC} \propto 1/T^\alpha

below the onset temperature

T_c^*

. Near the optimal doping,

T_c^*

reaches its highest value of about

0.05 t

(

\equiv 0.15 J

). Above

T_c^*

yet below a higher crossover temperature

T^*

, the magnetic susceptibility is suppressed, and the Fermi surface also exhibits node-antinode structure, resembling the pseudogap behaviors observed in cuprates. Our unbiased and accurate thermal tensor network calculations obtain the phase diagram of the

t

t'

J

model with

t'/t>0

, shedding light on the

d

-wave superconducting and pseudogap phases in the enigmatic cuprate phase diagram.Comment: 7+5 pages, 4+8 figure

arXiv.org e-Print Archive

Adjuvant treatment for patients with incidentally resected limited disease small cell lung cancer-a retrospective study.

Author: Addeo Alfredo
Antonoff Mara B
Dai Jie
Friedlaender Alex
Grossi Francesco
Guo Yan-Hua
Jiang Ge-Ning
Jin Kai-Qi
Kocher Gregor
Li Jia-Qi
Liu Xiao-Gang
Minervini Fabrizio
Wu Chun-Xiao
Zhang Peng
Zhu Yu-Ming
Publication venue: 'AME Publishing Company'
Publication date: 01/01/2022
Field of study

Background With the exception of very early-stage small cell lung cancer (SCLC), surgery is not typically recommended for this disease; however, incidental resection still occurs. After incidental resection, adjuvant salvage therapy is widely offered, but the evidence supporting its use is limited. This study aimed to explore proper adjuvant therapy for these incidentally resected SCLC cases. Methods Patients incidentally diagnosed with SCLC after surgery at the Shanghai Pulmonary Hospital in China from January 2005 to December 2014 were included in this study. The primary outcome was overall survival. Patients were classified into different group according to the type of adjuvant therapy they received and stratified by their pathological lymph node status. Patients' survival was analyzed using a Kaplan-Meier analysis and Cox regression analysis. Results A total of 161 patients were included in this study. Overall 5-year survival rate was 36.5%. For pathological N0 (pN0) cases (n=70), multivariable analysis revealed that adjuvant chemotherapy (ad-chemo) was associated with reduced risk of death [hazard ratio (HR): 0.373; 95% confidence interval (CI): 0.141-0.985, P=0.047] compared to omission of adjuvant therapy. For pathological N1 or N2 (pN1/2) cases (n=91), taking no adjuvant therapy cases as a reference, the multivariable analysis showed that ad-chemo was not associated with a lower risk of death (HR: 0.869; 95% CI: 0.459-1.645, P=0.666), while adjuvant chemo-radiotherapy (ad-CRT) was associated with a lower risk of death (HR: 0.279; 95% CI: 0.102-0.761, P=0.013). Conclusions Patients who incidentally receive surgical resection and are diagnosed with limited disease SCLC after resection should be offered adjuvant therapy as a salvage treatment. For incidentally resected pN0 cases, ad-chemo should be considered and for pN1/2 cases, ad-CRT should be received

Archivio istituzionale della ricerca - Università dell'Insubria

PubMed Central

Bern Open Repository and Information System (BORIS)

Archive ouverte UNIGE

Economic burden and its associated factors of hospitalized patients infected with A (H7N9) virus: a retrospective study in Eastern China, 2013–2014

Author: Chang-Cheng Li
Chang-Jun Bao
Chang-sha Xu
Chuan-Wu Sun
Cong Chen
Fen-Yang Tang
Hao-Di Huang
Hui-Yan Yu
Ke Xu
Lei Hong
Li-Ling Chen
Lun-Hui Xiang
Ming-Hao Zhou
Qi-gang Dai
Qiang Gao
Shan-Hui Chen
Wen-Jun Dai
Xian Qi
Xiang Huo
Xing-Yang Pan
Yin Zhou
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Springer - Publisher Connector