Search CORE

454 research outputs found

Fast Fourier Inception Networks for Occluded Video Prediction

Author: Li Ping
Xu Xianghua
Zhang Chenhan
Publication venue
Publication date: 17/06/2023
Field of study

Video prediction is a pixel-level task that generates future frames by employing the historical frames. There often exist continuous complex motions, such as object overlapping and scene occlusion in video, which poses great challenges to this task. Previous works either fail to well capture the long-term temporal dynamics or do not handle the occlusion masks. To address these issues, we develop the fully convolutional Fast Fourier Inception Networks for video prediction, termed \textit{FFINet}, which includes two primary components, \ie, the occlusion inpainter and the spatiotemporal translator. The former adopts the fast Fourier convolutions to enlarge the receptive field, such that the missing areas (occlusion) with complex geometric structures are filled by the inpainter. The latter employs the stacked Fourier transform inception module to learn the temporal evolution by group convolutions and the spatial movement by channel-wise Fourier convolutions, which captures both the local and the global spatiotemporal features. This encourages generating more realistic and high-quality future frames. To optimize the model, the recovery loss is imposed to the objective, \ie, minimizing the mean square error between the ground-truth frame and the recovery frame. Both quantitative and qualitative experimental results on five benchmarks, including Moving MNIST, TaxiBJ, Human3.6M, Caltech Pedestrian, and KTH, have demonstrated the superiority of the proposed approach. Our code is available at GitHub

arXiv.org e-Print Archive

Primary Culture of Adult Rat Heart Myocytes

Author: Colecraft Henry M.
Xu Xianghua
Publication venue: MyJove Corporation
Publication date
Field of study

Cultured primary adult rodent heart cells are an important model system for cardiovascular research. Nevertheless, establishment of robust, viable cultured adult myocytes can be a technically challenging, rate-limiting step for many researchers. Here we described a protocol to obtain a high yield of adult rat heart myocytes that remain viable in culture for several days. The heart is isolated and perfused with collagenase and protease under low Ca2+ conditions to recover single myocytes. Ca2+-tolerant cells are obtained by stepwise increases in extracellular Ca2+ concentration in three subsequent wash steps. Cells are filtered, resuspended in culture medium, and plated on laminin coated slips. Cultured myocytes obtained using this protocol are viable for up to four days and are suitable for most experiments including electrophysiology, biochemistry, imaging and molecular biology

Crossref

PubMed Central

Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation

Author: Chen Junjie
Li Ping
Lin Binbin
Xu Xianghua
Publication venue
Publication date: 17/06/2023
Field of study

Semantic segmentation plays an important role in widespread applications such as autonomous driving and robotic sensing. Traditional methods mostly use RGB images which are heavily affected by lighting conditions, \eg, darkness. Recent studies show thermal images are robust to the night scenario as a compensating modality for segmentation. However, existing works either simply fuse RGB-Thermal (RGB-T) images or adopt the encoder with the same structure for both the RGB stream and the thermal stream, which neglects the modality difference in segmentation under varying lighting conditions. Therefore, this work proposes a Residual Spatial Fusion Network (RSFNet) for RGB-T semantic segmentation. Specifically, we employ an asymmetric encoder to learn the compensating features of the RGB and the thermal images. To effectively fuse the dual-modality features, we generate the pseudo-labels by saliency detection to supervise the feature learning, and develop the Residual Spatial Fusion (RSF) module with structural re-parameterization to learn more promising features by spatially fusing the cross-modality features. RSF employs a hierarchical feature fusion to aggregate multi-level features, and applies the spatial weights with the residual connection to adaptively control the multi-spectral feature fusion by the confidence gate. Extensive experiments were carried out on two benchmarks, \ie, MFNet database and PST900 database. The results have shown the state-of-the-art segmentation performance of our method, which achieves a good balance between accuracy and speed

arXiv.org e-Print Archive

Fully Transformer-Equipped Architecture for End-to-End Referring Video Object Segmentation

Author: Li Ping
Xu Xianghua
Yuan Li
Zhang Yu
Publication venue
Publication date: 21/09/2023
Field of study

Referring Video Object Segmentation (RVOS) requires segmenting the object in video referred by a natural language query. Existing methods mainly rely on sophisticated pipelines to tackle such cross-modal task, and do not explicitly model the object-level spatial context which plays an important role in locating the referred object. Therefore, we propose an end-to-end RVOS framework completely built upon transformers, termed \textit{Fully Transformer-Equipped Architecture} (FTEA), which treats the RVOS task as a mask sequence learning problem and regards all the objects in video as candidate objects. Given a video clip with a text query, the visual-textual features are yielded by encoder, while the corresponding pixel-level and word-level features are aligned in terms of semantic similarity. To capture the object-level spatial context, we have developed the Stacked Transformer, which individually characterizes the visual appearance of each candidate object, whose feature map is decoded to the binary mask sequence in order directly. Finally, the model finds the best matching between mask sequence and text query. In addition, to diversify the generated masks for candidate objects, we impose a diversity loss on the model for capturing more accurate mask of the referred object. Empirical studies have shown the superiority of the proposed method on three benchmarks, e.g., FETA achieves 45.1% and 38.7% in terms of mAP on A2D Sentences (3782 videos) and J-HMDB Sentences (928 videos), respectively; it achieves 56.6% in terms of

\mathcal{J\&F}

on Ref-YouTube-VOS (3975 videos and 7451 objects). Particularly, compared to the best candidate method, it has a gain of 2.1% and 3.2% in terms of P

@

0.5 on the former two, respectively, while it has a gain of 2.9% in terms of

\mathcal{J}

on the latter one

arXiv.org e-Print Archive

Triple-View Knowledge Distillation for Semi-Supervised Semantic Segmentation

Author: Chen Junjie
Li Ping
Song Mingli
Xu Xianghua
Yuan Li
Publication venue
Publication date: 21/09/2023
Field of study

To alleviate the expensive human labeling, semi-supervised semantic segmentation employs a few labeled images and an abundant of unlabeled images to predict the pixel-level label map with the same size. Previous methods often adopt co-training using two convolutional networks with the same architecture but different initialization, which fails to capture the sufficiently diverse features. This motivates us to use tri-training and develop the triple-view encoder to utilize the encoders with different architectures to derive diverse features, and exploit the knowledge distillation skill to learn the complementary semantics among these encoders. Moreover, existing methods simply concatenate the features from both encoder and decoder, resulting in redundant features that require large memory cost. This inspires us to devise a dual-frequency decoder that selects those important features by projecting the features from the spatial domain to the frequency domain, where the dual-frequency channel attention mechanism is introduced to model the feature importance. Therefore, we propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation, including the triple-view encoder and the dual-frequency decoder. Extensive experiments were conducted on two benchmarks, \ie, Pascal VOC 2012 and Cityscapes, whose results verify the superiority of the proposed method with a good tradeoff between precision and inference speed

arXiv.org e-Print Archive

Mechanics of design and model development of CVC-plus roll curve

Author: Gong Dianyao
Jiang Zhengyi
Liu Xianghua
Wang Guodong
Xu Jianzhong
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2012
Field of study

The mathematic model of CVC-Plus work roll curve is built. The ratio of the initial shifting value to the target crown is determined, and the mathematical model considering the relationship between the coefficients A2, A3, A4, A5 and is established. According to the theoretical analysis, the distance between the maximum or minimum point of the high order equivalent crown for work roll with CVC-plus roll curve and the rolling central point is the times of the roll barrel length. In general, the initial shifting value of the CVC-plus roll curve is not equal to the initial shifting value of the 3-order CVC roll curve . The coefficient A1 can also be obtained by optimizing the target function with minimizing the axial force

Research Online

Pair-wise Layer Attention with Spatial Masking for Video Prediction

Author: Li Ping
Song Mingli
Xu Xianghua
Yang Zheng
Zhang Chenhan
Publication venue
Publication date: 19/11/2023
Field of study

Video prediction yields future frames by employing the historical frames and has exhibited its great potential in many applications, e.g., meteorological prediction, and autonomous driving. Previous works often decode the ultimate high-level semantic features to future frames without texture details, which deteriorates the prediction quality. Motivated by this, we develop a Pair-wise Layer Attention (PLA) module to enhance the layer-wise semantic dependency of the feature maps derived from the U-shape structure in Translator, by coupling low-level visual cues and high-level features. Hence, the texture details of predicted frames are enriched. Moreover, most existing methods capture the spatiotemporal dynamics by Translator, but fail to sufficiently utilize the spatial features of Encoder. This inspires us to design a Spatial Masking (SM) module to mask partial encoding features during pretraining, which adds the visibility of remaining feature pixels by Decoder. To this end, we present a Pair-wise Layer Attention with Spatial Masking (PLA-SM) framework for video prediction to capture the spatiotemporal dynamics, which reflect the motion trend. Extensive experiments and rigorous ablation studies on five benchmarks demonstrate the advantages of the proposed approach. The code is available at GitHub

arXiv.org e-Print Archive

Semiparametric regression analysis for alternating recurrent event data

Author: Huang Chiung‐yu
Lee Chi Hyun
Luo Xianghua
Xu Gongjun
Publication venue: 'Wiley'
Publication date: 23/11/2017
Field of study

Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/142558/1/sim7563_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/142558/2/sim7563.pd

Crossref

eScholarship - University of California

Deep Blue Documents at the University of Michigan

Glass formation and properties of Ge-Ga-Te-ZnI2 far infrared chalcohalide glasses

Author: Dai Shixun
He Yuju
Nie Qiuhua
Wang Xunsi
Xu Huijuan
Xu Tiefeng
Zhang Peiquan
Zhang Xianghua
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceIn order to develop novel far infrared window material, a series of Ge-Ga-Te-ZnI2 chalcohalide glasses were prepared by traditional melt-quenching method and their glass-forming region was determined also. Here, some measurements including X-ray diffraction (XRD), differential thermal analysis (DTA), UV-Vis-NIR absorption spectrum, and infrared optical transmission spectra were carried out. The allowed indirect transition optical band gap was calculated according to the classical Tauc equation. The results show that with the addition of ZnI2, the glass-forming ability and thermal stability are improved gradually. With the contents of ZnI2 increased from 5 to 20 at.%, continued blue-shifting occurs in the cutting-off absorption edge of short-wavelength and the values of indirect optical band gaps were observed with ranges from 0.596 to 0.626 eV in these glasses. These GeTe4.3-GaTe3-ZnI2 glasses show wide optical transmission and the infrared cut-off wavelengths are larger than 25 μm, which implies that the Ge-Ga-Te-ZnI2 chalcogenide glasses possess the potential of far-IR optical window applications

Crossref

HAL Descartes

HAL-Rennes 1

Adversarial Attacks on Video Object Segmentation with Hard Region Discovery

Author: Li Ping
Xu Xianghua
Yuan Li
Zhang Xiaoqin
Zhang Yu
Zhao Jian
Publication venue
Publication date: 24/09/2023
Field of study

Video object segmentation has been applied to various computer vision tasks, such as video editing, autonomous driving, and human-robot interaction. However, the methods based on deep neural networks are vulnerable to adversarial examples, which are the inputs attacked by almost human-imperceptible perturbations, and the adversary (i.e., attacker) will fool the segmentation model to make incorrect pixel-level predictions. This will rise the security issues in highly-demanding tasks because small perturbations to the input video will result in potential attack risks. Though adversarial examples have been extensively used for classification, it is rarely studied in video object segmentation. Existing related methods in computer vision either require prior knowledge of categories or cannot be directly applied due to the special design for certain tasks, failing to consider the pixel-wise region attack. Hence, this work develops an object-agnostic adversary that has adversarial impacts on VOS by first-frame attacking via hard region discovery. Particularly, the gradients from the segmentation model are exploited to discover the easily confused region, in which it is difficult to identify the pixel-wise objects from the background in a frame. This provides a hardness map that helps to generate perturbations with a stronger adversarial power for attacking the first frame. Empirical studies on three benchmarks indicate that our attacker significantly degrades the performance of several state-of-the-art video object segmentation models

arXiv.org e-Print Archive