Search CORE

268 research outputs found

Semantic Adversarial Network with Multi-scale Pyramid Attention for Video Classification

Author: Deng Cheng
Li Chao
Tao Dapeng
Wang Hao
Xie De
Publication venue
Publication date: 05/03/2019
Field of study

Two-stream architecture have shown strong performance in video classification task. The key idea is to learn spatio-temporal features by fusing convolutional networks spatially and temporally. However, there are some problems within such architecture. First, it relies on optical flow to model temporal information, which are often expensive to compute and store. Second, it has limited ability to capture details and local context information for video data. Third, it lacks explicit semantic guidance that greatly decrease the classification performance. In this paper, we proposed a new two-stream based deep framework for video classification to discover spatial and temporal information only from RGB frames, moreover, the multi-scale pyramid attention (MPA) layer and the semantic adversarial learning (SAL) module is introduced and integrated in our framework. The MPA enables the network capturing global and local feature to generate a comprehensive representation for video, and the SAL can make this representation gradually approximate to the real video semantics in an adversarial manner. Experimental results on two public benchmarks demonstrate our proposed methods achieves state-of-the-art results on standard video datasets

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer

Author: Chen Dapeng
Li Cheng
Liu Ruijin
Lu Ning
Peng Wei
Yuan Zejian
Publication venue
Publication date: 28/08/2023
Field of study

We present PBFormer, an efficient yet powerful scene text detector that unifies the transformer with a novel text shape representation Polynomial Band (PB). The representation has four polynomial curves to fit a text's top, bottom, left, and right sides, which can capture a text with a complex shape by varying polynomial coefficients. PB has appealing features compared with conventional representations: 1) It can model different curvatures with a fixed number of parameters, while polygon-points-based methods need to utilize a different number of points. 2) It can distinguish adjacent or overlapping texts as they have apparent different curve coefficients, while segmentation-based or points-based methods suffer from adhesive spatial positions. PBFormer combines the PB with the transformer, which can directly generate smooth text contours sampled from predicted curves without interpolation. A parameter-free cross-scale pixel attention (CPA) module is employed to highlight the feature map of a suitable scale while suppressing the other feature maps. The simple operation can help detect small-scale texts and is compatible with the one-stage DETR framework, where no postprocessing exists for NMS. Furthermore, PBFormer is trained with a shape-contained loss, which not only enforces the piecewise alignment between the ground truth and the predicted curves but also makes curves' positions and shapes consistent with each other. Without bells and whistles about text pre-training, our method is superior to the previous state-of-the-art text detectors on the arbitrary-shaped text datasets.Comment: 9 pages, 8 figures, accepted by ACM MM 202

arXiv.org e-Print Archive

Viia-hand: a Reach-and-grasp Restoration System Integrating Voice interaction, Computer vision and Auditory feedback for Blind Amputees

Author: Cheng Ming
Dai Jinghui
Jiang Li
Peng Chunhao
Yang Dapeng
Zhao Deyu
Publication venue
Publication date: 13/08/2023
Field of study

Visual feedback plays a crucial role in the process of amputation patients completing grasping in the field of prosthesis control. However, for blind and visually impaired (BVI) amputees, the loss of both visual and grasping abilities makes the "easy" reach-and-grasp task a feasible challenge. In this paper, we propose a novel multi-sensory prosthesis system helping BVI amputees with sensing, navigation and grasp operations. It combines modules of voice interaction, environmental perception, grasp guidance, collaborative control, and auditory/tactile feedback. In particular, the voice interaction module receives user instructions and invokes other functional modules according to the instructions. The environmental perception and grasp guidance module obtains environmental information through computer vision, and feedbacks the information to the user through auditory feedback modules (voice prompts and spatial sound sources) and tactile feedback modules (vibration stimulation). The prosthesis collaborative control module obtains the context information of the grasp guidance process and completes the collaborative control of grasp gestures and wrist angles of prosthesis in conjunction with the user's control intention in order to achieve stable grasp of various objects. This paper details a prototyping design (named viia-hand) and presents its preliminary experimental verification on healthy subjects completing specific reach-and-grasp tasks. Our results showed that, with the help of our new design, the subjects were able to achieve a precise reach and reliable grasp of the target objects in a relatively cluttered environment. Additionally, the system is extremely user-friendly, as users can quickly adapt to it with minimal training

arXiv.org e-Print Archive

Free-Form Composition Networks for Egocentric Action Recognition

Author: Cheng Qinghua
Ding Liang
Ling Haibin
Tao Dapeng
Wang Haoran
Yu Baosheng
Zhan Yibing
Publication venue
Publication date: 12/07/2023
Field of study

Egocentric action recognition is gaining significant attention in the field of human action recognition. In this paper, we address data scarcity issue in egocentric action recognition from a compositional generalization perspective. To tackle this problem, we propose a free-form composition network (FFCN) that can simultaneously learn disentangled verb, preposition, and noun representations, and then use them to compose new samples in the feature space for rare classes of action videos. First, we use a graph to capture the spatial-temporal relations among different hand/object instances in each action video. We thus decompose each action into a set of verb and preposition spatial-temporal representations using the edge features in the graph. The temporal decomposition extracts verb and preposition representations from different video frames, while the spatial decomposition adaptively learns verb and preposition representations from action-related instances in each frame. With these spatial-temporal representations of verbs and prepositions, we can compose new samples for those rare classes in a free-form manner, which is not restricted to a rigid form of a verb and a noun. The proposed FFCN can directly generate new training data samples for rare classes, hence significantly improve action recognition performance. We evaluated our method on three popular egocentric action recognition datasets, Something-Something V2, H2O, and EPIC-KITCHENS-100, and the experimental results demonstrate the effectiveness of the proposed method for handling data scarcity problems, including long-tailed and few-shot egocentric action recognition

arXiv.org e-Print Archive

Ultrafast Spin-To-Charge Conversion at the Surface of Topological Insulator Thin Films

Author: Battiato Marco
Boothroyd Chris B.
Chen Mengji
Cheng Liang
Chia Elbert E. M.
Lam Yeng Ming
Song Justin C. W.
Wang Xinbo
Wang Yi
Wu Yang
Yang Hyunsoo
Zhao Daming
Zhu Dapeng
Zhu Jian-Xin
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Strong spin-orbit coupling, resulting in the formation of spin-momentum-locked surface states, endows topological insulators with superior spin-to-charge conversion characteristics, though the dynamics that govern it have remained elusive. Here, we present an all-optical method that enables unprecedented tracking of the ultrafast dynamics of spin-to-charge conversion in a prototypical topological insulator Bi

_2

_3

/ferromagnetic Co heterostructure, down to the sub-picosecond timescale. Compared to pure Bi

_2

_3

or Co, we observe a giant terahertz emission in the heterostructure than originates from spin-to-charge conversion, in which the topological surface states play a crucial role. We identify a 0.12-picosecond timescale that sets a technological speed limit of spin-to-charge conversion processes in topological insulators. In addition, we show that the spin-to-charge conversion efficiency is temperature independent in Bi

_2

_3

as expected from the nature of the surface states, paving the way for designing next-generation high-speed opto-spintronic devices based on topological insulators at room temperature.Comment: 19 pages, 4 figure

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Size effect on the adsorption and dissociation of CO2 on Co nanoclusters

Author: Cao Dapeng
Cheng Daojian
Fisher Adrian
Johnston Roy
Yu Haiyan
Publication venue: 'Elsevier BV'
Publication date: 01/02/2017
Field of study

Crossref

University of Birmingham Research Portal

Oxygen-vacancy effect on structural, magnetic, and ferroelectric properties in multiferroic YMnO3 single crystals

Author: Chen Dapeng
Cheng Zhenxiang
Dou S. X.
Du Yi
Lin Zhi W
Wang Xiaolin
Xu Bo
Zhu Jian G
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2012
Field of study

We have investigated the structural, magnetic, and ferroelectric properties of magnetically frustrated multiferroic YMnO3 single crystals. The ferroelectric domain structures of YMnO3 samples were studied by piezoresponse force microscopy. Instead of domain vortex structure in stoichiometric crystals, YMnO3-delta exhibits a random domain configuration with straight domain walls. In magnetic measurements, the YMnO3-delta crystal shows typical antiferromagnetic behavior with higher Neel temperature and lower magnetization compared to the stoichiometric sample. The ordered oxygen vacancies dominate multiferroicity through tailoring the domain wall structure. (C) 2012 American Institute of Physics. [doi:10.1063/1.3676000

Crossref

OPUS - University of Technology Sydney

Research Online