Search CORE

8 research outputs found

Fine-graind Image Classification via Combining Vision and Language

Author: He Xiangteng
Peng Yuxin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/05/2017
Field of study

Fine-grained image classification is a challenging task due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Most existing fine-grained image classification methods generally learn part detection models to obtain the semantic parts for better classification accuracy. Despite achieving promising results, these methods mainly have two limitations: (1) not all the parts which obtained through the part detection models are beneficial and indispensable for classification, and (2) fine-grained image classification requires more detailed visual descriptions which could not be provided by the part locations or attribute annotations. For addressing the above two limitations, this paper proposes the two-stream model combining vision and language (CVL) for learning latent semantic representations. The vision stream learns deep representations from the original visual information via deep convolutional neural network. The language stream utilizes the natural language descriptions which could point out the discriminative parts or characteristics for each image, and provides a flexible and compact way of encoding the salient visual aspects for distinguishing sub-categories. Since the two streams are complementary, combining the two streams can further achieves better classification accuracy. Comparing with 12 state-of-the-art methods on the widely used CUB-200-2011 dataset for fine-grained image classification, the experimental results demonstrate our CVL approach achieves the best performance.Comment: 9 pages, to appear in CVPR 201

arXiv.org e-Print Archive

Crossref

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

Author: He Xiangteng
Liu Jing
Peng Yuxin
Wang Peng
Wu Peng
Zhang Yanning
Publication venue
Publication date: 24/07/2023
Field of study

Video anomaly detection (VAD) has been paid increasing attention due to its potential applications, its current dominant tasks focus on online detecting anomalies% at the frame level, which can be roughly interpreted as the binary or multiple event classification. However, such a setup that builds relationships between complicated anomalous events and single labels, e.g., ``vandalism'', is superficial, since single labels are deficient to characterize anomalous events. In reality, users tend to search a specific video rather than a series of approximate videos. Therefore, retrieving anomalous events using detailed descriptions is practical and positive but few researches focus on this. In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e.g., language descriptions and synchronous audios. Unlike the current video retrieval where videos are assumed to be temporally well-trimmed with short duration, VAR is devised to retrieve long untrimmed videos which may be partially relevant to the given query. To achieve this, we present two large-scale VAR benchmarks, UCFCrime-AR and XDViolence-AR, constructed on top of prevalent anomaly datasets. Meanwhile, we design a model called Anomaly-Led Alignment Network (ALAN) for VAR. In ALAN, we propose an anomaly-led sampling to focus on key segments in long untrimmed videos. Then, we introduce an efficient pretext task to enhance semantic associations between video-text fine-grained representations. Besides, we leverage two complementary alignments to further match cross-modal contents. Experimental results on two benchmarks reveal the challenges of VAR task and also demonstrate the advantages of our tailored method.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

Object-Part Attention Model for Fine-Grained Image Classification

Author: Junjie Zhao
Xiangteng He
Yuxin Peng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Source Characterization of Some Collapse Earthquakes due to Mining Activities in Shandong and Beijing, North China

Author: Abers
Alvizuri
Chai
Chen
D'Amico
David A. Yuen
Dreger
Dreger
Dreger
Fletcher
Ford
Ford
Ge
Han
Hardebeck
He
Herrmann
Herrmann
Hu
Huang
Hudson
Jia
Jia
Jost
Kang
Knopoff
Li
Li
Li
Li
Lin
Lin
Lü
Meng
Pechmann
Pechmann
Qi
Qiao
Ross
Shuofan Wang
Solomon
Tan
Tong
Wan
Wang
Wang
Wei
Wu
Xiangteng Wang
Xu
Yang
Yang
Yang
Yao
Yibing Dong
Zhang
Zhang
Zhang
Zhao
Zheng
Zheng
Zheng
Zhiwei Li
Zhu
Zhu
Zhu
Šílený
Publication venue: 'Seismological Society of America (SSA)'
Publication date
Field of study

Crossref