Search CORE

6,207 research outputs found

Appearance-and-Relation Networks for Video Classification

Author: Li Wei
Li Wen
Van Gool Luc
Wang Limin
Publication venue
Publication date: 06/05/2018
Field of study

Spatiotemporal feature learning in videos is a fundamental problem in computer vision. This paper presents a new architecture, termed as Appearance-and-Relation Network (ARTNet), to learn video representation in an end-to-end manner. ARTNets are constructed by stacking multiple generic building blocks, called as SMART, whose goal is to simultaneously model appearance and relation from RGB input in a separate and explicit manner. Specifically, SMART blocks decouple the spatiotemporal learning module into an appearance branch for spatial modeling and a relation branch for temporal modeling. The appearance branch is implemented based on the linear combination of pixels or filter responses in each frame, while the relation branch is designed based on the multiplicative interactions between pixels or filter responses across multiple frames. We perform experiments on three action recognition benchmarks: Kinetics, UCF101, and HMDB51, demonstrating that SMART blocks obtain an evident improvement over 3D convolutions for spatiotemporal feature learning. Under the same training setting, ARTNets achieve superior performance on these three datasets to the existing state-of-the-art methods.Comment: CVPR18 camera-ready version. Code & models available at https://github.com/wanglimin/ARTNe

arXiv.org e-Print Archive

Crossref

Re-ID done right: towards good practices for person re-identification

Author: Guangrong Li (664532)
Jiehong Zhang (5560661)
Ling Qiu (391806)
Shulan Fu (289000)
Shuyao Tang (5560658)
Tao Lang (5560655)
Wenqian Zhu (5560652)
Zongxiang Tang (289015)
Zujun Yang (664536)
Publication venue
Publication date: 01/01/2018
Field of study

Training a deep architecture using a ranking loss has become standard for the person re-identification task. Increasingly, these deep architectures include additional components that leverage part detections, attribute predictions, pose estimators and other auxiliary information, in order to more effectively localize and align discriminative image regions. In this paper we adopt a different approach and carefully design each component of a simple deep architecture and, critically, the strategy for training it effectively for person re-identification. We extensively evaluate each design choice, leading to a list of good practices for person re-identification. By following these practices, our approach outperforms the state of the art, including more complex methods with auxiliary components, by large margins on four benchmark datasets. We also provide a qualitative analysis of our trained representation which indicates that, while compact, it is able to capture information from localized and discriminative regions, in a manner akin to an implicit attention mechanism

arXiv.org e-Print Archive

Frontiers - Publisher Connector

FigShare