Search CORE

18,694 research outputs found

Action tube extraction based 3D-CNN for RGB-D action recognition

Author: Morros Rubió Josep Ramon
Vilaplana Besler Verónica
Xu Zhengyu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

In this paper we propose a novel action tube extractor for RGB-D action recognition in trimmed videos. The action tube extractor takes as input a video and outputs an action tube. The method consists of two parts: spatial tube extraction and temporal sampling. The first part is built upon MobileNet-SSD and its role is to define the spatial region where the action takes place. The second part is based on the structural similarity index (SSIM) and is designed to remove frames without obvious motion from the primary action tube. The final extracted action tube has two benefits: 1) a higher ratio of ROI (subjects of action) to background; 2) most frames contain obvious motion change. We propose to use a two-stream (RGB and Depth) I3D architecture as our 3D-CNN model. Our approach outperforms the state-of-the-art methods on the OA and NTU RGB-D datasets. © 2018 IEEE.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

What Can I Do Around Here? Deep Functional Scene Understanding for Cognitive Robots

Author: Aloimonos Yiannis
Fermuller Cornelia
Yang Yezhou
Ye Chengxi
Publication venue
Publication date: 02/02/2016
Field of study

For robots that have the capability to interact with the physical environment through their end effectors, understanding the surrounding scenes is not merely a task of image classification or object recognition. To perform actual tasks, it is critical for the robot to have a functional understanding of the visual scene. Here, we address the problem of localizing and recognition of functional areas from an arbitrary indoor scene, formulated as a two-stage deep learning based detection pipeline. A new scene functionality testing-bed, which is complied from two publicly available indoor scene datasets, is used for evaluation. Our method is evaluated quantitatively on the new dataset, demonstrating the ability to perform efficient recognition of functional areas from arbitrary indoor scenes. We also demonstrate that our detection model can be generalized onto novel indoor scenes by cross validating it with the images from two different datasets

arXiv.org e-Print Archive

Crossref