Search CORE

23,139 research outputs found

Going Deeper into First-Person Activity Recognition

Author: Fan Haoqi
Kitani Kris M.
Ma Minghuang
Publication venue
Publication date: 12/05/2016
Field of study

We bring together ideas from recent work on feature design for egocentric action recognition under one framework by exploring the use of deep convolutional neural networks (CNN). Recent work has shown that features such as hand appearance, object attributes, local hand motion and camera ego-motion are important for characterizing first-person actions. To integrate these ideas under one framework, we propose a twin stream network architecture, where one stream analyzes appearance information and the other stream analyzes motion information. Our appearance stream encodes prior knowledge of the egocentric paradigm by explicitly training the network to segment hands and localize objects. By visualizing certain neuron activation of our network, we show that our proposed architecture naturally learns features that capture object attributes and hand-object configurations. Our extensive experiments on benchmark egocentric action datasets show that our deep architecture enables recognition rates that significantly outperform state-of-the-art techniques -- an average

6.6\%

increase in accuracy over all datasets. Furthermore, by learning to recognize objects, actions and activities jointly, the performance of individual recognition tasks also increase by

30\%

(actions) and

14\%

(objects). We also include the results of extensive ablative analysis to highlight the importance of network design decisions.

arXiv.org e-Print Archive

Crossref

Drive Video Analysis for the Detection of Traffic Near-Miss Incidents

Author: Kataoka Hirokatsu
Matsui Yasuhiro
Oikawa Shoko
Satoh Yutaka
Suzuki Teppei
Publication venue
Publication date: 06/04/2018
Field of study

Because of their recent introduction, self-driving cars and advanced driver assistance system (ADAS) equipped vehicles have had little opportunity to learn, the dangerous traffic (including near-miss incident) scenarios that provide normal drivers with strong motivation to drive safely. Accordingly, as a means of providing learning depth, this paper presents a novel traffic database that contains information on a large number of traffic near-miss incidents that were obtained by mounting driving recorders in more than 100 taxis over the course of a decade. The study makes the following two main contributions: (i) In order to assist automated systems in detecting near-miss incidents based on database instances, we created a large-scale traffic near-miss incident database (NIDB) that consists of video clip of dangerous events captured by monocular driving recorders. (ii) To illustrate the applicability of NIDB traffic near-miss incidents, we provide two primary database-related improvements: parameter fine-tuning using various near-miss scenes from NIDB, and foreground/background separation into motion representation. Then, using our new database in conjunction with a monocular driving recorder, we developed a near-miss recognition method that provides automated systems with a performance level that is comparable to a human-level understanding of near-miss incidents (64.5% vs. 68.4% at near-miss recognition, 61.3% vs. 78.7% at near-miss detection).Comment: Accepted to ICRA 201

arXiv.org e-Print Archive

Crossref

Scipedia

Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks

Author: Gao Zhimin
Li Wanqing
Liu Song
Ogunbona Philip
Wang Pichao
Zhang Yuyao
Publication venue
Publication date: 01/01/2016
Field of study

This paper addresses the problem of continuous gesture recognition from sequences of depth maps using convolutional neutral networks (ConvNets). The proposed method first segments individual gestures from a depth sequence based on quantity of movement (QOM). For each segmented gesture, an Improved Depth Motion Map (IDMM), which converts the depth sequence into one image, is constructed and fed to a ConvNet for recognition. The IDMM effectively encodes both spatial and temporal information and allows the fine-tuning with existing ConvNet models for classification without introducing millions of parameters to learn. The proposed method is evaluated on the Large-scale Continuous Gesture Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved the performance of 0.2655 (Mean Jaccard Index) and ranked

3^{rd}

place in this challenge

arXiv.org e-Print Archive

Crossref

Research Online