Search CORE

8 research outputs found

Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems

Author: Atkins Ella M.
Choi Chiho
Crandall David J.
Dariush Behzad
Xu Mingze
Yao Yu
Publication venue
Publication date: 18/09/2018
Field of study

Predicting the future location of vehicles is essential for safety-critical applications such as advanced driver assistance systems (ADAS) and autonomous driving. This paper introduces a novel approach to simultaneously predict both the location and scale of target vehicles in the first-person (egocentric) view of an ego-vehicle. We present a multi-stream recurrent neural network (RNN) encoder-decoder model that separately captures both object location and scale and pixel-level observations for future vehicle localization. We show that incorporating dense optical flow improves prediction results significantly since it captures information about motion as well as appearance change. We also find that explicitly modeling future motion of the ego-vehicle improves the prediction accuracy, which could be especially beneficial in intelligent and automated vehicles that have motion planning capability. To evaluate the performance of our approach, we present a new dataset of first-person videos collected from a variety of scenarios at road intersections, which are particularly challenging moments for prediction because vehicle trajectories are diverse and dynamic.Comment: To appear on ICRA 201

arXiv.org e-Print Archive

Crossref

Scipedia

Understanding First-Person and Third-Person Videos in Computer Vision

Author: Bedekar Mangesh
Girase Sheetal
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2023
Field of study

Due to advancements in technology and social media, a large amount of visual information is created. There is a lot of interesting research going on in Computer Vision that takes into consideration either visual information generated by first-person (egocentric) or third-person(exocentric) cameras. Video data generated by YouTubers, Surveillance cameras, and Drones which is referred to as third-person or exocentric video data. Whereas first-person or egocentric is the one which is generated by GoPro cameras and Google Glass. Exocentric view capture wide and global views whereas egocentric view capture activities an actor is involved in w.r.t. objects. These two perspectives seem to be independent yet related. In Computer Vision, these two perspectives have been studied by various domains like Activity Recognition, Object Detection, Action Recognition, and Summarization independently. Their relationship and comparison are less discussed in the literature. This paper tries to bridge this gap by presenting a systematic study of first-person and third-person videos. Further, we implemented an algorithm to classify videos as first-person/third-person with the validation accuracy of 88.4% and an F1-score of 86.10% using the Charades dataset.

International Journal on Recent and Innovation Trends in Computing and Communication

Unsupervised Traffic Accident Detection in First-Person Videos

Author: Atkins Ella M.
Crandall David J.
Wang Yuchen
Xu Mingze
Yao Yu
Publication venue
Publication date: 28/02/2019
Field of study

Recognizing abnormal events such as traffic violations and accidents in natural driving scenes is essential for successful autonomous driving and advanced driver assistance systems. However, most work on video anomaly detection suffers from two crucial drawbacks. First, they assume cameras are fixed and videos have static backgrounds, which is reasonable for surveillance applications but not for vehicle-mounted cameras. Second, they pose the problem as one-class classification, relying on arduously hand-labeled training datasets that limit recognition to anomaly categories that have been explicitly trained. This paper proposes an unsupervised approach for traffic accident detection in first-person (dashboard-mounted camera) videos. Our major novelty is to detect anomalies by predicting the future locations of traffic participants and then monitoring the prediction accuracy and consistency metrics with three different strategies. We evaluate our approach using a new dataset of diverse traffic accidents, AnAn Accident Detection (A3D), as well as another publicly-available dataset. Experimental results show that our approach outperforms the state-of-the-art.Comment: Accepted to IROS 201

arXiv.org e-Print Archive

Crossref

Scipedia

Collection and Analysis of Driving Videos Based on Traffic Participants

Author: Yao Yu
Publication venue
Publication date: 01/01/2021
Field of study

Autonomous vehicle (AV) prototypes have been deployed in increasingly varied environments in recent years. An AV must be able to reliably detect and predict the future motion of traffic participants to maintain safe operation based on data collected from high-quality onboard sensors. Sensors such as camera and LiDAR generate high-bandwidth data that requires substantial computational and memory resources. To address these AV challenges, this thesis investigates three related problems: 1) What will the observed traffic participants do? 2) Is an anomalous traffic event likely to happen in near future? and 3) How should we collect fleet-wide high-bandwidth data based on 1) and 2) over the long-term? The first problem is addressed with future traffic trajectory and pedestrian behavior prediction. We propose a future object localization (FOL) method for trajectory prediction in first person videos (FPV). FOL encodes heterogeneous observations including bounding boxes, optical flow features and ego camera motions with multi-stream recurrent neural networks (RNN) to predict future trajectories. Because FOL does not consider multi-modal future trajectories, its accuracy suffers from accumulated RNN prediction error. We then introduce BiTraP, a goal-conditioned bidirectional multi-modal trajectory prediction method. BiTraP estimates multi-modal trajectories and uses a novel bi-directional decoder and loss to improve longer-term trajectory prediction accuracy. We show that different choices of non-parametric versus parametric target models directly influence predicted multi-modal trajectory distributions. Experiments with two FPV and six bird's-eye view (BEV) datasets show the effectiveness of our methods compared to state-of-the-art. We define pedestrian behavior prediction as a combination of action and intent. We hypothesize that current and future actions are strong intent priors and propose a multi-task learning RNN encoder-decoder network to detect and predict future pedestrian actions and street crossing intent. Experimental results show that one task helps the other so they together achieve state-of-the-art performance on published datasets. To identify likely traffic anomaly events, we introduce an unsupervised video anomaly detection (VAD) method based on trajectories. We predict locations of traffic participants over a near-term future horizon and monitor accuracy and consistency of these predictions as evidence of an anomaly. Inconsistent predictions tend to indicate an anomaly has happened or is about to occur. A supervised video action recognition method can then be applied to classify detected anomalies. We introduce a spatial-temporal area under curve (STAUC) metric as a supplement to the existing area under curve (AUC) evaluation and show it captures how well a model detects temporal and spatial locations of anomalous events. Experimental results show the proposed method and consistency-based anomaly score are more robust to moving cameras than image generation based methods; our method achieves state-of-the-art performance over AUC and STAUC metrics. VAD and action recognition support event-of-interest (EOI) distinction from normal driving data. We introduce a Smart Black Box (SBB), an intelligent event data recorder, to prioritize EOI data in long-term driving. The SBB compresses high-bandwidth data based on EOI potential and on-board storage limits. The SBB is designed to prioritize newer and anomalous driving data and discard older and normal data. An optimal compression factor is selected based on the trade-off between data value and storage cost. Experiments in a traffic simulator and with real-world datasets show the efficiency and effectiveness of using a SBB to collect high-quality videos over long-term driving.PHDRoboticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168035/1/brianyao_1.pd

Deep Blue Documents at the University of Michigan