Search CORE

4,764 research outputs found

Pedestrian Navigation using Artificial Neural Networks and Classical Filtering Techniques

Author: Ellis David J.
Publication venue: AFIT Scholar
Publication date: 01/03/2020
Field of study

The objective of this thesis is to explore the improvements achieved through using classical filtering methods with Artificial Neural Network (ANN) for pedestrian navigation techniques. ANN have been improving dramatically in their ability to approximate various functions. These neural network solutions have been able to surpass many classical navigation techniques. However, research using ANN to solve problems appears to be solely focused on the ability of neural networks alone. The combination of ANN with classical filtering methods has the potential to bring beneficial aspects of both techniques to increase accuracy in many different applications. Pedestrian navigation is used as a medium to explore this process using a localization and a Pedestrian Dead Reckoning (PDR) approach. Pedestrian navigation is primarily dominated by Global Positioning System (GPS) based navigation methods, but urban and indoor environments pose difficulties for using GPS for navigation. A novel urban data set is created for testing various localization and PDR based pedestrian navigation solutions. Cell phone data is collected including images, accelerometer, gyroscope, and magnetometer data to train the ANN. The ANN methods are explored first trying to achieve a low root mean square error (RMSE) of the predicted and original trajectory. After analyzing the localization and PDR solutions they are combined into an extended Kalman Filter (EKF) to achieve a 20% reduction in the RMSE. This takes the best localization results of 35m combined with underperforming PDR solution with a 171m RMSE to create an EKF solution of 28m of a one hour test collect

AFTI Scholar (Air Force Institute of Technology)

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation

Author: Bian Weikang
Cheung Ka Chun
Dai Jifeng
Huang Zhaoyang
Li Dasong
Li Hongsheng
Qin Hongwei
See Simon
Shi Xiaoyu
Zhang Manyuan
Publication venue
Publication date: 20/08/2023
Field of study

We introduce VideoFlow, a novel optical flow estimation framework for videos. In contrast to previous methods that learn to estimate optical flow from two frames, VideoFlow concurrently estimates bi-directional optical flows for multiple frames that are available in videos by sufficiently exploiting temporal cues. We first propose a TRi-frame Optical Flow (TROF) module that estimates bi-directional optical flows for the center frame in a three-frame manner. The information of the frame triplet is iteratively fused onto the center frame. To extend TROF for handling more frames, we further propose a MOtion Propagation (MOP) module that bridges multiple TROFs and propagates motion features between adjacent TROFs. With the iterative flow estimation refinement, the information fused in individual TROFs can be propagated into the whole sequence via MOP. By effectively exploiting video information, VideoFlow presents extraordinary performance, ranking 1st on all public benchmarks. On the Sintel benchmark, VideoFlow achieves 1.649 and 0.991 average end-point-error (AEPE) on the final and clean passes, a 15.1% and 7.6% error reduction from the best-published results (1.943 and 1.073 from FlowFormer++). On the KITTI-2015 benchmark, VideoFlow achieves an F1-all error of 3.65%, a 19.2% error reduction from the best-published result (4.52% from FlowFormer++). Code is released at \url{https://github.com/XiaoyuShi97/VideoFlow}

arXiv.org e-Print Archive

Image sequence restoration by median filtering

Author: Jackson Shawn R.
Publication venue: RIT Scholar Works
Publication date: 01/01/2004
Field of study

Median filters are non-linear filters that fit in the generic category of order-statistic filters. Median filters are widely used for reducing random defects, commonly characterized by impulse or salt and pepper noise in a single image. Motion estimation is the process of estimating the displacement vector between like pixels in the current frame and the reference frame. When dealing with a motion sequence, the motion vectors are the key for operating on corresponding pixels in several frames. This work explores the use of various motion estimation algorithms in combination with various median filter algorithms to provide noise suppression. The results are compared using two sets of metrics: performance-based and objective image quality-based. These results are used to determine the best motion estimation / median filter combination for image sequence restoration. The primary goals of this work are to implement a motion estimation and median filter algorithm in hardware and develop and benchmark a flexible software alternative restoration process. There are two unique median filter algorithms to this work. The first filter is a modification to a single frame adaptive median filter. The modification applied motion compensation and temporal concepts. The other is an adaptive extension to the multi-level (ML3D) filter, called adaptive multi-level (AML3D) filter. The extension provides adaptable filter window sizes to the multiple filter sets that comprise the ML3D filter. The adaptive median filter is capable of filtering an image in 26.88 seconds per frame and results in a PSNR improvement of 5.452dB. The AML3D is capable of filtering an image in 14.73 seconds per frame and results in a PSNR improvement of 6.273dB. The AML3D is a suitable alternative to the other median filters

RIT Scholar Works

Collection and Analysis of Driving Videos Based on Traffic Participants

Author: Yao Yu
Publication venue
Publication date: 01/01/2021
Field of study

Autonomous vehicle (AV) prototypes have been deployed in increasingly varied environments in recent years. An AV must be able to reliably detect and predict the future motion of traffic participants to maintain safe operation based on data collected from high-quality onboard sensors. Sensors such as camera and LiDAR generate high-bandwidth data that requires substantial computational and memory resources. To address these AV challenges, this thesis investigates three related problems: 1) What will the observed traffic participants do? 2) Is an anomalous traffic event likely to happen in near future? and 3) How should we collect fleet-wide high-bandwidth data based on 1) and 2) over the long-term? The first problem is addressed with future traffic trajectory and pedestrian behavior prediction. We propose a future object localization (FOL) method for trajectory prediction in first person videos (FPV). FOL encodes heterogeneous observations including bounding boxes, optical flow features and ego camera motions with multi-stream recurrent neural networks (RNN) to predict future trajectories. Because FOL does not consider multi-modal future trajectories, its accuracy suffers from accumulated RNN prediction error. We then introduce BiTraP, a goal-conditioned bidirectional multi-modal trajectory prediction method. BiTraP estimates multi-modal trajectories and uses a novel bi-directional decoder and loss to improve longer-term trajectory prediction accuracy. We show that different choices of non-parametric versus parametric target models directly influence predicted multi-modal trajectory distributions. Experiments with two FPV and six bird's-eye view (BEV) datasets show the effectiveness of our methods compared to state-of-the-art. We define pedestrian behavior prediction as a combination of action and intent. We hypothesize that current and future actions are strong intent priors and propose a multi-task learning RNN encoder-decoder network to detect and predict future pedestrian actions and street crossing intent. Experimental results show that one task helps the other so they together achieve state-of-the-art performance on published datasets. To identify likely traffic anomaly events, we introduce an unsupervised video anomaly detection (VAD) method based on trajectories. We predict locations of traffic participants over a near-term future horizon and monitor accuracy and consistency of these predictions as evidence of an anomaly. Inconsistent predictions tend to indicate an anomaly has happened or is about to occur. A supervised video action recognition method can then be applied to classify detected anomalies. We introduce a spatial-temporal area under curve (STAUC) metric as a supplement to the existing area under curve (AUC) evaluation and show it captures how well a model detects temporal and spatial locations of anomalous events. Experimental results show the proposed method and consistency-based anomaly score are more robust to moving cameras than image generation based methods; our method achieves state-of-the-art performance over AUC and STAUC metrics. VAD and action recognition support event-of-interest (EOI) distinction from normal driving data. We introduce a Smart Black Box (SBB), an intelligent event data recorder, to prioritize EOI data in long-term driving. The SBB compresses high-bandwidth data based on EOI potential and on-board storage limits. The SBB is designed to prioritize newer and anomalous driving data and discard older and normal data. An optimal compression factor is selected based on the trade-off between data value and storage cost. Experiments in a traffic simulator and with real-world datasets show the efficiency and effectiveness of using a SBB to collect high-quality videos over long-term driving.PHDRoboticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168035/1/brianyao_1.pd

Deep Blue Documents at the University of Michigan