22,213 research outputs found

    Scalable methods for single and multi camera trajectory forecasting

    Get PDF
    Predicting the future trajectory of objects in video is a critical task within computer vision with numerous application domains. For example, reliable anticipation of pedestrian trajectory is imperative for the operation of intelligent vehicles and can significantly enhance the functionality of advanced driver assistance systems. Trajectory forecasting can also enable more accurate tracking of objects in video, particularly if the objects are not always visible, such as during occlusion or entering a blind spot in a non-overlapping multicamera network. However, due to the considerable human labour required to manually annotate data amenable to trajectory forecasting, the scale and variety of existing datasets used to study the problem is limited. In this thesis, we propose a set of strategies for pedestrian trajectory forecasting. We address the lack of training data by introducing a scalable machine annotation scheme that enables models to be trained using a large Single-Camera Trajectory Forecasting (SCTF) dataset without human annotation. Using newly collected datasets annotated using our proposed methods, we develop two models for SCTF. The first model, Dynamic Trajectory Predictor (DTP), forecasts pedestrian trajectory from on board a moving vehicle up to one second into the future. DTP is trained using both human and machine-annotated data and anticipates dynamic motion that linear models do not capture. Our second model, Spatio-Temporal Encoder-Decoder (STED), predicts full object bounding boxes in addition to trajectory. STED combines visual and temporal features to model both object-motion and ego-motion. In addition to our SCTF contributions, we also introduce a new task: Multi-Camera Trajectory Forecasting (MCTF), where the future trajectory of an object is predicted in a network of cameras. Prior works consider forecasting trajectories in a single camera view. Our work is the first to consider the challenging scenario of forecasting across multiple non-overlapping camera views. This has wide applicability in tasks such as re-identification and multitarget multi-camera tracking. To facilitate research in this new area, we collect a unique dataset of multi-camera pedestrian trajectories from a network of 15 synchronized cameras. We also develop a semi-automated annotation method to accurately label this large dataset containing 600 hours of video footage. We introduce an MCTF framework that simultaneously uses all estimated relative object locations from several camera viewpoints and predicts the object's future location in all possible camera viewpoints. Our framework follows a Which- When-Where approach that predicts in which camera(s) the objects appear and when and where within the camera views they appear. Experimental results demonstrate the effectiveness of our MCTF model, which outperforms existing SCTF approaches adapted to the MCTF framework

    Hybrid Focal Stereo Networks for Pattern Analysis in Homogeneous Scenes

    Full text link
    In this paper we address the problem of multiple camera calibration in the presence of a homogeneous scene, and without the possibility of employing calibration object based methods. The proposed solution exploits salient features present in a larger field of view, but instead of employing active vision we replace the cameras with stereo rigs featuring a long focal analysis camera, as well as a short focal registration camera. Thus, we are able to propose an accurate solution which does not require intrinsic variation models as in the case of zooming cameras. Moreover, the availability of the two views simultaneously in each rig allows for pose re-estimation between rigs as often as necessary. The algorithm has been successfully validated in an indoor setting, as well as on a difficult scene featuring a highly dense pilgrim crowd in Makkah.Comment: 13 pages, 6 figures, submitted to Machine Vision and Application

    Towards automated visual surveillance using gait for identity recognition and tracking across multiple non-intersecting cameras

    No full text
    Despite the fact that personal privacy has become a major concern, surveillance technology is now becoming ubiquitous in modern society. This is mainly due to the increasing number of crimes as well as the essential necessity to provide secure and safer environment. Recent research studies have confirmed now the possibility of recognizing people by the way they walk i.e. gait. The aim of this research study is to investigate the use of gait for people detection as well as identification across different cameras. We present a new approach for people tracking and identification between different non-intersecting un-calibrated stationary cameras based on gait analysis. A vision-based markerless extraction method is being deployed for the derivation of gait kinematics as well as anthropometric measurements in order to produce a gait signature. The novelty of our approach is motivated by the recent research in biometrics and forensic analysis using gait. The experimental results affirmed the robustness of our approach to successfully detect walking people as well as its potency to extract gait features for different camera viewpoints achieving an identity recognition rate of 73.6 % processed for 2270 video sequences. Furthermore, experimental results confirmed the potential of the proposed method for identity tracking in real surveillance systems to recognize walking individuals across different views with an average recognition rate of 92.5 % for cross-camera matching for two different non-overlapping views.<br/
    • 

    corecore