26 research outputs found

    Large scale joint semantic re-localisation and scene understanding via globally unique instance coordinate regression

    Get PDF
    In this work we present a novel approach to joint semantic localisation and scene understanding. Our work is motivated by the need for localisation algorithms which not only predict 6-DoF camera pose but also simultaneously recognise surrounding objects and estimate 3D geometry. Such capabilities are crucial for computer vision guided systems which interact with the environment: autonomous driving, augmented reality and robotics. In particular, we propose a two step procedure. During the first step we train a convolutional neural network to jointly predict per-pixel globally unique instance labels and corresponding local coordinates for each instance of a static object (e.g. a building). During the second step we obtain scene coordinates by combining object center coordinates and local coordinates and use them to perform 6-DoF camera pose estimation. We evaluate our approach on real world (CamVid-360) and artificial (SceneCity) autonomous driving datasets. We obtain smaller mean distance and angular errors than state-of-the-art 6-DoF pose estimation algorithms based on direct pose regression and pose estimation from scene coordinates on all datasets. Our contributions include: (i) a novel formulation of scene coordinate regression as two separate tasks of object instance recognition and local coordinate regression and a demonstration that our proposed solution allows to predict accurate 3D geometry of static objects and estimate 6-DoF pose of camera on (ii) maps larger by several orders of magnitude than previously attempted by scene coordinate regression methods, as well as on (iii) lightweight, approximate 3D maps built from 3D primitives such as building-aligned cuboids.Toyota Corporatio

    Non-Causal Tracking by Deblatting

    Full text link
    Tracking by Deblatting stands for solving an inverse problem of deblurring and image matting for tracking motion-blurred objects. We propose non-causal Tracking by Deblatting which estimates continuous, complete and accurate object trajectories. Energy minimization by dynamic programming is used to detect abrupt changes of motion, called bounces. High-order polynomials are fitted to segments, which are parts of the trajectory separated by bounces. The output is a continuous trajectory function which assigns location for every real-valued time stamp from zero to the number of frames. Additionally, we show that from the trajectory function precise physical calculations are possible, such as radius, gravity or sub-frame object velocity. Velocity estimation is compared to the high-speed camera measurements and radars. Results show high performance of the proposed method in terms of Trajectory-IoU, recall and velocity estimation.Comment: Published at GCPR 2019, oral presentation, Best Paper Honorable Mention Awar

    Efficient Large-Scale Semantic Visual Localization in 2D Maps

    No full text
    With the emergence of autonomous navigation systems, image-based localization is one of the essential tasks to be tackled. However, most of the current algorithms struggle to scale to city-size environments mainly because of the need to collect large (semi-)annotated datasets for CNN training and create databases for test environment of images, key-point level features or image embeddings. This data acquisition is not only expensive and time-consuming but also may cause privacy concerns. In this work, we propose a novel framework for semantic visual localization in city-scale environments which alleviates the aforementioned problem by using freely available 2D maps such as OpenStreetMap. Our method does not require any images or image-map pairs for training or test environment database collection. Instead, a robust embedding is learned from a depth and building instance label information of a particular location in the 2D map. At test time, this embedding is extracted from a panoramic building instance label and depth images. It is then used to retrieve the closest match in the database. We evaluate our localization framework on two large-scale datasets consisting of Cambridge and San Francisco cities with a total length of drivable roads spanning 500 km and including approximately 110k unique locations. To the best of our knowledge, this is the first large-scale semantic localization method which works on par with approaches that require the availability of images at train time or for test environment database creation

    Online adaptive hidden Markov model for multi-tracker fusion

    No full text
    In this paper, we propose a novel method for visual object tracking called HMMTxD. The method fuses observations from complementary out-of-the box trackers and a detector by utilizing a hidden Markov model whose latent states correspond to a binary vector expressing the failure of individual trackers. The Markov model is trained in an unsupervised way, relying on an online learned detector to provide a source of tracker-independent information for a modified Baum- Welch algorithm that updates the model w.r.t. the partially annotated data. We show the effectiveness of the proposed method on combination of two and three tracking algorithms. The performance of HMMTxD is evaluated on two standard benchmarks (CVPR2013 and VOT) and on a rich collection of 77 publicly available sequences. The HMMTxD outperforms the state-of-the-art, often significantly, on all data-sets in almost all criteria

    Robust scale-adaptive mean-shift for tracking

    No full text
    Mean-Shift tracking is a popular algorithm for object tracking since it is easy to implement and it is fast and robust. In this paper, we address the problem of scale adaptation of the Hellinger distance based Mean-Shift tracker. We start from a theoretical derivation of scale estimation in the Mean-Shift framework. To make the scale estimation robust and suitable for tracking, we introduce regularization terms that counter two major problem: (i) scale expansion caused by background clutter and (ii) scale implosion on self-similar objects. To further robustify the scale estimate, it is validated by a forward-backward consistency check. The proposed Mean-shift tracker with scale selection is compared with recent state-of-the-art algorithms on a dataset of 48 public color sequences and it achieved excellent results. © 2013 Springer-Verlag

    Large scale joint semantic re-localisation and scene understanding via globally unique instance coordinate regression

    No full text
    © 2019. The copyright of this document resides with its authors. In this work we present a novel approach to joint semantic localisation and scene understanding. Our work is motivated by the need for localisation algorithms which not only predict 6-DoF camera pose but also simultaneously recognise surrounding objects and estimate 3D geometry. Such capabilities are crucial for computer vision guided systems which interact with the environment: autonomous driving, augmented reality and robotics. In particular, we propose a two step procedure. During the first step we train a convolutional neural network to jointly predict per-pixel globally unique instance labels [7] and corresponding local coordinates for each instance of a static object (e.g. a building). During the second step we obtain scene coordinates [32] by combining object center coordinates and local coordinates and use them to perform 6-DoF camera pose estimation. We evaluate our approach on real world (CamVid-360) and artificial (SceneCity) autonomous driving datasets [7]. We obtain smaller mean distance and angular errors than state-of-the-art 6-DoF pose estimation algorithms based on direct pose regression [14, 15] and pose estimation from scene coordinates [3] on all datasets. Our contributions include: (i) a novel formulation of scene coordinate regression as two separate tasks of object instance recognition and local coordinate regression and a demonstration that our proposed solution allows to predict accurate 3D geometry of static objects and estimate 6-DoF pose of camera on (ii) maps larger by several orders of magnitude than previously attempted by scene coordinate regression methods [3, 4, 20, 32], as well as on (iii) lightweight, approximate 3D maps built from 3D primitives such as building-aligned cuboids

    A system for real-time detection and tracking of vehicles from a single car-mounted camera

    No full text
    A novel system for detection and tracking of vehicles from a single car-mounted camera is presented. The core of the system are high-performance vision algorithms: the WaldBoost detector [1] and the TLD tracker [2] that are scheduled so that a real-time performance is achieved. The vehicle monitoring system is evaluated on a new dataset collected on Italian motorways which is provided with approximate ground truth (GT0) obtained from laser scans. For a wide range of distances, the recall and precision of detection for cars are excellent. Statistics for trucks are also reported. The dataset with the ground truth is made public. © 2012 IEEE

    A Novel Performance Evaluation Methodology for Single-Target Trackers

    Get PDF
    This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed

    MS3D: mean-shift object tracking boosted by joint back projection of color and depth

    No full text
    In this paper, we present MS3D tracker, which extends the mean-shift tracking algorithm in several ways when RGB-D data is available. We fuse color and depth distribution efficiently in the mean-shift tracking scheme. In addition, in order to improve the robustness of the description of the object to be tracked, we further process the pixels in the rectangular region of interest (ROI) returned by mean-shift. We apply depth distribution analysis to pixels of the ROI in order to separate background pixels from pixels belonging to the object to be tracked (i.e. the target region). Then, we use the color histogram of the target region and its surroundings to create a discriminative color model, which has the capability to distinguish the object from background. The proposed algorithm is evaluated on the RGB-D tracking dataset proposed by [1]. It ranked in the first position and it runs in real-time showing both accuracy and robustness in the challenge sequences of background clutter, occlusion, scale variation and shape deformation
    corecore