178,009 research outputs found

    Object-Aware Tracking and Mapping

    Get PDF
    Reasoning about geometric properties of digital cameras and optical physics enabled researchers to build methods that localise cameras in 3D space from a video stream, while – often simultaneously – constructing a model of the environment. Related techniques have evolved substantially since the 1980s, leading to increasingly accurate estimations. Traditionally, however, the quality of results is strongly affected by the presence of moving objects, incomplete data, or difficult surfaces – i.e. surfaces that are not Lambertian or lack texture. One insight of this work is that these problems can be addressed by going beyond geometrical and optical constraints, in favour of object level and semantic constraints. Incorporating specific types of prior knowledge in the inference process, such as motion or shape priors, leads to approaches with distinct advantages and disadvantages. After introducing relevant concepts in Chapter 1 and Chapter 2, methods for building object-centric maps in dynamic environments using motion priors are investigated in Chapter 5. Chapter 6 addresses the same problem as Chapter 5, but presents an approach which relies on semantic priors rather than motion cues. To fully exploit semantic information, Chapter 7 discusses the conditioning of shape representations on prior knowledge and the practical application to monocular, object-aware reconstruction systems

    Detection-aware multi-object tracking evaluation

    Full text link
    Master Universitario en Deep Learning for Audio and Video Signal ProcessingMulti-Object Tracking (MOT) is a hot topic in the computer vision field. It is a complex task that requires a detector, to identify objects, and a tracker, to follow them. It is useful for self-driving, surveillance and robot vision, between others, where research teams and companies are trying to improve their models. In order to determine which model performs better, they are scored using tracking metrics. In this thesis we experiment with MOT metrics aware of detection by using correlation matrices. By analyzing the results, we realize that tracking metrics incur in certain issues that prevent them for correctly reflecting tracking performance. The performance of the detector is relevant when scoring tracking models. The problem observed is that tracking metrics weigh differently elements that evaluate detection performance. Thus, improving one detector’s aspect with a high weight in the MOT metric will significantly improve the tracker’s score, but not necessarily indicating the amount of effort done by the tracker. That is, trackers are not evaluated in a balanced way. In order to solve this issue with the tracker scoring, we present a new multi-object tracking metric, based on the effort done by the tracker given a certain set of detections. This effort is calculated based on the improvement of bounding boxes over the ones given by the detector and the precision to keep the trace of the objects in a sequence. The metric has been tested for two widely employed datasets and shows us its reliability scoring tracking metrics. Also, it do not incur in the problem presented above

    Uncertainty-aware Unsupervised Multi-Object Tracking

    Full text link
    Without manually annotated identities, unsupervised multi-object trackers are inferior to learning reliable feature embeddings. It causes the similarity-based inter-frame association stage also be error-prone, where an uncertainty problem arises. The frame-by-frame accumulated uncertainty prevents trackers from learning the consistent feature embedding against time variation. To avoid this uncertainty problem, recent self-supervised techniques are adopted, whereas they failed to capture temporal relations. The interframe uncertainty still exists. In fact, this paper argues that though the uncertainty problem is inevitable, it is possible to leverage the uncertainty itself to improve the learned consistency in turn. Specifically, an uncertainty-based metric is developed to verify and rectify the risky associations. The resulting accurate pseudo-tracklets boost learning the feature consistency. And accurate tracklets can incorporate temporal information into spatial transformation. This paper proposes a tracklet-guided augmentation strategy to simulate tracklets' motion, which adopts a hierarchical uncertainty-based sampling mechanism for hard sample mining. The ultimate unsupervised MOT framework, namely U2MOT, is proven effective on MOT-Challenges and VisDrone-MOT benchmark. U2MOT achieves a SOTA performance among the published supervised and unsupervised trackers.Comment: Accepted by International Conference on Computer Vision (ICCV) 202

    Learning Background-Aware Correlation Filters for Visual Tracking

    Full text link
    Correlation Filters (CFs) have recently demonstrated excellent performance in terms of rapidly tracking objects under challenging photometric and geometric variations. The strength of the approach comes from its ability to efficiently learn - "on the fly" - how the object is changing over time. A fundamental drawback to CFs, however, is that the background of the object is not be modelled over time which can result in suboptimal results. In this paper we propose a Background-Aware CF that can model how both the foreground and background of the object varies over time. Our approach, like conventional CFs, is extremely computationally efficient - and extensive experiments over multiple tracking benchmarks demonstrate the superior accuracy and real-time performance of our method compared to the state-of-the-art trackers including those based on a deep learning paradigm

    Instance Flow Based Online Multiple Object Tracking

    Full text link
    We present a method to perform online Multiple Object Tracking (MOT) of known object categories in monocular video data. Current Tracking-by-Detection MOT approaches build on top of 2D bounding box detections. In contrast, we exploit state-of-the-art instance aware semantic segmentation techniques to compute 2D shape representations of target objects in each frame. We predict position and shape of segmented instances in subsequent frames by exploiting optical flow cues. We define an affinity matrix between instances of subsequent frames which reflects locality and visual similarity. The instance association is solved by applying the Hungarian method. We evaluate different configurations of our algorithm using the MOT 2D 2015 train dataset. The evaluation shows that our tracking approach is able to track objects with high relative motions. In addition, we provide results of our approach on the MOT 2D 2015 test set for comparison with previous works. We achieve a MOTA score of 32.1
    • …
    corecore