25 research outputs found

    How do Cross-View and Cross-Modal Alignment Affect Representations in Contrastive Learning?

    Full text link
    Various state-of-the-art self-supervised visual representation learning approaches take advantage of data from multiple sensors by aligning the feature representations across views and/or modalities. In this work, we investigate how aligning representations affects the visual features obtained from cross-view and cross-modal contrastive learning on images and point clouds. On five real-world datasets and on five tasks, we train and evaluate 108 models based on four pretraining variations. We find that cross-modal representation alignment discards complementary visual information, such as color and texture, and instead emphasizes redundant depth cues. The depth cues obtained from pretraining improve downstream depth prediction performance. Also overall, cross-modal alignment leads to more robust encoders than pre-training by cross-view alignment, especially on depth prediction, instance segmentation, and object detection

    Human Motion Trajectory Prediction: A Survey

    Full text link
    With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper provides a survey of human motion trajectory prediction. We review, analyze and structure a large selection of work from different communities and propose a taxonomy that categorizes existing methods based on the motion modeling approach and level of contextual information used. We provide an overview of the existing datasets and performance metrics. We discuss limitations of the state of the art and outline directions for further research.Comment: Submitted to the International Journal of Robotics Research (IJRR), 37 page

    Active Pedestrian Safety by Automatic Braking and Evasive Steering

    Full text link

    A bayesian, exemplar-based approach to hierarchical shape matching

    No full text
    Abstract—This paper presents a novel probabilistic approach to hierarchical, exemplar-based shape matching. No feature correspondence is needed among exemplars, just a suitable pairwise similarity measure. The approach uses a template tree to efficiently represent and match the variety of shape exemplars. The tree is generated offline by a bottom-up clustering approach using stochastic optimization. Online matching involves a simultaneous coarse-to-fine approach over the template tree and over the transformation parameters. The main contribution of this paper is a Bayesian model to estimate the a posteriori probability of the object class, after a certain match at a node of the tree. This model takes into account object scale and saliency and allows for a principled setting of the matching thresholds such that unpromising paths in the tree traversal process are eliminated early on. The proposed approach was tested in a variety of application domains. Here, results are presented on one of the more challenging domains: real-time pedestrian detection from a moving vehicle. A significant speed-up is obtained when comparing the proposed probabilistic matching approach with a manually tuned nonprobabilistic variant, both utilizing the same template tree structure. Index Terms—Hierarchical shape matching, chamfer distance, Bayesian models.

    Vision-based 3-D tracking of humans in action

    No full text
    The ability to recognize humans and their activities by vision is essential for future machines to interact intelligently and effortlessly with a human-inhabited environment. Some of the more promising applications are discussed. A prototype vision system is presented for the tracking of whole-body movement using multiple cameras. 3-D body pose is recovered at each time instant based on occluding contours. The pose-recovery problem is formulated as a search problem and entails finding the pose parameters of a graphical human model whose synthesized appearance is most similar to the actual appearance of the real human in the multi-view images. Hermite deformable contours are proposed as a tool for the 2-D contour tracking problem. The main contribution of this dissertation is that it demonstrates for the first time a set of techniques that allow accurate vision-based 3-D tracking of arbitrary whole-body movement without the use of markers

    Integrated Pedestrian Classification and Orientation Estimation

    No full text
    This paper presents a novel approach to single-frame pedestrian classification and orientation estimation. Unlike previous work which addressed classification and orientation separately with different models, our method involves a probabilistic framework to approach both in a unified fashion. We address both problems in terms of a set of view-related models which couple discriminative expert classifiers with sample-dependent priors, facilitating easy integration of other cues (e.g. motion, shape) in a Bayesian fashion. This mixture-of-experts formulation approximates the probability density of pedestrian orientation and scalesup to the use of multiple cameras. Experiments on large real-world data show a significant performance improvement in both pedestrian classification and orientation estimation of up to 50%, compared to stateof-the-art, using identical data and evaluation techniques. 1
    corecore