228,374 research outputs found

    Online learning and fusion of orientation appearance models for robust rigid object tracking

    Get PDF
    We present a robust framework for learning and fusing different modalities for rigid object tracking. Our method fuses data obtained from a standard visual camera and dense depth maps obtained by low-cost consumer depths cameras such as the Kinect. To combine these two completely different modalities, we propose to use features that do not depend on the data representation: angles. More specifically, our method combines image gradient orientations as extracted from intensity images with the directions of surface normal computed from dense depth fields provided by the Kinect. To incorporate these features in a learning framework, we use a robust kernel based on the Euler representation of angles. This kernel enables us to cope with gross measurement errors, missing data as well as typical problems in visual tracking such as illumination changes and occlusions. Additionally, the employed kernel can be efficiently implemented online. Finally, we propose to capture the correlations between the obtained orientation appearance models using a fusion approach motivated by the original AAM. Thus the proposed learning and fusing framework is robust, exact, computationally efficient and does not require off-line training. By combining the proposed models with a particle filter, the proposed tracking framework achieved robust performance in very difficult tracking scenarios including extreme pose variations

    Ground Vehicle Navigation with Depth Camera and Tracking Camera

    Get PDF
    The aim of this research is to provide autonomous navigation of a 4 wheel vehicle using commercial, off-the-shelf depth and tracking cameras. Some sensitive operations need accuracy within a few inches of navigation ability for indoor or outdoor scenarios where GPS signals are not available. Combination of the Visual Odometry (VO), Distance-Depth (D-D), and Object Detection data from the cameras can be used for accurate navigation and object avoidance. The Intel RealSense D435i, a depth camera, generates depth measurements and the relative position vector of an object. The Intel RealSense T265, a tracking camera, generates its own coordinate system and grabs coordinate goals. Both of them can generate Simultaneous Localization and Mapping (SLAM) data. The cameras share their data to provide a more robust capability. Combining the Intel cameras with a Pixhawk autopilot, it was demonstrated that the vehicle can follow a desired path and avoid objects along that path

    Online learning and fusion of orientation appearance models for robust rigid object tracking

    Get PDF
    We introduce a robust framework for learning and fusing of orientation appearance models based on both texture and depth information for rigid object tracking. Our framework fuses data obtained from a standard visual camera and dense depth maps obtained by low-cost consumer depth cameras such as the Kinect. To combine these two completely different modalities, we propose to use features that do not depend on the data representation: angles. More specifically, our framework combines image gradient orientations as extracted from intensity images with the directions of surface normals computed from dense depth fields. We propose to capture the correlations between the obtained orientation appearance models using a fusion approach motivated by the original Active Appearance Models (AAMs). To incorporate these features in a learning framework, we use a robust kernel based on the Euler representation of angles which does not require off-line training, and can be efficiently implemented online. The robustness of learning from orientation appearance models is presented both theoretically and experimentally in this work. This kernel enables us to cope with gross measurement errors, missing data as well as other typical problems such as illumination changes and occlusions. By combining the proposed models with a particle filter, the proposed framework was used for performing 2D plus 3D rigid object tracking, achieving robust performance in very difficult tracking scenarios including extreme pose variations. © 2014 Elsevier B.V. All rights reserved

    Deep Attention Models for Human Tracking Using RGBD

    Get PDF
    Visual tracking performance has long been limited by the lack of better appearance models. These models fail either where they tend to change rapidly, like in motion-based tracking, or where accurate information of the object may not be available, like in color camouflage (where background and foreground colors are similar). This paper proposes a robust, adaptive appearance model which works accurately in situations of color camouflage, even in the presence of complex natural objects. The proposed model includes depth as an additional feature in a hierarchical modular neural framework for online object tracking. The model adapts to the confusing appearance by identifying the stable property of depth between the target and the surrounding object(s). The depth complements the existing RGB features in scenarios when RGB features fail to adapt, hence becoming unstable over a long duration of time. The parameters of the model are learned efficiently in the Deep network, which consists of three modules: (1) The spatial attention layer, which discards the majority of the background by selecting a region containing the object of interest; (2) the appearance attention layer, which extracts appearance and spatial information about the tracked object; and (3) the state estimation layer, which enables the framework to predict future object appearance and location. Three different models were trained and tested to analyze the effect of depth along with RGB information. Also, a model is proposed to utilize only depth as a standalone input for tracking purposes. The proposed models were also evaluated in real-time using KinectV2 and showed very promising results. The results of our proposed network structures and their comparison with the state-of-the-art RGB tracking model demonstrate that adding depth significantly improves the accuracy of tracking in a more challenging environment (i.e., cluttered and camouflaged environments). Furthermore, the results of depth-based models showed that depth data can provide enough information for accurate tracking, even without RGB information

    Eye gaze position before, during and after percept switching of bistable visual stimului

    Full text link
    A bistable visual stimulus, such as the Necker Cube or Rubin’s Vase, can be perceived in two different ways which compete against each other and alternate spontaneously. Percept switch rates have been recorded in past psychophysical experiments, but few experiments have measured percept switches while tracking eye movements in human participants. In our study, we use the Eyelink II system to track eye gaze position during spontaneous percept switches of a bistable, structure-from-motion (SFM) cylinder that can be perceived to be rotating clockwise (CW) or counterclockwise (CCW). Participants reported the perceived direction of rotation of the SFM cylinder using key presses. Reliability of participants’ reports was ensured by including unambiguous rotations. Unambiguous rotation was generated by assigning depth using binocular disparity. Gaze positions were measured 50 – 2000 ms before and after key presses. Our pilot data show that during ambiguous cylinder presentation, gaze positions for CW reports clustered to the left half of the cylinder and gaze positions for CCW reports clustered to the right half of the cylinder between 1000ms before and 1500ms after key presses, but no such correlation was found beyond that timeframe. These results suggest that percept switches can be correlated with prior gaze positions for ambiguous stimuli. Our results further suggest that the mechanism underlying percept initiation may be influenced by the visual hemifield where the ambiguous stimulus is located.Published versio

    Dynamics of Attention in Depth: Evidence from Mutli-Element Tracking

    Full text link
    The allocation of attention in depth is examined using a multi-element tracking paradigm. Observers are required to track a predefined subset of from two to eight elements in displays containing up to sixteen identical moving elements. We first show that depth cues, such as binocular disparity and occlusion through T-junctions, improve performance in a multi-element tracking task in the case where element boundaries are allowed to intersect in the depiction of motion in a single fronto-parallel plane. We also show that the allocation of attention across two perceptually distinguishable planar surfaces either fronto-parallel or receding at a slanting angle and defined by coplanar elements, is easier than allocation of attention within a single surface. The same result was not found when attention was required to be deployed across items of two color populations rather than of a single color. Our results suggest that, when surface information does not suffice to distinguish between targets and distractors that are embedded in these surfaces, division of attention across two surfaces aids in tracking moving targets.National Science Foundation (IRI-94-01659); Office of Naval Research (N00014-95-1-0409, N00014-95-1-0657

    Keyframe-based monocular SLAM: design, survey, and future directions

    Get PDF
    Extensive research in the field of monocular SLAM for the past fifteen years has yielded workable systems that found their way into various applications in robotics and augmented reality. Although filter-based monocular SLAM systems were common at some time, the more efficient keyframe-based solutions are becoming the de facto methodology for building a monocular SLAM system. The objective of this paper is threefold: first, the paper serves as a guideline for people seeking to design their own monocular SLAM according to specific environmental constraints. Second, it presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution. Third, the paper provides insight into the direction of future research in this field, to address the major limitations still facing monocular SLAM; namely, in the issues of illumination changes, initialization, highly dynamic motion, poorly textured scenes, repetitive textures, map maintenance, and failure recovery

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Neural Models of Motion Integration, Segmentation, and Probablistic Decision-Making

    Full text link
    When brain mechanism carry out motion integration and segmentation processes that compute unambiguous global motion percepts from ambiguous local motion signals? Consider, for example, a deer running at variable speeds behind forest cover. The forest cover is an occluder that creates apertures through which fragments of the deer's motion signals are intermittently experienced. The brain coherently groups these fragments into a trackable percept of the deer in its trajectory. Form and motion processes are needed to accomplish this using feedforward and feedback interactions both within and across cortical processing streams. All the cortical areas V1, V2, MT, and MST are involved in these interactions. Figure-ground processes in the form stream through V2, such as the seperation of occluding boundaries of the forest cover from the boundaries of the deer, select the motion signals which determine global object motion percepts in the motion stream through MT. Sparse, but unambiguous, feauture tracking signals are amplified before they propogate across position and are intergrated with far more numerous ambiguous motion signals. Figure-ground and integration processes together determine the global percept. A neural model predicts the processing stages that embody these form and motion interactions. Model concepts and data are summarized about motion grouping across apertures in response to a wide variety of displays, and probabilistic decision making in parietal cortex in response to random dot displays.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
    • …
    corecore