5,298 research outputs found

    Data association and occlusion handling for vision-based people tracking by mobile robots

    Get PDF
    This paper presents an approach for tracking multiple persons on a mobile robot with a combination of colour and thermal vision sensors, using several new techniques. First, an adaptive colour model is incorporated into the measurement model of the tracker. Second, a new approach for detecting occlusions is introduced, using a machine learning classifier for pairwise comparison of persons (classifying which one is in front of the other). Third, explicit occlusion handling is incorporated into the tracker. The paper presents a comprehensive, quantitative evaluation of the whole system and its different components using several real world data sets

    Automatic Image Registration in Infrared-Visible Videos using Polygon Vertices

    Full text link
    In this paper, an automatic method is proposed to perform image registration in visible and infrared pair of video sequences for multiple targets. In multimodal image analysis like image fusion systems, color and IR sensors are placed close to each other and capture a same scene simultaneously, but the videos are not properly aligned by default because of different fields of view, image capturing information, working principle and other camera specifications. Because the scenes are usually not planar, alignment needs to be performed continuously by extracting relevant common information. In this paper, we approximate the shape of the targets by polygons and use affine transformation for aligning the two video sequences. After background subtraction, keypoints on the contour of the foreground blobs are detected using DCE (Discrete Curve Evolution)technique. These keypoints are then described by the local shape at each point of the obtained polygon. The keypoints are matched based on the convexity of polygon's vertices and Euclidean distance between them. Only good matches for each local shape polygon in a frame, are kept. To achieve a global affine transformation that maximises the overlapping of infrared and visible foreground pixels, the matched keypoints of each local shape polygon are stored temporally in a buffer for a few number of frames. The matrix is evaluated at each frame using the temporal buffer and the best matrix is selected, based on an overlapping ratio criterion. Our experimental results demonstrate that this method can provide highly accurate registered images and that we outperform a previous related method

    Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking

    Full text link
    The most common paradigm for vision-based multi-object tracking is tracking-by-detection, due to the availability of reliable detectors for several important object categories such as cars and pedestrians. However, future mobile systems will need a capability to cope with rich human-made environments, in which obtaining detectors for every possible object category would be infeasible. In this paper, we propose a model-free multi-object tracking approach that uses a category-agnostic image segmentation method to track objects. We present an efficient segmentation mask-based tracker which associates pixel-precise masks reported by the segmentation. Our approach can utilize semantic information whenever it is available for classifying objects at the track level, while retaining the capability to track generic unknown objects in the absence of such information. We demonstrate experimentally that our approach achieves performance comparable to state-of-the-art tracking-by-detection methods for popular object categories such as cars and pedestrians. Additionally, we show that the proposed method can discover and robustly track a large variety of other objects.Comment: ICRA'18 submissio

    Robust pedestrian detection and tracking in crowded scenes

    Get PDF
    In this paper, a robust computer vision approach to detecting and tracking pedestrians in unconstrained crowded scenes is presented. Pedestrian detection is performed via a 3D clustering process within a region-growing framework. The clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. Pedestrian tracking is achieved by formulating the track matching process as a weighted bipartite graph and using a Weighted Maximum Cardinality Matching scheme. The approach is evaluated using both indoor and outdoor sequences, captured using a variety of different camera placements and orientations, that feature significant challenges in terms of the number of pedestrians present, their interactions and scene lighting conditions. The evaluation is performed against a manually generated groundtruth for all sequences. Results point to the extremely accurate performance of the proposed approach in all cases

    Hybrid Focal Stereo Networks for Pattern Analysis in Homogeneous Scenes

    Full text link
    In this paper we address the problem of multiple camera calibration in the presence of a homogeneous scene, and without the possibility of employing calibration object based methods. The proposed solution exploits salient features present in a larger field of view, but instead of employing active vision we replace the cameras with stereo rigs featuring a long focal analysis camera, as well as a short focal registration camera. Thus, we are able to propose an accurate solution which does not require intrinsic variation models as in the case of zooming cameras. Moreover, the availability of the two views simultaneously in each rig allows for pose re-estimation between rigs as often as necessary. The algorithm has been successfully validated in an indoor setting, as well as on a difficult scene featuring a highly dense pilgrim crowd in Makkah.Comment: 13 pages, 6 figures, submitted to Machine Vision and Application

    Human mobility monitoring in very low resolution visual sensor network

    Get PDF
    This paper proposes an automated system for monitoring mobility patterns using a network of very low resolution visual sensors (30 30 pixels). The use of very low resolution sensors reduces privacy concern, cost, computation requirement and power consumption. The core of our proposed system is a robust people tracker that uses low resolution videos provided by the visual sensor network. The distributed processing architecture of our tracking system allows all image processing tasks to be done on the digital signal controller in each visual sensor. In this paper, we experimentally show that reliable tracking of people is possible using very low resolution imagery. We also compare the performance of our tracker against a state-of-the-art tracking method and show that our method outperforms. Moreover, the mobility statistics of tracks such as total distance traveled and average speed derived from trajectories are compared with those derived from ground truth given by Ultra-Wide Band sensors. The results of this comparison show that the trajectories from our system are accurate enough to obtain useful mobility statistics

    A multi-projector CAVE system with commodity hardware and gesture-based interaction

    Get PDF
    Spatially-immersive systems such as CAVEs provide users with surrounding worlds by projecting 3D models on multiple screens around the viewer. Compared to alternative immersive systems such as HMDs, CAVE systems are a powerful tool for collaborative inspection of virtual environments due to better use of peripheral vision, less sensitivity to tracking errors, and higher communication possibilities among users. Unfortunately, traditional CAVE setups require sophisticated equipment including stereo-ready projectors and tracking systems with high acquisition and maintenance costs. In this paper we present the design and construction of a passive-stereo, four-wall CAVE system based on commodity hardware. Our system works with any mix of a wide range of projector models that can be replaced independently at any time, and achieves high resolution and brightness at a minimum cost. The key ingredients of our CAVE are a self-calibration approach that guarantees continuity across the screen, as well as a gesture-based interaction approach based on a clever combination of skeletal data from multiple Kinect sensors.Preprin
    corecore