70 research outputs found

    Observation-switching linear dynamic systems for tracking humans through unexpected partial occlusions by scene objects

    Get PDF
    This paper focuses on the problem of tracking people through occlusions by scene objects. Rather than relying on models of the scene to predict when occlusions will occur as other researchers have done, this paper proposes a linear dynamic system that switches between two alternatives of the position measurement in order to handle occlusions as they occur. The filter automatically switches between a foot-based measure of position (assuming z = Q) to a head-based position measure (given the person\u27s height) when an occlusion of the person\u27s lower body occurs. No knowledge of the scene or its occluding objects is used. Unlike similar research [2, 14], the approach does not assume a fixed height for people and so is able to track humans through occlusions even when they change height during the occlusion. The approach is evaluated on three furnished scenes containing tables, chairs, desks and partitions. Occlusions range from occlusions of legs, occlusions whilst being seated and near-total occlusions where only the person\u27s head is visible. Results show that the approach provides a significant reduction in false-positive tracks in a multi-camera environment, and more than halves the number of lost tracks in single monocular camera views

    Tracking-Reconstruction or Reconstruction-Tracking?

    Full text link
    We developed two methods for tracking multiple objects using several camera views. The methods use the Multiple Hypothesis Tracking (MHT) framework to solve both the across-view data association problem (i.e., finding object correspondences across several views) and the across-time data association problem (i.e., the assignment of current object measurements to previously established object tracks). The "tracking-reconstruction method" establishes two-dimensional (2D) objects tracks for each view and then reconstructs their three-dimensional (3D) motion trajectories. The "reconstruction-tracking method" assembles 2D object measurements from all views, reconstructs 3D object positions, and then matches these 3D positions to previously established 3D object tracks to compute 3D motion trajectories. For both methods, we propose techniques for pruning the number of association hypotheses and for gathering track fragments. We tested and compared the performance of our methods on thermal infrared video of bats using several performance measures. Our analysis of video sequences with different levels of densities of flying bats reveals that the reconstruction-tracking method produces fewer track fragments than the tracking-reconstruction method but creates more false positive 3D tracks

    A method for vehicle count in the presence of multiple-vehicle occlusions in traffic images

    Get PDF
    This paper proposes a novel method for accurately counting the number of vehicles that are involved in multiple-vehicle occlusions, based on the resolvability of each occluded vehicle, as seen in a monocular traffic image sequence. Assuming that the occluded vehicles are segmented from the road background by a previously proposed vehicle segmentation method and that a deformable model is geometrically fitted onto the occluded vehicles, the proposed method first deduces the number of vertices per individual vehicle from the camera configuration. Second, a contour description model is utilized to describe the direction of the contour segments with respect to its vanishing points, from which individual contour description and vehicle count are determined. Third, it assigns a resolvability index to each occluded vehicle based on a resolvability model, from which each occluded vehicle model is resolved and the vehicle dimension is measured. The proposed method has been tested on 267 sets of real-world monocular traffic images containing 3074 vehicles with multiple-vehicle occlusions and is found to be 100% accurate in calculating vehicle count, in comparison with human inspection. By comparing the estimated dimensions of the resolved generalized deformable model of the vehicle with the actual dimensions published by the manufacturers, the root-mean-square error for width, length, and height estimations are found to be 48, 279, and 76 mm, respectively. © 2007 IEEE.published_or_final_versio

    Conditional Random Fields for Multi-Camera Object Detection

    Get PDF
    We formulate a model for multi-class object detection in a multi-camera environment. From our knowledge, this is the first time that this problem is addressed taken into account different object classes simultaneously. Given several images of the scene taken from different angles, our system estimates the ground plane location of the objects from the output of several object detectors applied at each viewpoint. We cast the problem as an energy minimization modeled with a Conditional Random Field (CRF). Instead of predicting the presence of an object at each image location independently, we simultaneously predict the labeling of the entire scene. Our CRF is able to take into account occlusions between objects and contextual constraints among them. We propose an effective iterative strategy that renders tractable the underlying optimization problem, and learn the parameters of the model with the max-margin paradigm. We evaluate the performance of our model on several challenging multi-camera pedestrian detection datasets namely PETS 2009 and EPFL terrace sequence. We also introduce a new dataset in which multiple classes of objects appear simultaneously in the scene. It is here where we show that our method effectively handles occlusions in the multi-class case

    Multi-view dynamic scene modeling

    Get PDF
    Modeling dynamic scenes/events from multiple fixed-location vision sensors, such as video camcorders, infrared cameras, Time-of-Flight sensors etc, is of broad interest in computer vision society, with many applications including 3D TV, virtual reality, medical surgery, markerless motion capture, video games, and security surveillance. However, most of the existing multi-view systems are set up in a strictly controlled indoor environment, with fixed lighting conditions and simple background views. Many challenges are preventing the technology to an outdoor natural environment. These include varying sunlight, shadows, reflections, background motion and visual occlusion. In this thesis, I address different aspects to overcome all of the aforementioned difficulties, so as to reduce human preparation and manipulation, and to make a robust outdoor system as automatic as possible. In particular, the main novel technical contributions of this thesis are as follows: a generic heterogeneous sensor fusion framework for robust 3D shape estimation together; a way to automatically recover 3D shapes of static occluder from dynamic object silhouette cues, which explicitly models the static visual occluding event along the viewing rays; a system to model multiple dynamic objects shapes and track their identities simultaneously, which explicitly models the inter-occluding event between dynamic objects; a scheme to recover an object's dense 3D motion flow over time, without assuming any prior knowledge of the underlying structure of the dynamic object being modeled, which helps to enforce temporal consistency of natural motions and initializes more advanced shape learning and motion analysis. A unified automatic calibration algorithm for the heterogeneous network of conventional cameras/camcorders and new Time-of-Flight sensors is also proposed

    Map-Based Localization for Unmanned Aerial Vehicle Navigation

    Get PDF
    Unmanned Aerial Vehicles (UAVs) require precise pose estimation when navigating in indoor and GNSS-denied / GNSS-degraded outdoor environments. The possibility of crashing in these environments is high, as spaces are confined, with many moving obstacles. There are many solutions for localization in GNSS-denied environments, and many different technologies are used. Common solutions involve setting up or using existing infrastructure, such as beacons, Wi-Fi, or surveyed targets. These solutions were avoided because the cost should be proportional to the number of users, not the coverage area. Heavy and expensive sensors, for example a high-end IMU, were also avoided. Given these requirements, a camera-based localization solution was selected for the sensor pose estimation. Several camera-based localization approaches were investigated. Map-based localization methods were shown to be the most efficient because they close loops using a pre-existing map, thus the amount of data and the amount of time spent collecting data are reduced as there is no need to re-observe the same areas multiple times. This dissertation proposes a solution to address the task of fully localizing a monocular camera onboard a UAV with respect to a known environment (i.e., it is assumed that a 3D model of the environment is available) for the purpose of navigation for UAVs in structured environments. Incremental map-based localization involves tracking a map through an image sequence. When the map is a 3D model, this task is referred to as model-based tracking. A by-product of the tracker is the relative 3D pose (position and orientation) between the camera and the object being tracked. State-of-the-art solutions advocate that tracking geometry is more robust than tracking image texture because edges are more invariant to changes in object appearance and lighting. However, model-based trackers have been limited to tracking small simple objects in small environments. An assessment was performed in tracking larger, more complex building models, in larger environments. A state-of-the art model-based tracker called ViSP (Visual Servoing Platform) was applied in tracking outdoor and indoor buildings using a UAVs low-cost camera. The assessment revealed weaknesses at large scales. Specifically, ViSP failed when tracking was lost, and needed to be manually re-initialized. Failure occurred when there was a lack of model features in the cameras field of view, and because of rapid camera motion. Experiments revealed that ViSP achieved positional accuracies similar to single point positioning solutions obtained from single-frequency (L1) GPS observations standard deviations around 10 metres. These errors were considered to be large, considering the geometric accuracy of the 3D model used in the experiments was 10 to 40 cm. The first contribution of this dissertation proposes to increase the performance of the localization system by combining ViSP with map-building incremental localization, also referred to as simultaneous localization and mapping (SLAM). Experimental results in both indoor and outdoor environments show sub-metre positional accuracies were achieved, while reducing the number of tracking losses throughout the image sequence. It is shown that by integrating model-based tracking with SLAM, not only does SLAM improve model tracking performance, but the model-based tracker alleviates the computational expense of SLAMs loop closing procedure to improve runtime performance. Experiments also revealed that ViSP was unable to handle occlusions when a complete 3D building model was used, resulting in large errors in its pose estimates. The second contribution of this dissertation is a novel map-based incremental localization algorithm that improves tracking performance, and increases pose estimation accuracies from ViSP. The novelty of this algorithm is the implementation of an efficient matching process that identifies corresponding linear features from the UAVs RGB image data and a large, complex, and untextured 3D model. The proposed model-based tracker improved positional accuracies from 10 m (obtained with ViSP) to 46 cm in outdoor environments, and improved from an unattainable result using VISP to 2 cm positional accuracies in large indoor environments. The main disadvantage of any incremental algorithm is that it requires the camera pose of the first frame. Initialization is often a manual process. The third contribution of this dissertation is a map-based absolute localization algorithm that automatically estimates the camera pose when no prior pose information is available. The method benefits from vertical line matching to accomplish a registration procedure of the reference model views with a set of initial input images via geometric hashing. Results demonstrate that sub-metre positional accuracies were achieved and a proposed enhancement of conventional geometric hashing produced more correct matches - 75% of the correct matches were identified, compared to 11%. Further the number of incorrect matches was reduced by 80%

    Vision-based traffic monitoring system with hierarchical camera auto-calibration

    Get PDF
    Texto en inglés.En las últimas décadas, el tráfico, debido al aumento de su volumen y al consiguiente incremento en la demanda de infraestructuras de transporte, se ha convertido en un gran problema en ciudades de casi todo el mundo. Constituye un fenómeno social, económico y medioambiental en el que se encuentra inmersa toda la sociedad, por lo que resulta importante tomarlo como un aspecto clave a mejorar. En esta línea, y para garantizar una movilidad segura, fluida y sostenible, es importante analizar el comportamiento e interacción de los vehículos y peatones en diferentes escenarios. Hasta el momento, esta tarea se ha llevado a cabo de forma limitada por operarios en los centros de control de tráfico. Sin embargo, el avance de la tecnología, sugiere una evolución en la metodología hacia sistemas automáticos de monitorización y control. Este trabajo se inscribe en el marco de los Sistemas Inteligentes de Transporte (ITS), concretamente en el ámbito de la monitorización para la detección y predicción de incidencias (accidentes, maniobras peligrosas, colapsos, etc.) en zonas críticas de infraestructuras de tráfico, como rotondas o intersecciones. Para ello se propone el enfoque de la visión artificial, con el objetivo de diseñar un sistema sensor compuesto de una cámara, capaz de medir de forma robusta parámetros correspondientes a peatones y vehículos que proporcionen información a un futuro sistema de detección de incidencias, control de tráfico, etc.El problema general de la visión artificial en este tipo de aplicaciones, y que es donde se hace hincapié en la solución propuesta, es la adaptabilidad del algoritmo a cualquier condición externa. De esta forma, cambios en la iluminación o en la meteorología, inestabilidades debido a viento o vibraciones, oclusiones, etc. son compensadas. Además el funcionamiento es independiente de la posición de la cámara, con la posibilidad de utilizar modelos con pan-tilt-zoom variable para aumentar la versatilidad del sistema. Una de las aportaciones de esta tesis es la extracción y uso de puntos de fuga (a partir de elementos estructurados de la escena), para obtener una calibración de la cámara sin conocimiento previo. Esta calibración proporciona un tamaño aproximado de los objetos buscados, mejorando así el rendimiento de las siguientes etapas del algoritmo. Para segmentar la imagen se realiza una extracción de los objetos móviles a partir del modelado del fondo, basándose en mezcla de Gaussianas (GMM) y métodos de detección de sombras. En cuanto al seguimiento de los objetos segmentados, se desecha la idea tradicional de considerarlos un conjunto. Para ello se extraen características cuya evolución es analizada para conseguir finalmente una agrupación óptima que sea capaz de solventar oclusiones. El sistema ha sido probado en condiciones de tráfico real sin ningún conocimiento previo de la escena, con resultados bastante satisfactorios que muestran la viabilidad del método
    corecore