5,741 research outputs found

    A flexible algorithm for detecting challenging moving objects in real-time within IR video sequences

    Get PDF
    Real-time detecting moving objects in infrared video sequences may be particularly challenging because of the characteristics of the objects, such as their size, contrast, velocity and trajectory. Many proposed algorithms achieve good performances but only in the presence of some specific kinds of objects, or by neglecting the computational time, becoming unsuitable for real-time applications. To obtain more flexibility in different situations, we developed an algorithm capable of successfully dealing with small and large objects, slow and fast objects, even if subjected to unusual movements, and poorly-contrasted objects. The algorithm is also capable to handle the contemporary presence of multiple objects within the scene and to work in real-time even using cheap hardware. The implemented strategy is based on a fast but accurate background estimation and rejection, performed pixel by pixel and updated frame by frame, which is robust to possible background intensity changes and to noise. A control routine prevents the estimation from being biased by the transit of moving objects, while two noise-adaptive thresholding stages, respectively, drive the estimation control and allow extracting moving objects after the background removal, leading to the desired detection map. For each step, attention has been paid to develop computationally light solution to achieve the real-time requirement. The algorithm has been tested on a database of infrared video sequences, obtaining promising results against different kinds of challenging moving objects and outperforming other commonly adopted solutions. Its effectiveness in terms of detection performance, flexibility and computational time make the algorithm particularly suitable for real-time applications such as intrusion monitoring, activity control and detection of approaching objects, which are fundamental task in the emerging research area of Smart City

    Selected topics in video coding and computer vision

    Get PDF
    Video applications ranging from multimedia communication to computer vision have been extensively studied in the past decades. However, the emergence of new applications continues to raise questions that are only partially answered by existing techniques. This thesis studies three selected topics related to video: intra prediction in block-based video coding, pedestrian detection and tracking in infrared imagery, and multi-view video alignment.;In the state-of-art video coding standard H.264/AVC, intra prediction is defined on the hierarchical quad-tree based block partitioning structure which fails to exploit the geometric constraint of edges. We propose a geometry-adaptive block partitioning structure and a new intra prediction algorithm named geometry-adaptive intra prediction (GAIP). A new texture prediction algorithm named geometry-adaptive intra displacement prediction (GAIDP) is also developed by extending the original intra displacement prediction (IDP) algorithm with the geometry-adaptive block partitions. Simulations on various test sequences demonstrate that intra coding performance of H.264/AVC can be significantly improved by incorporating the proposed geometry adaptive algorithms.;In recent years, due to the decreasing cost of thermal sensors, pedestrian detection and tracking in infrared imagery has become a topic of interest for night vision and all weather surveillance applications. We propose a novel approach for detecting and tracking pedestrians in infrared imagery based on a layered representation of infrared images. Pedestrians are detected from the foreground layer by a Principle Component Analysis (PCA) based scheme using the appearance cue. To facilitate the task of pedestrian tracking, we formulate the problem of shot segmentation and present a graph matching-based tracking algorithm. Simulations with both OSU Infrared Image Database and WVU Infrared Video Database are reported to demonstrate the accuracy and robustness of our algorithms.;Multi-view video alignment is a process to facilitate the fusion of non-synchronized multi-view video sequences for various applications including automatic video based surveillance and video metrology. In this thesis, we propose an accurate multi-view video alignment algorithm that iteratively aligns two sequences in space and time. To achieve an accurate sub-frame temporal alignment, we generalize the existing phase-correlation algorithm to 3-D case. We also present a novel method to obtain the ground-truth of the temporal alignment by using supplementary audio signals sampled at a much higher rate. The accuracy of our algorithm is verified by simulations using real-world sequences

    Contribuciones al uso de marcadores para Navegación Autónoma y Realidad Aumentada

    Get PDF
    Square planar markers are a widely used tools for localization and tracking due to their low cost and high performance. Many applications in Robotics, Unmanned Vehicles and Augmented Reality employ these markers for camera pose estimation with high accuracy. Nevertheless, marker-based systems are affected by several factors that limit their performance. First, the marker detection process is a time-consuming task, which is intensified as the image size increases. As a consequence, the current high-resolution cameras has weakened the processing efficiency of traditional marker systems. Second, marker detection is affected by the presence of noise, blurring and occlusion. The movement of the camera produces image blurriness, generated even by small movements. Furthermore, the marker may be partially or completely occluded in the image, so that it is no longer detected. This thesis deals with the above limitations, proposing novel methodologies and strategies for successful marker detection improving both the efficiency and robustness of these systems. First, a novel multi-scale approach has been developed to speed up the marker detection process. The method takes advantage of the different resolutions at which the image is represented to predict at runtime the optimal scale for detection and identification, as well as following a corner upsampling strategy necessary for an accurate pose estimation. Second, we introduce a new marker design, Fractal Marker, which using a novel keypoint-based method achieves detection even under severe occlusion, while allowing detection over a wider range of distance than traditional markers. Finally, we propose a new marker detection strategy based on Discriminative Correlation Filters (DCF), where the marker and its corners represented in the frequency domain perform more robust and faster detections than state-ofthe- art methods, even under extreme blur conditions.Los marcadores planos cuadrados son una de las herramientas ampliamente utilizadas para la localización y el tracking debido a su bajo coste y su alto rendimiento. Muchas aplicaciones en Robótica, Vehículos no Tripulados y Realidad Aumentada emplean estos marcadores para estimar con alta precisión la posición de la cámara. Sin embargo, los sistemas basados en marcadores se ven afectados por varios factores que limitan su rendimiento. En primer lugar, el proceso de detección de marcadores es una tarea que requiere mucho tiempo y este incrementa a medida que aumenta el tamaño de la imagen. En consecuencia, las actuales cámaras de alta resolución han debilitado la eficacia del procesamiento de los sistemas de marcadores tradicionales. Por otra parte, la detección de marcadores se ve afectada por la presencia de ruido, desenfoque y oclusión. El movimiento de la cámara produce desenfoque de la imagen, generado incluso por pequeños movimientos. Además, el marcador puede aparecer en la imagen parcial o completamente ocluido, dejando de ser detectado. Esta tesis aborda las limitaciones anteriores, proponiendo metodologías y estrategias novedosas para la correcta detección de marcadores, mejorando así tanto la eficiencia como la robustez de estos sistemas. En primer lugar, se ha desarrollado un novedoso enfoque multiescala para acelerar el proceso de detección de marcadores. El método aprovecha las diferentes resoluciones en las que la imagen está representada para predecir en tiempo de ejecución la escala óptima para la detección e identificación, a la vez que sigue una estrategia de upsampling de las esquinas necesaria para estimar la pose con precisión. En segundo lugar, introducimos un nuevo diseño de marcador, Fractal Marker, que, mediante un método basado en keypoints, logra detecciones incluso en casos de oclusión extrema, al tiempo que permite la detección en un rango de distancias más amplio que los marcadores tradicionales. Por último, proponemos una nueva estrategia de detección de marcadores basada en Discriminate Correlation Filters (DCF), donde el marcador y sus esquinas representadas en el dominio de la frecuencia realizan detecciones más robustas y rápidas que los métodos de referencia, incluso bajo condiciones extremas de emborronamiento

    Face tracking and pose estimation with automatic three-dimensional model construction

    Get PDF
    A method for robustly tracking and estimating the face pose of a person using stereo vision is presented. The method is invariant to identity and does not require previous training. A face model is automatically initialised and constructed online: a fixed point distribution is superposed over the face when it is frontal to the cameras, and several appropriate points close to those locations are chosen for tracking. Using the stereo correspondence of the cameras, the three-dimensional (3D) coordinates of these points are extracted, and the 3D model is created. The 2D projections of the model points are tracked separately on the left and right images using SMAT. RANSAC and POSIT are used for 3D pose estimation. Head rotations up to ±45° are correctly estimated. The approach runs in real time. The purpose of this method is to serve as the basis of a driver monitoring system, and has been tested on sequences recorded in a moving car.Ministerio de Educación y CienciaComunidad de Madri

    Vision-Based 2D and 3D Human Activity Recognition

    Get PDF

    Adaptive detection and tracking using multimodal information

    Get PDF
    This thesis describes work on fusing data from multiple sources of information, and focuses on two main areas: adaptive detection and adaptive object tracking in automated vision scenarios. The work on adaptive object detection explores a new paradigm in dynamic parameter selection, by selecting thresholds for object detection to maximise agreement between pairs of sources. Object tracking, a complementary technique to object detection, is also explored in a multi-source context and an efficient framework for robust tracking, termed the Spatiogram Bank tracker, is proposed as a means to overcome the difficulties of traditional histogram tracking. As well as performing theoretical analysis of the proposed methods, specific example applications are given for both the detection and the tracking aspects, using thermal infrared and visible spectrum video data, as well as other multi-modal information sources

    Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery

    Get PDF
    A robust and fast automatic moving object detection and tracking system is essential to characterize target object and extract spatial and temporal information for different functionalities including video surveillance systems, urban traffic monitoring and navigation, robotic. In this dissertation, I present a collaborative Spatial Pyramid Context-aware moving object detection and Tracking system. The proposed visual tracker is composed of one master tracker that usually relies on visual object features and two auxiliary trackers based on object temporal motion information that will be called dynamically to assist master tracker. SPCT utilizes image spatial context at different level to make the video tracking system resistant to occlusion, background noise and improve target localization accuracy and robustness. We chose a pre-selected seven-channel complementary features including RGB color, intensity and spatial pyramid of HoG to encode object color, shape and spatial layout information. We exploit integral histogram as building block to meet the demands of real-time performance. A novel fast algorithm is presented to accurately evaluate spatially weighted local histograms in constant time complexity using an extension of the integral histogram method. Different techniques are explored to efficiently compute integral histogram on GPU architecture and applied for fast spatio-temporal median computations and 3D face reconstruction texturing. We proposed a multi-component framework based on semantic fusion of motion information with projected building footprint map to significantly reduce the false alarm rate in urban scenes with many tall structures. The experiments on extensive VOTC2016 benchmark dataset and aerial video confirm that combining complementary tracking cues in an intelligent fusion framework enables persistent tracking for Full Motion Video and Wide Aerial Motion Imagery.Comment: PhD Dissertation (162 pages
    corecore