96 research outputs found


    Get PDF
    Nowadays human motion analysis is one of the most active research topics in Computer Vision and it is receiving an increasing attention from both the industrial and scientific communities. The growing interest in human motion analysis is motivated by the increasing number of promising applications, ranging from surveillance, human–computer interaction, virtual reality to healthcare, sports, computer games and video conferencing, just to name a few. The aim of this thesis is to give an overview of the various tasks involved in visual motion analysis of the human body and to present the issues and possible solutions related to it. In this thesis, visual motion analysis is categorized into three major areas related to the interpretation of human motion: tracking of human motion using virtual pan-tilt-zoom (vPTZ) camera, recognition of human motions and human behaviors segmentation. In the field of human motion tracking, a virtual environment for PTZ cameras (vPTZ) is presented to overcame the mechanical limitations of PTZ cameras. The vPTZ is built on equirectangular images acquired by 360° cameras and it allows not only the development of pedestrian tracking algorithms but also the comparison of their performances. On the basis of this virtual environment, three novel pedestrian tracking algorithms for 360° cameras were developed, two of which adopt a tracking-by-detection approach while the last adopts a Bayesian approach. The action recognition problem is addressed by an algorithm that represents actions in terms of multinomial distributions of frequent sequential patterns of different length. Frequent sequential patterns are series of data descriptors that occur many times in the data. The proposed method learns a codebook of frequent sequential patterns by means of an apriori-like algorithm. An action is then represented with a Bag-of-Frequent-Sequential-Patterns approach. In the last part of this thesis a methodology to semi-automatically annotate behavioral data given a small set of manually annotated data is presented. The resulting methodology is not only effective in the semi-automated annotation task but can also be used in presence of abnormal behaviors, as demonstrated empirically by testing the system on data collected from children affected by neuro-developmental disorders

    Deep learning-based anomalous object detection system powered by microcontroller for PTZ cameras

    Get PDF
    Automatic video surveillance systems are usually designed to detect anomalous objects being present in a scene or behaving dangerously. In order to perform adequately, they must incorporate models able to achieve accurate pattern recognition in an image, and deep learning neural networks excel at this task. However, exhaustive scan of the full image results in multiple image blocks or windows to analyze, which could make the time performance of the system very poor when implemented on low cost devices. This paper presents a system which attempts to detect abnormal moving objects within an area covered by a PTZ camera while it is panning. The decision about the block of the image to analyze is based on a mixture distribution composed of two components: a uniform probability distribution, which represents a blind random selection, and a mixture of Gaussian probability distributions. Gaussian distributions represent windows in the image where anomalous objects were detected previously and contribute to generate the next window to analyze close to those windows of interest. The system is implemented on a Raspberry Pi microcontroller-based board, which enables the design and implementation of a low-cost monitoring system that is able to perform image processing.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Long Range Automated Persistent Surveillance

    Get PDF
    This dissertation addresses long range automated persistent surveillance with focus on three topics: sensor planning, size preserving tracking, and high magnification imaging. field of view should be reserved so that camera handoff can be executed successfully before the object of interest becomes unidentifiable or untraceable. We design a sensor planning algorithm that not only maximizes coverage but also ensures uniform and sufficient overlapped camera’s field of view for an optimal handoff success rate. This algorithm works for environments with multiple dynamic targets using different types of cameras. Significantly improved handoff success rates are illustrated via experiments using floor plans of various scales. Size preserving tracking automatically adjusts the camera’s zoom for a consistent view of the object of interest. Target scale estimation is carried out based on the paraperspective projection model which compensates for the center offset and considers system latency and tracking errors. A computationally efficient foreground segmentation strategy, 3D affine shapes, is proposed. The 3D affine shapes feature direct and real-time implementation and improved flexibility in accommodating the target’s 3D motion, including off-plane rotations. The effectiveness of the scale estimation and foreground segmentation algorithms is validated via both offline and real-time tracking of pedestrians at various resolution levels. Face image quality assessment and enhancement compensate for the performance degradations in face recognition rates caused by high system magnifications and long observation distances. A class of adaptive sharpness measures is proposed to evaluate and predict this degradation. A wavelet based enhancement algorithm with automated frame selection is developed and proves efficient by a considerably elevated face recognition rate for severely blurred long range face images

    A dataset of annotated omnidirectional videos for distancing applications

    Get PDF
    Omnidirectional (or 360◦ ) cameras are acquisition devices that, in the next few years, could have a big impact on video surveillance applications, research, and industry, as they can record a spherical view of a whole environment from every perspective. This paper presents two new contributions to the research community: the CVIP360 dataset, an annotated dataset of 360◦ videos for distancing applications, and a new method to estimate the distances of objects in a scene from a single 360◦ image. The CVIP360 dataset includes 16 videos acquired outdoors and indoors, annotated by adding information about the pedestrians in the scene (bounding boxes) and the distances to the camera of some points in the 3D world by using markers at fixed and known intervals. The proposed distance estimation algorithm is based on geometry facts regarding the acquisition process of the omnidirectional device, and is uncalibrated in practice: the only required parameter is the camera height. The proposed algorithm was tested on the CVIP360 dataset, and empirical results demonstrate that the estimation error is negligible for distancing applications

    Development of artificial neural network-based object detection algorithms for low-cost hardware devices

    Get PDF
    Finally, the fourth work was published in the “WCCI” conference in 2020 and consisted of an individuals' position estimation algorithm based on a novel neural network model for environments with forbidden regions, named “Forbidden Regions Growing Neural Gas”.The human brain is the most complex, powerful and versatile learning machine ever known. Consequently, many scientists of various disciplines are fascinated by its structures and information processing methods. Due to the quality and quantity of the information extracted from the sense of sight, image is one of the main information channels used by humans. However, the massive amount of video footage generated nowadays makes it difficult to process those data fast enough manually. Thus, computer vision systems represent a fundamental tool in the extraction of information from digital images, as well as a major challenge for scientists and engineers. This thesis' primary objective is automatic foreground object detection and classification through digital image analysis, using artificial neural network-based techniques, specifically designed and optimised to be deployed in low-cost hardware devices. This objective will be complemented by developing individuals' movement estimation methods by using unsupervised learning and artificial neural network-based models. The cited objectives have been addressed through a research work illustrated in four publications supporting this thesis. The first one was published in the “ICAE” journal in 2018 and consists of a neural network-based movement detection system for Pan-Tilt-Zoom (PTZ) cameras deployed in a Raspberry Pi board. The second one was published in the “WCCI” conference in 2018 and consists of a deep learning-based automatic video surveillance system for PTZ cameras deployed in low-cost hardware. The third one was published in the “ICAE” journal in 2020 and consists of an anomalous foreground object detection and classification system for panoramic cameras, based on deep learning and supported by low-cost hardware

    Development of an Active Vision System for the Remote Identification of Multiple Targets

    Get PDF
    This thesis introduces a centralized active vision system for the remote identification of multiple targets in applications where the targets may outnumber the active system resources. Design and implementation details of a modular active vision system are presented, from which a prototype has been constructed. The system employs two different, yet complimentary, camera technologies. Omnidirectional cameras are used to detect and track targets at a low resolution, while perspective cameras mounted to pan-tilt stages are used to acquire high resolution images suitable for identification. Five greedy-based scheduling policies have been developed and implemented to manage the active system resources in an attempt to achieve optimal target-to-camera assignments. System performance has been evaluated using both simulated and real-world experiments under different target and system configurations for all five scheduling policies. Parameters affecting performance that were considered include: target entry conditions, congestion levels, target to camera speeds, target trajectories, and number of active cameras. An overall trend in the relative performance of the scheduling algorithms was observed. The Least System Reconfiguration and Future Least System Reconfiguration scheduling policies performed the best for the majority of conditions investigated, while the Load Sharing and First Come First Serve policies performed the poorest. The performance of the Earliest Deadline First policy was seen to be highly dependent on target predictability

    Development of an Active Vision System for the Remote Identification of Multiple Targets

    Get PDF
    This thesis introduces a centralized active vision system for the remote identification of multiple targets in applications where the targets may outnumber the active system resources. Design and implementation details of a modular active vision system are presented, from which a prototype has been constructed. The system employs two different, yet complimentary, camera technologies. Omnidirectional cameras are used to detect and track targets at a low resolution, while perspective cameras mounted to pan-tilt stages are used to acquire high resolution images suitable for identification. Five greedy-based scheduling policies have been developed and implemented to manage the active system resources in an attempt to achieve optimal target-to-camera assignments. System performance has been evaluated using both simulated and real-world experiments under different target and system configurations for all five scheduling policies. Parameters affecting performance that were considered include: target entry conditions, congestion levels, target to camera speeds, target trajectories, and number of active cameras. An overall trend in the relative performance of the scheduling algorithms was observed. The Least System Reconfiguration and Future Least System Reconfiguration scheduling policies performed the best for the majority of conditions investigated, while the Load Sharing and First Come First Serve policies performed the poorest. The performance of the Earliest Deadline First policy was seen to be highly dependent on target predictability

    MadEye: Boosting Live Video Analytics Accuracy with Adaptive Camera Configurations

    Full text link
    Camera orientations (i.e., rotation and zoom) govern the content that a camera captures in a given scene, which in turn heavily influences the accuracy of live video analytics pipelines. However, existing analytics approaches leave this crucial adaptation knob untouched, instead opting to only alter the way that captured images from fixed orientations are encoded, streamed, and analyzed. We present MadEye, a camera-server system that automatically and continually adapts orientations to maximize accuracy for the workload and resource constraints at hand. To realize this using commodity pan-tilt-zoom (PTZ) cameras, MadEye embeds (1) a search algorithm that rapidly explores the massive space of orientations to identify a fruitful subset at each time, and (2) a novel knowledge distillation strategy to efficiently (with only camera resources) select the ones that maximize workload accuracy. Experiments on diverse workloads show that MadEye boosts accuracy by 2.9-25.7% for the same resource usage, or achieves the same accuracy with 2-3.7x lower resource costs.Comment: 19 pages, 16 figure

    3D Modelling for Improved Visual Traffic Analytics

    Get PDF
    Advanced Traffic Management Systems utilize diverse types of sensor networks with the goal of improving mobility and safety of transportation systems. These systems require information about the state of the traffic configuration, including volume, vehicle speed, density, and incidents, which are useful in applications such as urban planning, collision avoidance systems, and emergency vehicle notification systems, to name a few. Sensing technologies are an important part of Advanced Traffic Management Systems that enable the estimation of the traffic state. Inductive Loop Detectors are often used to sense vehicles on highway roads. Although this technology has proven to be effective, it has limitations. Their installation and replacement cost is high and causes traffic disruptions, and their sensing modality provides very limited information about the vehicles being sensed. No vehicle appearance information is available. Traffic camera networks are also used in advanced traffic monitoring centers where the cameras are controlled by a remote operator. The amount of visual information provided by such cameras can be overwhelmingly large, which may cause the operators to miss important traffic events happening in the field. This dissertation focuses on visual traffic surveillance for Advanced Traffic Management Systems. The focus is on the research and development of computer vision algorithms that contribute to the automation of highway traffic analytics systems that require estimates of traffic volume and density. This dissertation makes three contributions: The first contribution is an integrated vision surveillance system called 3DTown, where cameras installed at a university campus together with algorithms are used to produce vehicle and pedestrian detections to augment a 3D model of the university with dynamic information from the scene. A second major contribution is a technique for extracting road lines from highway images that are used to estimate the tilt angle and the focal length of the camera. This technique is useful when the operator changes the camera pose. The third major contribution is a method to automatically extract the active road lanes and model the vehicles in 3D to improve the vehicle count estimation by individuating 2D segments of imaged vehicles that have been merged due to occlusions

    Efficient resource allocation for automotive active vision systems

    Get PDF
    Individual mobility on roads has a noticeable impact upon peoples' lives, including traffic accidents resulting in severe, or even lethal injuries. Therefore the main goal when operating a vehicle is to safely participate in road-traffic while minimising the adverse effects on our environment. This goal is pursued by road safety measures ranging from safety-oriented road design to driver assistance systems. The latter require exteroceptive sensors to acquire information about the vehicle's current environment. In this thesis an efficient resource allocation for automotive vision systems is proposed. The notion of allocating resources implies the presence of processes that observe the whole environment and that are able to effeciently direct attentive processes. Directing attention constitutes a decision making process dependent upon the environment it operates in, the goal it pursues, and the sensor resources and computational resources it allocates. The sensor resources considered in this thesis are a subset of the multi-modal sensor system on a test vehicle provided by Audi AG, which is also used to evaluate our proposed resource allocation system. This thesis presents an original contribution in three respects. First, a system architecture designed to efficiently allocate both high-resolution sensor resources and computational expensive processes based upon low-resolution sensor data is proposed. Second, a novel method to estimate 3-D range motion, e cient scan-patterns for spin image based classifiers, and an evaluation of track-to-track fusion algorithms present contributions in the field of data processing methods. Third, a Pareto efficient multi-objective resource allocation method is formalised, implemented, and evaluated using road traffic test sequences