619 research outputs found

    Vanishing point detection for visual surveillance systems in railway platform environments

    Get PDF
    © 2018 Elsevier B.V. Visual surveillance is of paramount importance in public spaces and especially in train and metro platforms which are particularly susceptible to many types of crime from petty theft to terrorist activity. Image resolution of visual surveillance systems is limited by a trade-off between several requirements such as sensor and lens cost, transmission bandwidth and storage space. When image quality cannot be improved using high-resolution sensors, high-end lenses or IR illumination, the visual surveillance system may need to increase the resolving power of the images by software to provide accurate outputs such as, in our case, vanishing points (VPs). Despite having numerous applications in camera calibration, 3D reconstruction and threat detection, a general method for VP detection has remained elusive. Rather than attempting the infeasible task of VP detection in general scenes, this paper presents a novel method that is fine-tuned to work for railway station environments and is shown to outperform the state-of-the-art for that particular case. In this paper, we propose a three-stage approach to accurately detect the main lines and vanishing points in low-resolution images acquired by visual surveillance systems in indoor and outdoor railway platform environments. First, several frames are used to increase the resolving power through a multi-frame image enhancer. Second, an adaptive edge detection is performed and a novel line clustering algorithm is then applied to determine the parameters of the lines that converge at VPs; this is based on statistics of the detected lines and heuristics about the type of scene. Finally, vanishing points are computed via a voting system to optimize detection in an attempt to omit spurious lines. The proposed approach is very robust since it is not affected by ever-changing illumination and weather conditions of the scene, and it is immune to vibrations. Accurate and reliable vanishing point detection provides very valuable information, which can be used to aid camera calibration, automatic scene understanding, scene segmentation, semantic classification or augmented reality in platform environments

    An investigation into common challenges of 3D scene understanding in visual surveillance

    Get PDF
    Nowadays, video surveillance systems are ubiquitous. Most installations simply consist of CCTV cameras connected to a central control room and rely on human operators to interpret what they see on the screen in order to, for example, detect a crime (either during or after an event). Some modern computer vision systems aim to automate the process, at least to some degree, and various algorithms have been somewhat successful in certain limited areas. However, such systems remain inefficient in general circumstances and present real challenges yet to be solved. These challenges include the ability to recognise and ultimately predict and prevent abnormal behaviour or even reliably recognise objects, for example in order to detect left luggage or suspicious objects. This thesis first aims to study the state-of-the-art and identify the major challenges and possible requirements of future automated and semi-automated CCTV technology in the field. This thesis presents the application of a suite of 2D and highly novel 3D methodologies that go some way to overcome current limitations.The methods presented here are based on the analysis of object features directly extracted from the geometry of the scene and start with a consideration of mainly existing techniques, such as the use of lines, vanishing points (VPs) and planes, applied to real scenes. Then, an investigation is presented into the use of richer 2.5D/3D surface normal data. In all cases the aim is to combine both 2D and 3D data to obtain a better understanding of the scene, aimed ultimately at capturing what is happening within the scene in order to be able to move towards automated scene analysis. Although this thesis focuses on the widespread application of video surveillance, an example case of the railway station environment is used to represent typical real-world challenges, where the principles can be readily extended elsewhere, such as to airports, motorways, the households, shopping malls etc. The context of this research work, together with an overall presentation of existing methods used in video surveillance and their challenges are described in chapter 1.Common computer vision techniques such as VP detection, camera calibration, 3D reconstruction, segmentation etc., can be applied in an effort to extract meaning to video surveillance applications. According to the literature, these methods have been well researched and their use will be assessed in the context of current surveillance requirements in chapter 2. While existing techniques can perform well in some contexts, such as an architectural environment composed of simple geometrical elements, their robustness and performance in feature extraction and object recognition tasks is not sufficient to solve the key challenges encountered in general video surveillance context. This is largely due to issues such as variable lighting, weather conditions, and shadows and in general complexity of the real-world environment. Chapter 3 presents the research and contribution on those topics – methods to extract optimal features for a specific CCTV application – as well as their strengths and weaknesses to highlight that the proposed algorithm obtains better results than most due to its specific design.The comparison of current surveillance systems and methods from the literature has shown that 2D data are however almost constantly used for many applications. Indeed, industrial systems as well as the research community have been improving intensively 2D feature extraction methods since image analysis and Scene understanding has been of interest. The constant progress on 2D feature extraction methods throughout the years makes it almost effortless nowadays due to a large variety of techniques. Moreover, even if 2D data do not allow solving all challenges in video surveillance or other applications, they are still used as starting stages towards scene understanding and image analysis. Chapter 4 will then explore 2D feature extraction via vanishing point detection and segmentation methods. A combination of most common techniques and a novel approach will be then proposed to extract vanishing points from video surveillance environments. Moreover, segmentation techniques will be explored in the aim to determine how they can be used to complement vanishing point detection and lead towards 3D data extraction and analysis. In spite of the contribution above, 2D data is insufficient for all but the simplest applications aimed at obtaining an understanding of a scene, where the aim is for a robust detection of, say, left luggage or abnormal behaviour; without significant a priori information about the scene geometry. Therefore, more information is required in order to be able to design a more automated and intelligent algorithm to obtain richer information from the scene geometry and so a better understanding of what is happening within. This can be overcome by the use of 3D data (in addition to 2D data) allowing opportunity for object “classification” and from this to infer a map of functionality, describing feasible and unfeasible object functionality in a given environment. Chapter 5 presents how 3D data can be beneficial for this task and the various solutions investigated to recover 3D data, as well as some preliminary work towards plane extraction.It is apparent that VPs and planes give useful information about a scene’s perspective and can assist in 3D data recovery within a scene. However, neither VPs nor plane detection techniques alone allow the recovery of more complex generic object shapes - for example composed of spheres, cylinders etc - and any simple model will suffer in the presence of non-Manhattan features, e.g. introduced by the presence of an escalator. For this reason, a novel photometric stereo-based surface normal retrieval methodology is introduced to capture the 3D geometry of the whole scene or part of it. Chapter 6 describes how photometric stereo allows recovery of 3D information in order to obtain a better understanding of a scene, as well as also partially overcoming some current surveillance challenges, such as difficulty in resolving fine detail, particularly at large standoff distances, and in isolating and recognising more complex objects in real scenes. Here items of interest may be obscured by complex environmental factors that are subject to rapid change, making, for example, the detection of suspicious objects and behaviour highly problematic. Here innovative use is made of an untapped latent capability offered within modern surveillance environments to introduce a form of environmental structuring to good advantage in order to achieve a richer form of data acquisition. This chapter also goes on to explore the novel application of photometric stereo in such diverse applications, how our algorithm can be incorporated into an existing surveillance system and considers a typical real commercial application.One of the most important aspects of this research work is its application. Indeed, while most of the research literature has been based on relatively simple structured environments, the approach here has been designed to be applied to real surveillance environments, such as railway stations, airports, waiting rooms, etc, and where surveillance cameras may be fixed or in the future form part of a mobile robotic free roaming surveillance device, that must continually reinterpret its changing environment. So, as mentioned previously, while the main focus has been to apply this algorithm to railway station environments, the work has been approached in a way that allows adaptation to many other applications, such as autonomous robotics, and in motorway, shopping centre, street and home environments. All of these applications require a better understanding of the scene for security or safety purposes. Finally, chapter 7 presents a global conclusion and what will be achieved in the future

    Revisión de algoritmos, métodos y técnicas para la detección de UAVs y UAS en aplicaciones de audio, radiofrecuencia y video

    Get PDF
    Unmanned Aerial Vehicles (UAVs), also known as drones, have had an exponential evolution in recent times due in large part to the development of technologies that enhance the development of these devices. This has resulted in increasingly affordable and better-equipped artifacts, which implies their application in new fields such as agriculture, transport, monitoring, and aerial photography. However, drones have also been used in terrorist acts, privacy violations, and espionage, in addition to involuntary accidents in high-risk zones such as airports. In response to these events, multiple technologies have been introduced to control and monitor the airspace in order to ensure protection in risk areas. This paper is a review of the state of the art of the techniques, methods, and algorithms used in video, radiofrequency, and audio-based applications to detect UAVs and Unmanned Aircraft Systems (UAS). This study can serve as a starting point to develop future drone detection systems with the most convenient technologies that meet certain requirements of optimal scalability, portability, reliability, and availability.Los vehículos aéreos no tripulados, conocidos también como drones, han tenido una evolución exponencial en los últimos tiempos, debido en gran parte al desarrollo de las tecnologías que potencian su desarrollo, lo cual ha desencadenado en artefactos cada vez más asequibles y con mejores prestaciones, lo que implica el desarrollo de nuevas aplicaciones como agricultura, transporte, monitoreo, fotografía aérea, entre otras. No obstante, los drones se han utilizado también en actos terroristas, violaciones a la privacidad y espionaje, además de haber producido accidentes involuntarios en zonas de alto riesgo de operación como aeropuertos. En respuesta a dichos eventos, aparecen tecnologías que permiten controlar y monitorear el espacio aéreo, con el fin de garantizar la protección en zonas de riesgo. En este artículo se realiza un estudio del estado del arte de la técnicas, métodos y algoritmos basados en video, en análisis de sonido y en radio frecuencia, para tener un punto de partida que permita el desarrollo en el futuro de un sistema de detección de drones, con las tecnologías más propicias, según los requerimientos que puedan ser planteados con las características de escalabilidad, portabilidad, confiabilidad y disponibilidad óptimas

    Autonomous Quadrotor Navigation by Detecting Vanishing Points in Indoor Environments

    Get PDF
    abstract: Toward the ambitious long-term goal of a fleet of cooperating Flexible Autonomous Machines operating in an uncertain Environment (FAME), this thesis addresses various perception and control problems in autonomous aerial robotics. The objective of this thesis is to motivate the use of perspective cues in single images for the planning and control of quadrotors in indoor environments. In addition to providing empirical evidence for the abundance of such cues in indoor environments, the usefulness of these perspective cues is demonstrated by designing a control algorithm for navigating a quadrotor in indoor corridors. An Extended Kalman Filter (EKF), implemented on top of the vision algorithm, serves to improve the robustness of the algorithm to changing illumination. In this thesis, vanishing points are the perspective cues used to control and navigate a quadrotor in an indoor corridor. Indoor corridors are an abundant source of parallel lines. As a consequence of perspective projection, parallel lines in the real world, that are not parallel to the plane of the camera, intersect at a point in the image. This point is called the vanishing point of the image. The vanishing point is sensitive to the lateral motion of the camera and hence the quadrotor. By tracking the position of the vanishing point in every image frame, the quadrotor can navigate along the center of the corridor. Experiments are conducted using the Augmented Reality (AR) Drone 2.0. The drone is equipped with the following componenets: (1) 720p forward facing camera for vanishing point detection, (2) 240p downward facing camera, (3) Inertial Measurement Unit (IMU) for attitude control , (4) Ultrasonic sensor for estimating altitude, (5) On-board 1 GHz Processor for processing low level commands. The reliability of the vision algorithm is presented by flying the drone in indoor corridors.Dissertation/ThesisMasters Thesis Electrical Engineering 201

    Assessment of Automated Crowd Behaviour Analysis Based on Optical Flow

    Get PDF
    In visual surveillance, camera streams are often used to keep an eye on dense crowds. The examination of this data is mostly done manually by observers. When analysing multiple cameras some assistance is desirable. Computer vision methods can be used to assist observers in detecting crowd behaviours. Methods based on optical flow are particularly interesting since they can examine high density crowds with cluttering and (partial) occlusion without increasing computing costs. Not many methods can detect specific behaviour of dense crowds without the need of a learning stage. One promising method by Solmaz et al. uses the Jacobian stability of the optical flow field in the scene to detect five behaviour patterns viz. blocking, bottlenecks, fountainheads, rings and lanes. The method is implemented and a demo program is written with which experiments are performed on several datasets. The detection of three out of five behaviour patterns turn out to be promising, for the latter two improvements are proposed

    VIDEO FOREGROUND LOCALIZATION FROM TRADITIONAL METHODS TO DEEP LEARNING

    Get PDF
    These days, detection of Visual Attention Regions (VAR), such as moving objects has become an integral part of many Computer Vision applications, viz. pattern recognition, object detection and classification, video surveillance, autonomous driving, human-machine interaction (HMI), and so forth. The moving object identification using bounding boxes has matured to the level of localizing the objects along their rigid borders and the process is called foreground localization (FGL). Over the decades, many image segmentation methodologies have been well studied, devised, and extended to suit the video FGL. Despite that, still, the problem of video foreground (FG) segmentation remains an intriguing task yet appealing due to its ill-posed nature and myriad of applications. Maintaining spatial and temporal coherence, particularly at object boundaries, persists challenging, and computationally burdensome. It even gets harder when the background possesses dynamic nature, like swaying tree branches or shimmering water body, and illumination variations, shadows cast by the moving objects, or when the video sequences have jittery frames caused by vibrating or unstable camera mounts on a surveillance post or moving robot. At the same time, in the analysis of traffic flow or human activity, the performance of an intelligent system substantially depends on its robustness of localizing the VAR, i.e., the FG. To this end, the natural question arises as what is the best way to deal with these challenges? Thus, the goal of this thesis is to investigate plausible real-time performant implementations from traditional approaches to modern-day deep learning (DL) models for FGL that can be applicable to many video content-aware applications (VCAA). It focuses mainly on improving existing methodologies through harnessing multimodal spatial and temporal cues for a delineated FGL. The first part of the dissertation is dedicated for enhancing conventional sample-based and Gaussian mixture model (GMM)-based video FGL using probability mass function (PMF), temporal median filtering, and fusing CIEDE2000 color similarity, color distortion, and illumination measures, and picking an appropriate adaptive threshold to extract the FG pixels. The subjective and objective evaluations are done to show the improvements over a number of similar conventional methods. The second part of the thesis focuses on exploiting and improving deep convolutional neural networks (DCNN) for the problem as mentioned earlier. Consequently, three models akin to encoder-decoder (EnDec) network are implemented with various innovative strategies to improve the quality of the FG segmentation. The strategies are not limited to double encoding - slow decoding feature learning, multi-view receptive field feature fusion, and incorporating spatiotemporal cues through long-shortterm memory (LSTM) units both in the subsampling and upsampling subnetworks. Experimental studies are carried out thoroughly on all conditions from baselines to challenging video sequences to prove the effectiveness of the proposed DCNNs. The analysis demonstrates that the architectural efficiency over other methods while quantitative and qualitative experiments show the competitive performance of the proposed models compared to the state-of-the-art
    corecore