467 research outputs found

    An investigation into common challenges of 3D scene understanding in visual surveillance

    Get PDF
    Nowadays, video surveillance systems are ubiquitous. Most installations simply consist of CCTV cameras connected to a central control room and rely on human operators to interpret what they see on the screen in order to, for example, detect a crime (either during or after an event). Some modern computer vision systems aim to automate the process, at least to some degree, and various algorithms have been somewhat successful in certain limited areas. However, such systems remain inefficient in general circumstances and present real challenges yet to be solved. These challenges include the ability to recognise and ultimately predict and prevent abnormal behaviour or even reliably recognise objects, for example in order to detect left luggage or suspicious objects. This thesis first aims to study the state-of-the-art and identify the major challenges and possible requirements of future automated and semi-automated CCTV technology in the field. This thesis presents the application of a suite of 2D and highly novel 3D methodologies that go some way to overcome current limitations.The methods presented here are based on the analysis of object features directly extracted from the geometry of the scene and start with a consideration of mainly existing techniques, such as the use of lines, vanishing points (VPs) and planes, applied to real scenes. Then, an investigation is presented into the use of richer 2.5D/3D surface normal data. In all cases the aim is to combine both 2D and 3D data to obtain a better understanding of the scene, aimed ultimately at capturing what is happening within the scene in order to be able to move towards automated scene analysis. Although this thesis focuses on the widespread application of video surveillance, an example case of the railway station environment is used to represent typical real-world challenges, where the principles can be readily extended elsewhere, such as to airports, motorways, the households, shopping malls etc. The context of this research work, together with an overall presentation of existing methods used in video surveillance and their challenges are described in chapter 1.Common computer vision techniques such as VP detection, camera calibration, 3D reconstruction, segmentation etc., can be applied in an effort to extract meaning to video surveillance applications. According to the literature, these methods have been well researched and their use will be assessed in the context of current surveillance requirements in chapter 2. While existing techniques can perform well in some contexts, such as an architectural environment composed of simple geometrical elements, their robustness and performance in feature extraction and object recognition tasks is not sufficient to solve the key challenges encountered in general video surveillance context. This is largely due to issues such as variable lighting, weather conditions, and shadows and in general complexity of the real-world environment. Chapter 3 presents the research and contribution on those topics – methods to extract optimal features for a specific CCTV application – as well as their strengths and weaknesses to highlight that the proposed algorithm obtains better results than most due to its specific design.The comparison of current surveillance systems and methods from the literature has shown that 2D data are however almost constantly used for many applications. Indeed, industrial systems as well as the research community have been improving intensively 2D feature extraction methods since image analysis and Scene understanding has been of interest. The constant progress on 2D feature extraction methods throughout the years makes it almost effortless nowadays due to a large variety of techniques. Moreover, even if 2D data do not allow solving all challenges in video surveillance or other applications, they are still used as starting stages towards scene understanding and image analysis. Chapter 4 will then explore 2D feature extraction via vanishing point detection and segmentation methods. A combination of most common techniques and a novel approach will be then proposed to extract vanishing points from video surveillance environments. Moreover, segmentation techniques will be explored in the aim to determine how they can be used to complement vanishing point detection and lead towards 3D data extraction and analysis. In spite of the contribution above, 2D data is insufficient for all but the simplest applications aimed at obtaining an understanding of a scene, where the aim is for a robust detection of, say, left luggage or abnormal behaviour; without significant a priori information about the scene geometry. Therefore, more information is required in order to be able to design a more automated and intelligent algorithm to obtain richer information from the scene geometry and so a better understanding of what is happening within. This can be overcome by the use of 3D data (in addition to 2D data) allowing opportunity for object “classification” and from this to infer a map of functionality, describing feasible and unfeasible object functionality in a given environment. Chapter 5 presents how 3D data can be beneficial for this task and the various solutions investigated to recover 3D data, as well as some preliminary work towards plane extraction.It is apparent that VPs and planes give useful information about a scene’s perspective and can assist in 3D data recovery within a scene. However, neither VPs nor plane detection techniques alone allow the recovery of more complex generic object shapes - for example composed of spheres, cylinders etc - and any simple model will suffer in the presence of non-Manhattan features, e.g. introduced by the presence of an escalator. For this reason, a novel photometric stereo-based surface normal retrieval methodology is introduced to capture the 3D geometry of the whole scene or part of it. Chapter 6 describes how photometric stereo allows recovery of 3D information in order to obtain a better understanding of a scene, as well as also partially overcoming some current surveillance challenges, such as difficulty in resolving fine detail, particularly at large standoff distances, and in isolating and recognising more complex objects in real scenes. Here items of interest may be obscured by complex environmental factors that are subject to rapid change, making, for example, the detection of suspicious objects and behaviour highly problematic. Here innovative use is made of an untapped latent capability offered within modern surveillance environments to introduce a form of environmental structuring to good advantage in order to achieve a richer form of data acquisition. This chapter also goes on to explore the novel application of photometric stereo in such diverse applications, how our algorithm can be incorporated into an existing surveillance system and considers a typical real commercial application.One of the most important aspects of this research work is its application. Indeed, while most of the research literature has been based on relatively simple structured environments, the approach here has been designed to be applied to real surveillance environments, such as railway stations, airports, waiting rooms, etc, and where surveillance cameras may be fixed or in the future form part of a mobile robotic free roaming surveillance device, that must continually reinterpret its changing environment. So, as mentioned previously, while the main focus has been to apply this algorithm to railway station environments, the work has been approached in a way that allows adaptation to many other applications, such as autonomous robotics, and in motorway, shopping centre, street and home environments. All of these applications require a better understanding of the scene for security or safety purposes. Finally, chapter 7 presents a global conclusion and what will be achieved in the future

    Robotic Cameraman for Augmented Reality based Broadcast and Demonstration

    Get PDF
    In recent years, a number of large enterprises have gradually begun to use vari-ous Augmented Reality technologies to prominently improve the audiences’ view oftheir products. Among them, the creation of an immersive virtual interactive scenethrough the projection has received extensive attention, and this technique refers toprojection SAR, which is short for projection spatial augmented reality. However,as the existing projection-SAR systems have immobility and limited working range,they have a huge difficulty to be accepted and used in human daily life. Therefore,this thesis research has proposed a technically feasible optimization scheme so thatit can be practically applied to AR broadcasting and demonstrations. Based on three main techniques required by state-of-art projection SAR applica-tions, this thesis has created a novel mobile projection SAR cameraman for ARbroadcasting and demonstration. Firstly, by combining the CNN scene parsingmodel and multiple contour extractors, the proposed contour extraction pipelinecan always detect the optimal contour information in non-HD or blurred images.This algorithm reduces the dependency on high quality visual sensors and solves theproblems of low contour extraction accuracy in motion blurred images. Secondly, aplane-based visual mapping algorithm is introduced to solve the difficulties of visualmapping in these low-texture scenarios. Finally, a complete process of designing theprojection SAR cameraman robot is introduced. This part has solved three mainproblems in mobile projection-SAR applications: (i) a new method for marking con-tour on projection model is proposed to replace the model rendering process. Bycombining contour features and geometric features, users can identify objects oncolourless model easily. (ii) a camera initial pose estimation method is developedbased on visual tracking algorithms, which can register the start pose of robot to thewhole scene in Unity3D. (iii) a novel data transmission approach is introduced to establishes a link between external robot and the robot in Unity3D simulation work-space. This makes the robotic cameraman can simulate its trajectory in Unity3D simulation work-space and project correct virtual content. Our proposed mobile projection SAR system has made outstanding contributionsto the academic value and practicality of the existing projection SAR technique. Itfirstly solves the problem of limited working range. When the system is running ina large indoor scene, it can follow the user and project dynamic interactive virtualcontent automatically instead of increasing the number of visual sensors. Then,it creates a more immersive experience for audience since it supports the user hasmore body gestures and richer virtual-real interactive plays. Lastly, a mobile systemdoes not require up-front frameworks and cheaper and has provided the public aninnovative choice for indoor broadcasting and exhibitions

    Automated Semantic Content Extraction from Images

    Get PDF
    In this study, an automatic semantic segmentation and object recognition methodology is implemented which bridges the semantic gap between low level features of image content and high level conceptual meaning. Semantically understanding an image is essential in modeling autonomous robots, targeting customers in marketing or reverse engineering of building information modeling in the construction industry. To achieve an understanding of a room from a single image we proposed a new object recognition framework which has four major components: segmentation, scene detection, conceptual cueing and object recognition. The new segmentation methodology developed in this research extends Felzenswalb\u27s cost function to include new surface index and depth features as well as color, texture and normal features to overcome issues of occlusion and shadowing commonly found in images. Adding depth allows capturing new features for object recognition stage to achieve high accuracy compared to the current state of the art. The goal was to develop an approach to capture and label perceptually important regions which often reflect global representation and understanding of the image. We developed a system by using contextual and common sense information for improving object recognition and scene detection, and fused the information from scene and objects to reduce the level of uncertainty. This study in addition to improving segmentation, scene detection and object recognition, can be used in applications that require physical parsing of the image into objects, surfaces and their relations. The applications include robotics, social networking, intelligence and anti-terrorism efforts, criminal investigations and security, marketing, and building information modeling in the construction industry. In this dissertation a structural framework (ontology) is developed that generates text descriptions based on understanding of objects, structures and the attributes of an image

    3D-vision based detection, localization, and sizing of broccoli heads in the field

    Get PDF
    This paper describes a 3D vision system for robotic harvesting of broccoli using low-cost RGB-D sensors, which was developed and evaluated using sensory data collected under real-world field conditions in both the UK and Spain. The presented method addresses the tasks of detecting mature broccoli heads in the field and providing their 3D locations relative to the vehicle. The paper evaluates different 3D features, machine learning, and temporal filtering methods for detection of broccoli heads. Our experiments show that a combination of Viewpoint Feature Histograms, Support Vector Machine classifier, and a temporal filter to track the detected heads results in a system that detects broccoli heads with high precision. We also show that the temporal filtering can be used to generate a 3D map of the broccoli head positions in the field. Additionally, we present methods for automatically estimating the size of the broccoli heads, to determine when a head is ready for harvest. All of the methods were evaluated using ground-truth data from both the UK and Spain, which we also make available to the research community for subsequent algorithm development and result comparison. Cross-validation of the system trained on the UK dataset on the Spanish dataset, and vice versa, indicated good generalization capabilities of the system, confirming the strong potential of low-cost 3D imaging for commercial broccoli harvesting

    Use of Consumer-grade Depth Cameras in Mobile Robot Navigation

    Get PDF
    Simultaneous Localization And Mapping (SLAM) stands as one of the core techniques used by robots for autonomous navigation. Cameras combining Red-Green-Blue (RGB) color information and depth (D) information are called RGB-D cameras or depth cam- eras. RGB-D cameras can provide rich information for indoor mobile robot navigation. Microsoft’s Kinect device, a representative low cost RGB-D camera product, has attracted tremendous attention from researchers in recent years, for its relatively high quality of depth measurement. By analyzing the multi-data stream of both color and depth, better 3D plane detectors, local shape registration techniques can be designed to improve the quality of mobile robot navigation. In the first part of this work, models of the Kinect’s cameras and projector are es- tablished, which can be applied for calibration and characterization of the Kinect device. Experiments show both variable depth resolution and Kinect’s own optical noises in depth values calculation. Based on Kinect’s models and characterization, this project implements an optimized 3D matching system for SLAM, from processing of RGB-D data to further algorithms design. The developed system includes the following parts: (1) raw data pre- processing and de-noising, improving the quality of integrated environment depth maps. (2) 3D planes surfaces detection and fitting with RANSAC algorithms; also providing ap- plications and illustrative examples about multi-scale-multi-planes detections algorithms which designed for common indoor environment. The proposed approach is validated on scene and object reconstruction. RGB-D features matching under uncertainty and noise in a large scale of data, forms the basis of future application in mobile robot naviga- tion. Experimental results have shown that system performance improvement is valid and feasible

    Detección y modelado de escaleras con sensor RGB-D para asistencia personal

    Get PDF
    La habilidad de avanzar y moverse de manera efectiva por el entorno resulta natural para la mayoría de la gente, pero no resulta fácil de realizar bajo algunas circunstancias, como es el caso de las personas con problemas visuales o cuando nos movemos en entornos especialmente complejos o desconocidos. Lo que pretendemos conseguir a largo plazo es crear un sistema portable de asistencia aumentada para ayudar a quienes se enfrentan a esas circunstancias. Para ello nos podemos ayudar de cámaras, que se integran en el asistente. En este trabajo nos hemos centrado en el módulo de detección, dejando para otros trabajos el resto de módulos, como podría ser la interfaz entre la detección y el usuario. Un sistema de guiado de personas debe mantener al sujeto que lo utiliza apartado de peligros, pero también debería ser capaz de reconocer ciertas características del entorno para interactuar con ellas. En este trabajo resolvemos la detección de uno de los recursos más comunes que una persona puede tener que utilizar a lo largo de su vida diaria: las escaleras. Encontrar escaleras es doblemente beneficioso, puesto que no sólo permite evitar posibles caídas sino que ayuda a indicar al usuario la posibilidad de alcanzar otro piso en el edificio. Para conseguir esto hemos hecho uso de un sensor RGB-D, que irá situado en el pecho del sujeto, y que permite captar de manera simultánea y sincronizada información de color y profundidad de la escena. El algoritmo usa de manera ventajosa la captación de profundidad para encontrar el suelo y así orientar la escena de la manera que aparece ante el usuario. Posteriormente hay un proceso de segmentación y clasificación de la escena de la que obtenemos aquellos segmentos que se corresponden con "suelo", "paredes", "planos horizontales" y una clase residual, de la que todos los miembros son considerados "obstáculos". A continuación, el algoritmo de detección de escaleras determina si los planos horizontales son escalones que forman una escalera y los ordena jerárquicamente. En el caso de que se haya encontrado una escalera, el algoritmo de modelado nos proporciona toda la información de utilidad para el usuario: cómo esta posicionada con respecto a él, cuántos escalones se ven y cuáles son sus medidas aproximadas. En definitiva, lo que se presenta en este trabajo es un nuevo algoritmo de ayuda a la navegación humana en entornos de interior cuya mayor contribución es un algoritmo de detección y modelado de escaleras que determina toda la información de mayor relevancia para el sujeto. Se han realizado experimentos con grabaciones de vídeo en distintos entornos, consiguiendo buenos resultados tanto en precisión como en tiempo de respuesta. Además se ha realizado una comparación de nuestros resultados con los extraídos de otras publicaciones, demostrando que no sólo se consigue una eciencia que iguala al estado de la materia sino que también se aportan una serie de mejoras. Especialmente, nuestro algoritmo es el primero capaz de obtener las dimensiones de las escaleras incluso con obstáculos obstruyendo parcialmente la vista, como puede ser gente subiendo o bajando. Como resultado de este trabajo se ha elaborado una publicación aceptada en el Second Workshop on Assitive Computer Vision and Robotics del ECCV, cuya presentación tiene lugar el 12 de Septiembre de 2014 en Zúrich, Suiza
    corecore