339 research outputs found

    Automatic visual detection of human behavior: a review from 2000 to 2014

    Get PDF
    Due to advances in information technology (e.g., digital video cameras, ubiquitous sensors), the automatic detection of human behaviors from video is a very recent research topic. In this paper, we perform a systematic and recent literature review on this topic, from 2000 to 2014, covering a selection of 193 papers that were searched from six major scientific publishers. The selected papers were classified into three main subjects: detection techniques, datasets and applications. The detection techniques were divided into four categories (initialization, tracking, pose estimation and recognition). The list of datasets includes eight examples (e.g., Hollywood action). Finally, several application areas were identified, including human detection, abnormal activity detection, action recognition, player modeling and pedestrian detection. Our analysis provides a road map to guide future research for designing automatic visual human behavior detection systems.This work is funded by the Portuguese Foundation for Science and Technology (FCT - Fundacao para a Ciencia e a Tecnologia) under research Grant SFRH/BD/84939/2012

    An investigation into common challenges of 3D scene understanding in visual surveillance

    Get PDF
    Nowadays, video surveillance systems are ubiquitous. Most installations simply consist of CCTV cameras connected to a central control room and rely on human operators to interpret what they see on the screen in order to, for example, detect a crime (either during or after an event). Some modern computer vision systems aim to automate the process, at least to some degree, and various algorithms have been somewhat successful in certain limited areas. However, such systems remain inefficient in general circumstances and present real challenges yet to be solved. These challenges include the ability to recognise and ultimately predict and prevent abnormal behaviour or even reliably recognise objects, for example in order to detect left luggage or suspicious objects. This thesis first aims to study the state-of-the-art and identify the major challenges and possible requirements of future automated and semi-automated CCTV technology in the field. This thesis presents the application of a suite of 2D and highly novel 3D methodologies that go some way to overcome current limitations.The methods presented here are based on the analysis of object features directly extracted from the geometry of the scene and start with a consideration of mainly existing techniques, such as the use of lines, vanishing points (VPs) and planes, applied to real scenes. Then, an investigation is presented into the use of richer 2.5D/3D surface normal data. In all cases the aim is to combine both 2D and 3D data to obtain a better understanding of the scene, aimed ultimately at capturing what is happening within the scene in order to be able to move towards automated scene analysis. Although this thesis focuses on the widespread application of video surveillance, an example case of the railway station environment is used to represent typical real-world challenges, where the principles can be readily extended elsewhere, such as to airports, motorways, the households, shopping malls etc. The context of this research work, together with an overall presentation of existing methods used in video surveillance and their challenges are described in chapter 1.Common computer vision techniques such as VP detection, camera calibration, 3D reconstruction, segmentation etc., can be applied in an effort to extract meaning to video surveillance applications. According to the literature, these methods have been well researched and their use will be assessed in the context of current surveillance requirements in chapter 2. While existing techniques can perform well in some contexts, such as an architectural environment composed of simple geometrical elements, their robustness and performance in feature extraction and object recognition tasks is not sufficient to solve the key challenges encountered in general video surveillance context. This is largely due to issues such as variable lighting, weather conditions, and shadows and in general complexity of the real-world environment. Chapter 3 presents the research and contribution on those topics – methods to extract optimal features for a specific CCTV application – as well as their strengths and weaknesses to highlight that the proposed algorithm obtains better results than most due to its specific design.The comparison of current surveillance systems and methods from the literature has shown that 2D data are however almost constantly used for many applications. Indeed, industrial systems as well as the research community have been improving intensively 2D feature extraction methods since image analysis and Scene understanding has been of interest. The constant progress on 2D feature extraction methods throughout the years makes it almost effortless nowadays due to a large variety of techniques. Moreover, even if 2D data do not allow solving all challenges in video surveillance or other applications, they are still used as starting stages towards scene understanding and image analysis. Chapter 4 will then explore 2D feature extraction via vanishing point detection and segmentation methods. A combination of most common techniques and a novel approach will be then proposed to extract vanishing points from video surveillance environments. Moreover, segmentation techniques will be explored in the aim to determine how they can be used to complement vanishing point detection and lead towards 3D data extraction and analysis. In spite of the contribution above, 2D data is insufficient for all but the simplest applications aimed at obtaining an understanding of a scene, where the aim is for a robust detection of, say, left luggage or abnormal behaviour; without significant a priori information about the scene geometry. Therefore, more information is required in order to be able to design a more automated and intelligent algorithm to obtain richer information from the scene geometry and so a better understanding of what is happening within. This can be overcome by the use of 3D data (in addition to 2D data) allowing opportunity for object “classification” and from this to infer a map of functionality, describing feasible and unfeasible object functionality in a given environment. Chapter 5 presents how 3D data can be beneficial for this task and the various solutions investigated to recover 3D data, as well as some preliminary work towards plane extraction.It is apparent that VPs and planes give useful information about a scene’s perspective and can assist in 3D data recovery within a scene. However, neither VPs nor plane detection techniques alone allow the recovery of more complex generic object shapes - for example composed of spheres, cylinders etc - and any simple model will suffer in the presence of non-Manhattan features, e.g. introduced by the presence of an escalator. For this reason, a novel photometric stereo-based surface normal retrieval methodology is introduced to capture the 3D geometry of the whole scene or part of it. Chapter 6 describes how photometric stereo allows recovery of 3D information in order to obtain a better understanding of a scene, as well as also partially overcoming some current surveillance challenges, such as difficulty in resolving fine detail, particularly at large standoff distances, and in isolating and recognising more complex objects in real scenes. Here items of interest may be obscured by complex environmental factors that are subject to rapid change, making, for example, the detection of suspicious objects and behaviour highly problematic. Here innovative use is made of an untapped latent capability offered within modern surveillance environments to introduce a form of environmental structuring to good advantage in order to achieve a richer form of data acquisition. This chapter also goes on to explore the novel application of photometric stereo in such diverse applications, how our algorithm can be incorporated into an existing surveillance system and considers a typical real commercial application.One of the most important aspects of this research work is its application. Indeed, while most of the research literature has been based on relatively simple structured environments, the approach here has been designed to be applied to real surveillance environments, such as railway stations, airports, waiting rooms, etc, and where surveillance cameras may be fixed or in the future form part of a mobile robotic free roaming surveillance device, that must continually reinterpret its changing environment. So, as mentioned previously, while the main focus has been to apply this algorithm to railway station environments, the work has been approached in a way that allows adaptation to many other applications, such as autonomous robotics, and in motorway, shopping centre, street and home environments. All of these applications require a better understanding of the scene for security or safety purposes. Finally, chapter 7 presents a global conclusion and what will be achieved in the future

    Autonomous 3D mapping and surveillance of mines with MAVs

    Get PDF
    A dissertation Submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, for the degree of Master of Science. 12 July 2017.The mapping of mines, both operational and abandoned, is a long, di cult and occasionally dangerous task especially in the latter case. Recent developments in active and passive consumer grade sensors, as well as quadcopter drones present the opportunity to automate these challenging tasks providing cost and safety bene ts. The goal of this research is to develop an autonomous vision-based mapping system that employs quadrotor drones to explore and map sections of mine tunnels. The system is equipped with inexpensive, structured light, depth cameras in place of traditional laser scanners, making the quadrotor setup more viable to produce in bulk. A modi ed version of Microsoft's Kinect Fusion algorithm is used to construct 3D point clouds in real-time as the agents traverse the scene. Finally, the generated and merged point clouds from the system are compared with those produced by current Lidar scanners.LG201

    New media and impressionism

    Get PDF
    This master’s thesis is framed in the areas of New Media Art (NMA) and Human Computer Interaction (HCI). In particular, it is focused in the study of New Media Art pieces that share a set of characteristics (the most important one being that they are composed by atomic elements), might be explicitly interactive, and are usually exhibited in public settings or have been designed to be consumed by a large simultaneous audience. The content of the thesis can be divided in four big items: 1- The review of a certain set of NMA pieces, their characteristics, and some similarities hold between them and the impressionist movement that emerged at the second half of the 19th century, along with some visual perception principles of Gestalt psychology. 2- A selection and an adaptation of pre-existing theoretical frameworks for modelling interaction in public settings. These theoretical frameworks comprise a set of tools for describing, analysing, and designing New Media Art pieces. 3- The presentation of a set of selected artworks authored or coauthored by the author of this thesis. A description of their characteristics and technology will be presented. 4- The introduction of two tools for artistic production, which were instrumental for the construction of some of the artworks here presented: Sendero (an LED lighting system), and N.IMP (a tool for real time visual content generation)

    Design and implementation of a visual sensor - based depth estimation module in a system-on-chip architecture

    Get PDF
    En este trabajo de grado, se diseña e implements un sistema de estimación de profundidad basado en sensores visuales (cámara estereoscópica) sobre una arquitectura computacional SoC (system-on-chip), El método hace uso de una técnica conocida como SGBM (semi-global block matching) para obtener la disparidad de la escena. Se implementa en un SoC con miras a montarse sobre un vehículo aéreo no tripulado. Se requiere de un proceso inicial de calibración y rectificación para asegurar que todas las imágenes capturadas están alineadas entre ambas cámaras.In this document, an image-based depth estimation module is proposed. It uses a visual sensor, which in this case, is a stereoscopic camera and a semi-global block matching (SGBM) technique to compute the disparity in the scene. After selecting and implementing the previously mentioned method, it is then adapted to a SoC (system-on-chip) architecture since it is meant to be embedded into a drone. An initial calibration and rectification procedure is necessary to assure that all captured images are consistently aligned between both left and right cameras.Magíster en Ingeniería ElectrónicaMaestrí

    A framework for cardio-pulmonary resuscitation (CPR) scene retrieval from medical simulation videos based on object and activity detection.

    Get PDF
    In this thesis, we propose a framework to detect and retrieve CPR activity scenes from medical simulation videos. Medical simulation is a modern training method for medical students, where an emergency patient condition is simulated on human-like mannequins and the students act upon. These simulation sessions are recorded by the physician, for later debriefing. With the increasing number of simulation videos, automatic detection and retrieval of specific scenes became necessary. The proposed framework for CPR scene retrieval, would eliminate the conventional approach of using shot detection and frame segmentation techniques. Firstly, our work explores the application of Histogram of Oriented Gradients in three dimensions (HOG3D) to retrieve the scenes containing CPR activity. Secondly, we investigate the use of Local Binary Patterns in Three Orthogonal Planes (LBPTOP), which is the three dimensional extension of the popular Local Binary Patterns. This technique is a robust feature that can detect specific activities from scenes containing multiple actors and activities. Thirdly, we propose an improvement to the above mentioned methods by a combination of HOG3D and LBP-TOP. We use decision level fusion techniques to combine the features. We prove experimentally that the proposed techniques and their combination out-perform the existing system for CPR scene retrieval. Finally, we devise a method to detect and retrieve the scenes containing the breathing bag activity, from the medical simulation videos. The proposed framework is tested and validated using eight medical simulation videos and the results are presented

    Ambiente de realidade virtual para visitas imersivas e interação

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaComo solução para visitas virtuais imersivas a museus, propomos uma extensão à plataforma previamente desenvolvida para efectuar a configuração de ambientes virtuais imersivos (pSIVE), mantendo todas as funcionalidades de criação de ambientes virtuais e de associação de conteúdos (PDF, videos, texto), mas também permitindo interações baseadas em gestos e navegação. Para isso, propomos navegação um para um usando rastreamento do esqueleto com uma Kinect que é calibrada no espaço do mundo real em que o utilizador se situa, assim como métodos de interação por gestos. Para validar os métodos propostos de navegação e interação, foi efetuado um estudo comparativo entre a interação e navegação à base de gestos e em botões. Com os resultados desse estudo em mente, desenvolvemos novos métodos de interação com seleção via direção do olhar. A aplicação desenvolvida foi testada num cenário real, como parte de uma instalação artística no museu da cidade de Aveiro, onde os visitantes podiam navegar uma sala virtual do museu e manipular objetos de maneira a criar a sua própria exposição.Como solução para visitas virtuais imersivas a museus, propomos uma extensão à plataforma previamente desenvolvida para efectuar a configuração de ambientes virtuais imersivos (pSIVE), mantendo todas as funcionalidades de criação de ambientes virtuais e de associação de conteúdos (PDF, videos, texto), mas também permitindo interações baseadas em gestos e navegação. Para isso, propomos navegação um para um usando rastreamento do esqueleto com uma Kinect que é calibrada no espaço do mundo real em que o utilizador se situa, assim como métodos de interação por gestos. Para validar os métodos propostos de navegação e interação, foi efetuado um estudo comparativo entre a interação e navegação à base de gestos e em botões. Com os resultados desse estudo em mente, desenvolvemos novos métodos de interação com seleção via direção do olhar. A aplicação desenvolvida foi testada num cenário real, como parte de uma instalação artística no museu da cidade de Aveiro, onde os visitantes podiam navegar uma sala virtual do museu e manipular objetos de maneira a criar a sua própria exposição

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance
    corecore