14 research outputs found

    3D Reconstruction with Uncalibrated Cameras Using the Six-Line Conic Variety

    Full text link
    We present new algorithms for the recovery of the Euclidean structure from a projective calibration of a set of cameras with square pixels but otherwise arbitrarily varying intrinsic and extrinsic parameters. Our results, based on a novel geometric approach, include a closed-form solution for the case of three cameras and two known vanishing points and an efficient one-dimensional search algorithm for the case of four cameras and one known vanishing point. In addition, an algorithm for a reliable automatic detection of vanishing points on the images is presented. These techniques fit in a 3D reconstruction scheme oriented to urban scenes reconstruction. The satisfactory performance of the techniques is demonstrated with tests on synthetic and real data

    A Mixture of Manhattan Frames: Beyond the Manhattan World

    Get PDF
    Objects and structures within man-made environments typically exhibit a high degree of organization in the form of orthogonal and parallel planes. Traditional approaches to scene representation exploit this phenomenon via the somewhat restrictive assumption that every plane is perpendicular to one of the axes of a single coordinate system. Known as the Manhattan-World model, this assumption is widely used in computer vision and robotics. The complexity of many real-world scenes, however, necessitates a more flexible model. We propose a novel probabilistic model that describes the world as a mixture of Manhattan frames: each frame defines a different orthogonal coordinate system. This results in a more expressive model that still exploits the orthogonality constraints. We propose an adaptive Markov-Chain Monte-Carlo sampling algorithm with Metropolis-Hastings split/merge moves that utilizes the geometry of the unit sphere. We demonstrate the versatility of our Mixture-of-Manhattan-Frames model by describing complex scenes using depth images of indoor scenes as well as aerial-LiDAR measurements of an urban center. Additionally, we show that the model lends itself to focal-length calibration of depth cameras and to plane segmentation.United States. Office of Naval Research. Multidisciplinary University Research Initiative (Award N00014-11-1-0688)United States. Defense Advanced Research Projects Agency (Award FA8650-11-1-7154)Technion, Israel Institute of Technology (MIT Postdoctoral Fellowship Program

    Merging static and dynamic visual media along an event timeline

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1998.Includes bibliographical references (p. 63-65).Kyratso Karahalios.M.S

    Visual Human Tracking and Group Activity Analysis: A Video Mining System for Retail Marketing

    Get PDF
    Thesis (PhD) - Indiana University, Computer Sciences, 2007In this thesis we present a system for automatic human tracking and activity recognition from video sequences. The problem of automated analysis of visual information in order to derive descriptors of high level human activities has intrigued computer vision community for decades and is considered to be largely unsolved. A part of this interest is derived from the vast range of applications in which such a solution may be useful. We attempt to find efficient formulations of these tasks as applied to the extracting customer behavior information in a retail marketing context. Based on these formulations, we present a system that visually tracks customers in a retail store and performs a number of activity analysis tasks based on the output from the tracker. In tracking we introduce new techniques for pedestrian detection, initialization of the body model and a formulation of the temporal tracking as a global trans-dimensional optimization problem. Initial human detection is addressed by a novel method for head detection, which incorporates the knowledge of the camera projection model.The initialization of the human body model is addressed by newly developed shape and appearance descriptors. Temporal tracking of customer trajectories is performed by employing a human body tracking system designed as a Bayesian jump-diffusion filter. This approach demonstrates the ability to overcome model dimensionality ambiguities as people are leaving and entering the scene. Following the tracking, we developed a two-stage group activity formulation based upon the ideas from swarming research. For modeling purposes, all moving actors in the scene are viewed here as simplistic agents in the swarm. This allows to effectively define a set of inter-agent interactions, which combine to derive a distance metric used in further swarm clustering. This way, in the first stage the shoppers that belong to the same group are identified by deterministically clustering bodies to detect short term events and in the second stage events are post-processed to form clusters of group activities with fuzzy memberships. Quantitative analysis of the tracking subsystem shows an improvement over the state of the art methods, if used under similar conditions. Finally, based on the output from the tracker, the activity recognition procedure achieves over 80% correct shopper group detection, as validated by the human generated ground truth results

    Exploitation d'indices visuels liés au mouvement pour l'interprétation du contenu des séquences vidéos

    Get PDF
    L'interprétation du contenu des séquences vidéo est un des principaux domaines de recherche en vision artificielle. Dans le but d'enrichir l'information provenant des indices visuels qui sont propres à une seule image, on peut se servir d'indices découlant du mouvement entre les images. Ce mouvement peut être causé par un changement d'orientation ou de position du système d'acquisition, par un déplacement des objets dans la scène, et par bien d'autres facteurs. Je me suis intéressé à deux phénomènes découlant du mouvement dans les séquences vidéo. Premièrement, le mouvement causé par la caméra, et comment il est possible de l'interpréter par une combinaison du mouvement apparent entre les images, et du déplacement de points de fuite dans ces images. Puis, je me suis intéressé à la détection et la classification du phénomène d'occultation, qui est causé par le mouvement dans une scène complexe, grâce à un modèle géométrique dans le volume spatio-temporel. Ces deux travaux sont présentés par le biais de deux articles soumis pour publication dans des revues scientifiques

    Optoelectronic and photogrammetric measuring systems

    Get PDF
    Disertace se zabývá analýzou a návrhem optoelektronických a fotogrammetrických měřících systémů. Obsahuje konkrétní návrhy optoelektronických bezdotykových měřičů ploch rovinných objektů, případně plošné projekce 3D objektů včetně analýzy dosažitelné přesnosti měření. V další části věnované stereofotogrammetrii se zabývá principy rekonstrukce prostorových souřadnic snímaných objektů, metodami automatické kalibrace kamer, postupy v ztotožňování bodů na snímcích spolu s analýzou dosažitelné přesnosti v určování sledovaných parametrů. Součástí práce je vyvinutý testovací program v němž jsou uvedené postupy implementovány a který umožňuje praktickou aplikaci stereofotogrammetrického systému pro pořizování prostorových souřadnic trojrozměrných objektů.Dissertation deals with analysis and design of optoelectronic and photogrammetric measuring systems. Specific design of optoelectronic contactless flat object area meters with analysis of attainable measurement accuracy is described. Next part is dedicated to stereophotogrammetry - principles of 3D reconstructions, methods of camera self-calibration and matching points in images are described. The analysis of attainable accuracy of monitored parameters is discussed too. Finally, the test program with implemented described routines is introduced. This test program enables practical aplication of stereophotogrammetric system for taking spatial coordinates of 3D objects.

    An investigation into common challenges of 3D scene understanding in visual surveillance

    Get PDF
    Nowadays, video surveillance systems are ubiquitous. Most installations simply consist of CCTV cameras connected to a central control room and rely on human operators to interpret what they see on the screen in order to, for example, detect a crime (either during or after an event). Some modern computer vision systems aim to automate the process, at least to some degree, and various algorithms have been somewhat successful in certain limited areas. However, such systems remain inefficient in general circumstances and present real challenges yet to be solved. These challenges include the ability to recognise and ultimately predict and prevent abnormal behaviour or even reliably recognise objects, for example in order to detect left luggage or suspicious objects. This thesis first aims to study the state-of-the-art and identify the major challenges and possible requirements of future automated and semi-automated CCTV technology in the field. This thesis presents the application of a suite of 2D and highly novel 3D methodologies that go some way to overcome current limitations.The methods presented here are based on the analysis of object features directly extracted from the geometry of the scene and start with a consideration of mainly existing techniques, such as the use of lines, vanishing points (VPs) and planes, applied to real scenes. Then, an investigation is presented into the use of richer 2.5D/3D surface normal data. In all cases the aim is to combine both 2D and 3D data to obtain a better understanding of the scene, aimed ultimately at capturing what is happening within the scene in order to be able to move towards automated scene analysis. Although this thesis focuses on the widespread application of video surveillance, an example case of the railway station environment is used to represent typical real-world challenges, where the principles can be readily extended elsewhere, such as to airports, motorways, the households, shopping malls etc. The context of this research work, together with an overall presentation of existing methods used in video surveillance and their challenges are described in chapter 1.Common computer vision techniques such as VP detection, camera calibration, 3D reconstruction, segmentation etc., can be applied in an effort to extract meaning to video surveillance applications. According to the literature, these methods have been well researched and their use will be assessed in the context of current surveillance requirements in chapter 2. While existing techniques can perform well in some contexts, such as an architectural environment composed of simple geometrical elements, their robustness and performance in feature extraction and object recognition tasks is not sufficient to solve the key challenges encountered in general video surveillance context. This is largely due to issues such as variable lighting, weather conditions, and shadows and in general complexity of the real-world environment. Chapter 3 presents the research and contribution on those topics – methods to extract optimal features for a specific CCTV application – as well as their strengths and weaknesses to highlight that the proposed algorithm obtains better results than most due to its specific design.The comparison of current surveillance systems and methods from the literature has shown that 2D data are however almost constantly used for many applications. Indeed, industrial systems as well as the research community have been improving intensively 2D feature extraction methods since image analysis and Scene understanding has been of interest. The constant progress on 2D feature extraction methods throughout the years makes it almost effortless nowadays due to a large variety of techniques. Moreover, even if 2D data do not allow solving all challenges in video surveillance or other applications, they are still used as starting stages towards scene understanding and image analysis. Chapter 4 will then explore 2D feature extraction via vanishing point detection and segmentation methods. A combination of most common techniques and a novel approach will be then proposed to extract vanishing points from video surveillance environments. Moreover, segmentation techniques will be explored in the aim to determine how they can be used to complement vanishing point detection and lead towards 3D data extraction and analysis. In spite of the contribution above, 2D data is insufficient for all but the simplest applications aimed at obtaining an understanding of a scene, where the aim is for a robust detection of, say, left luggage or abnormal behaviour; without significant a priori information about the scene geometry. Therefore, more information is required in order to be able to design a more automated and intelligent algorithm to obtain richer information from the scene geometry and so a better understanding of what is happening within. This can be overcome by the use of 3D data (in addition to 2D data) allowing opportunity for object “classification” and from this to infer a map of functionality, describing feasible and unfeasible object functionality in a given environment. Chapter 5 presents how 3D data can be beneficial for this task and the various solutions investigated to recover 3D data, as well as some preliminary work towards plane extraction.It is apparent that VPs and planes give useful information about a scene’s perspective and can assist in 3D data recovery within a scene. However, neither VPs nor plane detection techniques alone allow the recovery of more complex generic object shapes - for example composed of spheres, cylinders etc - and any simple model will suffer in the presence of non-Manhattan features, e.g. introduced by the presence of an escalator. For this reason, a novel photometric stereo-based surface normal retrieval methodology is introduced to capture the 3D geometry of the whole scene or part of it. Chapter 6 describes how photometric stereo allows recovery of 3D information in order to obtain a better understanding of a scene, as well as also partially overcoming some current surveillance challenges, such as difficulty in resolving fine detail, particularly at large standoff distances, and in isolating and recognising more complex objects in real scenes. Here items of interest may be obscured by complex environmental factors that are subject to rapid change, making, for example, the detection of suspicious objects and behaviour highly problematic. Here innovative use is made of an untapped latent capability offered within modern surveillance environments to introduce a form of environmental structuring to good advantage in order to achieve a richer form of data acquisition. This chapter also goes on to explore the novel application of photometric stereo in such diverse applications, how our algorithm can be incorporated into an existing surveillance system and considers a typical real commercial application.One of the most important aspects of this research work is its application. Indeed, while most of the research literature has been based on relatively simple structured environments, the approach here has been designed to be applied to real surveillance environments, such as railway stations, airports, waiting rooms, etc, and where surveillance cameras may be fixed or in the future form part of a mobile robotic free roaming surveillance device, that must continually reinterpret its changing environment. So, as mentioned previously, while the main focus has been to apply this algorithm to railway station environments, the work has been approached in a way that allows adaptation to many other applications, such as autonomous robotics, and in motorway, shopping centre, street and home environments. All of these applications require a better understanding of the scene for security or safety purposes. Finally, chapter 7 presents a global conclusion and what will be achieved in the future

    Determinación automática de puntos de fuga en imágenes monoscópicas: aplicación al patrimonio histórico

    Get PDF
    Imagen monoscópica es una proyección de una escena tridimensional en un plano, que se realiza a través de un punto de vista y puede expresarse como una perspectiva. Sin embargo, aunque la tercera dimensión, aparentemente se pierde durante esta proyección, puede recuperarse si la información almacenada en la imagen se trata de la forma adecuada empleando técnicas fotogramétricas o de análisis dimensional perspectivo. Históricamente, la perspectiva ha sido estudiada por artistas buscando representar mejor las escenas tridimensionales en materiales planos (cuadros o láminas, entre otros), pero en la actualidad, el estudio se centra en la representación tridimensional de objetos a partir de su imagen bidimensional. El efecto perspectivo más conocido es que en el espacio las líneas paralelas se cortan en un único punto común llamado punto de fuga. Su conocimiento condiciona ciertos elementos de una imagen, permitiendo así realizar un análisis cualitativo y/o cuantitativo de la misma. Cualitativamente, los puntos de fuga pueden utilizarse para agrupar líneas comunes en imágenes contiguas con el objetivo de fusionarlas, y cuantitativamente, se emplean para la autocalibración de la cámara, el análisis dimensional del objeto o la reconstrucción tridimensional. En esta tesis doctoral se desarrolla una metodología para la detección automática de los puntos de fuga en imágenes monoscópicas. Para ello, se cubren las siguientes fases: 1. Detección de los bordes de la imagen utilizando operadores basados la primera derivada de la función imagen, tanto direccionales como no direccionales. 2. Extracción de las rectas definidas por los agrupamientos de píxeles que definen los bordes de la imagen, utilizando la dirección del gradiente en el píxel de borde. 3. Determinación del tipo de perspectiva que define la imagen fotográfica. Para ello se clasifican las rectas extraídas utilizando la ecuación general de la recta en cinco particiones: horizontal, vertical, centro, derecha e izquierda. 4. Cálculo de la posición de los puntos de fuga según el tipo de perspectiva seleccionado: central, casi central o de dos puntos de fuga. Una vez completadas todas estas fases, se extraen conclusiones metodológicas a partir de unas imágenes tipo que sirven de campo de pruebas, así como de imágenes relativas a ejemplos del patrimonio histórico.A monoscopic image is a projection in a plane of a three-dimensional scene, which is performed through a viewpoint and can be expressed as a perspective. However, while the third dimension is apparently lost during this projection, it can be retrieved if the stored information is treated appropriately using photogrammetric techniques or dimensional perspective analysis. Historically, perspective has been studied by artists seeking to represent more faithfully threedimensional scenes on flat materials (paintings or prints, among others), but nowadays studies are focuses on the three-dimensional representation of objects from a bi-dimensional image. The most well-known perspective effect is that in the space parallel lines converge at a common point known as a vanishing point. Knowledge of these points gives information about elements of an image, which means that a qualitative and/or quantitative analysis of the image can be carried out. Qualitatively, vanishing points can be used to group common lines in adjacent images which need to be merging, and quantitatively, vanishing points are used for automatic calibration or a camera, dimensional analysis of an object or three-dimensional reconstruction. This thesis develops a methodology for the automatic vanishing points detection in monoscopic images. To this end, it covers the following stages: 1. Edges detection based in first-derivative operators, both directional and non-directional. 2. Extracting the straight-lines defined by groupings pixels that define the edges of the image, using the gradient direction at the edge pixel. 3. Determining the type of perspective that defines the photographic image. To do this, the straight-lines are classified using the general equation in five partitions: horizontal, vertical, center, right and left. 4. Calculation of the position of the vanishing points depending on the type of perspective selected: central, nearly central or two vanishing points. Once all these stages are completed, methodological conclusions from test and historical heritage images are draw
    corecore