11 research outputs found

    Multi-View Reconstruction in-between Known Environments

    Get PDF
    Abstract—We present a novel multi-view 3D reconstruction algorithm which unifies the advantages of several recent reconstruction approaches. Based on a known environment causing occlusions and on the cameras ' pixel grid discretization, an irregular partitioning of the reconstruction space is chosen. Reconstruction artifacts are rejected by using plausibility checks based on additional information about the objects to be reconstructed. The binary occupancy decision is solely performed in reconstruction space instead of fusing back-projected silhouettes in image space. Hierarchical data structures are used to reconstruct the objects progressively focusing on boundary regions. Thus, the algorithm can be stopped at any time with a certain conservative level of detail. Most parts of the algorithm may be processed in parallel using GPU programming techniques. The main application domain is the surveillance of real environments like in human/robot coexistence and cooperation scenarios

    3D object reconstruction using computer vision : reconstruction and characterization applications for external human anatomical structures

    Get PDF
    Tese de doutoramento. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 201

    Stochastic optimization and interactive machine learning for human motion analysis

    Get PDF
    The analysis of human motion from visual data is a central issue in the computer vision research community as it enables a wide range of applications and it still remains a challenging problem when dealing with unconstrained scenarios and general conditions. Human motion analysis is used in the entertainment industry for movies or videogame production, in medical applications for rehabilitation or biomechanical studies. It is also used for human computer interaction in any kind of environment, and moreover, it is used for big data analysis from social networks such as Youtube or Flickr, to mention some of its use cases. In this thesis we have studied human motion analysis techniques with a focus on its application for smart room environments. That is, we have studied methods that will support the analysis of people behavior in the room, allowing interaction with computers in a natural manner and in general, methods that introduce computers in human activity environments to enable new kind of services but in an unobstrusive mode. The thesis is structured in two parts, where we study the problem of 3D pose estimation from multiple views and the recognition of gestures using range sensors. First, we propose a generic framework for hierarchically layered particle filtering (HPF) specially suited for motion capture tasks. Human motion capture problem generally involve tracking or optimization of high-dimensional state vectors where also one have to deal with multi-modal pdfs. HPF allow to overcome the problem by means of multiple passes through substate space variables. Then, based on the HPF framework, we propose a method to estimate the anthropometry of the subject, which at the end allows to obtain a human body model adjusted to the subject. Moreover, we introduce a new weighting function strategy for approximate partitioning of observations and a method that employs body part detections to improve particle propagation and weight evaluation, both integrated within the HPF framework. The second part of this thesis is centered in the detection of gestures, and we have focused the problem of reducing annotation and training efforts required to train a specific gesture. In order to reduce the efforts required to train a gesture detector, we propose a solution based on online random forests that allows training in real-time, while receiving new data in sequence. The main aspect that makes the solution effective is the method we propose to collect the hard negatives examples while training the forests. The method uses the detector trained up to the current frame to test on that frame, and then collects samples based on the response of the detector such that they will be more relevant for training. In this manner, training is more effective in terms of the number of annotated frames required.L'anàlisi del moviment humà a partir de dades visuals és un tema central en la recerca en visió per computador, per una banda perquè habilita un ampli espectre d'aplicacions i per altra perquè encara és un problema no resolt quan és aplicat en escenaris no controlats. L'analisi del moviment humà s'utilitza a l'indústria de l'entreteniment per la producció de pel·lícules i videojocs, en aplicacions mèdiques per rehabilitació o per estudis bio-mecànics. També s'utilitza en el camp de la interacció amb computadors o també per l'analisi de grans volums de dades de xarxes socials com Youtube o Flickr, per mencionar alguns exemples. En aquesta tesi s'han estudiat tècniques per l'anàlisi de moviment humà enfocant la seva aplicació en entorns de sales intel·ligents. És a dir, s'ha enfocat a mètodes que puguin permetre l'anàlisi del comportament de les persones a la sala, que permetin la interacció amb els dispositius d'una manera natural i, en general, mètodes que incorporin les computadores en espais on hi ha activitat de persones, per habilitar nous serveis de manera que no interfereixin en la activitat. A la primera part, es proposa un marc genèric per l'ús de filtres de partícules jeràrquics (HPF) especialment adequat per tasques de captura de moviment humà. La captura de moviment humà generalment implica seguiment i optimització de vectors d'estat de molt alta dimensió on a la vegada també s'han de tractar pdf's multi-modals. Els HPF permeten tractar aquest problema mitjançant multiples passades en subdivisions del vector d'estat. Basant-nos en el marc dels HPF, es proposa un mètode per estimar l'antropometria del subjecte, que a la vegada permet obtenir un model acurat del subjecte. També proposem dos nous mètodes per la captura de moviment humà. Per una banda, el APO es basa en una nova estratègia per les funcions de cost basada en la partició de les observacions. Per altra, el DD-HPF utilitza deteccions de parts del cos per millorar la propagació de partícules i l'avaluació de pesos. Ambdós mètodes són integrats dins el marc dels HPF. La segona part de la tesi es centra en la detecció de gestos, i s'ha enfocat en el problema de reduir els esforços d'anotació i entrenament requerits per entrenar un detector per un gest concret. Per tal de reduir els esforços requerits per entrenar un detector de gestos, proposem una solució basada en online random forests que permet l'entrenament en temps real, mentre es reben noves dades sequencialment. El principal aspecte que fa la solució efectiva és el mètode que proposem per obtenir mostres negatives rellevants, mentre s'entrenen els arbres de decisió. El mètode utilitza el detector entrenat fins al moment per recollir mostres basades en la resposta del detector, de manera que siguin més rellevants per l'entrenament. D'aquesta manera l'entrenament és més efectiu pel que fa al nombre de mostres anotades que es requereixen

    The robot's vista space : a computational 3D scene analysis

    Get PDF
    Swadzba A. The robot's vista space : a computational 3D scene analysis. Bielefeld (Germany): Bielefeld University; 2011.The space that can be explored quickly from a fixed view point without locomotion is known as the vista space. In indoor environments single rooms and room parts follow this definition. The vista space plays an important role in situations with agent-agent interaction as it is the directly surrounding environment in which the interaction takes place. A collaborative interaction of the partners in and with the environment requires that both partners know where they are, what spatial structures they are talking about, and what scene elements they are going to manipulate. This thesis focuses on the analysis of a robot's vista space. Mechanisms for extracting relevant spatial information are developed which enable the robot to recognize in which place it is, to detect the scene elements the human partner is talking about, and to segment scene structures the human is changing. These abilities are addressed by the proposed holistic, aligned, and articulated modeling approach. For a smooth human-robot interaction, the computed models should be aligned to the partner's representations. Therefore, the design of the computational models is based on the combination of psychological results from studies on human scene perception with basic physical properties of the perceived scene and the perception itself. The holistic modeling realizes a categorization of room percepts based on the observed 3D spatial layout. Room layouts have room type specific features and fMRI studies have shown that some of the human brain areas being active in scene recognition are sensitive to the 3D geometry of a room. With the aligned modeling, the robot is able to extract the hierarchical scene representation underlying a scene description given by a human tutor. Furthermore, it is able to ground the inferred scene elements in its own visual perception of the scene. This modeling follows the assumption that cognition and language schematize the world in the same way. This is visible in the fact that a scene depiction mainly consists of relations between an object and its supporting structure or between objects located on the same supporting structure. Last, the articulated modeling equips the robot with a methodology for articulated scene part extraction and fast background learning under short and disturbed observation conditions typical for human-robot interaction scenarios. Articulated scene parts are detected model-less by observing scene changes caused by their manipulation. Change detection and background learning are closely coupled because change is defined phenomenologically as variation of structure. This means that change detection involves a comparison of currently visible structures with a representation in memory. In range sensing this comparison can be nicely implement as subtraction of these two representations. The three modeling approaches enable the robot to enrich its visual perceptions of the surrounding environment, the vista space, with semantic information about meaningful spatial structures useful for further interaction with the environment and the human partner

    Multi-view dynamic scene modeling

    Get PDF
    Modeling dynamic scenes/events from multiple fixed-location vision sensors, such as video camcorders, infrared cameras, Time-of-Flight sensors etc, is of broad interest in computer vision society, with many applications including 3D TV, virtual reality, medical surgery, markerless motion capture, video games, and security surveillance. However, most of the existing multi-view systems are set up in a strictly controlled indoor environment, with fixed lighting conditions and simple background views. Many challenges are preventing the technology to an outdoor natural environment. These include varying sunlight, shadows, reflections, background motion and visual occlusion. In this thesis, I address different aspects to overcome all of the aforementioned difficulties, so as to reduce human preparation and manipulation, and to make a robust outdoor system as automatic as possible. In particular, the main novel technical contributions of this thesis are as follows: a generic heterogeneous sensor fusion framework for robust 3D shape estimation together; a way to automatically recover 3D shapes of static occluder from dynamic object silhouette cues, which explicitly models the static visual occluding event along the viewing rays; a system to model multiple dynamic objects shapes and track their identities simultaneously, which explicitly models the inter-occluding event between dynamic objects; a scheme to recover an object's dense 3D motion flow over time, without assuming any prior knowledge of the underlying structure of the dynamic object being modeled, which helps to enforce temporal consistency of natural motions and initializes more advanced shape learning and motion analysis. A unified automatic calibration algorithm for the heterogeneous network of conventional cameras/camcorders and new Time-of-Flight sensors is also proposed

    On-line control of active camera networks

    Get PDF
    Large networks of cameras have been increasingly employed to capture dynamic events for tasks such as surveillance and training. When using active (pan-tilt-zoom) cameras to capture events distributed throughout a large area, human control becomes impractical and unreliable. This has led to the development of automated approaches for on-line camera control. I introduce a new approach that consists of a stochastic performance metric and a constrained optimization method. The metric quantifies the uncertainty in the state of multiple points on each target. It uses state-space methods with stochastic models of the target dynamics and camera measurements. It can account for static and dynamic occlusions, accommodate requirements specific to the algorithm used to process the images, and incorporate other factors that can affect its results. The optimization explores the space of camera configurations over time under constraints associated with the cameras, the predicted target trajectories, and the image processing algorithm. While an exhaustive exploration of this parameter space is intractable, through careful complexity analysis and application domain observations I have identified appropriate alternatives for reducing the space. Specifically, I reduce the spatial dimension of the search by dividing the optimization problem into subproblems, and then optimizing each subproblem independently. I reduce the temporal dimension of the search by using empirically-based heuristics inside each subproblem. The result is a tractable optimization that explores an appropriate subspace of the parameters, while attempting to minimize the risk of excluding the global optimum. The approach can be applied to conventional surveillance tasks (e.g., tracking or face recognition), as well as tasks employing more complex computer vision methods (e.g., markerless motion capture or 3D reconstruction). I present the results of experimental simulations of two such scenarios, using controlled and natural (unconstrained) target motions, employing simulated and real target tracks, in realistic scenes, and with realistic camera networks

    Human Pose Estimation with Supervoxels

    Get PDF
    This thesis investigates how segmentation as a preprocessing step can reduce both the search space as well as complexity of human pose estimation in the context of smart environments. A 3D reconstruction is computed with a voxel carving algorithm. Based on a superpixel algorithm, these voxels are segmented into supervoxels that are then applied to pictorial structures in 3D to efficiently estimate the human pose. Both static and dynamic gesture recognition applications were developed

    Sensor Signal and Information Processing II

    Get PDF
    In the current age of information explosion, newly invented technological sensors and software are now tightly integrated with our everyday lives. Many sensor processing algorithms have incorporated some forms of computational intelligence as part of their core framework in problem solving. These algorithms have the capacity to generalize and discover knowledge for themselves and learn new information whenever unseen data are captured. The primary aim of sensor processing is to develop techniques to interpret, understand, and act on information contained in the data. The interest of this book is in developing intelligent signal processing in order to pave the way for smart sensors. This involves mathematical advancement of nonlinear signal processing theory and its applications that extend far beyond traditional techniques. It bridges the boundary between theory and application, developing novel theoretically inspired methodologies targeting both longstanding and emergent signal processing applications. The topic ranges from phishing detection to integration of terrestrial laser scanning, and from fault diagnosis to bio-inspiring filtering. The book will appeal to established practitioners, along with researchers and students in the emerging field of smart sensors processing

    3D occlusion inference from silhouette cues

    No full text
    We consider the problem of detecting and accounting for the presence of occluders in a 3D scene based on silhouette cues in video streams obtained from multiple, calibrated views. While well studied and robust in controlled environments, silhouette-based reconstruction of dynamic objects fails in general environments where uncontrolled occlusions are commonplace, due to inherent silhouette corruption by occluders. We show that occluders in the interaction space of dynamic objects can be detected and their 3D shape fully recovered as a byproduct of shape-from-silhouette analysis. We provide a Bayesian sensor fusion formulation to process all occlusion cues occurring in a multi-view sequence. Results show that the shape of static occluders can be robustly recovered from pure dynamic object motion, and that this information can be used for online self-correction and consolidation of dynamic object shape reconstruction. 1
    corecore