102 research outputs found

    3D human pose estimation from depth maps using a deep combination of poses

    Full text link
    Many real-world applications require the estimation of human body joints for higher-level tasks as, for example, human behaviour understanding. In recent years, depth sensors have become a popular approach to obtain three-dimensional information. The depth maps generated by these sensors provide information that can be employed to disambiguate the poses observed in two-dimensional images. This work addresses the problem of 3D human pose estimation from depth maps employing a Deep Learning approach. We propose a model, named Deep Depth Pose (DDP), which receives a depth map containing a person and a set of predefined 3D prototype poses and returns the 3D position of the body joints of the person. In particular, DDP is defined as a ConvNet that computes the specific weights needed to linearly combine the prototypes for the given input. We have thoroughly evaluated DDP on the challenging 'ITOP' and 'UBC3V' datasets, which respectively depict realistic and synthetic samples, defining a new state-of-the-art on them.Comment: Accepted for publication at "Journal of Visual Communication and Image Representation

    Single Cone Mirror Omni-Directional Stereo

    Get PDF
    Omni-directional view and stereo information for scene points are both crucial in many computer vision applications. In some demanding applications like autonomous robots, we need to acquire both in real-time without sacrificing too much image resolution. This work describes a novel method to meet all the stringent demands with relatively simple setup and off-the-shelf equipments. Only one simple reflective surface and two regular (perspective) camera views are needed. First we describe the novel stereo method. Then we discuss some variations in practical implementation and their respective tradeoffs

    RGB-D Multicamera Object Detection and Tracking Implemented through Deep Learning

    Get PDF
    In this thesis we present the development of a multi object detection and tracking system in low light environment implemented by using a RGB-D multicamera system and the deep learning framework. For better understanding how the system works, some hardware and software components are presented such as RGB-D sensor cameras, multi object detection and tracking techniques. In addition a brief introduction of the main concepts of the neural networks are presented

    Image-based rendering and synthesis

    Get PDF
    Multiview imaging (MVI) is currently the focus of some research as it has a wide range of applications and opens up research in other topics and applications, including virtual view synthesis for three-dimensional (3D) television (3DTV) and entertainment. However, a large amount of storage is needed by multiview systems and are difficult to construct. The concept behind allowing 3D scenes and objects to be visualized in a realistic way without full 3D model reconstruction is image-based rendering (IBR). Using images as the primary substrate, IBR has many potential applications including for video games, virtual travel and others. The technique creates new views of scenes which are reconstructed from a collection of densely sampled images or videos. The IBR concept has different classification such as knowing 3D models and the lighting conditions and be rendered using conventional graphic techniques. Another is lightfield or lumigraph rendering which depends on dense sampling with no or very little geometry for rendering without recovering the exact 3D-models.published_or_final_versio

    A Multicamera System for Gesture Tracking With Three Dimensional Hand Pose Estimation

    Get PDF
    The goal of any visual tracking system is to successfully detect then follow an object of interest through a sequence of images. The difficulty of tracking an object depends on the dynamics, the motion and the characteristics of the object as well as on the environ ment. For example, tracking an articulated, self-occluding object such as a signing hand has proven to be a very difficult problem. The focus of this work is on tracking and pose estimation with applications to hand gesture interpretation. An approach that attempts to integrate the simplicity of a region tracker with single hand 3D pose estimation methods is presented. Additionally, this work delves into the pose estimation problem. This is ac complished by both analyzing hand templates composed of their morphological skeleton, and addressing the skeleton\u27s inherent instability. Ligature points along the skeleton are flagged in order to determine their effect on skeletal instabilities. Tested on real data, the analysis finds the flagging of ligature points to proportionally increase the match strength of high similarity image-template pairs by about 6%. The effectiveness of this approach is further demonstrated in a real-time multicamera hand tracking system that tracks hand gestures through three-dimensional space as well as estimate the three-dimensional pose of the hand

    VINS-mono Optimized: A Monocular Visual-inertial State Estimator with Improved Initialization

    Get PDF
    State estimation is one of the key areas in robotics. It touches a variety of applications in practice such as, aerial vehicle navigation, autonomous driving, augmented reality, and virtual reality. A monocular visual-inertial system (VINS) is one of the popular trends in solving state estimation. By fusing a monocular camera and IMU properly, the system is capable of providing the position and orientation of a vehicle and recovering the scale. One of the challenges for a monocular VINS is estimator initialization due to the inadequacy of direct distance measurement. Based on the work of Hong Kong University of Technology on monocular VINS, a checkerboard pattern is introduced to improve the original initialization process. The checkerboard parameters are used along with the calculated 3D coordinates to replace the original initialization process, leading to higher accuracy. The results demonstrated lowered cross track error and final drift, compared with the original approach

    Multicamera System for Automatic Positioning of Objects in Game Sports

    Get PDF
    Garantir um sistema com múltiplas câmaras que seja capaz de extrair dados 3D da posição de uma bola durante um evento desportivo, através da análise e teste de técnicas de visão computacional (calibração de câmaras e reconstrução 3D)
    corecore