65 research outputs found

    Ray Tracing Gems

    Get PDF
    This book is a must-have for anyone serious about rendering in real time. With the announcement of new ray tracing APIs and hardware to support them, developers can easily create real-time applications with ray tracing as a core component. As ray tracing on the GPU becomes faster, it will play a more central role in real-time rendering. Ray Tracing Gems provides key building blocks for developers of games, architectural applications, visualizations, and more. Experts in rendering share their knowledge by explaining everything from nitty-gritty techniques that will improve any ray tracer to mastery of the new capabilities of current and future hardware. What you'll learn: The latest ray tracing techniques for developing real-time applications in multiple domains Guidance, advice, and best practices for rendering applications with Microsoft DirectX Raytracing (DXR) How to implement high-performance graphics for interactive visualizations, games, simulations, and more Who this book is for: Developers who are looking to leverage the latest APIs and GPU technology for real-time rendering and ray tracing Students looking to learn about best practices in these areas Enthusiasts who want to understand and experiment with their new GPU

    Accurate dense depth from light field technology for object segmentation and 3D computer vision

    Get PDF

    Parallel Tracking and Mapping for Manipulation Applications with Golem Krang

    Get PDF
    Implementing a simultaneous localization and mapping system and an image semantic segmentation method on a mobile manipulation. The application of the SLAM is working towards navigating among obstacles in unknown environments. The object detection method will be integrated for future manipulation tasks such as grasping. This work will be demonstrated on a real robotics hardware system in the lab.Outgoin

    3-D Scene Reconstruction from Aerial Imagery

    Get PDF
    3-D scene reconstructions derived from Structure from Motion (SfM) and Multi-View Stereo (MVS) techniques were analyzed to determine the optimal reconnaissance flight characteristics suitable for target reconstruction. In support of this goal, a preliminary study of a simple 3-D geometric object facilitated the analysis of convergence angles and number of camera frames within a controlled environment. Reconstruction accuracy measurements revealed at least 3 camera frames and a 6 convergence angle were required to achieve results reminiscent of the original structure. The central investigative effort sought the applicability of certain airborne reconnaissance flight profiles to reconstructing ground targets. The data sets included images collected within a synthetic 3-D urban environment along circular, linear and s-curve aerial flight profiles equipped with agile and non-agile sensors. S-curve and dynamically controlled linear flight paths provided superior results, whereas with sufficient data conditioning and combination of orthogonal flight paths, all flight paths produced quality reconstructions under a wide variety of operational considerations

    Single-chip CMOS tracking image sensor for a complex target

    Get PDF

    Markerless facial motion capture: deep learning approaches on RGBD data

    Get PDF
    Facial expressions are a series of fast, complex and interconnected movement that causes an array of deformations, such as stretching, compressing and folding of the skin. Identifying expression is a natural process in human vision, but due to the diversity of faces, it has many challenges for computer vision. Research in markerless facial motion capture using single Red Green Blue (RGB) camera has gained popularity due to the wide access of the data, such as from mobile phones. The motivation behind this work is much of the existing work attempts to infer the 3-Dimensional (3D) data from 2-Dimensional (2D) images, such as in motion capture multiple 2D cameras are calibration to allow some depth prediction. Whereas, the inclusion of Red Green Blue Depth (RGBD) sensors that give ground truth depth data could gain a better understanding of the human face and how expressions are visualised. The aim of this thesis is to investigate and develop novel methods of markerless facial motion capture, where the focus is on the inclusions of RGBD data to provide 3D data. The contributions are: A tool to aid in the annotation of 3D facial landmarks; A novel neural network that demonstrate the ability of predicting 2D and 3D landmarks by merging RGBD data; Working application that demonstrates complex deep learning network on portable handheld devices; A review of existing methods of denoising fine detail in depth maps using neural networks; A network for the complete analysis of facial landmarks and expressions in 3D. The 3D annotator was developed to overcome the issues of relying on existing 3D modelling software, which made feature identification difficult. The technique of predicting 2D and 3D with auxiliary information, allowed high accuracy 3D landmarking, without the need for full model generation. Also, it outperformed other recent techniques of landmarking. The networks running on the handheld devices show as a proof of concept that even without much optimisation, a complex task can be performed in near real-time. Denoising Time of Flight (ToF) depth maps, showed much more complexity than the tradition RGB denoising, where we reviewed and applied an array of techniques to the task. The full facial analysis showed that when neural networks perform on a wide range of related task for auxiliary information allow for deep understanding of the overall task. The research for facial processing is vast, but still with many new problems and challenges to face and improve upon. While RGB cameras are used widely, we see the inclusion of high accuracy and cost-effective depth sensing device available. The new devices allow better understanding of facial features and expression. By using and merging RGB data, the area of facial landmarking, and expression intensity recognition can be improved

    Video foreground extraction for mobile camera platforms

    Get PDF
    Foreground object detection is a fundamental task in computer vision with many applications in areas such as object tracking, event identification, and behavior analysis. Most conventional foreground object detection methods work only in a stable illumination environments using fixed cameras. In real-world applications, however, it is often the case that the algorithm needs to operate under the following challenging conditions: drastic lighting changes, object shape complexity, moving cameras, low frame capture rates, and low resolution images. This thesis presents four novel approaches for foreground object detection on real-world datasets using cameras deployed on moving vehicles.The first problem addresses passenger detection and tracking tasks for public transport buses investigating the problem of changing illumination conditions and low frame capture rates. Our approach integrates a stable SIFT (Scale Invariant Feature Transform) background seat modelling method with a human shape model into a weighted Bayesian framework to detect passengers. To deal with the problem of tracking multiple targets, we employ the Reversible Jump Monte Carlo Markov Chain tracking algorithm. Using the SVM classifier, the appearance transformation models capture changes in the appearance of the foreground objects across two consecutives frames under low frame rate conditions. In the second problem, we present a system for pedestrian detection involving scenes captured by a mobile bus surveillance system. It integrates scene localization, foreground-background separation, and pedestrian detection modules into a unified detection framework. The scene localization module performs a two stage clustering of the video data.In the first stage, SIFT Homography is applied to cluster frames in terms of their structural similarity, and the second stage further clusters these aligned frames according to consistency in illumination. This produces clusters of images that are differential in viewpoint and lighting. A kernel density estimation (KDE) technique for colour and gradient is then used to construct background models for each image cluster, which is further used to detect candidate foreground pixels. Finally, using a hierarchical template matching approach, pedestrians can be detected.In addition to the second problem, we present three direct pedestrian detection methods that extend the HOG (Histogram of Oriented Gradient) techniques (Dalal and Triggs, 2005) and provide a comparative evaluation of these approaches. The three approaches include: a) a new histogram feature, that is formed by the weighted sum of both the gradient magnitude and the filter responses from a set of elongated Gaussian filters (Leung and Malik, 2001) corresponding to the quantised orientation, which we refer to as the Histogram of Oriented Gradient Banks (HOGB) approach; b) the codebook based HOG feature with branch-and-bound (efficient subwindow search) algorithm (Lampert et al., 2008) and; c) the codebook based HOGB approach.In the third problem, a unified framework that combines 3D and 2D background modelling is proposed to detect scene changes using a camera mounted on a moving vehicle. The 3D scene is first reconstructed from a set of videos taken at different times. The 3D background modelling identifies inconsistent scene structures as foreground objects. For the 2D approach, foreground objects are detected using the spatio-temporal MRF algorithm. Finally, the 3D and 2D results are combined using morphological operations.The significance of these research is that it provides basic frameworks for automatic large-scale mobile surveillance applications and facilitates many higher-level applications such as object tracking and behaviour analysis

    3D Scene Reconstruction with Micro-Aerial Vehicles and Mobile Devices

    Full text link
    Scene reconstruction is the process of building an accurate geometric model of one\u27s environment from sensor data. We explore the problem of real-time, large-scale 3D scene reconstruction in indoor environments using small laser range-finders and low-cost RGB-D (color plus depth) cameras. We focus on computationally-constrained platforms such as micro-aerial vehicles (MAVs) and mobile devices. These platforms present a set of fundamental challenges - estimating the state and trajectory of the device as it moves within its environment and utilizing lightweight, dynamic data structures to hold the representation of the reconstructed scene. The system needs to be computationally and memory-efficient, so that it can run in real time, onboard the platform. In this work, we present three scene reconstruction systems. The first system uses a laser range-finder and operates onboard a quadrotor MAV. We address the issues of autonomous control, state estimation, path-planning, and teleoperation. We propose the multi-volume occupancy grid (MVOG) - a novel data structure for building 3D maps from laser data, which provides a compact, probabilistic scene representation. The second system uses an RGB-D camera to recover the 6-DoF trajectory of the platform by aligning sparse features observed in the current RGB-D image against a model of previously seen features. We discuss our work on camera calibration and the depth measurement model. We apply the system onboard an MAV to produce occupancy-based 3D maps, which we utilize for path-planning. Finally, we present our contributions to a scene reconstruction system for mobile devices with built-in depth sensing and motion-tracking capabilities. We demonstrate reconstructing and rendering a global mesh on the fly, using only the mobile device\u27s CPU, in very large (300 square meter) scenes, at a resolutions of 2-3cm. To achieve this, we divide the scene into spatial volumes indexed by a hash map. Each volume contains the truncated signed distance function for that area of space, as well as the mesh segment derived from the distance function. This approach allows us to focus computational and memory resources only in areas of the scene which are currently observed, as well as leverage parallelization techniques for multi-core processing

    Visual Servoing in Robotics

    Get PDF
    Visual servoing is a well-known approach to guide robots using visual information. Image processing, robotics, and control theory are combined in order to control the motion of a robot depending on the visual information extracted from the images captured by one or several cameras. With respect to vision issues, a number of issues are currently being addressed by ongoing research, such as the use of different types of image features (or different types of cameras such as RGBD cameras), image processing at high velocity, and convergence properties. As shown in this book, the use of new control schemes allows the system to behave more robustly, efficiently, or compliantly, with fewer delays. Related issues such as optimal and robust approaches, direct control, path tracking, or sensor fusion are also addressed. Additionally, we can currently find visual servoing systems being applied in a number of different domains. This book considers various aspects of visual servoing systems, such as the design of new strategies for their application to parallel robots, mobile manipulators, teleoperation, and the application of this type of control system in new areas

    Parametric face alignment : generative and discriminative approaches

    Get PDF
    Tese de doutoramento em Engenharia Electrotécnica e de Computadores, apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraThis thesis addresses the matching of deformable human face models into 2D images. Two di erent approaches are detailed: generative and discriminative methods. Generative or holistic methods model the appearance/texture of all image pixels describing the face by synthesizing the expected appearance (it builds synthetic versions of the target face). Discriminative or patch-based methods model the local correlations between pixel values. Such approach uses an ensemble of local feature detectors all connected by a shape regularization model. Typically, generative approaches can achieve higher tting accuracy, but discriminative methods perform a lot better in unseen images. The Active Appearance Models (AAMs) are probably the most widely used generative technique. AAMs match parametric models of shape and appearance into new images by solving a nonlinear optimization that minimizes the di erence between a synthetic template and the real appearance. The rst part of this thesis describes the 2.5D AAM, an extension of the original 2D AAM that deals with a full perspective projection model. The 2.5D AAM uses a 3D Point Distribution Model (PDM) and a 2D appearance model whose control points are de ned by a perspective projection of the PDM. Two model tting algorithms and their computational e cient approximations are proposed: the Simultaneous Forwards Additive (SFA) and the Normalization Forwards Additive (NFA). Robust solutions for the SFA and NFA are also proposed in order to take into account the self-occlusion and/or partial occlusion of the face. Extensive results, involving the tting convergence, tting performance in unseen data, robustness to occlusion, tracking performance and pose estimation are shown. The second main part of this thesis concerns to discriminative methods such as the Constrained Local Models (CLM) or the Active Shape Models (ASM), where an ensemble of local feature detectors are constrained to lie within the subspace spanned by a PDM. Fitting such a model to an image typically involves two steps: (1) a local search using a detector, obtaining response maps for each landmark and (2) a global optimization that nds the shape parameters that jointly maximize all the detection responses. This work proposes: Discriminative Bayesian Active Shape Models (DBASM) a new global optimization strategy, using a Bayesian approach, where the posterior distribution of the shape parameters are inferred in a maximum a posteriori (MAP) sense by means of a Linear Dynamical System (LDS). The DBASM approach models the covariance of the latent variables i.e. it uses 2nd order statistics of the shape (and pose) parameters. Later, Bayesian Active Shape Models (BASM) is presented. BASM is an extension of the previous DBASM formulation where the prior distribution is explicitly modeled by means of recursive Bayesian estimation. Extensive results are presented, evaluating DBASM and BASM global optimization strategies, local face parts detectors and tracking performance in several standard datasets. Qualitative results taken from the challenging Labeled Faces in the Wild (LFW) dataset are also shown. Finally, the last part of this thesis, addresses the identity and facial expression recognition. Face geometry is extracted from input images using the AAM and low dimensional manifolds were then derived using Laplacian EigenMaps (LE) resulting in two types of manifolds, one for representing identity and the other for person-speci c facial expression. The identity and facial expression recognition system uses a two stage approach: First, a Support Vector Machines (SVM) is used to establish identity across expression changes, then the second stage deals with person-speci c expression recognition with a network of Hidden Markov Models (HMMs). Results taken from people exhibiting the six basic expressions (happiness, sadness, anger, fear, surprise and disgust) plus the neutral emotion are shown.Esta tese aborda a correspond^encia de modelos humanos de faces deform aveis em imagens 2D. S~ao apresentadas duas abordagens diferentes: m etodos generativos e discriminativos. Os modelos generativos ou hol sticos modelam a apar^encia/textura de todos os pixeis que descrevem a face, sintetizando a apar^encia esperada (s~ao criadas vers~oes sint eticas da face alvo). Os modelos discriminativos ou baseados em partes modelam correla c~oes locais entre valores de pixeis. Esta abordagem utiliza um conjunto de detectores locais de caracter sticas, conectados por um modelo de regulariza c~ao geom etrico. Normalmente, as abordagens generativas permitem obter uma maior precis~ ao de ajuste do modelo, mas os m etodos discriminativos funcionam bastante melhor em imagens nunca antes vistas. Os Modelos Activos de Apar^encia (AAMs) s~ao provavelmente a t ecnica generativa mais utilizada. Os AAMs ajustam modelos param etricos de forma e apar^encia em imagens, resolvendo uma optimiza c~ao n~ao linear que minimiza a diferen ca entre o modelo sint etico e a apar^encia real. A primeira parte desta tese descreve os AAM 2.5D, uma extens~ao do AAM original 2D que permite a utiliza c~ao de um modelo de projec c~ao em perspectiva. Os AAM 2.5D utilizam um Modelo de Distribui c~ao de Pointos (PDM) e um modelo de apar^encia 2D cujos pontos de controlo s~ao de nidos por uma projec c~ao em perspectiva do PDM. Dois algoritmos de ajuste do modelo e as suas aproxima c~oes e cientes s~ao propostas: Simultaneous Forwards Additive (SFA) e o Normalization Forwards Additive (NFA). Solu c~oes robustas para o SFA e NFA, que contemplam a oclus~ao parcial da face, s~ao igualmente propostas. Resultados extensos, envolvendo a converg^encia de ajuste, o desempenho em imagens nunca vistas, robustez a oclus~ao, desempenho de seguimento e estimativa de pose s~ao apresentados. A segunda parte desta da tese diz respeito os m etodos discriminativos, tais como os Modelos Locais com Restri c~oes (CLM) ou os Modelos Activos de Forma (ASM), onde um conjunto de detectores de caracteristicas locais est~ao restritos a pertencer ao subespa co gerado por um PDM. O ajuste de um modelo deste tipo, envolve tipicamente duas et apas: (1) uma pesquisa local utilizando um detector, obtendo mapas de resposta para cada ponto de refer^encia e (2) uma estrat egia de optimiza c~ao global que encontra os par^ametros do PDM que permitem maximizar todas as respostas conjuntamente. Neste trabalho e proposto o Discriminative Bayesian Active Shape Models (DBASM), uma nova estrat egia de optimiza c~ao global que utiliza uma abordagem Bayesiana, onde a distribui c~ao a posteriori dos par^ametros de forma s~ao inferidos por meio de um sistema din^amico linear. A abordagem DBASM modela a covari^ancia das vari aveis latentes ou seja, e utilizado estat stica de segunda ordem na modela c~ao dos par^ametros. Posteriormente e apresentada a formula c~ao Bayesian Active Shape Models (BASM). O BASM e uma extens~ao do DBASM, onde a distribui c~ao a priori e explicitamente modelada por meio de estima c~ao Bayesiana recursiva. S~ao apresentados resultados extensos, avaliando as estrat egias de optimiza c~ao globais DBASM e BASM, detectores locais de componentes da face, e desempenho de seguimento em v arias bases de dados padr~ao. Resultados qualitativos extra dos da desa ante base de dados Labeled Faces in the Wild (LFW) s~ao tamb em apresentados. Finalmente, a ultima parte desta tese aborda o reconhecimento de id^entidade e express~oes faciais. A geometria da face e extra da de imagens utilizando o AAM e variedades de baixa dimensionalidade s~ao derivadas utilizando Laplacian EigenMaps (LE), obtendo-se dois tipos de variedades, uma para representar a id^entidade e a outra para express~oes faciais espec cas de cada pessoa. A id^entidade e o sistema de reconhecimento de express~oes faciais utiliza uma abordagem de duas fases: Num primeiro est agio e utilizado uma M aquina de Vectores de Suporte (SVM) para determinar a id^entidade, dedicando-se o segundo est agio ao reconhecimento de express~oes. Este est agio e especi co para cada pessoa e utiliza Modelos de Markov Escondidos (HMM). S~ao mostrados resultados obtidos em pessoas exibindo as seis express~oes b asicas (alegria, tristeza, raiva, medo, surpresa e nojo), e ainda a emo c~ao neutra
    • …
    corecore