97 research outputs found
Model-Based High-Dimensional Pose Estimation with Application to Hand Tracking
This thesis presents novel techniques for computer vision based full-DOF human hand motion estimation. Our main contributions are: A robust skin color estimation approach; A novel resolution-independent and memory efficient representation of hand pose silhouettes, which allows us to compute area-based similarity measures in near-constant time; A set of new segmentation-based similarity measures; A new class of similarity measures that work for nearly arbitrary input modalities; A novel edge-based similarity measure that avoids any problematic thresholding or discretizations and can be computed very efficiently in Fourier space; A template hierarchy to minimize the number of similarity computations needed for finding the most likely hand pose observed; And finally, a novel image space search method, which we naturally combine with our hierarchy. Consequently, matching can efficiently be formulated as a simultaneous template tree traversal and function maximization
Recommended from our members
A Novel Inpainting Framework for Virtual View Synthesis
Multi-view imaging has stimulated significant research to enhance the user experience of free viewpoint video, allowing interactive navigation between views and the freedom to select a desired view to watch. This usually involves transmitting both textural and depth information captured from different viewpoints to the receiver, to enable the synthesis of an arbitrary view. In rendering these virtual views, perceptual holes can appear due to certain regions, hidden in the original view by a closer object, becoming visible in the virtual view. To provide a high quality experience these holes must be filled in a visually plausible way, in a process known as inpainting. This is challenging because the missing information is generally unknown and the hole-regions can be large. Recently depth-based inpainting techniques have been proposed to address this challenge and while these generally perform better than non-depth assisted methods, they are not very robust and can produce perceptual artefacts.
This thesis presents a new inpainting framework that innovatively exploits depth and textural self-similarity characteristics to construct subjectively enhanced virtual viewpoints. The framework makes three significant contributions to the field: i) the exploitation of view information to jointly inpaint textural and depth hole regions; ii) the introduction of the novel concept of self-similarity characterisation which is combined with relevant depth information; and iii) an advanced self-similarity characterising scheme that automatically determines key spatial transform parameters for effective and flexible inpainting.
The presented inpainting framework has been critically analysed and shown to provide superior performance both perceptually and numerically compared to existing techniques, especially in terms of lower visual artefacts. It provides a flexible robust framework to develop new inpainting strategies for the next generation of interactive multi-view technologies
Calibration-free Pedestrian Partial Pose Estimation Using a High-mounted Kinect
Les applications de l’analyse du comportement humain ont subit de rapides développements durant les dernières décades, tant au niveau des systèmes de divertissements que pour des applications professionnelles comme les interfaces humain-machine, les systèmes d’assistance de conduite automobile ou des systèmes de protection des piétons. Cette thèse traite du problème de reconnaissance de piétons ainsi qu’à l’estimation de leur orientation en 3D. Cette estimation est faite dans l’optique que la connaissance de cette orientation est bénéfique tant au niveau de l’analyse que de la prédiction du comportement des piétons. De ce fait, cette thèse propose à la fois une nouvelle méthode pour détecter les piétons et une manière d’estimer leur orientation, par l’intégration séquentielle d’un module de détection et un module d’estimation d’orientation. Pour effectuer cette détection de piéton, nous avons conçu un classificateur en cascade qui génère automatiquement une boîte autour des piétons détectés dans l’image. Suivant cela, des régions sont extraites d’un nuage de points 3D afin de classifier l’orientation du torse du piéton. Cette classification se base sur une image synthétique grossière par tramage (rasterization) qui simule une caméra virtuelle placée immédiatement au-dessus du piéton détecté. Une machine à vecteurs de support effectue la classification à partir de cette image de synthèse, pour l’une des 10 orientations discrètes utilisées lors de l’entrainement (incréments de 30 degrés). Afin de valider les performances de notre approche d’estimation d’orientation, nous avons construit une base de données de référence contenant 764 nuages de points. Ces données furent capturées à l’aide d’une caméra Kinect de Microsoft pour 30 volontaires différents, et la vérité-terrain sur l’orientation fut établie par l’entremise d’un système de capture de mouvement Vicon. Finalement, nous avons démontré les améliorations apportées par notre approche. En particulier, nous pouvons détecter des piétons avec une précision de 95.29% et estimer l’orientation du corps (dans un intervalle de 30 degrés) avec une précision de 88.88%. Nous espérons ainsi que nos résultats de recherche puissent servir de point de départ à d’autres recherches futures.The application of human behavior analysis has undergone rapid development during the last decades from entertainment system to professional one, as Human Robot Interaction (HRI), Advanced Driver Assistance System (ADAS), Pedestrian Protection System (PPS), etc. Meanwhile, this thesis addresses the problem of recognizing pedestrians and estimating their body orientation in 3D based on the fact that estimating a person’s orientation is beneficial in determining their behavior. In this thesis, a new method is proposed for detecting and estimating the orientation, in which the result of a pedestrian detection module and a orientation estimation module are integrated sequentially. For the goal of pedestrian detection, a cascade classifier is designed to draw a bounding box around the detected pedestrian. Following this, extracted regions are given to a discrete orientation classifier to estimate pedestrian body’s orientation. This classification is based on a coarse, rasterized depth image simulating a top-view virtual camera, and uses a support vector machine classifier that was trained to distinguish 10 orientations (30 degrees increments). In order to test the performance of our approach, a new benchmark database contains 764 sets of point cloud for body-orientation classification was captured. For this benchmark, a Kinect recorded the point cloud of 30 participants and a marker-based motion capture system (Vicon) provided the ground truth on their orientation. Finally we demonstrated the improvements brought by our system, as it detected pedestrian with an accuracy of 95:29% and estimated the body orientation with an accuracy of 88:88%.We hope it can provide a new foundation for future researches
Recommended from our members
Human extremity detection and its applications in action detection and recognition
textIt is proven that locations of internal body joints are sufficient visual cues to characterize human motion. In this dissertation I propose that locations of human extremities including heads, hands and feet provide powerful approximation to internal body motion. I propose detection of precise extremities from contours obtained from image segmentation or contour tracking. Junctions of medial axis of contours are selected as stars. Contour points with a local maximum distance to various stars are chosen as candidate extremities. All the candidates are filtered by cues including proximity to other candidates, visibility to stars and robustness to noise smoothing parameters. I present my applications of using precise extremities for fast human action detection and recognition. Environment specific features are built from precise extremities and feed into a block based Hidden Markov Model to decode the fence climbing action from continuous videos. Precise extremities are grouped into stable contacts if the same extremity does not move for a certain duration. Such stable contacts are utilized to decompose a long continuous video into shorter pieces. Each piece is associated with certain motion features to form primitive motion units. In this way the sequence is abstracted into more meaningful segments and a searching strategy is used to detect the fence climbing action. Moreover, I propose the histogram of extremities as a general posture descriptor. It is tested in a Hidden Markov Model based framework for action recognition. I further propose detection of probable extremities from raw images without any segmentation. Modeling the extremity as an image patch instead of a single point on the contour helps overcome the segmentation difficulty and increase the detection robustness. I represent the extremity patches with Histograms of Oriented Gradients. The detection is achieved by window based image scanning. In order to reduce computation load, I adopt the integral histograms technique without sacrificing accuracy. The result is a probability map where each pixel denotes probability of the patch forming the specific class of extremities. With a probable extremity map, I propose the histogram of probable extremities as another general posture descriptor. It is tested on several data sets and the results are compared with that of precise extremities to show the superiority of probable extremities.Electrical and Computer Engineerin
Recommended from our members
Image based human body rendering via regression & MRF energy minimization
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A machine learning method for synthesising human images is explored to create new images without relying on 3D modelling. Machine learning allows the creation of new images through prediction from existing data based on the use of training images. In the present study, image synthesis is performed at two levels: contour and pixel. A class of learning-based methods is formulated to create object contours from the training image for the synthetic image that allow pixel synthesis within the contours in the second level. The methods rely on applying robust object descriptions, dynamic learning models after appropriate motion segmentation, and machine learning-based frameworks.
Image-based human image synthesis using machine learning is a research focus that has recently gained considerable attention in the field of computer graphics. It makes use of techniques from image/motion analysis in computer vision. The problem lies in the estimation of methods for image-based object configuration (i.e. segmentation, contour outline). Using the results of these analysis methods as bases, the research adopts the machine learning approach, in which human images are synthesised by executing the synthesis of contour and pixels through the learning from training image.
Firstly, thesis shows how an accurate silhouette is distilled using developed background subtraction for accuracy and efficiency. The traditional vector machine approach is used to avoid ambiguities within the regression process. Images can be represented as a class of accurate and efficient vectors for single images as well as sequences. Secondly, the framework is explored using a unique view of machine learning methods, i.e., support vector regression (SVR), to obtain the convergence result of vectors for contour allocation. The changing relationship between the synthetic image and the training image is expressed as a vector and represented in functions. Finally, a pixel synthesis is performed based on belief propagation.
This thesis proposes a novel image-based rendering method for colour image synthesis using SVR and belief propagation for generalisation to enable the prediction of contour and colour information from input colour images. The methods rely on using appropriately defined and robust input colour images, optimising the input contour images within a sparse SVR framework. Firstly, the thesis shows how contour can effectively and efficiently be predicted from small numbers of input contour images. In addition, the thesis exploits the sparse properties of SVR efficiency, and makes use of SVR to estimate regression function. The image-based rendering method employed in this study enables contour synthesis for the prediction of small numbers of input source images. This procedure avoids the use of complex models and geometry information. Secondly, the method used for human body contour colouring is extended to define eight differently connected pixels, and construct a link distance field via the belief propagation method. The link distance, which acts as the message in propagation, is transformed by improving the low-envelope method in fast distance transform. Finally, the methodology is tested by considering human facial and human body clothing information. The accuracy of the test results for the human body model confirms the efficiency of the proposed method
High-performance hardware accelerators for image processing in space applications
Mars is a hard place to reach. While there have been many notable success stories in getting probes to the Red Planet, the historical record is full of bad news. The success rate for actually landing on the Martian surface is even worse, roughly 30%. This low success rate must be mainly credited to the Mars environment characteristics. In the Mars atmosphere strong winds frequently breath. This phenomena usually modifies the lander descending trajectory diverging it from the target one. Moreover, the Mars surface is not the best place where performing a safe land. It is pitched by many and close craters and huge stones, and characterized by huge mountains and hills (e.g., Olympus Mons is 648 km in diameter and 27 km tall). For these reasons a mission failure due to a landing in huge craters, on big stones or on part of the surface characterized by a high slope is highly probable.
In the last years, all space agencies have increased their research efforts in order to enhance the success rate of Mars missions. In particular, the two hottest research topics are: the active debris removal and the guided landing on Mars.
The former aims at finding new methods to remove space debris exploiting unmanned spacecrafts. These must be able to autonomously: detect a debris, analyses it, in order to extract its characteristics in terms of weight, speed and dimension, and, eventually, rendezvous with it. In order to perform these tasks, the spacecraft must have high vision capabilities. In other words, it must be able to take pictures and process them with very complex image processing algorithms in order to detect, track and analyse the debris.
The latter aims at increasing the landing point precision (i.e., landing ellipse) on Mars. Future space-missions will increasingly adopt Video Based Navigation systems to assist the entry, descent and landing (EDL) phase of space modules (e.g., spacecrafts), enhancing the precision of automatic EDL navigation systems. For instance, recent space exploration missions, e.g., Spirity, Oppurtunity, and Curiosity, made use of an EDL procedure aiming at following a fixed and precomputed descending trajectory to reach a precise landing point. This approach guarantees a maximum landing point precision of 20 km. By comparing this data with the Mars environment characteristics, it is possible to understand how the mission failure probability still remains really high.
A very challenging problem is to design an autonomous-guided EDL system able to even more reduce the landing ellipse, guaranteeing to avoid the landing in dangerous area of Mars surface (e.g., huge craters or big stones) that could lead to the mission failure. The autonomous behaviour of the system is mandatory since a manual driven approach is not feasible due to the distance between Earth and Mars. Since this distance varies from 56 to 100 million of km approximately due to the orbit eccentricity, even if a signal transmission at the light speed could be possible, in the best case the transmission time would be around 31 minutes, exceeding so the overall duration of the EDL phase.
In both applications, algorithms must guarantee self-adaptability to the environmental conditions. Since the Mars (and in general the space) harsh conditions are difficult to be predicted at design time, these algorithms must be able to automatically tune the internal parameters depending on the current conditions.
Moreover, real-time performances are another key factor. Since a software implementation of these computational intensive tasks cannot reach the required performances, these algorithms must be accelerated via hardware.
For this reasons, this thesis presents my research work done on advanced image processing algorithms for space applications and the associated hardware accelerators. My research activity has been focused on both the algorithm and their hardware implementations. Concerning the first aspect, I mainly focused my research effort to integrate self-adaptability features in the existing algorithms. While concerning the second, I studied and validated a methodology to efficiently develop, verify and validate hardware components aimed at accelerating video-based applications. This approach allowed me to develop and test high performance hardware accelerators that strongly overcome the performances of the actual state-of-the-art implementations.
The thesis is organized in four main chapters.
Chapter 2 starts with a brief introduction about the story of digital image processing. The main content of this chapter is the description of space missions in which digital image processing has a key role. A major effort has been spent on the missions in which my research activity has a substantial impact. In particular, for these missions, this chapter deeply analizes and evaluates the state-of-the-art approaches and algorithms.
Chapter 3 analyzes and compares the two technologies used to implement high performances hardware accelerators, i.e., Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Thanks to this information the reader may understand the main reasons behind the decision of space agencies to exploit FPGAs instead of ASICs for high-performance hardware accelerators in space missions, even if FPGAs are more sensible to Single Event Upsets (i.e., transient error induced on hardware component by alpha particles and solar radiation in space). Moreover, this chapter deeply describes the three available space-grade FPGA technologies (i.e., One-time Programmable, Flash-based, and SRAM-based), and the main fault-mitigation techniques against SEUs that are mandatory for employing space-grade FPGAs in actual missions.
Chapter 4 describes one of the main contribution of my research work: a library of high-performance hardware accelerators for image processing in space applications. The basic idea behind this library is to offer to designers a set of validated hardware components able to strongly speed up the basic image processing operations commonly used in an image processing chain. In other words, these components can be directly used as elementary building blocks to easily create a complex image processing system, without wasting time in the debug and validation phase. This library groups the proposed hardware accelerators in IP-core families. The components contained in a same family share the same provided functionality and input/output interface. This harmonization in the I/O interface enables to substitute, inside a complex image processing system, components of the same family without requiring modifications to the system communication infrastructure. In addition to the analysis of the internal architecture of the proposed components, another important aspect of this chapter is the methodology used to develop, verify and validate the proposed high performance image processing hardware accelerators. This methodology involves the usage of different programming and hardware description languages in order to support the designer from the algorithm modelling up to the hardware implementation and validation.
Chapter 5 presents the proposed complex image processing systems. In particular, it exploits a set of actual case studies, associated with the most recent space agency needs, to show how the hardware accelerator components can be assembled to build a complex image processing system. In addition to the hardware accelerators contained in the library, the described complex system embeds innovative ad-hoc hardware components and software routines able to provide high performance and self-adaptable image processing functionalities. To prove the benefits of the proposed methodology, each case study is concluded with a comparison with the current state-of-the-art implementations, highlighting the benefits in terms of performances and self-adaptability to the environmental conditions
Automatic Food Intake Assessment Using Camera Phones
Obesity is becoming an epidemic phenomenon in most developed countries. The fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended. It is essential to monitor everyday food intake for obesity prevention and management. Existing dietary assessment methods usually require manually recording and recall of food types and portions. Accuracy of the results largely relies on many uncertain factors such as user\u27s memory, food knowledge, and portion estimations. As a result, the accuracy is often compromised. Accurate and convenient dietary assessment methods are still blank and needed in both population and research societies.
In this thesis, an automatic food intake assessment method using cameras, inertial measurement units (IMUs) on smart phones was developed to help people foster a healthy life style. With this method, users use their smart phones before and after a meal to capture images or videos around the meal. The smart phone will recognize food items and calculate the volume of the food consumed and provide the results to users. The technical objective is to explore the feasibility of image based food recognition and image based volume estimation.
This thesis comprises five publications that address four specific goals of this work: (1) to develop a prototype system with existing methods to review the literature methods, find their drawbacks and explore the feasibility to develop novel methods; (2) based on the prototype system, to investigate new food classification methods to improve the recognition accuracy to a field application level; (3) to design indexing methods for large-scale image database to facilitate the development of new food image recognition and retrieval algorithms; (4) to develop novel convenient and accurate food volume estimation methods using only smart phones with cameras and IMUs.
A prototype system was implemented to review existing methods. Image feature detector and descriptor were developed and a nearest neighbor classifier were implemented to classify food items. A reedit card marker method was introduced for metric scale 3D reconstruction and volume calculation.
To increase recognition accuracy, novel multi-view food recognition algorithms were developed to recognize regular shape food items. To further increase the accuracy and make the algorithm applicable to arbitrary food items, new food features, new classifiers were designed. The efficiency of the algorithm was increased by means of developing novel image indexing method in large-scale image database. Finally, the volume calculation was enhanced through reducing the marker and introducing IMUs. Sensor fusion technique to combine measurements from cameras and IMUs were explored to infer the metric scale of the 3D model as well as reduce noises from these sensors
Fish4Knowledge: Collecting and Analyzing Massive Coral Reef Fish Video Data
This book gives a start-to-finish overview of the whole Fish4Knowledge project, in 18 short chapters, each describing one aspect of the project. The Fish4Knowledge project explored the possibilities of big video data, in this case from undersea video. Recording and analyzing 90 thousand hours of video from ten camera locations, the project gives a 3 year view of fish abundance in several tropical coral reefs off the coast of Taiwan. The research system built a remote recording network, over 100 Tb of storage, supercomputer processing, video target detection and
Digital Surface Modelling in Developing Countries Using Spaceborne SAR Techniques
Topographic databases at the national level, in the form of Digital Surface Models (DSMs), are required for a large number of applications which have been spurred on by the increased use of Geographic Information Systems (GIS). Ground-Based (surveying, GPS, etc.) and traditional airborne approaches to generating topographic information are proving to be time consuming and costly for applications in developing countries. Where these countries are located in the tropical zone, they are affected by the additional problem of cloud cover which could cause delays for almost 75% of the year in obtaining optical imagery. The Caribbean happens to be one such affected territory that is in need of national digital topographic information for its GIS database developments, 3D visualization of landscapes and for use in the digital ortho-rectification of satellite imagery.
The use of Synthetic Aperture Radar (SAR), with its cloud penetrating and day/night imaging capabilities, is emerging as a possible remote sensing tool for use in cloud affected territories. There has been success with airborne single-pass dual antennae systems (e.g. STAR 3i) and the Shuttle Radar Topographic Mapping (SRTM) mission. However, the use of these systems in the Caribbean are restrictive and datasets will not be generally available. The launching of imaging radar satellites such as ERS-1, ERS-2, Radarsat-1 and more recently Envisat have provided additional opportunities for augmenting the technologies available for generating medium accuracy, low cost, topographic information for developing countries by using the techniques of Radargrammetry (StereoSAR) and Interferometric SAR (InSAR).
The primary aim of this research was to develop, from scratch, a prototype StereoSAR system based on automatic stereo matching and space intersection algorithms to generate medium accuracy, low cost DSMs, using various influencing parameters without any recourse to ground control points. The result was to be a software package to undertake this process for implementation on a personal computer. The DSMs generated from Radarsat-1 and Envisat SAR imagery were compared with a reference surface from airborne InSAR and conclusions with respect to the quality of the StereoSAR DSMs are presented. Work required to further improve the StereoSAR system is also suggested
- …