Search CORE

679 research outputs found

Robust and real-time hand detection and tracking in monocular video

Author: Spruyt Vincent
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2015
Field of study

In recent years, personal computing devices such as laptops, tablets and smartphones have become ubiquitous. Moreover, intelligent sensors are being integrated into many consumer devices such as eyeglasses, wristwatches and smart televisions. With the advent of touchscreen technology, a new human-computer interaction (HCI) paradigm arose that allows users to interface with their device in an intuitive manner. Using simple gestures, such as swipe or pinch movements, a touchscreen can be used to directly interact with a virtual environment. Nevertheless, touchscreens still form a physical barrier between the virtual interface and the real world. An increasingly popular field of research that tries to overcome this limitation, is video based gesture recognition, hand detection and hand tracking. Gesture based interaction allows the user to directly interact with the computer in a natural manner by exploring a virtual reality using nothing but his own body language. In this dissertation, we investigate how robust hand detection and tracking can be accomplished under real-time constraints. In the context of human-computer interaction, real-time is defined as both low latency and low complexity, such that a complete video frame can be processed before the next one becomes available. Furthermore, for practical applications, the algorithms should be robust to illumination changes, camera motion, and cluttered backgrounds in the scene. Finally, the system should be able to initialize automatically, and to detect and recover from tracking failure. We study a wide variety of existing algorithms, and propose significant improvements and novel methods to build a complete detection and tracking system that meets these requirements. Hand detection, hand tracking and hand segmentation are related yet technically different challenges. Whereas detection deals with finding an object in a static image, tracking considers temporal information and is used to track the position of an object over time, throughout a video sequence. Hand segmentation is the task of estimating the hand contour, thereby separating the object from its background. Detection of hands in individual video frames allows us to automatically initialize our tracking algorithm, and to detect and recover from tracking failure. Human hands are highly articulated objects, consisting of finger parts that are connected with joints. As a result, the appearance of a hand can vary greatly, depending on the assumed hand pose. Traditional detection algorithms often assume that the appearance of the object of interest can be described using a rigid model and therefore can not be used to robustly detect human hands. Therefore, we developed an algorithm that detects hands by exploiting their articulated nature. Instead of resorting to a template based approach, we probabilistically model the spatial relations between different hand parts, and the centroid of the hand. Detecting hand parts, such as fingertips, is much easier than detecting a complete hand. Based on our model of the spatial configuration of hand parts, the detected parts can be used to obtain an estimate of the complete hand's position. To comply with the real-time constraints, we developed techniques to speed-up the process by efficiently discarding unimportant information in the image. Experimental results show that our method is competitive with the state-of-the-art in object detection while providing a reduction in computational complexity with a factor 1 000. Furthermore, we showed that our algorithm can also be used to detect other articulated objects such as persons or animals and is therefore not restricted to the task of hand detection. Once a hand has been detected, a tracking algorithm can be used to continuously track its position in time. We developed a probabilistic tracking method that can cope with uncertainty caused by image noise, incorrect detections, changing illumination, and camera motion. Furthermore, our tracking system automatically determines the number of hands in the scene, and can cope with hands entering or leaving the video canvas. We introduced several novel techniques that greatly increase tracking robustness, and that can also be applied in other domains than hand tracking. To achieve real-time processing, we investigated several techniques to reduce the search space of the problem, and deliberately employ methods that are easily parallelized on modern hardware. Experimental results indicate that our methods outperform the state-of-the-art in hand tracking, while providing a much lower computational complexity. One of the methods used by our probabilistic tracking algorithm, is optical flow estimation. Optical flow is defined as a 2D vector field describing the apparent velocities of objects in a 3D scene, projected onto the image plane. Optical flow is known to be used by many insects and birds to visually track objects and to estimate their ego-motion. However, most optical flow estimation methods described in literature are either too slow to be used in real-time applications, or are not robust to illumination changes and fast motion. We therefore developed an optical flow algorithm that can cope with large displacements, and that is illumination independent. Furthermore, we introduce a regularization technique that ensures a smooth flow-field. This regularization scheme effectively reduces the number of noisy and incorrect flow-vector estimates, while maintaining the ability to handle motion discontinuities caused by object boundaries in the scene. The above methods are combined into a hand tracking framework which can be used for interactive applications in unconstrained environments. To demonstrate the possibilities of gesture based human-computer interaction, we developed a new type of computer display. This display is completely transparent, allowing multiple users to perform collaborative tasks while maintaining eye contact. Furthermore, our display produces an image that seems to float in thin air, such that users can touch the virtual image with their hands. This floating imaging display has been showcased on several national and international events and tradeshows. The research that is described in this dissertation has been evaluated thoroughly by comparing detection and tracking results with those obtained by state-of-the-art algorithms. These comparisons show that the proposed methods outperform most algorithms in terms of accuracy, while achieving a much lower computational complexity, resulting in a real-time implementation. Results are discussed in depth at the end of each chapter. This research further resulted in an international journal publication; a second journal paper that has been submitted and is under review at the time of writing this dissertation; nine international conference publications; a national conference publication; a commercial license agreement concerning the research results; two hardware prototypes of a new type of computer display; and a software demonstrator

Ghent University Academic Bibliography

Detail Enhancing Denoising of Digitized 3D Models from a Mobile Scanning System

Author: Grinstead Bradley I
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2008
Field of study

The acquisition process of digitizing a large-scale environment produces an enormous amount of raw geometry data. This data is corrupted by system noise, which leads to 3D surfaces that are not smooth and details that are distorted. Any scanning system has noise associate with the scanning hardware, both digital quantization errors and measurement inaccuracies, but a mobile scanning system has additional system noise introduced by the pose estimation of the hardware during data acquisition. The combined system noise generates data that is not handled well by existing noise reduction and smoothing techniques. This research is focused on enhancing the 3D models acquired by mobile scanning systems used to digitize large-scale environments. These digitization systems combine a variety of sensors – including laser range scanners, video cameras, and pose estimation hardware – on a mobile platform for the quick acquisition of 3D models of real world environments. The data acquired by such systems are extremely noisy, often with significant details being on the same order of magnitude as the system noise. By utilizing a unique 3D signal analysis tool, a denoising algorithm was developed that identifies regions of detail and enhances their geometry, while removing the effects of noise on the overall model. The developed algorithm can be useful for a variety of digitized 3D models, not just those involving mobile scanning systems. The challenges faced in this study were the automatic processing needs of the enhancement algorithm, and the need to fill a hole in the area of 3D model analysis in order to reduce the effect of system noise on the 3D models. In this context, our main contributions are the automation and integration of a data enhancement method not well known to the computer vision community, and the development of a novel 3D signal decomposition and analysis tool. The new technologies featured in this document are intuitive extensions of existing methods to new dimensionality and applications. The totality of the research has been applied towards detail enhancing denoising of scanned data from a mobile range scanning system, and results from both synthetic and real models are presented

University of Tennessee, Knoxville: Trace

Object Tracking

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application

Directory of Open Access Books (DOAB)

Acceleration Methods for MRI

Author: Muckley Matthew J.
Publication venue
Publication date: 01/01/2016
Field of study

Acceleration methods are a critical area of research for MRI. Two of the most important acceleration techniques involve parallel imaging and compressed sensing. These advanced signal processing techniques have the potential to drastically reduce scan times and provide radiologists with new information for diagnosing disease. However, many of these new techniques require solving difficult optimization problems, which motivates the development of more advanced algorithms to solve them. In addition, acceleration methods have not reached maturity in some applications, which motivates the development of new models tailored to these applications. This dissertation makes advances in three different areas of accelerations. The first is the development of a new algorithm (called B1-Based, Adaptive Restart, Iterative Soft Thresholding Algorithm or BARISTA), that solves a parallel MRI optimization problem with compressed sensing assumptions. BARISTA is shown to be 2-3 times faster and more robust to parameter selection than current state-of-the-art variable splitting methods. The second contribution is the extension of BARISTA ideas to non-Cartesian trajectories that also leads to a 2-3 times acceleration over previous methods. The third contribution is the development of a new model for functional MRI that enables a 3-4 factor of acceleration of effective temporal resolution in functional MRI scans. Several variations of the new model are proposed, with an ROC curve analysis showing that a combination low-rank/sparsity model giving the best performance in identifying the resting-state motor network.PhDBiomedical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120841/1/mmuckley_1.pd

Deep Blue Documents at the University of Michigan

Feature based estimation of myocardial motion from tagged MR images

Author: Becciu A.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2010
Field of study

In the past few years we witnessed an increase in mortality due to cancer relative to mortality due to cardiovascular diseases. In 2008, the Netherlands Statistics Agency reports that 33.900 people died of cancer against 33.100 deaths due to cardiovascular diseases, making cancer the number one cause of death in the Netherlands [33]. Even if the rate of people affected by heart diseases is continually rising, they "simply don’t die of it", according to the research director Prof. Mat Daemen of research institute CARIM of the University of Maastricht [50]. The reason for this is the early diagnosis, and the treatment of people with identified risk factors for diseases like ischemic heart disease, hypertrophic cardiomyopathy, thoracic aortic disease, pericardial (sac around the heart) disease, cardiac tumors, pulmonary artery disease, valvular disease, and congenital heart disease before and after surgical repair. Cardiac imaging plays a crucial role in the early diagnosis, since it allows the accurate investigation of a large amount of imaging data in a small amount of time. Moreover, cardiac imaging reduces costs of inpatient care, as has been shown in recent studies [77]. With this in mind, in this work we have provided several tools with the aim to help the investigation of the cardiac motion. In chapters 2 and 3 we have explored a novel variational optic flow methodology based on multi-scale feature points to extract cardiac motion from tagged MR images. Compared to constant brightness methods, this new approach exhibits several advantages. Although the intensity of critical points is also influenced by fading, critical points do retain their characteristic even in the presence of intensity changes, such as in MR imaging. In an experiment in section 5.4 we have applied this optic flow approach directly on tagged MR images. A visual inspection confirmed that the extracted motion fields realistically depicted the cardiac wall motion. The method exploits also the advantages from the multiscale framework. Because sparse velocity formulas 2.9, 3.7, 6.21, and 7.5 provide a number of equations equal to the number of unknowns, the method does not suffer from the aperture problem in retrieving velocities associated to the critical points. In chapters 2 and 3 we have moreover introduced a smoothness component of the optic flow equation described by means of covariant derivatives. This is a novelty in the optic flow literature. Many variational optic flow methods present a smoothness component that penalizes for changes from global assumptions such as isotropic or anisotropic smoothness. In the smoothness term proposed deviations from a predefined motion model are penalized. Moreover, the proposed optic flow equation has been decomposed in rotation-free and divergence-free components. This decomposition allows independent tuning of the two components during the vector field reconstruction. The experiments and the Table of errors provided in 3.8 showed that the combination of the smoothness term, influenced by a predefined motion model, and the Helmholtz decomposition in the optic flow equation reduces the average angular error substantially (20%-25%) with respect to a similar technique that employs only standard derivatives in the smoothness term. In section 5.3 we extracted the motion field of a phantom of which we know the ground truth of and compared the performance of this optic flow method with the performance of other optic flow methods well known in the literature, such as the Horn and Schunck [76] approach, the Lucas and Kanade [111] technique and the tuple image multi-scale optic flow constraint equation of Van Assen et al. [163]. Tests showed that the proposed optic flow methodology provides the smallest average angular error (AAE = 3.84 degrees) and L2 norm = 0.1. In this work we employed the Helmholtz decomposition also to study the cardiac behavior, since the vector field decomposition allows to investigate cardiac contraction and cardiac rotation independently. In chapter 4 we carried out an analysis of cardiac motion of ten volunteers and one patient where we estimated the kinetic energy for the different components. This decomposition is useful since it allows to visualize and quantify the contributions of each single vector field component to the heart beat. Local measurements of the kinetic energy have also been used to detect areas of the cardiac walls with little movement. Experiments on a patient and a comparison between a late enhancement cardiac image and an illustration of the cardiac kinetic energy on a bull’s eye plot illustrated that a correspondence between an infarcted area and an area with very small kinetic energy exists. With the aim to extend in the future the proposed optic flow equation to a 3D approach, in chapter 6 we investigated the 3D winding number approach as a tool to locate critical points in volume images. We simplified the mathematics involved with respect to a previous work [150] and we provided several examples and applications such as cardiac motion estimation from 3-dimensional tagged images, follicle and neuronal cell counting. Finally in chapter 7 we continued our investigation on volume tagged MR images, by retrieving the cardiac motion field using a 3-dimensional and simple version of the proposed optic flow equation based on standard derivatives. We showed that the retrieved motion fields display the contracting and rotating behavior of the cardiac muscle. We moreover extracted the through-plane component, which provides a realistic illustration of the vector field and is missed by 2-dimensional approaches

Repository TU/e

Pure OAI Repository

Recommended from our members

3D Shape Understanding and Generation

Author: Gadelha Matheus
Publication venue: ScholarWorks@UMass Amherst
Publication date: 20/10/2021
Field of study

In recent years, Machine Learning techniques have revolutionized solutions to longstanding image-based problems, like image classification, generation, semantic segmentation, object detection and many others. However, if we want to be able to build agents that can successfully interact with the real world, those techniques need to be capable of reasoning about the world as it truly is: a tridimensional space. There are two main challenges while handling 3D information in machine learning models. First, it is not clear what is the best 3D representation. For images, convolutional neural networks (CNNs) operating on raster images yield the best results in virtually all image-based benchmarks. For 3D data, the best combination of model and representation is still an open question. Second, 3D data is not available on the same scale as images – taking pictures is a common procedure in our daily lives, whereas capturing 3D content is an activity usually restricted to specialized professionals. This thesis is focused on addressing both of these issues. Which model and representation should we use for generating and recognizing 3D data? What are efficient ways of learning 3D representations from a few examples? Is it possible to leverage image data to build models capable of reasoning about the world in 3D? Our research findings show that it is possible to build models that efficiently generate 3D shapes as irregularly structured representations. Those models require significantly less memory while generating higher quality shapes than the ones based on voxels and multi-view representations. We start by developing techniques to generate shapes represented as point clouds. This class of models leads to high quality reconstructions and better unsupervised feature learning. However, since point clouds are not amenable to editing and human manipulation, we also present models capable of generating shapes as sets of shape handles -- simpler primitives that summarize complex 3D shapes and were specifically designed for high-level tasks and user interaction. Despite their effectiveness, those approaches require some form of 3D supervision, which is scarce. We present multiple alternatives to this problem. First, we investigate how approximate convex decomposition techniques can be used as self-supervision to improve recognition models when only a limited number of labels are available. Second, we study how neural network architectures induce shape priors that can be used in multiple reconstruction tasks -- using both volumetric and manifold representations. In this regime, reconstruction is performed from a single example -- either a sparse point cloud or multiple silhouettes. Finally, we demonstrate how to train generative models of 3D shapes without using any 3D supervision by combining differentiable rendering techniques and Generative Adversarial Networks

ScholarWorks@UMass Amherst

3D Motion Analysis via Energy Minimization

Author: Wedel Andreas
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

This work deals with 3D motion analysis from stereo image sequences for driver assistance systems. It consists of two parts: the estimation of motion from the image data and the segmentation of moving objects in the input images. The content can be summarized with the technical term machine visual kinesthesia, the sensation or perception and cognition of motion. In the first three chapters, the importance of motion information is discussed for driver assistance systems, for machine vision in general, and for the estimation of ego motion. The next two chapters delineate on motion perception, analyzing the apparent movement of pixels in image sequences for both a monocular and binocular camera setup. Then, the obtained motion information is used to segment moving objects in the input video. Thus, one can clearly identify the thread from analyzing the input images to describing the input images by means of stationary and moving objects. Finally, I present possibilities for future applications based on the contents of this thesis. Previous work in each case is presented in the respective chapters. Although the overarching issue of motion estimation from image sequences is related to practice, there is nothing as practical as a good theory (Kurt Lewin). Several problems in computer vision are formulated as intricate energy minimization problems. In this thesis, motion analysis in image sequences is thoroughly investigated, showing that splitting an original complex problem into simplified sub-problems yields improved accuracy, increased robustness, and a clear and accessible approach to state-of-the-art motion estimation techniques. In Chapter 4, optical flow is considered. Optical flow is commonly estimated by minimizing the combined energy, consisting of a data term and a smoothness term. These two parts are decoupled, yielding a novel and iterative approach to optical flow. The derived Refinement Optical Flow framework is a clear and straight-forward approach to computing the apparent image motion vector field. Furthermore this results currently in the most accurate motion estimation techniques in literature. Much as this is an engineering approach of fine-tuning precision to the last detail, it helps to get a better insight into the problem of motion estimation. This profoundly contributes to state-of-the-art research in motion analysis, in particular facilitating the use of motion estimation in a wide range of applications. In Chapter 5, scene flow is rethought. Scene flow stands for the three-dimensional motion vector field for every image pixel, computed from a stereo image sequence. Again, decoupling of the commonly coupled approach of estimating three-dimensional position and three dimensional motion yields an approach to scene ow estimation with more accurate results and a considerably lower computational load. It results in a dense scene flow field and enables additional applications based on the dense three-dimensional motion vector field, which are to be investigated in the future. One such application is the segmentation of moving objects in an image sequence. Detecting moving objects within the scene is one of the most important features to extract in image sequences from a dynamic environment. This is presented in Chapter 6. Scene flow and the segmentation of independently moving objects are only first steps towards machine visual kinesthesia. Throughout this work, I present possible future work to improve the estimation of optical flow and scene flow. Chapter 7 additionally presents an outlook on future research for driver assistance applications. But there is much more to the full understanding of the three-dimensional dynamic scene. This work is meant to inspire the reader to think outside the box and contribute to the vision of building perceiving machines.</em

bonndoc – Der Publikationsserver der Universität Bonn

Sparse Modeling for Image and Vision Processing

Author: Ecole Normale Supérieure
Francis Bach
Francis Bach
Hal Id Hal
Jean Ponce
Jean Ponce
Julien Mairal
Julien Mairal
Sparse Modeling Image
Vision Processing
Publication venue
Publication date: 01/01/2014
Field of study

In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server