71 research outputs found

    Robust and real-time hand detection and tracking in monocular video

    Get PDF
    In recent years, personal computing devices such as laptops, tablets and smartphones have become ubiquitous. Moreover, intelligent sensors are being integrated into many consumer devices such as eyeglasses, wristwatches and smart televisions. With the advent of touchscreen technology, a new human-computer interaction (HCI) paradigm arose that allows users to interface with their device in an intuitive manner. Using simple gestures, such as swipe or pinch movements, a touchscreen can be used to directly interact with a virtual environment. Nevertheless, touchscreens still form a physical barrier between the virtual interface and the real world. An increasingly popular field of research that tries to overcome this limitation, is video based gesture recognition, hand detection and hand tracking. Gesture based interaction allows the user to directly interact with the computer in a natural manner by exploring a virtual reality using nothing but his own body language. In this dissertation, we investigate how robust hand detection and tracking can be accomplished under real-time constraints. In the context of human-computer interaction, real-time is defined as both low latency and low complexity, such that a complete video frame can be processed before the next one becomes available. Furthermore, for practical applications, the algorithms should be robust to illumination changes, camera motion, and cluttered backgrounds in the scene. Finally, the system should be able to initialize automatically, and to detect and recover from tracking failure. We study a wide variety of existing algorithms, and propose significant improvements and novel methods to build a complete detection and tracking system that meets these requirements. Hand detection, hand tracking and hand segmentation are related yet technically different challenges. Whereas detection deals with finding an object in a static image, tracking considers temporal information and is used to track the position of an object over time, throughout a video sequence. Hand segmentation is the task of estimating the hand contour, thereby separating the object from its background. Detection of hands in individual video frames allows us to automatically initialize our tracking algorithm, and to detect and recover from tracking failure. Human hands are highly articulated objects, consisting of finger parts that are connected with joints. As a result, the appearance of a hand can vary greatly, depending on the assumed hand pose. Traditional detection algorithms often assume that the appearance of the object of interest can be described using a rigid model and therefore can not be used to robustly detect human hands. Therefore, we developed an algorithm that detects hands by exploiting their articulated nature. Instead of resorting to a template based approach, we probabilistically model the spatial relations between different hand parts, and the centroid of the hand. Detecting hand parts, such as fingertips, is much easier than detecting a complete hand. Based on our model of the spatial configuration of hand parts, the detected parts can be used to obtain an estimate of the complete hand's position. To comply with the real-time constraints, we developed techniques to speed-up the process by efficiently discarding unimportant information in the image. Experimental results show that our method is competitive with the state-of-the-art in object detection while providing a reduction in computational complexity with a factor 1 000. Furthermore, we showed that our algorithm can also be used to detect other articulated objects such as persons or animals and is therefore not restricted to the task of hand detection. Once a hand has been detected, a tracking algorithm can be used to continuously track its position in time. We developed a probabilistic tracking method that can cope with uncertainty caused by image noise, incorrect detections, changing illumination, and camera motion. Furthermore, our tracking system automatically determines the number of hands in the scene, and can cope with hands entering or leaving the video canvas. We introduced several novel techniques that greatly increase tracking robustness, and that can also be applied in other domains than hand tracking. To achieve real-time processing, we investigated several techniques to reduce the search space of the problem, and deliberately employ methods that are easily parallelized on modern hardware. Experimental results indicate that our methods outperform the state-of-the-art in hand tracking, while providing a much lower computational complexity. One of the methods used by our probabilistic tracking algorithm, is optical flow estimation. Optical flow is defined as a 2D vector field describing the apparent velocities of objects in a 3D scene, projected onto the image plane. Optical flow is known to be used by many insects and birds to visually track objects and to estimate their ego-motion. However, most optical flow estimation methods described in literature are either too slow to be used in real-time applications, or are not robust to illumination changes and fast motion. We therefore developed an optical flow algorithm that can cope with large displacements, and that is illumination independent. Furthermore, we introduce a regularization technique that ensures a smooth flow-field. This regularization scheme effectively reduces the number of noisy and incorrect flow-vector estimates, while maintaining the ability to handle motion discontinuities caused by object boundaries in the scene. The above methods are combined into a hand tracking framework which can be used for interactive applications in unconstrained environments. To demonstrate the possibilities of gesture based human-computer interaction, we developed a new type of computer display. This display is completely transparent, allowing multiple users to perform collaborative tasks while maintaining eye contact. Furthermore, our display produces an image that seems to float in thin air, such that users can touch the virtual image with their hands. This floating imaging display has been showcased on several national and international events and tradeshows. The research that is described in this dissertation has been evaluated thoroughly by comparing detection and tracking results with those obtained by state-of-the-art algorithms. These comparisons show that the proposed methods outperform most algorithms in terms of accuracy, while achieving a much lower computational complexity, resulting in a real-time implementation. Results are discussed in depth at the end of each chapter. This research further resulted in an international journal publication; a second journal paper that has been submitted and is under review at the time of writing this dissertation; nine international conference publications; a national conference publication; a commercial license agreement concerning the research results; two hardware prototypes of a new type of computer display; and a software demonstrator

    Image-based human pose estimation

    Get PDF

    Visual Tracking in Robotic Minimally Invasive Surgery

    Get PDF
    Intra-operative imaging and robotics are some of the technologies driving forward better and more effective minimally invasive surgical procedures. To advance surgical practice and capabilities further, one of the key requirements for computationally enhanced interventions is to know how instruments and tissues move during the operation. While endoscopic video captures motion, the complex appearance dynamic effects of surgical scenes are challenging for computer vision algorithms to handle with robustness. Tackling both tissue and instrument motion estimation, this thesis proposes a combined non-rigid surface deformation estimation method to track tissue surfaces robustly and in conditions with poor illumination. For instrument tracking, a keypoint based 2D tracker that relies on the Generalized Hough Transform is developed to initialize a 3D tracker in order to robustly track surgical instruments through long sequences that contain complex motions. To handle appearance changes and occlusion a patch-based adaptive weighting with segmentation and scale tracking framework is developed. It takes a tracking-by-detection approach and a segmentation model is used to assigns weights to template patches in order to suppress back- ground information. The performance of the method is thoroughly evaluated showing that without any offline-training, the tracker works well even in complex environments. Finally, the thesis proposes a novel 2D articulated instrument pose estimation framework, which includes detection-regression fully convolutional network and a multiple instrument parsing component. The framework achieves compelling performance and illustrates interesting properties includ- ing transfer between different instrument types and between ex vivo and in vivo data. In summary, the thesis advances the state-of-the art in visual tracking for surgical applications for both tissue and instrument motion estimation. It contributes to developing the technological capability of full surgical scene understanding from endoscopic video

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF

    Integrating Vision and Physical Interaction for Discovery, Segmentation and Grasping of Unknown Objects

    Get PDF
    In dieser Arbeit werden Verfahren der Bildverarbeitung und die Fähigkeit humanoider Roboter, mit ihrer Umgebung physisch zu interagieren, in engem Zusammenspiel eingesetzt, um unbekannte Objekte zu identifizieren, sie vom Hintergrund und anderen Objekten zu trennen, und letztendlich zu greifen. Im Verlauf dieser interaktiven Exploration werden außerdem Eigenschaften des Objektes wie etwa sein Aussehen und seine Form ermittelt

    Texture and Colour in Image Analysis

    Get PDF
    Research in colour and texture has experienced major changes in the last few years. This book presents some recent advances in the field, specifically in the theory and applications of colour texture analysis. This volume also features benchmarks, comparative evaluations and reviews

    Recognition, Analysis, and Assessments of Human Skills using Wearable Sensors

    Get PDF
    One of the biggest social issues in mature societies such as Europe and Japan is the aging population and declining birth rate. These societies have a serious problem with the retirement of the expert workers, doctors, and engineers etc. Especially in the sectors that require long time to make experts in fields like medicine and industry; the retirement and injuries of the experts, is a serious problem. The technology to support the training and assessment of skilled workers (like doctors, manufacturing workers) is strongly required for the society. Although there are some solutions for this problem, most of them are video-based which violates the privacy of the subjects. Furthermore, they are not easy to deploy due to the need for large training data. This thesis provides a novel framework to recognize, analyze, and assess human skills with minimum customization cost. The presented framework tackles this problem in two different domains, industrial setup and medical operations of catheter-based cardiovascular interventions (CBCVI). In particular, the contributions of this thesis are four-fold. First, it proposes an easy-to-deploy framework for human activity recognition based on zero-shot learning approach, which is based on learning basic actions and objects. The model recognizes unseen activities by combinations of basic actions learned in a preliminary way and involved objects. Therefore, it is completely configurable by the user and can be used to detect completely new activities. Second, a novel gaze-estimation model for attention driven object detection task is presented. The key features of the model are: (i) usage of the deformable convolutional layers to better incorporate spatial dependencies of different shapes of objects and backgrounds, (ii) formulation of the gaze-estimation problem in two different way, as a classification as well as a regression problem. We combine both formulations using a joint loss that incorporates both the cross-entropy as well as the mean-squared error in order to train our model. This enhanced the accuracy of the model from 6.8 by using only the cross-entropy loss to 6.4 for the joint loss. The third contribution of this thesis targets the area of quantification of quality of i actions using wearable sensor. To address the variety of scenarios, we have targeted two possibilities: a) both expert and novice data is available , b) only expert data is available, a quite common case in safety critical scenarios. Both of the developed methods from these scenarios are deep learning based. In the first one, we use autoencoders with OneClass SVM, and in the second one we use the Siamese Networks. These methods allow us to encode the expert’s expertise and to learn the differences between novice and expert workers. This enables quantification of the performance of the novice in comparison to the expert worker. The fourth contribution, explicitly targets medical practitioners and provides a methodology for novel gaze-based temporal spatial analysis of CBCVI data. The developed methodology allows continuous registration and analysis of gaze data for analysis of the visual X-ray image processing (XRIP) strategies of expert operators in live-cases scenarios and may assist in transferring experts’ reading skills to novices

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    Deep Learning-Based Action Recognition

    Get PDF
    The classification of human action or behavior patterns is very important for analyzing situations in the field and maintaining social safety. This book focuses on recent research findings on recognizing human action patterns. Technology for the recognition of human action pattern includes the processing technology of human behavior data for learning, technology of expressing feature values ​​of images, technology of extracting spatiotemporal information of images, technology of recognizing human posture, and technology of gesture recognition. Research on these technologies has recently been conducted using general deep learning network modeling of artificial intelligence technology, and excellent research results have been included in this edition
    • …
    corecore