66,051 research outputs found

    Visual Hand Tracking on Depth Image Using 2-D Matched Filter

    Get PDF
    Hand detection has been the central attention of human-machine interaction in recent researches. In order to track hand accurately, traditional methods mostly involve using machine learning and other available libraries, which requires a lot of computational resource on data collection and processing. This paper presents a method of hand detection and tracking using depth image which can be conveniently and manageably applied in practice without the huge data analysis. This method is based on the two-dimensional matched filter in image processing to precisely locate the hand position through several underlying codes, cooperated with a Delta robot. Compared with other approaches, this method is comprehensible and time-saving, especially for single specific gesture detection and tracking. Additionally, it is friendly-programmed and can be used on variable platforms such as MATLAB and Python. The experiments show that this method can do fast hand tracking and improve accuracy by selecting the proper hand template and can be directly used in the applications of human-machine interaction. In order to evaluate the performance of gesture tracking, a recorded video on depth image model is used to test theoretical design, and a delta parallel robot is used to follow the moving hand by the proposed algorithm, which demonstrates the feasibility in practice

    Robust and real-time hand detection and tracking in monocular video

    Get PDF
    In recent years, personal computing devices such as laptops, tablets and smartphones have become ubiquitous. Moreover, intelligent sensors are being integrated into many consumer devices such as eyeglasses, wristwatches and smart televisions. With the advent of touchscreen technology, a new human-computer interaction (HCI) paradigm arose that allows users to interface with their device in an intuitive manner. Using simple gestures, such as swipe or pinch movements, a touchscreen can be used to directly interact with a virtual environment. Nevertheless, touchscreens still form a physical barrier between the virtual interface and the real world. An increasingly popular field of research that tries to overcome this limitation, is video based gesture recognition, hand detection and hand tracking. Gesture based interaction allows the user to directly interact with the computer in a natural manner by exploring a virtual reality using nothing but his own body language. In this dissertation, we investigate how robust hand detection and tracking can be accomplished under real-time constraints. In the context of human-computer interaction, real-time is defined as both low latency and low complexity, such that a complete video frame can be processed before the next one becomes available. Furthermore, for practical applications, the algorithms should be robust to illumination changes, camera motion, and cluttered backgrounds in the scene. Finally, the system should be able to initialize automatically, and to detect and recover from tracking failure. We study a wide variety of existing algorithms, and propose significant improvements and novel methods to build a complete detection and tracking system that meets these requirements. Hand detection, hand tracking and hand segmentation are related yet technically different challenges. Whereas detection deals with finding an object in a static image, tracking considers temporal information and is used to track the position of an object over time, throughout a video sequence. Hand segmentation is the task of estimating the hand contour, thereby separating the object from its background. Detection of hands in individual video frames allows us to automatically initialize our tracking algorithm, and to detect and recover from tracking failure. Human hands are highly articulated objects, consisting of finger parts that are connected with joints. As a result, the appearance of a hand can vary greatly, depending on the assumed hand pose. Traditional detection algorithms often assume that the appearance of the object of interest can be described using a rigid model and therefore can not be used to robustly detect human hands. Therefore, we developed an algorithm that detects hands by exploiting their articulated nature. Instead of resorting to a template based approach, we probabilistically model the spatial relations between different hand parts, and the centroid of the hand. Detecting hand parts, such as fingertips, is much easier than detecting a complete hand. Based on our model of the spatial configuration of hand parts, the detected parts can be used to obtain an estimate of the complete hand's position. To comply with the real-time constraints, we developed techniques to speed-up the process by efficiently discarding unimportant information in the image. Experimental results show that our method is competitive with the state-of-the-art in object detection while providing a reduction in computational complexity with a factor 1 000. Furthermore, we showed that our algorithm can also be used to detect other articulated objects such as persons or animals and is therefore not restricted to the task of hand detection. Once a hand has been detected, a tracking algorithm can be used to continuously track its position in time. We developed a probabilistic tracking method that can cope with uncertainty caused by image noise, incorrect detections, changing illumination, and camera motion. Furthermore, our tracking system automatically determines the number of hands in the scene, and can cope with hands entering or leaving the video canvas. We introduced several novel techniques that greatly increase tracking robustness, and that can also be applied in other domains than hand tracking. To achieve real-time processing, we investigated several techniques to reduce the search space of the problem, and deliberately employ methods that are easily parallelized on modern hardware. Experimental results indicate that our methods outperform the state-of-the-art in hand tracking, while providing a much lower computational complexity. One of the methods used by our probabilistic tracking algorithm, is optical flow estimation. Optical flow is defined as a 2D vector field describing the apparent velocities of objects in a 3D scene, projected onto the image plane. Optical flow is known to be used by many insects and birds to visually track objects and to estimate their ego-motion. However, most optical flow estimation methods described in literature are either too slow to be used in real-time applications, or are not robust to illumination changes and fast motion. We therefore developed an optical flow algorithm that can cope with large displacements, and that is illumination independent. Furthermore, we introduce a regularization technique that ensures a smooth flow-field. This regularization scheme effectively reduces the number of noisy and incorrect flow-vector estimates, while maintaining the ability to handle motion discontinuities caused by object boundaries in the scene. The above methods are combined into a hand tracking framework which can be used for interactive applications in unconstrained environments. To demonstrate the possibilities of gesture based human-computer interaction, we developed a new type of computer display. This display is completely transparent, allowing multiple users to perform collaborative tasks while maintaining eye contact. Furthermore, our display produces an image that seems to float in thin air, such that users can touch the virtual image with their hands. This floating imaging display has been showcased on several national and international events and tradeshows. The research that is described in this dissertation has been evaluated thoroughly by comparing detection and tracking results with those obtained by state-of-the-art algorithms. These comparisons show that the proposed methods outperform most algorithms in terms of accuracy, while achieving a much lower computational complexity, resulting in a real-time implementation. Results are discussed in depth at the end of each chapter. This research further resulted in an international journal publication; a second journal paper that has been submitted and is under review at the time of writing this dissertation; nine international conference publications; a national conference publication; a commercial license agreement concerning the research results; two hardware prototypes of a new type of computer display; and a software demonstrator

    Research on Object Tracking Technology for Orderless and Blurred Movement under Complex Scenes

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Visual tracking is widely found in anomaly behaviour detection, self-driving, virtual reality. Recent researches reported that classic methods, including the Tracking-Learning-Detection method, the Particle Filter and the mean shift, were surpassed by deep learning in accuracy and correlation filtering in speed. However, correlation filtering can be affected by boundary effects. The conventional correlation filtering fixes the size of its detection window. When its detection window only captures partial target images due to large and sudden scale variations, the correlation filtering fails to locate the tracked target. When the target is undergoing violent shaking, motion blurs and orderless movements appear along with it. The conventional correlation filtering locks itself in the previous position of the target, and hence, the target is out of the sight of the correlation filtering. In this case, the correlation filtering drifts or fails to track. Therefore, this thesis topic is to track single-objects under complex scenes with attributes of motion blurs, orderless motions and scale variations. The main research innovation is listed as follows. (1) An approach for searching orderless movements is designed in a generative-discriminative tracking model. To address the uncertain orderless movements, a coarse-to-fine tracking framework is adopted. A spatio-temporal correlation is learned for the detection in the subsequent frames. Experiments are conducted on public databases with orderless motion attributes to validate the robustness of the proposed approach. (2) A template matching method is proposed for tracking objects with motion blurs. An effective target motion model is designed to provide supplementary appearance features. A robust similarity measure is proposed to address the outliers caused by motion blurs. Our approach outperforms other approaches in a public benchmark database with motion blurs. (3) An ensemble framework is designed to tackle scale variations. The scale of a target is estimated based on the Gaussian Particle Filtering. A high-confidence strategy is used to validate the reliability of tracking results. Our approach with hand-crafted or CNN features outperforms the methods based on correlation filtering and deep learning in databases with scale variations. To sum up, this thesis addresses boundary effects, model drifts, fixed search windows and easily interfered hand-crafted features of objects. Different trackers are proposed for tracking single-objects with orderless movements, motion blurs and scale variations. As future work, our methods can be extended to using a neural network to further improve single-object tracking models

    Facial Feature Tracking and Occlusion Recovery in American Sign Language

    Full text link
    Facial features play an important role in expressing grammatical information in signed languages, including American Sign Language(ASL). Gestures such as raising or furrowing the eyebrows are key indicators of constructions such as yes-no questions. Periodic head movements (nods and shakes) are also an essential part of the expression of syntactic information, such as negation (associated with a side-to-side headshake). Therefore, identification of these facial gestures is essential to sign language recognition. One problem with detection of such grammatical indicators is occlusion recovery. If the signer's hand blocks his/her eyebrows during production of a sign, it becomes difficult to track the eyebrows. We have developed a system to detect such grammatical markers in ASL that recovers promptly from occlusion. Our system detects and tracks evolving templates of facial features, which are based on an anthropometric face model, and interprets the geometric relationships of these templates to identify grammatical markers. It was tested on a variety of ASL sentences signed by various Deaf native signers and detected facial gestures used to express grammatical information, such as raised and furrowed eyebrows as well as headshakes.National Science Foundation (IIS-0329009, IIS-0093367, IIS-9912573, EIA-0202067, EIA-9809340

    Real-Time RGB-D based Template Matching Pedestrian Detection

    Full text link
    Pedestrian detection is one of the most popular topics in computer vision and robotics. Considering challenging issues in multiple pedestrian detection, we present a real-time depth-based template matching people detector. In this paper, we propose different approaches for training the depth-based template. We train multiple templates for handling issues due to various upper-body orientations of the pedestrians and different levels of detail in depth-map of the pedestrians with various distances from the camera. And, we take into account the degree of reliability for different regions of sliding window by proposing the weighted template approach. Furthermore, we combine the depth-detector with an appearance based detector as a verifier to take advantage of the appearance cues for dealing with the limitations of depth data. We evaluate our method on the challenging ETH dataset sequence. We show that our method outperforms the state-of-the-art approaches.Comment: published in ICRA 201

    Real-time 3D Tracking of Articulated Tools for Robotic Surgery

    Full text link
    In robotic surgery, tool tracking is important for providing safe tool-tissue interaction and facilitating surgical skills assessment. Despite recent advances in tool tracking, existing approaches are faced with major difficulties in real-time tracking of articulated tools. Most algorithms are tailored for offline processing with pre-recorded videos. In this paper, we propose a real-time 3D tracking method for articulated tools in robotic surgery. The proposed method is based on the CAD model of the tools as well as robot kinematics to generate online part-based templates for efficient 2D matching and 3D pose estimation. A robust verification approach is incorporated to reject outliers in 2D detections, which is then followed by fusing inliers with robot kinematic readings for 3D pose estimation of the tool. The proposed method has been validated with phantom data, as well as ex vivo and in vivo experiments. The results derived clearly demonstrate the performance advantage of the proposed method when compared to the state-of-the-art.Comment: This paper was presented in MICCAI 2016 conference, and a DOI was linked to the publisher's versio
    • …
    corecore