193 research outputs found

    THE USE OF CONTEXTUAL CLUES IN REDUCING FALSE POSITIVES IN AN EFFICIENT VISION-BASED HEAD GESTURE RECOGNITION SYSTEM

    Get PDF
    This thesis explores the use of head gesture recognition as an intuitive interface for computer interaction. This research presents a novel vision-based head gesture recognition system which utilizes contextual clues to reduce false positives. The system is used as a computer interface for answering dialog boxes. This work seeks to validate similar research, but focuses on using more efficient techniques using everyday hardware. A survey of image processing techniques for recognizing and tracking facial features is presented along with a comparison of several methods for tracking and identifying gestures over time. The design explains an efficient reusable head gesture recognition system using efficient lightweight algorithms to minimize resource utilization. The research conducted consists of a comparison between the base gesture recognition system and an optimized system that uses contextual clues to reduce false positives. The results confirm that simple contextual clues can lead to a significant reduction of false positives. The head gesture recognition system achieves an overall accuracy of 96% using contextual clues and significantly reduces false positives. In addition, the results from a usability study are presented showing that head gesture recognition is considered an intuitive interface and desirable above conventional input for answering dialog boxes. By providing the detailed design and architecture of a head gesture recognition system using efficient techniques and simple hardware, this thesis demonstrates the feasibility of implementing head gesture recognition as an intuitive form of interaction using preexisting infrastructure, and also provides evidence that such a system is desirable

    Computer vision based traffic monitoring system for multi-track freeways

    Get PDF
    Nowadays, development is synonymous with construction of infrastructure. Such road infrastructure needs constant attention in terms of traffic monitoring as even a single disaster on a major artery will disrupt the way of life. Humans cannot be expected to monitor these massive infrastructures over 24/7 and computer vision is increasingly being used to develop automated strategies to notify the human observers of any impending slowdowns and traffic bottlenecks. However, due to extreme costs associated with the current state of the art computer vision based networked monitoring systems, innovative computer vision based systems can be developed which are standalone and efficient in analyzing the traffic flow and tracking vehicles for speed detection. In this article, a traffic monitoring system is suggested that counts vehicles and tracks their speeds in realtime for multi-track freeways in Australia. Proposed algorithm uses Gaussian mixture model for detection of foreground and is capable of tracking the vehicle trajectory and extracts the useful traffic information for vehicle counting. This stationary surveillance system uses a fixed position overhead camera to monitor traffic

    Advanced information processing of MEMS motion sensors for gesture interaction

    Get PDF

    3D hand tracking.

    Get PDF
    The hand is often considered as one of the most natural and intuitive interaction modalities for human-to-human interaction. In human-computer interaction (HCI), proper 3D hand tracking is the first step in developing a more intuitive HCI system which can be used in applications such as gesture recognition, virtual object manipulation and gaming. However, accurate 3D hand tracking, remains a challenging problem due to the hand’s deformation, appearance similarity, high inter-finger occlusion and complex articulated motion. Further, 3D hand tracking is also interesting from a theoretical point of view as it deals with three major areas of computer vision- segmentation (of hand), detection (of hand parts), and tracking (of hand). This thesis proposes a region-based skin color detection technique, a model-based and an appearance-based 3D hand tracking techniques to bring the human-computer interaction applications one step closer. All techniques are briefly described below. Skin color provides a powerful cue for complex computer vision applications. Although skin color detection has been an active research area for decades, the mainstream technology is based on individual pixels. This thesis presents a new region-based technique for skin color detection which outperforms the current state-of-the-art pixel-based skin color detection technique on the popular Compaq dataset (Jones & Rehg 2002). The proposed technique achieves 91.17% true positive rate with 13.12% false negative rate on the Compaq dataset tested over approximately 14,000 web images. Hand tracking is not a trivial task as it requires tracking of 27 degreesof- freedom of hand. Hand deformation, self occlusion, appearance similarity and irregular motion are major problems that make 3D hand tracking a very challenging task. This thesis proposes a model-based 3D hand tracking technique, which is improved by using proposed depth-foreground-background ii feature, palm deformation module and context cue. However, the major problem of model-based techniques is, they are computationally expensive. This can be overcome by discriminative techniques as described below. Discriminative techniques (for example random forest) are good for hand part detection, however they fail due to sensor noise and high interfinger occlusion. Additionally, these techniques have difficulties in modelling kinematic or temporal constraints. Although model-based descriptive (for example Markov Random Field) or generative (for example Hidden Markov Model) techniques utilize kinematic and temporal constraints well, they are computationally expensive and hardly recover from tracking failure. This thesis presents a unified framework for 3D hand tracking, using the best of both methodologies, which out performs the current state-of-the-art 3D hand tracking techniques. The proposed 3D hand tracking techniques in this thesis can be used to extract accurate hand movement features and enable complex human machine interaction such as gaming and virtual object manipulation

    A review of computer vision-based approaches for physical rehabilitation and assessment

    Get PDF
    The computer vision community has extensively researched the area of human motion analysis, which primarily focuses on pose estimation, activity recognition, pose or gesture recognition and so on. However for many applications, like monitoring of functional rehabilitation of patients with musculo skeletal or physical impairments, the requirement is to comparatively evaluate human motion. In this survey, we capture important literature on vision-based monitoring and physical rehabilitation that focuses on comparative evaluation of human motion during the past two decades and discuss the state of current research in this area. Unlike other reviews in this area, which are written from a clinical objective, this article presents research in this area from a computer vision application perspective. We propose our own taxonomy of computer vision-based rehabilitation and assessment research which are further divided into sub-categories to capture novelties of each research. The review discusses the challenges of this domain due to the wide ranging human motion abnormalities and difficulty in automatically assessing those abnormalities. Finally, suggestions on the future direction of research are offered

    Hand features extractor using hand contour – a case study

    Get PDF
    Hand gesture recognition is an important topic in natural user interfaces (NUI). Hand features extraction is the first step for hand gesture recognition. This work proposes a novel real time method for hand features recognition. In our framework we use three cameras and the hand region is extracted with the background subtraction method. Features like arm angle and fingers positions are calculated using Y variations in the vertical contour image. Wrist detection is obtained by calculating the bigger distance from a base line and the hand contour, giving the main features for the hand gesture recognition. Experiments on our own data-set of about 1800 images show that our method performs well and is highly efficient

    Robust and real-time hand detection and tracking in monocular video

    Get PDF
    In recent years, personal computing devices such as laptops, tablets and smartphones have become ubiquitous. Moreover, intelligent sensors are being integrated into many consumer devices such as eyeglasses, wristwatches and smart televisions. With the advent of touchscreen technology, a new human-computer interaction (HCI) paradigm arose that allows users to interface with their device in an intuitive manner. Using simple gestures, such as swipe or pinch movements, a touchscreen can be used to directly interact with a virtual environment. Nevertheless, touchscreens still form a physical barrier between the virtual interface and the real world. An increasingly popular field of research that tries to overcome this limitation, is video based gesture recognition, hand detection and hand tracking. Gesture based interaction allows the user to directly interact with the computer in a natural manner by exploring a virtual reality using nothing but his own body language. In this dissertation, we investigate how robust hand detection and tracking can be accomplished under real-time constraints. In the context of human-computer interaction, real-time is defined as both low latency and low complexity, such that a complete video frame can be processed before the next one becomes available. Furthermore, for practical applications, the algorithms should be robust to illumination changes, camera motion, and cluttered backgrounds in the scene. Finally, the system should be able to initialize automatically, and to detect and recover from tracking failure. We study a wide variety of existing algorithms, and propose significant improvements and novel methods to build a complete detection and tracking system that meets these requirements. Hand detection, hand tracking and hand segmentation are related yet technically different challenges. Whereas detection deals with finding an object in a static image, tracking considers temporal information and is used to track the position of an object over time, throughout a video sequence. Hand segmentation is the task of estimating the hand contour, thereby separating the object from its background. Detection of hands in individual video frames allows us to automatically initialize our tracking algorithm, and to detect and recover from tracking failure. Human hands are highly articulated objects, consisting of finger parts that are connected with joints. As a result, the appearance of a hand can vary greatly, depending on the assumed hand pose. Traditional detection algorithms often assume that the appearance of the object of interest can be described using a rigid model and therefore can not be used to robustly detect human hands. Therefore, we developed an algorithm that detects hands by exploiting their articulated nature. Instead of resorting to a template based approach, we probabilistically model the spatial relations between different hand parts, and the centroid of the hand. Detecting hand parts, such as fingertips, is much easier than detecting a complete hand. Based on our model of the spatial configuration of hand parts, the detected parts can be used to obtain an estimate of the complete hand's position. To comply with the real-time constraints, we developed techniques to speed-up the process by efficiently discarding unimportant information in the image. Experimental results show that our method is competitive with the state-of-the-art in object detection while providing a reduction in computational complexity with a factor 1 000. Furthermore, we showed that our algorithm can also be used to detect other articulated objects such as persons or animals and is therefore not restricted to the task of hand detection. Once a hand has been detected, a tracking algorithm can be used to continuously track its position in time. We developed a probabilistic tracking method that can cope with uncertainty caused by image noise, incorrect detections, changing illumination, and camera motion. Furthermore, our tracking system automatically determines the number of hands in the scene, and can cope with hands entering or leaving the video canvas. We introduced several novel techniques that greatly increase tracking robustness, and that can also be applied in other domains than hand tracking. To achieve real-time processing, we investigated several techniques to reduce the search space of the problem, and deliberately employ methods that are easily parallelized on modern hardware. Experimental results indicate that our methods outperform the state-of-the-art in hand tracking, while providing a much lower computational complexity. One of the methods used by our probabilistic tracking algorithm, is optical flow estimation. Optical flow is defined as a 2D vector field describing the apparent velocities of objects in a 3D scene, projected onto the image plane. Optical flow is known to be used by many insects and birds to visually track objects and to estimate their ego-motion. However, most optical flow estimation methods described in literature are either too slow to be used in real-time applications, or are not robust to illumination changes and fast motion. We therefore developed an optical flow algorithm that can cope with large displacements, and that is illumination independent. Furthermore, we introduce a regularization technique that ensures a smooth flow-field. This regularization scheme effectively reduces the number of noisy and incorrect flow-vector estimates, while maintaining the ability to handle motion discontinuities caused by object boundaries in the scene. The above methods are combined into a hand tracking framework which can be used for interactive applications in unconstrained environments. To demonstrate the possibilities of gesture based human-computer interaction, we developed a new type of computer display. This display is completely transparent, allowing multiple users to perform collaborative tasks while maintaining eye contact. Furthermore, our display produces an image that seems to float in thin air, such that users can touch the virtual image with their hands. This floating imaging display has been showcased on several national and international events and tradeshows. The research that is described in this dissertation has been evaluated thoroughly by comparing detection and tracking results with those obtained by state-of-the-art algorithms. These comparisons show that the proposed methods outperform most algorithms in terms of accuracy, while achieving a much lower computational complexity, resulting in a real-time implementation. Results are discussed in depth at the end of each chapter. This research further resulted in an international journal publication; a second journal paper that has been submitted and is under review at the time of writing this dissertation; nine international conference publications; a national conference publication; a commercial license agreement concerning the research results; two hardware prototypes of a new type of computer display; and a software demonstrator

    A Low Cost and Computationally Efficient Approach for Occlusion Handling in Video Surveillance Systems

    Get PDF
    In the development of intelligent video surveillance systems for tracking a vehicle, occlusions are one of the major challenges. It becomes difficult to retain features during occlusion especially in case of complete occlusion. In this paper, a target vehicle tracking algorithm for Smart Video Surveillance (SVS) is proposed to track an unidentified target vehicle even in case of occlusions. This paper proposes a computationally efficient approach for handling occlusions named as Kalman Filter Assisted Occlusion Handling (KFAOH) technique. The algorithm works through two periods namely tracking period when no occlusion is seen and detection period when occlusion occurs, thus depicting its hybrid nature. Kanade-Lucas-Tomasi (KLT) feature tracker governs the operation of algorithm during the tracking period, whereas, a Cascaded Object Detector (COD) of weak classifiers, specially trained on a large database of cars governs the operation during detection period or occlusion with the assistance of Kalman Filter (KF). The algorithm’s tracking efficiency has been tested on six different tracking scenarios with increasing complexity in real-time. Performance evaluation under different noise variances and illumination levels shows that the tracking algorithm has good robustness against high noise and low illumination. All tests have been conducted on the MATLAB platform. The validity and practicality of the algorithm are also verified by success plots and precision plots for the test cases

    Development and Evaluation of Facial Gesture Recognition and Head Tracking for Assistive Technologies

    Get PDF
    Globally, the World Health Organisation estimates that there are about 1 billion people suffering from disabilities and the UK has about 10 million people suffering from neurological disabilities in particular. In extreme cases these individuals with disabilities such as Motor Neuron Disease(MND), Cerebral Palsy(CP) and Multiple Sclerosis(MS) may only be able to perform limited head movement, move their eyes or make facial gestures. The aim of this research is to investigate low-cost and reliable assistive devices using automatic gesture recognition systems that will enable the most severely disabled user to access electronic assistive technologies and communication devices thus enabling them to communicate with friends and relative. The research presented in this thesis is concerned with the detection of head movements, eye movements, and facial gestures, through the analysis of video and depth images. The proposed system, using web cameras or a RGB-D sensor coupled with computer vision and pattern recognition techniques, will have to be able to detect the movement of the user and calibrate it to facilitate communication. The system will also provide the user with the functionality of choosing the sensor to be used i.e. the web camera or the RGB-D sensor, and the interaction or switching mechanism i.e. eye blink or eyebrows movement to use. This ability to system to enable the user to select according to the user's needs would make it easier on the users as they would not have to learn how to operating the same system as their condition changes. This research aims to explore in particular the use of depth data for head movement based assistive devices and the usability of different gesture modalities as switching mechanisms. The proposed framework consists of a facial feature detection module, a head tracking module and a gesture recognition module. Techniques such as Haar-Cascade and skin detection were used to detect facial features such as the face, eyes and nose. The depth data from the RGB-D sensor was used to segment the area nearest to the sensor. Both the head tracking module and the gesture recognition module rely on the facial feature module as it provided data such as the location of the facial features. The head tracking module uses the facial feature data to calculate the centroid of the face, the distance to the sensor, the location of the eyes and the nose to detect head motion and translate it into pointer movement. The gesture detection module uses features such as the location of the eyes, the location of the pupil, the size of the pupil and calculates the interocular distance for the detection of blink or eyebrows movement to perform a click action. The research resulted in the creation of four assistive devices based on the combination of the sensors (Web Camera and RGB-D sensor) and facial gestures (Blink and Eyebrows movement): Webcam-Blink, Webcam-Eyebrows, Kinect-Blink and Kinect-Eyebrows. Another outcome of this research has been the creation of an evaluation framework based on Fitts' Law with a modified multi-directional task including a central location and a dataset consisting of both colour images and depth data of people performing head movement towards different direction and performing gestures such as eye blink, eyebrows movement and mouth movements. The devices have been tested with healthy participants. From the observed data, it was found that both Kinect-based devices have lower Movement Time and higher Index of Performance and Effective Throughput than the web camera-based devices thus showing that the introduction of the depth data has had a positive impact on the head tracking algorithm. The usability assessment survey, suggests that there is a significant difference in eye fatigue experienced by the participants; blink gesture was less tiring to the eye than eyebrows movement gesture. Also, the analysis of the gestures showed that the Index of Difficulty has a large effect on the error rates of the gesture detection and also that the smaller the Index of Difficulty the higher the error rate
    • …
    corecore