455,068 research outputs found

    Robust and real-time hand detection and tracking in monocular video

    Get PDF
    In recent years, personal computing devices such as laptops, tablets and smartphones have become ubiquitous. Moreover, intelligent sensors are being integrated into many consumer devices such as eyeglasses, wristwatches and smart televisions. With the advent of touchscreen technology, a new human-computer interaction (HCI) paradigm arose that allows users to interface with their device in an intuitive manner. Using simple gestures, such as swipe or pinch movements, a touchscreen can be used to directly interact with a virtual environment. Nevertheless, touchscreens still form a physical barrier between the virtual interface and the real world. An increasingly popular field of research that tries to overcome this limitation, is video based gesture recognition, hand detection and hand tracking. Gesture based interaction allows the user to directly interact with the computer in a natural manner by exploring a virtual reality using nothing but his own body language. In this dissertation, we investigate how robust hand detection and tracking can be accomplished under real-time constraints. In the context of human-computer interaction, real-time is defined as both low latency and low complexity, such that a complete video frame can be processed before the next one becomes available. Furthermore, for practical applications, the algorithms should be robust to illumination changes, camera motion, and cluttered backgrounds in the scene. Finally, the system should be able to initialize automatically, and to detect and recover from tracking failure. We study a wide variety of existing algorithms, and propose significant improvements and novel methods to build a complete detection and tracking system that meets these requirements. Hand detection, hand tracking and hand segmentation are related yet technically different challenges. Whereas detection deals with finding an object in a static image, tracking considers temporal information and is used to track the position of an object over time, throughout a video sequence. Hand segmentation is the task of estimating the hand contour, thereby separating the object from its background. Detection of hands in individual video frames allows us to automatically initialize our tracking algorithm, and to detect and recover from tracking failure. Human hands are highly articulated objects, consisting of finger parts that are connected with joints. As a result, the appearance of a hand can vary greatly, depending on the assumed hand pose. Traditional detection algorithms often assume that the appearance of the object of interest can be described using a rigid model and therefore can not be used to robustly detect human hands. Therefore, we developed an algorithm that detects hands by exploiting their articulated nature. Instead of resorting to a template based approach, we probabilistically model the spatial relations between different hand parts, and the centroid of the hand. Detecting hand parts, such as fingertips, is much easier than detecting a complete hand. Based on our model of the spatial configuration of hand parts, the detected parts can be used to obtain an estimate of the complete hand's position. To comply with the real-time constraints, we developed techniques to speed-up the process by efficiently discarding unimportant information in the image. Experimental results show that our method is competitive with the state-of-the-art in object detection while providing a reduction in computational complexity with a factor 1 000. Furthermore, we showed that our algorithm can also be used to detect other articulated objects such as persons or animals and is therefore not restricted to the task of hand detection. Once a hand has been detected, a tracking algorithm can be used to continuously track its position in time. We developed a probabilistic tracking method that can cope with uncertainty caused by image noise, incorrect detections, changing illumination, and camera motion. Furthermore, our tracking system automatically determines the number of hands in the scene, and can cope with hands entering or leaving the video canvas. We introduced several novel techniques that greatly increase tracking robustness, and that can also be applied in other domains than hand tracking. To achieve real-time processing, we investigated several techniques to reduce the search space of the problem, and deliberately employ methods that are easily parallelized on modern hardware. Experimental results indicate that our methods outperform the state-of-the-art in hand tracking, while providing a much lower computational complexity. One of the methods used by our probabilistic tracking algorithm, is optical flow estimation. Optical flow is defined as a 2D vector field describing the apparent velocities of objects in a 3D scene, projected onto the image plane. Optical flow is known to be used by many insects and birds to visually track objects and to estimate their ego-motion. However, most optical flow estimation methods described in literature are either too slow to be used in real-time applications, or are not robust to illumination changes and fast motion. We therefore developed an optical flow algorithm that can cope with large displacements, and that is illumination independent. Furthermore, we introduce a regularization technique that ensures a smooth flow-field. This regularization scheme effectively reduces the number of noisy and incorrect flow-vector estimates, while maintaining the ability to handle motion discontinuities caused by object boundaries in the scene. The above methods are combined into a hand tracking framework which can be used for interactive applications in unconstrained environments. To demonstrate the possibilities of gesture based human-computer interaction, we developed a new type of computer display. This display is completely transparent, allowing multiple users to perform collaborative tasks while maintaining eye contact. Furthermore, our display produces an image that seems to float in thin air, such that users can touch the virtual image with their hands. This floating imaging display has been showcased on several national and international events and tradeshows. The research that is described in this dissertation has been evaluated thoroughly by comparing detection and tracking results with those obtained by state-of-the-art algorithms. These comparisons show that the proposed methods outperform most algorithms in terms of accuracy, while achieving a much lower computational complexity, resulting in a real-time implementation. Results are discussed in depth at the end of each chapter. This research further resulted in an international journal publication; a second journal paper that has been submitted and is under review at the time of writing this dissertation; nine international conference publications; a national conference publication; a commercial license agreement concerning the research results; two hardware prototypes of a new type of computer display; and a software demonstrator

    COMPARISON OF POTENTIAL ROAD ACCIDENT DETECTION ALGORITHMS FOR MODERN MACHINE VISION SYSTEM

    Get PDF
    Nowadays the robotics is relevant development industry. Robots are becoming more sophisticated, and this requires more sophisticated technologies. One of them is robot vision. This is needed for robots which communicate with the environment using vision instead of a batch of sensors. These data are utilized to analyze the situation at hand and develop a real-time action plan for the given scenario. This article explores the most suitable algorithm for detecting potential road accidents, specifically focusing on the scenario of turning left across one or more oncoming lanes. The selection of the optimal algorithm is based on a comparative analysis of evaluation and testing results, including metrics such as maximum frames per second for video processing during detection using robot’s hardware. The study categorises potential accidents into two classes: danger and not-danger. The Yolov7 and Detectron2 algorithms are compared, and the article aims to create simple models with the potential for future refinement. Also, this article provides conclusions and recommendations regarding the practical implementation of the proposed models and algorithm

    A General Framework for Complex Network Applications

    Full text link
    Complex network theory has been applied to solving practical problems from different domains. In this paper, we present a general framework for complex network applications. The keys of a successful application are a thorough understanding of the real system and a correct mapping of complex network theory to practical problems in the system. Despite of certain limitations discussed in this paper, complex network theory provides a foundation on which to develop powerful tools in analyzing and optimizing large interconnected systems.Comment: 8 page

    Assessing the quality of steady-state visual-evoked potentials for moving humans using a mobile electroencephalogram headset.

    Get PDF
    Recent advances in mobile electroencephalogram (EEG) systems, featuring non-prep dry electrodes and wireless telemetry, have enabled and promoted the applications of mobile brain-computer interfaces (BCIs) in our daily life. Since the brain may behave differently while people are actively situated in ecologically-valid environments versus highly-controlled laboratory environments, it remains unclear how well the current laboratory-oriented BCI demonstrations can be translated into operational BCIs for users with naturalistic movements. Understanding inherent links between natural human behaviors and brain activities is the key to ensuring the applicability and stability of mobile BCIs. This study aims to assess the quality of steady-state visual-evoked potentials (SSVEPs), which is one of promising channels for functioning BCI systems, recorded using a mobile EEG system under challenging recording conditions, e.g., walking. To systematically explore the effects of walking locomotion on the SSVEPs, this study instructed subjects to stand or walk on a treadmill running at speeds of 1, 2, and 3 mile (s) per hour (MPH) while concurrently perceiving visual flickers (11 and 12 Hz). Empirical results of this study showed that the SSVEP amplitude tended to deteriorate when subjects switched from standing to walking. Such SSVEP suppression could be attributed to the walking locomotion, leading to distinctly deteriorated SSVEP detectability from standing (84.87 ± 13.55%) to walking (1 MPH: 83.03 ± 13.24%, 2 MPH: 79.47 ± 13.53%, and 3 MPH: 75.26 ± 17.89%). These findings not only demonstrated the applicability and limitations of SSVEPs recorded from freely behaving humans in realistic environments, but also provide useful methods and techniques for boosting the translation of the BCI technology from laboratory demonstrations to practical applications

    End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

    Full text link
    Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

    Mutation testing on an object-oriented framework: An experience report

    Get PDF
    This is the preprint version of the article - Copyright @ 2011 ElsevierContext The increasing presence of Object-Oriented (OO) programs in industrial systems is progressively drawing the attention of mutation researchers toward this paradigm. However, while the number of research contributions in this topic is plentiful, the number of empirical results is still marginal and mostly provided by researchers rather than practitioners. Objective This article reports our experience using mutation testing to measure the effectiveness of an automated test data generator from a user perspective. Method In our study, we applied both traditional and class-level mutation operators to FaMa, an open source Java framework currently being used for research and commercial purposes. We also compared and contrasted our results with the data obtained from some motivating faults found in the literature and two real tools for the analysis of feature models, FaMa and SPLOT. Results Our results are summarized in a number of lessons learned supporting previous isolated results as well as new findings that hopefully will motivate further research in the field. Conclusion We conclude that mutation testing is an effective and affordable technique to measure the effectiveness of test mechanisms in OO systems. We found, however, several practical limitations in current tool support that should be addressed to facilitate the work of testers. We also missed specific techniques and tools to apply mutation testing at the system level.This work has been partially supported by the European Commission (FEDER) and Spanish Government under CICYT Project SETI (TIN2009-07366) and the Andalusian Government Projects ISABEL (TIC-2533) and THEOS (TIC-5906)
    corecore