156 research outputs found

    Real-Time Head Gesture Recognition on Head-Mounted Displays using Cascaded Hidden Markov Models

    Full text link
    Head gesture is a natural means of face-to-face communication between people but the recognition of head gestures in the context of virtual reality and use of head gesture as an interface for interacting with virtual avatars and virtual environments have been rarely investigated. In the current study, we present an approach for real-time head gesture recognition on head-mounted displays using Cascaded Hidden Markov Models. We conducted two experiments to evaluate our proposed approach. In experiment 1, we trained the Cascaded Hidden Markov Models and assessed the offline classification performance using collected head motion data. In experiment 2, we characterized the real-time performance of the approach by estimating the latency to recognize a head gesture with recorded real-time classification data. Our results show that the proposed approach is effective in recognizing head gestures. The method can be integrated into a virtual reality system as a head gesture interface for interacting with virtual worlds

    Towards Naturalistic Interfaces of Virtual Reality Systems

    Get PDF
    Interaction plays a key role in achieving realistic experience in virtual reality (VR). Its realization depends on interpreting the intents of human motions to give inputs to VR systems. Thus, understanding human motion from the computational perspective is essential to the design of naturalistic interfaces for VR. This dissertation studied three types of human motions, including locomotion (walking), head motion and hand motion in the context of VR. For locomotion, the dissertation presented a machine learning approach for developing a mechanical repositioning technique based on a 1-D treadmill for interacting with a unique new large-scale projective display, called the Wide-Field Immersive Stereoscopic Environment (WISE). The usability of the proposed approach was assessed through a novel user study that asked participants to pursue a rolling ball at variable speed in a virtual scene. In addition, the dissertation studied the role of stereopsis in avoiding virtual obstacles while walking by asking participants to step over obstacles and gaps under both stereoscopic and non-stereoscopic viewing conditions in VR experiments. In terms of head motion, the dissertation presented a head gesture interface for interaction in VR that recognizes real-time head gestures on head-mounted displays (HMDs) using Cascaded Hidden Markov Models. Two experiments were conducted to evaluate the proposed approach. The first assessed its offline classification performance while the second estimated the latency of the algorithm to recognize head gestures. The dissertation also conducted a user study that investigated the effects of visual and control latency on teleoperation of a quadcopter using head motion tracked by a head-mounted display. As part of the study, a method for objectively estimating the end-to-end latency in HMDs was presented. For hand motion, the dissertation presented an approach that recognizes dynamic hand gestures to implement a hand gesture interface for VR based on a static head gesture recognition algorithm. The proposed algorithm was evaluated offline in terms of its classification performance. A user study was conducted to compare the performance and the usability of the head gesture interface, the hand gesture interface and a conventional gamepad interface for answering Yes/No questions in VR. Overall, the dissertation has two main contributions towards the improvement of naturalism of interaction in VR systems. Firstly, the interaction techniques presented in the dissertation can be directly integrated into existing VR systems offering more choices for interaction to end users of VR technology. Secondly, the results of the user studies of the presented VR interfaces in the dissertation also serve as guidelines to VR researchers and engineers for designing future VR systems

    Vision systems with the human in the loop

    Get PDF
    The emerging cognitive vision paradigm deals with vision systems that apply machine learning and automatic reasoning in order to learn from what they perceive. Cognitive vision systems can rate the relevance and consistency of newly acquired knowledge, they can adapt to their environment and thus will exhibit high robustness. This contribution presents vision systems that aim at flexibility and robustness. One is tailored for content-based image retrieval, the others are cognitive vision systems that constitute prototypes of visual active memories which evaluate, gather, and integrate contextual knowledge for visual analysis. All three systems are designed to interact with human users. After we will have discussed adaptive content-based image retrieval and object and action recognition in an office environment, the issue of assessing cognitive systems will be raised. Experiences from psychologically evaluated human-machine interactions will be reported and the promising potential of psychologically-based usability experiments will be stressed

    Sign Language Recognition

    Get PDF
    This chapter covers the key aspects of sign-language recognition (SLR), starting with a brief introduction to the motivations and requirements, followed by a précis of sign linguistics and their impact on the field. The types of data available and the relative merits are explored allowing examination of the features which can be extracted. Classifying the manual aspects of sign (similar to gestures) is then discussed from a tracking and non-tracking viewpoint before summarising some of the approaches to the non-manual aspects of sign languages. Methods for combining the sign classification results into full SLR are given showing the progression towards speech recognition techniques and the further adaptations required for the sign specific case. Finally the current frontiers are discussed and the recent research presented. This covers the task of continuous sign recognition, the work towards true signer independence, how to effectively combine the different modalities of sign, making use of the current linguistic research and adapting to larger more noisy data set

    Context-based Visual Feedback Recognition

    Get PDF
    PhD thesisDuring face-to-face conversation, people use visual feedback (e.g.,head and eye gesture) to communicate relevant information and tosynchronize rhythm between participants. When recognizing visualfeedback, people often rely on more than their visual perception.For instance, knowledge about the current topic and from previousutterances help guide the recognition of nonverbal cues. The goal ofthis thesis is to augment computer interfaces with the ability toperceive visual feedback gestures and to enable the exploitation ofcontextual information from the current interaction state to improvevisual feedback recognition.We introduce the concept of visual feedback anticipationwhere contextual knowledge from an interactive system (e.g. lastspoken utterance from the robot or system events from the GUIinterface) is analyzed online to anticipate visual feedback from ahuman participant and improve visual feedback recognition. Ourmulti-modal framework for context-based visual feedback recognitionwas successfully tested on conversational and non-embodiedinterfaces for head and eye gesture recognition.We also introduce Frame-based Hidden-state Conditional RandomField model, a new discriminative model for visual gesturerecognition which can model the sub-structure of a gesture sequence,learn the dynamics between gesture labels, and can be directlyapplied to label unsegmented sequences. The FHCRF model outperformsprevious approaches (i.e. HMM, SVM and CRF) for visual gesturerecognition and can efficiently learn relevant contextualinformation necessary for visual feedback anticipation.A real-time visual feedback recognition library for interactiveinterfaces (called Watson) was developed to recognize head gaze,head gestures, and eye gaze using the images from a monocular orstereo camera and the context information from the interactivesystem. Watson was downloaded by more then 70 researchers around theworld and was successfully used by MERL, USC, NTT, MIT Media Lab andmany other research groups

    Multi-View Face Recognition From Single RGBD Models of the Faces

    Get PDF
    This work takes important steps towards solving the following problem of current interest: Assuming that each individual in a population can be modeled by a single frontal RGBD face image, is it possible to carry out face recognition for such a population using multiple 2D images captured from arbitrary viewpoints? Although the general problem as stated above is extremely challenging, it encompasses subproblems that can be addressed today. The subproblems addressed in this work relate to: (1) Generating a large set of viewpoint dependent face images from a single RGBD frontal image for each individual; (2) using hierarchical approaches based on view-partitioned subspaces to represent the training data; and (3) based on these hierarchical approaches, using a weighted voting algorithm to integrate the evidence collected from multiple images of the same face as recorded from different viewpoints. We evaluate our methods on three datasets: a dataset of 10 people that we created and two publicly available datasets which include a total of 48 people. In addition to providing important insights into the nature of this problem, our results show that we are able to successfully recognize faces with accuracies of 95% or higher, outperforming existing state-of-the-art face recognition approaches based on deep convolutional neural networks

    Py-Feat: Python Facial Expression Analysis Toolbox

    Full text link
    Studying facial expressions is a notoriously difficult endeavor. Recent advances in the field of affective computing have yielded impressive progress in automatically detecting facial expressions from pictures and videos. However, much of this work has yet to be widely disseminated in social science domains such as psychology. Current state of the art models require considerable domain expertise that is not traditionally incorporated into social science training programs. Furthermore, there is a notable absence of user-friendly and open-source software that provides a comprehensive set of tools and functions that support facial expression research. In this paper, we introduce Py-Feat, an open-source Python toolbox that provides support for detecting, preprocessing, analyzing, and visualizing facial expression data. Py-Feat makes it easy for domain experts to disseminate and benchmark computer vision models and also for end users to quickly process, analyze, and visualize face expression data. We hope this platform will facilitate increased use of facial expression data in human behavior research.Comment: 25 pages, 3 figures, 5 table

    Robust Hand Motion Capture and Physics-Based Control for Grasping in Real Time

    Get PDF
    Hand motion capture technologies are being explored due to high demands in the fields such as video game, virtual reality, sign language recognition, human-computer interaction, and robotics. However, existing systems suffer a few limitations, e.g. they are high-cost (expensive capture devices), intrusive (additional wear-on sensors or complex configurations), and restrictive (limited motion varieties and restricted capture space). This dissertation mainly focus on exploring algorithms and applications for the hand motion capture system that is low-cost, non-intrusive, low-restriction, high-accuracy, and robust. More specifically, we develop a realtime and fully-automatic hand tracking system using a low-cost depth camera. We first introduce an efficient shape-indexed cascaded pose regressor that directly estimates 3D hand poses from depth images. A unique property of our hand pose regressor is to utilize a low-dimensional parametric hand geometric model to learn 3D shape-indexed features robust to variations in hand shapes, viewpoints and hand poses. We further introduce a hybrid tracking scheme that effectively complements our hand pose regressor with model-based hand tracking. In addition, we develop a rapid 3D hand shape modeling method that uses a small number of depth images to accurately construct a subject-specific skinned mesh model for hand tracking. This step not only automates the whole tracking system but also improves the robustness and accuracy of model-based tracking and hand pose regression. Additionally, we also propose a physically realistic human grasping synthesis method that is capable to grasp a wide variety of objects. Given an object to be grasped, our method is capable to compute required controls (e.g. forces and torques) that advance the simulation to achieve realistic grasping. Our method combines the power of data-driven synthesis and physics-based grasping control. We first introduce a data-driven method to synthesize a realistic grasping motion from large sets of prerecorded grasping motion data. And then we transform the synthesized kinematic motion to a physically realistic one by utilizing our online physics-based motion control method. In addition, we also provide a performance interface which allows the user to act out before a depth camera to control a virtual object

    Robust Hand Motion Capture and Physics-Based Control for Grasping in Real Time

    Get PDF
    Hand motion capture technologies are being explored due to high demands in the fields such as video game, virtual reality, sign language recognition, human-computer interaction, and robotics. However, existing systems suffer a few limitations, e.g. they are high-cost (expensive capture devices), intrusive (additional wear-on sensors or complex configurations), and restrictive (limited motion varieties and restricted capture space). This dissertation mainly focus on exploring algorithms and applications for the hand motion capture system that is low-cost, non-intrusive, low-restriction, high-accuracy, and robust. More specifically, we develop a realtime and fully-automatic hand tracking system using a low-cost depth camera. We first introduce an efficient shape-indexed cascaded pose regressor that directly estimates 3D hand poses from depth images. A unique property of our hand pose regressor is to utilize a low-dimensional parametric hand geometric model to learn 3D shape-indexed features robust to variations in hand shapes, viewpoints and hand poses. We further introduce a hybrid tracking scheme that effectively complements our hand pose regressor with model-based hand tracking. In addition, we develop a rapid 3D hand shape modeling method that uses a small number of depth images to accurately construct a subject-specific skinned mesh model for hand tracking. This step not only automates the whole tracking system but also improves the robustness and accuracy of model-based tracking and hand pose regression. Additionally, we also propose a physically realistic human grasping synthesis method that is capable to grasp a wide variety of objects. Given an object to be grasped, our method is capable to compute required controls (e.g. forces and torques) that advance the simulation to achieve realistic grasping. Our method combines the power of data-driven synthesis and physics-based grasping control. We first introduce a data-driven method to synthesize a realistic grasping motion from large sets of prerecorded grasping motion data. And then we transform the synthesized kinematic motion to a physically realistic one by utilizing our online physics-based motion control method. In addition, we also provide a performance interface which allows the user to act out before a depth camera to control a virtual object

    Video-based Pedestrian Intention Recognition and Path Prediction for Advanced Driver Assistance Systems

    Get PDF
    Fortgeschrittene Fahrerassistenzsysteme (FAS) spielen eine sehr wichtige Rolle in zukĂŒnftigen Fahrzeugen um die Sicherheit fĂŒr den Fahrer, der FahrgĂ€ste und ungeschĂŒtzte Verkehrsteilnehmer wie FußgĂ€nger und Radfahrer zu erhöhen. Diese Art von Systemen versucht in begrenztem Rahmen, ZusammenstĂ¶ĂŸe in gefĂ€hrlichen Situationen mit einem unaufmerksamen Fahrer und FußgĂ€nger durch das Auslösen einer automatischen Notbremsung zu vermeiden. Aufgrund der hohen VariabilitĂ€t an FußgĂ€ngerbewegungsmustern werden bestehende Systeme in einer konservativen Art und Weise konzipiert, um durch eine Restriktion auf beherrschbare Umgebungen mögliche Fehlauslöseraten drastisch zu reduzieren, wie z.B. in Szenarien in denen FußgĂ€nger plötzlich anhalten und dadurch die Situation deeskalieren. Um dieses Problem zu ĂŒberwinden, stellt eine zuverlĂ€ssige FußgĂ€ngerabsichtserkennung und Pfad\-vorhersage einen großen Wert dar. In dieser Arbeit wird die gesamte Ablaufkette eines Stereo-Video basierten Systems zur IntentionsschĂ€tzung und Pfadvorhersage von FußgĂ€ngern beschrieben, welches in einer spĂ€teren Funktionsentscheidung fĂŒr eine automatische Notbremsung verwendet wird. Im ersten von drei Hauptbestandteilen wird ein Echtzeit-Verfahren vorgeschlagen, das in niedrig aufgelösten Bildern aus komplexen und hoch dynamischen Innerstadt-Szenarien versucht, die Köpfe von FußgĂ€ngern zu lokalisieren und deren Pose zu schĂ€tzen. Einzelbild-basierte SchĂ€tzungen werden aus den Wahrscheinlichkeitsausgaben von acht angelernten Kopfposen-spezifischen Detektoren abgeleitet, die im Bildbereich eines FußgĂ€ngerkandidaten angewendet werden. Weitere Robustheit in der Kopflokalisierung wird durch Hinzunahme von Stereo-Tiefeninformation erreicht. DarĂŒber hinaus werden die Kopfpositionen und deren Pose ĂŒber die Zeit durch die Implementierung eines Partikelfilters geglĂ€ttet. FĂŒr die IntentionsschĂ€tzung von FußgĂ€ngern wird die Verwendung eines robusten und leistungsstarken Ansatzes des Maschinellen Lernens in unterschiedlichen Szenarien untersucht. Dieser Ansatz ist in der Lage, fĂŒr Zeitreihen von Beobachtungen, die inneren Unterstrukturen einer bestimmten Absichtsklasse zu modellieren und zusĂ€tzlich die extrinsische Dynamik zwischen unterschiedlichen Absichtsklassen zu erfassen. Das Verfahren integriert bedeutsame extrahierte Merkmale aus der FußgĂ€ngerdynamik sowie Kontextinformationen mithilfe der menschlichen Kopfpose. Zum Schluss wird ein Verfahren zur Pfadvorhersage vorgestellt, welches die PrĂ€diktionsschritte eines Filters fĂŒr multiple Bewegungsmodelle fĂŒr einen Zeithorizont von ungefĂ€hr einer Sekunde durch Einbeziehung der geschĂ€tzten FußgĂ€ngerabsichten steuert. Durch Hilfestellungen fĂŒr den Filter das geeignete Bewegungsmodell zu wĂ€hlen, kann der resultierende PfadprĂ€diktionsfehler um ein signifikantes Maß reduziert werden. Eine Vielzahl von Szenarien wird behandelt, einschließlich seitlich querender oder anhaltender FußgĂ€nger oder Personen, die zunĂ€chst entlang des BĂŒrgersteigs gehen aber dann plötzlich in Richtung der Fahrbahn einbiegen
    • 

    corecore