1,871 research outputs found

    Adaptive models for the recognition of human gesture

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2000.Includes bibliographical references (leaves 135-140).Tomorrow's ubiquitous computing environments will go beyond the keyboard, mouse and monitor paradigm of interaction and will require the automatic interpretation of human motion using a variety of sensors including video cameras. I present several techniques for human motion recognition that are inspired by observations on human gesture, the class of communicative human movement. Typically, gesture recognition systems are unable to handle systematic variation in the input signal, and so are too brittle to be applied successfully in many real-world situations. To address this problem, I present modeling and recognition techniques to adapt gesture models to the situation at hand. A number of systems and frameworks that use adaptive gesture models are presented. First, the parametric hidden Markov model (PHMM) addresses the representation and recognition of gesture families, to extract how a gesture is executed. Second, strong temporal models drawn from natural gesture theory are exploited to segment two kinds of natural gestures from video sequences. Third, a realtime computer vision system learns gesture models online from time-varying context. Fourth, a realtime computer vision system employs hybrid Bayesian networks to unify and extend the previous approaches, as well as point the way for future work.by Andrew David Wilson.Ph.D

    Continuous Realtime Gesture Following and Recognition

    Full text link

    Deep Visual Instruments: Realtime Continuous, Meaningful Human Control over Deep Neural Networks for Creative Expression

    Get PDF
    In this thesis, we investigate Deep Learning models as an artistic medium for new modes of performative, creative expression. We call these Deep Visual Instruments: realtime interactive generative systems that exploit and leverage the capabilities of state-of-the-art Deep Neural Networks (DNN), while allowing Meaningful Human Control, in a Realtime Continuous manner. We characterise Meaningful Human Control in terms of intent, predictability, and accountability; and Realtime Continuous Control with regards to its capacity for performative interaction with immediate feedback, enhancing goal-less exploration. The capabilities of DNNs that we are looking to exploit and leverage in this manner, are their ability to learn hierarchical representations modelling highly complex, real-world data such as images. Thinking of DNNs as tools that extract useful information from massive amounts of Big Data, we investigate ways in which we can navigate and explore what useful information a DNN has learnt, and how we can meaningfully use such a model in the production of artistic and creative works, in a performative, expressive manner. We present five studies that approach this from different but complementary angles. These include: a collaborative, generative sketching application using MCTS and discriminative CNNs; a system to gesturally conduct the realtime generation of text in different styles using an ensemble of LSTM RNNs; a performative tool that allows for the manipulation of hyperparameters in realtime while a Convolutional VAE trains on a live camera feed; a live video feed processing software that allows for digital puppetry and augmented drawing; and a method that allows for long-form story telling within a generative model's latent space with meaningful control over the narrative. We frame our research with the realtime, performative expression provided by musical instruments as a metaphor, in which we think of these systems as not used by a user, but played by a performer

    A Survey of Applications and Human Motion Recognition with Microsoft Kinect

    Get PDF
    Microsoft Kinect, a low-cost motion sensing device, enables users to interact with computers or game consoles naturally through gestures and spoken commands without any other peripheral equipment. As such, it has commanded intense interests in research and development on the Kinect technology. In this paper, we present, a comprehensive survey on Kinect applications, and the latest research and development on motion recognition using data captured by the Kinect sensor. On the applications front, we review the applications of the Kinect technology in a variety of areas, including healthcare, education and performing arts, robotics, sign language recognition, retail services, workplace safety training, as well as 3D reconstructions. On the technology front, we provide an overview of the main features of both versions of the Kinect sensor together with the depth sensing technologies used, and review literatures on human motion recognition techniques used in Kinect applications. We provide a classification of motion recognition techniques to highlight the different approaches used in human motion recognition. Furthermore, we compile a list of publicly available Kinect datasets. These datasets are valuable resources for researchers to investigate better methods for human motion recognition and lower-level computer vision tasks such as segmentation, object detection and human pose estimation

    Computer vision based traffic monitoring system for multi-track freeways

    Get PDF
    Nowadays, development is synonymous with construction of infrastructure. Such road infrastructure needs constant attention in terms of traffic monitoring as even a single disaster on a major artery will disrupt the way of life. Humans cannot be expected to monitor these massive infrastructures over 24/7 and computer vision is increasingly being used to develop automated strategies to notify the human observers of any impending slowdowns and traffic bottlenecks. However, due to extreme costs associated with the current state of the art computer vision based networked monitoring systems, innovative computer vision based systems can be developed which are standalone and efficient in analyzing the traffic flow and tracking vehicles for speed detection. In this article, a traffic monitoring system is suggested that counts vehicles and tracks their speeds in realtime for multi-track freeways in Australia. Proposed algorithm uses Gaussian mixture model for detection of foreground and is capable of tracking the vehicle trajectory and extracts the useful traffic information for vehicle counting. This stationary surveillance system uses a fixed position overhead camera to monitor traffic

    FML: Face Model Learning from Videos

    Full text link
    Monocular image-based 3D reconstruction of faces is a long-standing problem in computer vision. Since image data is a 2D projection of a 3D face, the resulting depth ambiguity makes the problem ill-posed. Most existing methods rely on data-driven priors that are built from limited 3D face scans. In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) learns a face identity model both in shape and appearance while (ii) jointly learning to reconstruct 3D faces. Our face model is learned using only corpora of in-the-wild video clips collected from the Internet. This virtually endless source of training data enables learning of a highly general 3D face model. In order to achieve this, we propose a novel multi-frame consistency loss that ensures consistent shape and appearance across multiple frames of a subject's face, thus minimizing depth ambiguity. At test time we can use an arbitrary number of frames, so that we can perform both monocular as well as multi-frame reconstruction.Comment: CVPR 2019 (Oral). Video: https://www.youtube.com/watch?v=SG2BwxCw0lQ, Project Page: https://gvv.mpi-inf.mpg.de/projects/FML19

    Unsupervised adaptation for acceleration-based activity recognition: robustness to sensor displacement and rotation

    Get PDF
    A common assumption in activity recognition is that the system remains unchanged between its design and its posterior operation. However, many factors affect the data distribution between two different experimental sessions. One of these factors is the potential change in the sensor location (e.g. due to replacement or slippage) affecting the classification performance. Assuming that changes in the sensor placement mainly result in shifts in the feature distributions, we propose an unsupervised adaptive classifier that calibrates itself using an online version of expectation-maximisation. Tests using three activity recognition scenarios show that the proposed adaptive algorithm is robust against shift in the feature space due to sensor displacement and rotation. Moreover, since the method estimates the change in the feature distribution, it can also be used to roughly evaluate the reliability of the system during online operatio

    Motion Modeling for Expressive Interaction

    Get PDF
    While human-human or human-object interactions involve very rich, complex and nuanced gestures, gestures as they are captured for human-computer interaction remain relatively simplistic. Our approach is to consider the study of variation of motion input as a way of understanding expression and expressivity in human-computer interaction and in order to propose computational solutions for capturing and using these expressive variations. The paper reports an attempt at drawing the lines of design guidelines for modeling systems adapting to motion variations. We propose to illustrate them through two case studies: the first model is used to estimate temporal and geometrical motion variations while the second is used to track variations of motion dynamics. These case studies are illustrated in two application
    • …
    corecore