496 research outputs found

    Multi-modal on-body sensing of human activities

    Get PDF
    Increased usage and integration of state-of-the-art information technology in our everyday work life aims at increasing the working efficiency. Due to unhandy human-computer-interaction methods this progress does not always result in increased efficiency, for mobile workers in particular. Activity recognition based contextual computing attempts to balance this interaction deficiency. This work investigates wearable, on-body sensing techniques on their applicability in the field of human activity recognition. More precisely we are interested in the spotting and recognition of so-called manipulative hand gestures. In particular the thesis focuses on the question whether the widely used motion sensing based approach can be enhanced through additional information sources. The set of gestures a person usually performs on a specific place is limited -- in the contemplated production and maintenance scenarios in particular. As a consequence this thesis investigates whether the knowledge about the user's hand location provides essential hints for the activity recognition process. In addition, manipulative hand gestures -- due to their object manipulating character -- typically start in the moment the user's hand reaches a specific place, e.g. a specific part of a machinery. And the gestures most likely stop in the moment the hand leaves the position again. Hence this thesis investigates whether hand location can help solving the spotting problem. Moreover, as user-independence is still a major challenge in activity recognition, this thesis investigates location context as a possible key component in a user-independent recognition system. We test a Kalman filter based method to blend absolute position readings with orientation readings based on inertial measurements. A filter structure is suggested which allows up-sampling of slow absolute position readings, and thus introduces higher dynamics to the position estimations. In such a way the position measurement series is made aware of wrist motions in addition to the wrist position. We suggest location based gesture spotting and recognition approaches. Various methods to model the location classes used in the spotting and recognition stages as well as different location distance measures are suggested and evaluated. In addition a rather novel sensing approach in the field of human activity recognition is studied. This aims at compensating drawbacks of the mere motion sensing based approach. To this end we develop a wearable hardware architecture for lower arm muscular activity measurements. The sensing hardware based on force sensing resistors is designed to have a high dynamic range. In contrast to preliminary attempts the proposed new design makes hardware calibration unnecessary. Finally we suggest a modular and multi-modal recognition system; modular with respect to sensors, algorithms, and gesture classes. This means that adding or removing a sensor modality or an additional algorithm has little impact on the rest of the recognition system. Sensors and algorithms used for spotting and recognition can be selected and fine-tuned separately for each single activity. New activities can be added without impact on the recognition rates of the other activities

    Gesture and sign language recognition with deep learning

    Get PDF

    Integrated Framework Design for Intelligent Human Machine Interaction

    Get PDF
    Human-computer interaction, sometimes referred to as Man-Machine Interaction, is a concept that emerged simultaneously with computers, or more generally machines. The methods by which humans have been interacting with computers have traveled a long way. New designs and technologies appear every day. However, computer systems and complex machines are often only technically successful, and most of the time users may find them confusing to use; thus, such systems are never used efficiently. Therefore, building sophisticated machines and robots is not the only thing someone has to address; in fact, more effort should be put to make these machines simpler for all kind of users, and generic enough to accommodate different types of environments. Thus, designing intelligent human computer interaction modules come to emerge. In this work, we aim to implement a generic framework (referred to as CIMF framework) that allows the user to control the synchronized and coordinated cooperative type of work that a set of robots can perform. Three robots are involved so far: Two manipulators and one mobile robot. The framework should be generic enough to be hardware independent and to allow the easy integration of new entities and modules. We also aim to implement the different building blocks for the intelligent manufacturing cell that communicates with the framework via the most intelligent and advanced human computer interaction techniques. Three techniques shall be addressed: Interface-, audio-, and visual-based type of interaction

    Deep Learning-Based Action Recognition

    Get PDF
    The classification of human action or behavior patterns is very important for analyzing situations in the field and maintaining social safety. This book focuses on recent research findings on recognizing human action patterns. Technology for the recognition of human action pattern includes the processing technology of human behavior data for learning, technology of expressing feature values ​​of images, technology of extracting spatiotemporal information of images, technology of recognizing human posture, and technology of gesture recognition. Research on these technologies has recently been conducted using general deep learning network modeling of artificial intelligence technology, and excellent research results have been included in this edition

    Towards gestural understanding for intelligent robots

    Get PDF
    Fritsch JN. Towards gestural understanding for intelligent robots. Bielefeld: Universität Bielefeld; 2012.A strong driving force of scientific progress in the technical sciences is the quest for systems that assist humans in their daily life and make their life easier and more enjoyable. Nowadays smartphones are probably the most typical instances of such systems. Another class of systems that is getting increasing attention are intelligent robots. Instead of offering a smartphone touch screen to select actions, these systems are intended to offer a more natural human-machine interface to their users. Out of the large range of actions performed by humans, gestures performed with the hands play a very important role especially when humans interact with their direct surrounding like, e.g., pointing to an object or manipulating it. Consequently, a robot has to understand such gestures to offer an intuitive interface. Gestural understanding is, therefore, a key capability on the way to intelligent robots. This book deals with vision-based approaches for gestural understanding. Over the past two decades, this has been an intensive field of research which has resulted in a variety of algorithms to analyze human hand motions. Following a categorization of different gesture types and a review of other sensing techniques, the design of vision systems that achieve hand gesture understanding for intelligent robots is analyzed. For each of the individual algorithmic steps – hand detection, hand tracking, and trajectory-based gesture recognition – a separate Chapter introduces common techniques and algorithms and provides example methods. The resulting recognition algorithms are considering gestures in isolation and are often not sufficient for interacting with a robot who can only understand such gestures when incorporating the context like, e.g., what object was pointed at or manipulated. Going beyond a purely trajectory-based gesture recognition by incorporating context is an important prerequisite to achieve gesture understanding and is addressed explicitly in a separate Chapter of this book. Two types of context, user-provided context and situational context, are reviewed and existing approaches to incorporate context for gestural understanding are reviewed. Example approaches for both context types provide a deeper algorithmic insight into this field of research. An overview of recent robots capable of gesture recognition and understanding summarizes the currently realized human-robot interaction quality. The approaches for gesture understanding covered in this book are manually designed while humans learn to recognize gestures automatically during growing up. Promising research targeted at analyzing developmental learning in children in order to mimic this capability in technical systems is highlighted in the last Chapter completing this book as this research direction may be highly influential for creating future gesture understanding systems

    Dynamic motion coupling of body movement for input control

    Get PDF
    Touchless gestures are used for input when touch is unsuitable or unavailable, such as when interacting with displays that are remote, large, public, or when touch is prohibited for hygienic reasons. Traditionally user input is spatially or semantically mapped to system output, however, in the context of touchless gestures these interaction principles suffer from several disadvantages including memorability, fatigue, and ill-defined mappings. This thesis investigates motion correlation as the third interaction principle for touchless gestures, which maps user input to system output based on spatiotemporal matching of reproducible motion. We demonstrate the versatility of motion correlation by using movement as the primary sensing principle, relaxing the restrictions on how a user provides input. Using TraceMatch, a novel computer vision-based system, we show how users can provide effective input through investigation of input performance with different parts of the body, and how users can switch modes of input spontaneously in realistic application scenarios. Secondly, spontaneous spatial coupling shows how motion correlation can bootstrap spatial input, allowing any body movement, or movement of tangible objects, to be appropriated for ad hoc touchless pointing on a per interaction basis. We operationalise the concept in MatchPoint, and demonstrate the unique capabilities through an exploration of the design space with application examples. Finally, we explore how users synchronise with moving targets in the context of motion correlation, revealing how simple harmonic motion leads to better synchronisation. Using the insights gained we explore the robustness of algorithms used for motion correlation, showing how it is possible to successfully detect a user's intent to interact whilst suppressing accidental activations from common spatial and semantic gestures. Finally, we look across our work to distil guidelines for interface design, and further considerations of how motion correlation can be used, both in general and for touchless gestures

    Human Action Recognition Using Deep Probabilistic Graphical Models

    Get PDF
    Building intelligent systems that are capable of representing or extracting high-level representations from high-dimensional sensory data lies at the core of solving many A.I. related tasks. Human action recognition is an important topic in computer vision that lies in high-dimensional space. Its applications include robotics, video surveillance, human-computer interaction, user interface design, and multi-media video retrieval amongst others. A number of approaches have been proposed to extract representative features from high-dimensional temporal data, most commonly hard wired geometric or bio-inspired shape context features. This thesis first demonstrates some \emph{ad-hoc} hand-crafted rules for effectively encoding motion features, and later elicits a more generic approach for incorporating structured feature learning and reasoning, \ie deep probabilistic graphical models. The hierarchial dynamic framework first extracts high level features and then uses the learned representation for estimating emission probability to infer action sequences. We show that better action recognition can be achieved by replacing gaussian mixture models by Deep Neural Networks that contain many layers of features to predict probability distributions over states of Markov Models. The framework can be easily extended to include an ergodic state to segment and recognise actions simultaneously. The first part of the thesis focuses on analysis and applications of hand-crafted features for human action representation and classification. We show that the ``hard coded" concept of correlogram can incorporate correlations between time domain sequences and we further investigate multi-modal inputs, \eg depth sensor input and its unique traits for action recognition. The second part of this thesis focuses on marrying probabilistic graphical models with Deep Neural Networks (both Deep Belief Networks and Deep 3D Convolutional Neural Networks) for structured sequence prediction. The proposed Deep Dynamic Neural Network exhibits its general framework for structured 2D data representation and classification. This inspires us to further investigate for applying various graphical models for time-variant video sequences

    Advances in Human-Robot Interaction

    Get PDF
    Rapid advances in the field of robotics have made it possible to use robots not just in industrial automation but also in entertainment, rehabilitation, and home service. Since robots will likely affect many aspects of human existence, fundamental questions of human-robot interaction must be formulated and, if at all possible, resolved. Some of these questions are addressed in this collection of papers by leading HRI researchers

    Visual Representation Learning with Minimal Supervision

    Get PDF
    Computer vision intends to provide the human abilities of understanding and interpreting the visual surroundings to computers. An essential element to comprehend the environment is to extract relevant information from complex visual data so that the desired task can be solved. For instance, to distinguish cats from dogs the feature 'body shape' is more relevant than 'eye color' or the 'amount of legs'. In traditional computer vision it is conventional to develop handcrafted functions that extract specific low-level features such as edges from visual data. However, in order to solve a particular task satisfactorily we require a combination of several features. Thus, the approach of traditional computer vision has the disadvantage that whenever a new task is addressed, a developer needs to manually specify all the features the computer should look for. For that reason, recent works have primarily focused on developing new algorithms that teach the computer to autonomously detect relevant and task-specific features. Deep learning has been particularly successful for that matter. In deep learning, artificial neural networks automatically learn to extract informative features directly from visual data. The majority of developed deep learning strategies require a dataset with annotations which indicate the solution of the desired task. The main bottleneck is that creating such a dataset is very tedious and time-intensive considering that every sample needs to be annotated manually. This thesis presents new techniques that attempt to keep the amount of human supervision to a minimum while still reaching satisfactory performances on various visual understanding tasks. In particular, this thesis focuses on self-supervised learning algorithms that train a neural network on a surrogate task where no human supervision is required. We create an artificial supervisory signal by breaking the order of visual patterns and asking the network to recover the original structure. Besides demonstrating the abilities of our model on common computer vision tasks such as action recognition, we additionally apply our model to biomedical scenarios. Many research projects in medicine involve profuse manual processes that extend the duration of developing successful treatments. Taking the example of analyzing the motor function of neurologically impaired patients we show that our self-supervised method can help to automate tedious, visually based processes in medical research. In order to perform a detailed analysis of motor behavior and, thus, provide a suitable treatment, it is important to discover and identify the negatively affected movements. Therefore, we propose a magnification tool that can detect and enhance subtle changes in motor function including motor behavior differences across individuals. In this way, our automatic diagnostic system does not only analyze apparent behavior but also facilitates the perception and discovery of impaired movements. Learning a feature representation without requiring annotations significantly reduces human supervision. However, using annotated dataset leads generally to better performances in contrast to self-supervised learning methods. Hence, we additionally examine semi-supervised approaches which efficiently combine few annotated samples with large unlabeled datasets. Consequently, semi-supervised learning represents a good trade-off between annotation time and accuracy