522 research outputs found

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Sentient Matter: Towards Affective Human-Architecture Interaction

    Get PDF
    Interactive design has been embedded into every aspect of our lives. Ranging from handy devices to architecturally scaled environments, these designs have not only shifted the way we facilitate interaction with other people, but they also actively reconfigure themselves in response to human stimuli. Following in the wake of interactive experimentation, sentient matter, the idea that matter embodies the capacity to perceive and respond to stimuli, attempts to engage in a challenging arena that few architects and architectural researchers have ventured into. In particular, the creation and simulation of emotive types of interaction between the architectural environment and its inhabitants. This ambition is made possible by the collaboration of multiple disciplines. Cybernetics, specifically the legacy of Pask’s conversation theory, inspires this thesis with the question of why emotion is needed in facilitating human–architecture communication; why emotion appraisal theory (P. Desmet) within psychology supports the feasibility of an architectural environment to elicit emotional changes on its participant as well as the possibility of generating a next-step response by having the participant’s emotive behaviors observed; and why movement notation systems, especially Laban Movement Analysis (a movement rating scale system), helps us to understand how emotions can be identified by motion elements that signify emotive behavior. Through the process of decomposing movement into several qualitative and quantitative factors such as velocity, openness, and smoothness, emotions embodied in motion can be detected and even manipulated by altering those movement factors. Moreover, with the employment of a Kinect sensor, live performance can be analyzed in real time. Based on the above research and inspired by the Kinetic sculptures of Margolin, the final product of this thesis is the development of a prototype that translates human movements that are expressive of emotion into continuous surface transformations, thus making evident how such emotive states might be transcoded into an architectural form. In this process, four typical emotive architectural expressions—joy, anger, excited, and sadness—are researched. This thesis also documents three virtual scenarios in order to examine the effect of this interactive system. Different contexts, kinetic types, and behavioral strategies are presented so that we may explore their potential applications. Sentient matter outlines a framework of syntheses, which is built upon the convergence of embedded computation (intelligence) and physical counterpart (kinetics). In the entire process, it considers people’s participation as materials that fuel the generation of legible emotional behaviors within an architectural environment. Consequently, there is potential for an architectural learning capacity coupled with an evolving data library of human behavioral knowledge. This opens doors for futuristic designs where the paradigm shifts from “What is that building?” to “What is that building doing?

    Continuous sign recognition of brazilian sign language in a healthcare setting

    Get PDF
    Communication is the basis of human society. The majority of people communicate using spoken language in oral or written form. However, sign language is the primary mode of communication for deaf people. In general, understanding spoken information is a major challenge for the deaf and hard of hearing. Access to basic information and essential services is challenging for these individuals. For example, without translation support, carrying out simple tasks in a healthcare center such as asking for guidance or consulting with a doctor, can be hopelessly difficult. Computer-based sign language recognition technologies offer an alternative to mitigate the communication barrier faced by the deaf and hard of hearing. Despite much effort, research in this field is still in its infancy and automatic recognition of continuous signing remains a major challenge. This paper presents an ongoing research project designed to recognize continuous signing of Brazilian Sign Language (Libras) in healthcare settings. Health emergency situations and dialogues inspire the vocabulary of the signs and sentences we are using to contribute to the field301Vision-based human activity recognition8289COORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIOR - CAPESnão te

    Optical Methods in Sensing and Imaging for Medical and Biological Applications

    Get PDF
    The recent advances in optical sources and detectors have opened up new opportunities for sensing and imaging techniques which can be successfully used in biomedical and healthcare applications. This book, entitled ‘Optical Methods in Sensing and Imaging for Medical and Biological Applications’, focuses on various aspects of the research and development related to these areas. The book will be a valuable source of information presenting the recent advances in optical methods and novel techniques, as well as their applications in the fields of biomedicine and healthcare, to anyone interested in this subject

    The Dollar General: Continuous Custom Gesture Recognition Techniques At Everyday Low Prices

    Get PDF
    Humans use gestures to emphasize ideas and disseminate information. Their importance is apparent in how we continuously augment social interactions with motion—gesticulating in harmony with nearly every utterance to ensure observers understand that which we wish to communicate, and their relevance has not escaped the HCI community\u27s attention. For almost as long as computers have been able to sample human motion at the user interface boundary, software systems have been made to understand gestures as command metaphors. Customization, in particular, has great potential to improve user experience, whereby users map specific gestures to specific software functions. However, custom gesture recognition remains a challenging problem, especially when training data is limited, input is continuous, and designers who wish to use customization in their software are limited by mathematical attainment, machine learning experience, domain knowledge, or a combination thereof. Data collection, filtering, segmentation, pattern matching, synthesis, and rejection analysis are all non-trivial problems a gesture recognition system must solve. To address these issues, we introduce The Dollar General (TDG), a complete pipeline composed of several novel continuous custom gesture recognition techniques. Specifically, TDG comprises an automatic low-pass filter tuner that we use to improve signal quality, a segmenter for identifying gesture candidates in a continuous input stream, a classifier for discriminating gesture candidates from non-gesture motions, and a synthetic data generation module we use to train the classifier. Our system achieves high recognition accuracy with as little as one or two training samples per gesture class, is largely input device agnostic, and does not require advanced mathematical knowledge to understand and implement. In this dissertation, we motivate the importance of gestures and customization, describe each pipeline component in detail, and introduce strategies for data collection and prototype selection

    Multisensory learning in adaptive interactive systems

    Get PDF
    The main purpose of my work is to investigate multisensory perceptual learning and sensory integration in the design and development of adaptive user interfaces for educational purposes. To this aim, starting from renewed understanding from neuroscience and cognitive science on multisensory perceptual learning and sensory integration, I developed a theoretical computational model for designing multimodal learning technologies that take into account these results. Main theoretical foundations of my research are multisensory perceptual learning theories and the research on sensory processing and integration, embodied cognition theories, computational models of non-verbal and emotion communication in full-body movement, and human-computer interaction models. Finally, a computational model was applied in two case studies, based on two EU ICT-H2020 Projects, "weDRAW" and "TELMI", on which I worked during the PhD

    Computational Multimedia for Video Self Modeling

    Get PDF
    Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of oneself. This is the idea behind the psychological theory of self-efficacy - you can learn or model to perform certain tasks because you see yourself doing it, which provides the most ideal form of behavior modeling. The effectiveness of VSM has been demonstrated for many different types of disabilities and behavioral problems ranging from stuttering, inappropriate social behaviors, autism, selective mutism to sports training. However, there is an inherent difficulty associated with the production of VSM material. Prolonged and persistent video recording is required to capture the rare, if not existed at all, snippets that can be used to string together in forming novel video sequences of the target skill. To solve this problem, in this dissertation, we use computational multimedia techniques to facilitate the creation of synthetic visual content for self-modeling that can be used by a learner and his/her therapist with a minimum amount of training data. There are three major technical contributions in my research. First, I developed an Adaptive Video Re-sampling algorithm to synthesize realistic lip-synchronized video with minimal motion jitter. Second, to denoise and complete the depth map captured by structure-light sensing systems, I introduced a layer based probabilistic model to account for various types of uncertainties in the depth measurement. Third, I developed a simple and robust bundle-adjustment based framework for calibrating a network of multiple wide baseline RGB and depth cameras

    Data analytics for image visual complexity and kinect-based videos of rehabilitation exercises

    Full text link
    With the recent advances in computer vision and pattern recognition, methods from these fields are successfully applied to solve problems in various domains, including health care and social sciences. In this thesis, two such problems, from different domains, are discussed. First, an application of computer vision and broader pattern recognition in physical therapy is presented. Home-based physical therapy is an essential part of the recovery process in which the patient is prescribed specific exercises in order to improve symptoms and daily functioning of the body. However, poor adherence to the prescribed exercises is a common problem. In our work, we explore methods for improving home-based physical therapy experience. We begin by proposing DyAd, a dynamically difficulty adjustment system which captures the trajectory of the hand movement, evaluates the user's performance quantitatively and adjusts the difficulty level for the next trial of the exercise based on the performance measurements. Next, we introduce ExerciseCheck, a remote monitoring and evaluation platform for home-based physical therapy. ExerciseCheck is capable of capturing exercise information, evaluating the performance, providing therapeutic feedback to the patient and the therapist, checking the progress of the user over the course of the physical therapy, and supporting the patient throughout this period. In our experiments, Parkinson patients have tested our system at a clinic and in their homes during their physical therapy period. Our results suggests that ExerciseCheck is a user-friendly application and can assist patients by providing motivation, and guidance to ensure correct execution of the required exercises. As the second application, and within computer vision paradigm, we focus on visual complexity, an image attribute that humans can subjectively evaluate based on the level of details in the image. Visual complexity has been studied in psychophysics, cognitive science, and, more recently, computer vision, for the purposes of product design, web design, advertising, etc. We first introduce a diverse visual complexity dataset which compromises of seven image categories. We collect the ground-truth scores by comparing the pairwise relationship of images and then convert the pairwise scores to absolute scores using mathematical methods. Furthermore, we propose a method to measure the visual complexity that uses unsupervised information extraction from intermediate convolutional layers of deep neural networks. We derive an activation energy metric that combines convolutional layer activations to quantify visual complexity. The high correlations between ground-truth labels and computed energy scores in our experiments show superiority of our method compared to the previous works. Finally, as an example of the relationship between visual complexity and other image attributes, we demonstrate that, within the context of a category, visually more complex images are more memorable to human observers

    Enhanced Living Environments

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1303 “Algorithms, Architectures and Platforms for Enhanced Living Environments (AAPELE)”. The concept of Enhanced Living Environments (ELE) refers to the area of Ambient Assisted Living (AAL) that is more related with Information and Communication Technologies (ICT). Effective ELE solutions require appropriate ICT algorithms, architectures, platforms, and systems, having in view the advance of science and technology in this area and the development of new and innovative solutions that can provide improvements in the quality of life for people in their homes and can reduce the financial burden on the budgets of the healthcare providers. The aim of this book is to become a state-of-the-art reference, discussing progress made, as well as prompting future directions on theories, practices, standards, and strategies related to the ELE area. The book contains 12 chapters and can serve as a valuable reference for undergraduate students, post-graduate students, educators, faculty members, researchers, engineers, medical doctors, healthcare organizations, insurance companies, and research strategists working in this area
    • 

    corecore