1,158 research outputs found

    Human behavior understanding for worker-centered intelligent manufacturing

    Get PDF
    “In a worker-centered intelligent manufacturing system, sensing and understanding of the worker’s behavior are the primary tasks, which are essential for automatic performance evaluation & optimization, intelligent training & assistance, and human-robot collaboration. In this study, a worker-centered training & assistant system is proposed for intelligent manufacturing, which is featured with self-awareness and active-guidance. To understand the hand behavior, a method is proposed for complex hand gesture recognition using Convolutional Neural Networks (CNN) with multiview augmentation and inference fusion, from depth images captured by Microsoft Kinect. To sense and understand the worker in a more comprehensive way, a multi-modal approach is proposed for worker activity recognition using Inertial Measurement Unit (IMU) signals obtained from a Myo armband and videos from a visual camera. To automatically learn the importance of different sensors, a novel attention-based approach is proposed to human activity recognition using multiple IMU sensors worn at different body locations. To deploy the developed algorithms to the factory floor, a real-time assembly operation recognition system is proposed with fog computing and transfer learning. The proposed worker-centered training & assistant system has been validated and demonstrated the feasibility and great potential for applying to the manufacturing industry for frontline workers. Our developed approaches have been evaluated: 1) the multi-view approach outperforms the state-of-the-arts on two public benchmark datasets, 2) the multi-modal approach achieves an accuracy of 97% on a worker activity dataset including 6 activities and achieves the best performance on a public dataset, 3) the attention-based method outperforms the state-of-the-art methods on five publicly available datasets, and 4) the developed transfer learning model achieves a real-time recognition accuracy of 95% on a dataset including 10 worker operations”--Abstract, page iv

    Employing Environmental Data and Machine Learning to Improve Mobile Health Receptivity

    Get PDF
    Behavioral intervention strategies can be enhanced by recognizing human activities using eHealth technologies. As we find after a thorough literature review, activity spotting and added insights may be used to detect daily routines inferring receptivity for mobile notifications similar to just-in-time support. Towards this end, this work develops a model, using machine learning, to analyze the motivation of digital mental health users that answer self-assessment questions in their everyday lives through an intelligent mobile application. A uniform and extensible sequence prediction model combining environmental data with everyday activities has been created and validated for proof of concept through an experiment. We find that the reported receptivity is not sequentially predictable on its own, the mean error and standard deviation are only slightly below by-chance comparison. Nevertheless, predicting the upcoming activity shows to cover about 39% of the day (up to 58% in the best case) and can be linked to user individual intervention preferences to indirectly find an opportune moment of receptivity. Therefore, we introduce an application comprising the influences of sensor data on activities and intervention thresholds, as well as allowing for preferred events on a weekly basis. As a result of combining those multiple approaches, promising avenues for innovative behavioral assessments are possible. Identifying and segmenting the appropriate set of activities is key. Consequently, deliberate and thoughtful design lays the foundation for further development within research projects by extending the activity weighting process or introducing a model reinforcement.BMBF, 13GW0157A, Verbundprojekt: Self-administered Psycho-TherApy-SystemS (SELFPASS) - Teilvorhaben: Data Analytics and Prescription for SELFPASSTU Berlin, Open-Access-Mittel - 201

    InMyFace: Inertial and Mechanomyography-Based Sensor Fusion for Wearable Facial Activity Recognition

    Full text link
    Recognizing facial activity is a well-understood (but non-trivial) computer vision problem. However, reliable solutions require a camera with a good view of the face, which is often unavailable in wearable settings. Furthermore, in wearable applications, where systems accompany users throughout their daily activities, a permanently running camera can be problematic for privacy (and legal) reasons. This work presents an alternative solution based on the fusion of wearable inertial sensors, planar pressure sensors, and acoustic mechanomyography (muscle sounds). The sensors were placed unobtrusively in a sports cap to monitor facial muscle activities related to facial expressions. We present our integrated wearable sensor system, describe data fusion and analysis methods, and evaluate the system in an experiment with thirteen subjects from different cultural backgrounds (eight countries) and both sexes (six women and seven men). In a one-model-per-user scheme and using a late fusion approach, the system yielded an average F1 score of 85.00% for the case where all sensing modalities are combined. With a cross-user validation and a one-model-for-all-user scheme, an F1 score of 79.00% was obtained for thirteen participants (six females and seven males). Moreover, in a hybrid fusion (cross-user) approach and six classes, an average F1 score of 82.00% was obtained for eight users. The results are competitive with state-of-the-art non-camera-based solutions for a cross-user study. In addition, our unique set of participants demonstrates the inclusiveness and generalizability of the approach.Comment: Submitted to Information Fusion, Elsevie

    Vision Based Activity Recognition Using Machine Learning and Deep Learning Architecture

    Get PDF
    Human Activity recognition, with wide application in fields like video surveillance, sports, human interaction, elderly care has shown great influence in upbringing the standard of life of people. With the constant development of new architecture, models, and an increase in the computational capability of the system, the adoption of machine learning and deep learning for activity recognition has shown great improvement with high performance in recent years. My research goal in this thesis is to design and compare machine learning and deep learning models for activity recognition through videos collected from different media in the field of sports. Human activity recognition (HAR) mostly is to recognize the action performed by a human through the data collected from different sources automatically. Based on the literature review, most data collected for analysis is based on time series data collected through different sensors and video-based data collected through the camera. So firstly, our research analyzes and compare different machine learning and deep learning architecture with sensor-based data collected from an accelerometer of a smartphone place at different position of the human body. Without any hand-crafted feature extraction methods, we found that deep learning architecture outperforms most of the machine learning architecture and the use of multiple sensors has higher accuracy than a dataset collected from a single sensor. Secondly, as collecting data from sensors in real-time is not feasible in all the fields such as sports, we study the activity recognition by using the video dataset. For this, we used two state-of-the-art deep learning architectures previously trained on the big, annotated dataset using transfer learning methods for activity recognition in three different sports-related publicly available datasets. Extending the study to the different activities performed on a single sport, and to avoid the current trend of using special cameras and expensive set up around the court for data collection, we developed our video dataset using sports coverage of basketball games broadcasted through broadcasting media. The detailed analysis and experiments based on different criteria such as range of shots taken, scoring activities is presented for 8 different activities using state-of-art deep learning architecture for video classification

    SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

    Get PDF
    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

    Activities recognition and worker profiling in the intelligent office environment using a fuzzy finite state machine

    Get PDF
    Analysis of the office workers’ activities of daily working in an intelligent office environment can be used to optimize energy consumption and also office workers’ comfort. To achieve this end, it is essential to recognise office workers’ activities including short breaks, meetings and non-computer activities to allow an optimum control strategy to be implemented. In this paper, fuzzy finite state machines are used to model an office worker’s behaviour. The model will incorporate sensory data collected from the environment as the input and some pre-defined fuzzy states are used to develop the model. Experimental results are presented to illustrate the effectiveness of this approach. The activity models of different individual workers as inferred from the sensory devices can be distinguished. However, further investigation is required to create a more complete model

    Unified Contrastive Fusion Transformer for Multimodal Human Action Recognition

    Full text link
    Various types of sensors have been considered to develop human action recognition (HAR) models. Robust HAR performance can be achieved by fusing multimodal data acquired by different sensors. In this paper, we introduce a new multimodal fusion architecture, referred to as Unified Contrastive Fusion Transformer (UCFFormer) designed to integrate data with diverse distributions to enhance HAR performance. Based on the embedding features extracted from each modality, UCFFormer employs the Unified Transformer to capture the inter-dependency among embeddings in both time and modality domains. We present the Factorized Time-Modality Attention to perform self-attention efficiently for the Unified Transformer. UCFFormer also incorporates contrastive learning to reduce the discrepancy in feature distributions across various modalities, thus generating semantically aligned features for information fusion. Performance evaluation conducted on two popular datasets, UTD-MHAD and NTU RGB+D, demonstrates that UCFFormer achieves state-of-the-art performance, outperforming competing methods by considerable margins

    Heterogeneous multi-modal sensor fusion with hybrid attention for exercise recognition.

    Get PDF
    Exercise adherence is a key component of digital behaviour change interventions for the self-management of musculoskeletal pain. Automated monitoring of exercise adherence requires sensors that can capture patients performing exercises and Machine Learning (ML) algorithms that can recognise exercises. In contrast to ambulatory activities that are recognisable with a wrist accelerometer data; exercises require multiple sensor modalities because of the complexity of movements and the settings involved. Exercise Recognition (ExR) pose many challenges to ML researchers due to the heterogeneity of the sensor modalities (e.g. image/video streams, wearables, pressure mats). We recently published MEx, a benchmark dataset for ExR, to promote the study of new and transferable HAR methods to improve ExR and benchmarked the state-of-the-art ML algorithms on 4 modalities. The results highlighted the need for fusion methods that unite the individual strengths of modalities. In this paper we explore fusion methods with focus on attention and propose a novel multi-modal hybrid attention fusion architecture mHAF for ExR. We achieve the best performance of 96.24% (F1-measure) with a modality combination of a pressure mat, a depth camera and an accelerometer on the thigh. mHAF significantly outperforms multiple baselines and the contribution of model components are verified with an ablation study. The benefits of attention fusion are clearly demonstrated by visualising attention weights; showing how mHAF learns feature importance and modality combinations suited for different exercise classes. We highlight the importance of improving deployability and minimising obtrusiveness by exploring the best performing 2 and 3 modality combinations
    • …
    corecore