42 research outputs found

    EmbraceNet for Activity: A Deep Multimodal Fusion Architecture for Activity Recognition

    Full text link
    Human activity recognition using multiple sensors is a challenging but promising task in recent decades. In this paper, we propose a deep multimodal fusion model for activity recognition based on the recently proposed feature fusion architecture named EmbraceNet. Our model processes each sensor data independently, combines the features with the EmbraceNet architecture, and post-processes the fused feature to predict the activity. In addition, we propose additional processes to boost the performance of our model. We submit the results obtained from our proposed model to the SHL recognition challenge with the team name "Yonsei-MCML."Comment: Accepted in HASCA at ACM UbiComp/ISWC 2019, won the 2nd place in the SHL Recognition Challenge 201

    Transportation mode recognition fusing wearable motion, sound and vision sensors

    Get PDF
    We present the first work that investigates the potential of improving the performance of transportation mode recognition through fusing multimodal data from wearable sensors: motion, sound and vision. We first train three independent deep neural network (DNN) classifiers, which work with the three types of sensors, respectively. We then propose two schemes that fuse the classification results from the three mono-modal classifiers. The first scheme makes an ensemble decision with fixed rules including Sum, Product, Majority Voting, and Borda Count. The second scheme is an adaptive fuser built as another classifier (including Naive Bayes, Decision Tree, Random Forest and Neural Network) that learns enhanced predictions by combining the outputs from the three mono-modal classifiers. We verify the advantage of the proposed method with the state-of-the-art Sussex-Huawei Locomotion and Transportation (SHL) dataset recognizing the eight transportation activities: Still, Walk, Run, Bike, Bus, Car, Train and Subway. We achieve F1 scores of 79.4%, 82.1% and 72.8% with the mono-modal motion, sound and vision classifiers, respectively. The F1 score is remarkably improved to 94.5% and 95.5% by the two data fusion schemes, respectively. The recognition performance can be further improved with a post-processing scheme that exploits the temporal continuity of transportation. When assessing generalization of the model to unseen data, we show that while performance is reduced - as expected - for each individual classifier, the benefits of fusion are retained with performance improved by 15 percentage points. Besides the actual performance increase, this work, most importantly, opens up the possibility for dynamically fusing modalities to achieve distinct power-performance trade-off at run time

    Summary of the Sussex-Huawei Locomotion-Transportation Recognition Challenge 2019

    Get PDF
    In this paper we summarize the contributions of participants to the third Sussex-Huawei Locomotion-Transportation (SHL) Recognition Challenge organized at the HASCAWorkshop of UbiComp/ISWC 2020. The goal of this machine learning/data science challenge is to recognize eight locomotion and transportation activities (Still, Walk, Run, Bike, Bus, Car, Train, Subway) from the inertial sensor data of a smartphone in a user-independent manner with an unknown target phone position. The training data of a “train” user is available from smartphones placed at four body positions (Hand, Torso, Bag and Hips). The testing data originates from “test” users with a smartphone placed at one, but unknown, body position. We introduce the dataset used in the challenge and the protocol of the competition. We present a meta-analysis of the contributions from 15 submissions, their approaches, the software tools used, computational cost and the achieved results. Overall, one submission achieved F1 scores above 80%, three with F1 scores between 70% and 80%, seven between 50% and 70%, and four below 50%, with a latency of maximum of 5 seconds

    Summary of SHL Challenge 2023: Recognizing Locomotion and Transportation Mode from GPS and Motion Sensors

    Get PDF
    In this paper we summarize the contributions of participants to the fifth Sussex-Huawei Locomotion-Transportation (SHL) Recognition Challenge organized at the HASCA Workshop of UbiComp/ISWC 2023. The goal of this machine learning/data science challenge is to recognize eight locomotion and transportation activities (Still, Walk, Run, Bike, Bus, Car, Train, Subway) from the motion (accelerometer, gyroscope, magnetometer) and GPS (GPS location, GPS reception) sensor data of a smartphone in a user-independent manner. The training data of a “train” user is available from smartphones placed at four body positions (Hand, Torso, Bag and Hips). The testing data originates from “test” users with a smartphone placed at one, but unknown, body position. We introduce the dataset used in the challenge and the protocol of the competition. We present a meta-analysis of the contributions from 15 submissions, their approaches, the software tools used, computational cost and the achieved results. The challenge evaluates the recognition performance by comparing predicted to ground-truth labels at every 10 milliseconds, but puts no constraints on the maximum decision window length. Overall, five submissions achieved F1 scores above 90%, three between 80% and 90%, two between 70% and 80%, three between 50% and 70%, and two below 50%. While the task this year is facing the technical challenges of sensor unavailability, irregular sampling, and sensor diversity, the overall performance based on GPS and motion sensors is better than previous years (e.g. the best performance reported in SHL 2020, 2021 and 2023 are 88.5%, 75.4% and 96.0%, respectively). This is possibly due to the complementary between the GPS and motion sensors and also the removal of constraints on the decision window length. Finally, we present a baseline implementation to help understand the contribution of each sensor modality to the recognition task

    Summary of the Sussex-Huawei Locomotion-Transportation Recognition Challenge

    Get PDF
    In this paper we summarize the contributions of participants to the Sussex-Huawei Transportation-Locomotion (SHL) Recognition Challenge organized at the HASCA Workshop of UbiComp 2018. The SHL challenge is a machine learning and data science competition, which aims to recognize eight transportation activities (Still, Walk, Run, Bike, Bus, Car, Train, Subway) from the inertial and pressure sensor data of a smartphone. We introduce the dataset used in the challenge and the protocol for the competition. We present a meta-analysis of the contributions from 19 submissions, their approaches, the software tools used, computational cost and the achieved results. Overall, two entries achieved F1 scores above 90%, eight with F1 scores between 80% and 90%, and nine between 50% and 80%

    The University of Sussex-Huawei locomotion and transportation dataset for multimodal analytics with mobile devices

    Get PDF
    Scientific advances build on reproducible research which need publicly available benchmark datasets. The computer vision and speech recognition communities have led the way in establishing benchmark datasets. There are much less datasets available in mobile computing, especially for rich locomotion and transportation analytics. This paper presents a highly versatile and precisely annotated large-scale dataset of smartphone sensor data for multimodal locomotion and transportation analytics of mobile users. The dataset comprises 7 months of measurements, collected from all sensors of 4 smartphones carried at typical body locations, including the images of a body-worn camera, while 3 participants used 8 different modes of transportation in the southeast of the United Kingdom, including in London. In total 28 context labels were annotated, including transportation mode, participant’s posture, inside/outside location, road conditions, traffic conditions, presence in tunnels, social interactions, and having meals. The total amount of collected data exceed 950 GB of sensor data, which corresponds to 2812 hours of labelled data and 17562 km of traveled distance. We present how we set up the data collection, including the equipment used and the experimental protocol. We discuss the dataset, including the data curation process, the analysis of the annotations and of the sensor data. We discuss the challenges encountered and present the lessons learned and some of the best practices we developed to ensure high quality data collection and annotation. We discuss the potential applications which can be developed using this large-scale dataset. In particular, we present how a machine-learning system can use this dataset to automatically recognize modes of transportations. Many other research questions related to transportation analytics, activity recognition, radio signal propagation and mobility modelling can be adressed through this dataset. The full dataset is being made available to the community, and a thorough preview is already publishe

    Deep CNN-BiLSTM Model for Transportation Mode Detection Using Smartphone Accelerometer and Magnetometer

    Get PDF
    Transportation mode detection from smartphone data is investigated as a relevant problem in the multi-modal transportation systems context. Neural networks are chosen as a timely and viable solution. The goal of this paper is to solve such a problem with a combination model of Convolutional Neural Network (CNN) and Bidirectional-Long short-term memory (BiLSTM) only processing accelerometer and magnetometer data. The performance in terms of accuracy and F1 score on the Sussex-Huawei Locomotion-Transportation (SHL) challenge 2018 dataset is comparable to methods that require the processing of a wider range of sensors. The uniqueness of our work is the light architecture requiring less computational resources for training and consequently a shorter inference time

    Enabling Reproducible Research in Sensor-Based Transportation Mode Recognition With the Sussex-Huawei Dataset

    Get PDF
    Transportation and locomotion mode recognition from multimodal smartphone sensors is useful to provide just-in-time context-aware assistance. However, the field is currently held back by the lack of standardized datasets, recognition tasks and evaluation criteria. Currently, recognition methods are often tested on ad-hoc datasets acquired for one-off recognition problems and with differing choices of sensors. This prevents a systematic comparative evaluation of methods within and across research groups. Our goal is to address these issues by: i) introducing a publicly available, large-scale dataset for transportation and locomotion mode recognition from multimodal smartphone sensors; ii) suggesting twelve reference recognition scenarios, which are a superset of the tasks we identified in related work; iii) suggesting relevant combinations of sensors to use based on energy considerations among accelerometer, gyroscope, magnetometer and GPS sensors; iv) defining precise evaluation criteria, including training and testing sets, evaluation measures, and user-independent and sensor-placement independent evaluations. Based on this, we report a systematic study of the relevance of statistical and frequency features based on information theoretical criteria to inform recognition systems. We then systematically report the reference performance obtained on all the identified recognition scenarios using a machine-learning recognition pipeline. The extent of this analysis and the clear definition of the recognition tasks enable future researchers to evaluate their own methods in a comparable manner, thus contributing to further advances in the field. The dataset and the code are available online

    SFusion: Self-attention based N-to-One Multimodal Fusion Block

    Full text link
    People perceive the world with different senses, such as sight, hearing, smell, and touch. Processing and fusing information from multiple modalities enables Artificial Intelligence to understand the world around us more easily. However, when there are missing modalities, the number of available modalities is different in diverse situations, which leads to an N-to-One fusion problem. To solve this problem, we propose a self-attention based fusion block called SFusion. Different from preset formulations or convolution based methods, the proposed block automatically learns to fuse available modalities without synthesizing or zero-padding missing ones. Specifically, the feature representations extracted from upstream processing model are projected as tokens and fed into self-attention module to generate latent multimodal correlations. Then, a modal attention mechanism is introduced to build a shared representation, which can be applied by the downstream decision model. The proposed SFusion can be easily integrated into existing multimodal analysis networks. In this work, we apply SFusion to different backbone networks for human activity recognition and brain tumor segmentation tasks. Extensive experimental results show that the SFusion block achieves better performance than the competing fusion strategies. Our code is available at https://github.com/scut-cszcl/SFusion.Comment: This paper has been accepted by MICCAI 202

    Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition

    Get PDF
    Human activity recognition (HAR) using wearable sensors is a topic that is being actively researched in machine learning. Smart, sensor-embedded devices, such as smartphones, fitness trackers, or smart watches that collect detailed data on movement, are widely available now. HAR may be applied in areas such as healthcare, physiotherapy, and fitness to assist users of these smart devices in their daily lives. However, one of the main challenges facing HAR, particularly when it is used in supervised learning, is how balanced data may be obtained for algorithm optimisation and testing. Because users engage in some activities more than others, e.g. walking more than running, HAR datasets are typically imbalanced. The lack of dataset representation from minority classes, therefore, hinders the ability of HAR classifiers to sufficiently capture new instances of those activities. Inspired by the concept of data fusion, this thesis will introduce three new hybrid sampling methods. Thus, the diversity of the synthesised samples will be enhanced by combining output from separate sampling methods into three hybrid approaches. The advantage of the hybrid method is that it provides diverse synthetic data that can increase the size of the training data from different sampling approaches. This leads to improvements in the generalisation of a learning activity recognition model. The first strategy, known as the (DBM), combines synthetic minority oversampling techniques (SMOTE) with Random_SMOTE, both of which are built around the k-nearest neighbours algorithm. The second technique, called the noise detection-based method (NDBM), combines Tomek links (SMOTE_Tomeklinks) and the modified synthetic minority oversampling technique (MSMOTE). The third approach, titled the cluster-based method (CBM), combines cluster-based synthetic oversampling (CBSO) and the proximity weighted synthetic oversampling technique (ProWSyn). The performance of the proposed hybrid methods is compared with existing methods using accelerometer data from three commonly used benchmark datasets. The results show that the DBM, NDBM and CBM can significantly reduce the impact of class imbalance and enhance F1 scores of the multilayer perceptron (MLP) by as much as 9 % to 20 % compared with their constituent sampling methods. Also, the Friedman statistical significance test was conducted to compare the effect of the different sampling methods. The test results confirm that the CBM is more effective than the other sampling approaches. This thesis also introduces a method based on the Wasserstein generative adversarial network (WGAN) for generating different types of data on human activity. The WGAN is more stable to train than a generative adversarial network (GAN) and this is due to the use of a stable metric, namely Wasserstein distance, to compare the similarity between the real data distribution with the generated data distribution. WGAN is a deep learning approach, and in contrast to the six existing sampling methods referred to previously, it can operate on raw sensor data as convolutional and recurrent layers can act as feature extractors. WGAN is used to generate raw sensor data to overcome the limitations of the traditional machine learning-based sampling methods that can only operate on extracted features. The synthetic data that is produced by WGAN is then used to oversample the imbalanced training data. This thesis demonstrates that this approach significantly enhances the learning ability of the convolutional neural network(CNN) by as much as 5 % to 6 % from imbalanced human activity datasets. This thesis concludes that the proposed sampling methods based on traditional machine learning are efficient when human activity training data is imbalanced and small. These methods are less complex to implement, require less human activity training data to produce synthetic data and fewer computational resources than the WGAN approach. The proposed WGAN method is effective at producing raw sensor data when a large quantity of human activity training data is available. Additionally, it is time-consuming to optimise the hyperparameters related to the WGAN architecture, which significantly impacts the performance of the method
    corecore