research

The University of Sussex-Huawei locomotion and transportation dataset for multimodal analytics with mobile devices

Abstract

Scientific advances build on reproducible research which need publicly available benchmark datasets. The computer vision and speech recognition communities have led the way in establishing benchmark datasets. There are much less datasets available in mobile computing, especially for rich locomotion and transportation analytics. This paper presents a highly versatile and precisely annotated large-scale dataset of smartphone sensor data for multimodal locomotion and transportation analytics of mobile users. The dataset comprises 7 months of measurements, collected from all sensors of 4 smartphones carried at typical body locations, including the images of a body-worn camera, while 3 participants used 8 different modes of transportation in the southeast of the United Kingdom, including in London. In total 28 context labels were annotated, including transportation mode, participant’s posture, inside/outside location, road conditions, traffic conditions, presence in tunnels, social interactions, and having meals. The total amount of collected data exceed 950 GB of sensor data, which corresponds to 2812 hours of labelled data and 17562 km of traveled distance. We present how we set up the data collection, including the equipment used and the experimental protocol. We discuss the dataset, including the data curation process, the analysis of the annotations and of the sensor data. We discuss the challenges encountered and present the lessons learned and some of the best practices we developed to ensure high quality data collection and annotation. We discuss the potential applications which can be developed using this large-scale dataset. In particular, we present how a machine-learning system can use this dataset to automatically recognize modes of transportations. Many other research questions related to transportation analytics, activity recognition, radio signal propagation and mobility modelling can be adressed through this dataset. The full dataset is being made available to the community, and a thorough preview is already publishe

    Similar works