7 research outputs found

    On robustness of cloud speech APIs: An early characterization

    Get PDF
    The robustness and consistency of sensory inference models under changing environmental conditions and hardware is a crucial requirement for the generalizability of recent innovative work, particularly in the field of deep learning, from the lab to the real world. We measure the extent to which current speech recognition cloud models are robust to background noise, and show that hardware variability is still a problem for real-world applicability of state-of-the-art speech recognition models

    Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems

    Full text link
    Mobile and embedded devices are increasingly using microphones and audio-based computational models to infer user context. A major challenge in building systems that combine audio models with commodity microphones is to guarantee their accuracy and robustness in the real-world. Besides many environmental dynamics, a primary factor that impacts the robustness of audio models is microphone variability. In this work, we propose Mic2Mic -- a machine-learned system component -- which resides in the inference pipeline of audio models and at real-time reduces the variability in audio data caused by microphone-specific factors. Two key considerations for the design of Mic2Mic were: a) to decouple the problem of microphone variability from the audio task, and b) put a minimal burden on end-users to provide training data. With these in mind, we apply the principles of cycle-consistent generative adversarial networks (CycleGANs) to learn Mic2Mic using unlabeled and unpaired data collected from different microphones. Our experiments show that Mic2Mic can recover between 66% to 89% of the accuracy lost due to microphone variability for two common audio tasks.Comment: Published at ACM IPSN 201

    Dos and Don'ts in Mobile Phone Sensing Middleware: Learning from a Large-Scale Experiment

    Get PDF
    International audienceMobile phone sensing contributes to changing the way we approach science: massive amount of data is being contributed across places and time, and paves the way for advanced analyses of numerous phenomena at an unprecedented scale. Still, despite the extensive research work on enabling resource-efficient mobile phone sensing with a very-large crowd, key challenges remain. One challenge is facing the introduction of a new heterogeneity dimension in the traditional middleware research landscape. The middleware must deal with the heterogeneity of the contributing crowd in addition to the system's technical heterogeneities. In order to tackle these two heterogeneity dimensions together, we have been conducting a large-scale empirical study in cooperation with the city of Paris. Our experiment revolves around the public release of a mobile app for urban pollution monitoring that builds upon a dedicated mobile crowd-sensing middleware. In this paper, we report on the empirical analysis of the resulting mobile phone sensing efficiency from both technical and social perspectives, in face of a large and highly heterogeneous population of participants. We concentrate on the data originating from the 20 most popular phone models of our user base, which represent contributions from over 2,000 users with 23 million observations collected over 10 months. Following our analysis, we introduce a few recommendations to overcome-technical and crowd-heterogeneities in the implementation of mobile phone sensing applications and supporting middleware

    Mic2Mic: using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems

    Get PDF
    Mobile and embedded devices are increasingly using microphones and audio-based computational models to infer user context. A major challenge in building systems that combine audio models with commodity microphones is to guarantee their accuracy and robustness in the real-world. Besides many environmental dynamics, a primary factor that impacts the robustness of audio models is microphone variability. In this work, we propose Mic2Mic - a machine-learned system component - which resides in the inference pipeline of audio models and at real-time reduces the variability in audio data caused by microphone-specific factors. Two key considerations for the design of Mic2Mic were: a) to decouple the problem of microphone variability from the audio task, and b) put minimal burden on end-users to provide training data. With these in mind, we apply the principles of cycle-consistent generative adversarial networks (CycleGANs) to learn Mic2Mic using unlabeled and unpaired data collected from different microphones. Our experiments show that Mic2Mic can recover between 66% to 89% of the accuracy lost due to microphone variability for two common audio tasks

    Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities

    Full text link
    The vast proliferation of sensor devices and Internet of Things enables the applications of sensor-based activity recognition. However, there exist substantial challenges that could influence the performance of the recognition system in practical scenarios. Recently, as deep learning has demonstrated its effectiveness in many areas, plenty of deep methods have been investigated to address the challenges in activity recognition. In this study, we present a survey of the state-of-the-art deep learning methods for sensor-based human activity recognition. We first introduce the multi-modality of the sensory data and provide information for public datasets that can be used for evaluation in different challenge tasks. We then propose a new taxonomy to structure the deep methods by challenges. Challenges and challenge-related deep methods are summarized and analyzed to form an overview of the current research progress. At the end of this work, we discuss the open issues and provide some insights for future directions

    Design and evaluation in the large of health apps for the general population with case studies in mindfulness, neurological and psychological assessment

    Get PDF
    Nowadays, there is a high number of health apps, i.e. apps aimed at helping people learn, track and improve health conditions and behaviors, available on on-line app stores, such as Apple\u2019s App Store and Google Play, as well as on social networks, such as Facebook. However, very few of these apps have been created by healthcare experts or have been scientifically evaluated, posing the risk that they might be ineffective or even detrimental to people. In this thesis, we have explored how HCI design and evaluation methods can be applied for proposing effective health apps that target the general population. In particular, we focused on three domains, i.e. (i) mindfulness, (ii) psychological assessment, and (iii) neurological assessment, and proposed an app for each domain. Then, to improve their design or assess their efficacy in different contexts of use, we carried out quantitative and qualitative short- and long-term studies. To this purpose, we employed traditional HCI user-centered design methods, i.e. lab and in situ studies, as well as a recently proposed method, i.e. research in the large, to possibly recruit a large number of users. In the thesis, for each proposed health app, we discuss the results of the evaluations we carried out, as well as pointing out strengths and limitations of the particular study methodology employed. Finally, we outline possible future wor
    corecore