7 research outputs found
On robustness of cloud speech APIs: An early characterization
The robustness and consistency of sensory inference models under changing environmental conditions and hardware is a crucial requirement for the generalizability of recent innovative work, particularly in the field of deep learning, from the lab to the real world. We measure the extent to which current speech recognition cloud models are robust to background noise, and show that hardware variability is still a problem for real-world applicability of state-of-the-art speech recognition models
Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems
Mobile and embedded devices are increasingly using microphones and
audio-based computational models to infer user context. A major challenge in
building systems that combine audio models with commodity microphones is to
guarantee their accuracy and robustness in the real-world. Besides many
environmental dynamics, a primary factor that impacts the robustness of audio
models is microphone variability. In this work, we propose Mic2Mic -- a
machine-learned system component -- which resides in the inference pipeline of
audio models and at real-time reduces the variability in audio data caused by
microphone-specific factors. Two key considerations for the design of Mic2Mic
were: a) to decouple the problem of microphone variability from the audio task,
and b) put a minimal burden on end-users to provide training data. With these
in mind, we apply the principles of cycle-consistent generative adversarial
networks (CycleGANs) to learn Mic2Mic using unlabeled and unpaired data
collected from different microphones. Our experiments show that Mic2Mic can
recover between 66% to 89% of the accuracy lost due to microphone variability
for two common audio tasks.Comment: Published at ACM IPSN 201
Dos and Don'ts in Mobile Phone Sensing Middleware: Learning from a Large-Scale Experiment
International audienceMobile phone sensing contributes to changing the way we approach science: massive amount of data is being contributed across places and time, and paves the way for advanced analyses of numerous phenomena at an unprecedented scale. Still, despite the extensive research work on enabling resource-efficient mobile phone sensing with a very-large crowd, key challenges remain. One challenge is facing the introduction of a new heterogeneity dimension in the traditional middleware research landscape. The middleware must deal with the heterogeneity of the contributing crowd in addition to the system's technical heterogeneities. In order to tackle these two heterogeneity dimensions together, we have been conducting a large-scale empirical study in cooperation with the city of Paris. Our experiment revolves around the public release of a mobile app for urban pollution monitoring that builds upon a dedicated mobile crowd-sensing middleware. In this paper, we report on the empirical analysis of the resulting mobile phone sensing efficiency from both technical and social perspectives, in face of a large and highly heterogeneous population of participants. We concentrate on the data originating from the 20 most popular phone models of our user base, which represent contributions from over 2,000 users with 23 million observations collected over 10 months. Following our analysis, we introduce a few recommendations to overcome-technical and crowd-heterogeneities in the implementation of mobile phone sensing applications and supporting middleware
Mic2Mic: using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems
Mobile and embedded devices are increasingly using microphones and audio-based computational models to infer user context. A major challenge in building systems that combine audio models with commodity microphones is to guarantee their accuracy and robustness in the real-world. Besides many environmental dynamics, a primary factor that impacts the robustness of audio models is microphone variability. In this work, we propose Mic2Mic - a machine-learned system component - which resides in the inference pipeline of audio models and at real-time reduces the variability in audio data caused by microphone-specific factors. Two key considerations for the design of Mic2Mic were: a) to decouple the problem of microphone variability from the audio task, and b) put minimal burden on end-users to provide training data. With these in mind, we apply the principles of cycle-consistent generative adversarial networks (CycleGANs) to learn Mic2Mic using unlabeled and unpaired data collected from different microphones. Our experiments show that Mic2Mic can recover between 66% to 89% of the accuracy lost due to microphone variability for two common audio tasks
Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities
The vast proliferation of sensor devices and Internet of Things enables the
applications of sensor-based activity recognition. However, there exist
substantial challenges that could influence the performance of the recognition
system in practical scenarios. Recently, as deep learning has demonstrated its
effectiveness in many areas, plenty of deep methods have been investigated to
address the challenges in activity recognition. In this study, we present a
survey of the state-of-the-art deep learning methods for sensor-based human
activity recognition. We first introduce the multi-modality of the sensory data
and provide information for public datasets that can be used for evaluation in
different challenge tasks. We then propose a new taxonomy to structure the deep
methods by challenges. Challenges and challenge-related deep methods are
summarized and analyzed to form an overview of the current research progress.
At the end of this work, we discuss the open issues and provide some insights
for future directions
Design and evaluation in the large of health apps for the general population with case studies in mindfulness, neurological and psychological assessment
Nowadays, there is a high number of health apps, i.e. apps aimed at helping people learn, track and improve health conditions and behaviors, available on on-line app stores, such as Apple\u2019s App Store and Google Play, as well as on social networks, such as Facebook. However, very few of these apps have been created by healthcare experts or have been scientifically evaluated, posing the risk that they might be ineffective or even detrimental to people. In this thesis, we have explored how HCI design and evaluation methods can be applied for proposing effective health apps that target the general population. In particular, we focused on three domains, i.e. (i) mindfulness, (ii) psychological assessment, and (iii) neurological assessment, and proposed an app for each domain. Then, to improve their design or assess their efficacy in different contexts of use, we carried out quantitative and qualitative short- and long-term studies. To this purpose, we employed traditional HCI user-centered design methods, i.e. lab and in situ studies, as well as a recently proposed method, i.e. research in the large, to possibly recruit a large number of users. In the thesis, for each proposed health app, we discuss the results of the evaluations we carried out, as well as pointing out strengths and limitations of the particular study methodology employed. Finally, we outline possible future wor