1,002 research outputs found
Audio Content Analysis for Unobtrusive Event Detection in Smart Homes
Institute of Engineering Sciences
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Environmental sound signals are multi-source, heterogeneous, and varying in time. Many systems have been proposed to process such signals for event detection in ambient assisted living applications. Typically, these systems use feature extraction, selection, and classification. However, despite major advances, several important questions remain unanswered, especially in real-world settings. This paper contributes to the body of knowledge in the field by addressing the following problems for ambient sounds recorded in various real-world kitchen environments: 1) which features and which classifiers are most suitable in the
presence of background noise? 2) what is the effect of signal duration on recognition accuracy? 3) how do the signal-to-noise-ratio and the distance between the microphone and the audio source affect the recognition accuracy in an environment in which the system was not trained? We show that for systems that use traditional classifiers, it is beneficial to combine gammatone frequency cepstral coefficients and discrete wavelet transform coefficients and to use a gradient boosting classifier. For systems based on deep learning, we consider 1D and 2D Convolutional Neural Networks (CNN) using mel-spectrogram energies
and mel-spectrograms images, as inputs, respectively and show that the 2D CNN outperforms the 1D CNN. We obtained competitive classification results for two such systems. The first one, which uses a gradient boosting classifier,
achieved an F1-Score of 90.2% and a recognition accuracy of 91.7%. The second
one, which uses a 2D CNN with mel-spectrogram images, achieved an F1-Score
of 92.7% and a recognition accuracy of 96%
Vocal imitations and the identification of sound events
International audienceIt is commonly observed that a speaker vocally imitates a sound that she or he intends to communicate to an interlocutor. We report on an experiment that examined the assumption that vocal imitations can e ffectively communicate a referent sound, and that they do so by conveying the features necessary for the identifi cation of the referent sound event. Subjects were required to sort a set of vocal imitations of everyday sounds. The resulting clusters corresponded in most of the cases to the categories of the referent sound events, indicating that the imitations enabled the listeners to recover what was imitated. Furthermore, a binary decision tree analysis showed that a few characteristic acoustic features predicted the clusters. These features also predicted the classi fication of the referent sounds, but did not generalize to the categorization of other sounds. This showed that, for the speaker, vocally imitating a sound consists of conveying the acoustic features important for recognition, within the constraints of human vocal production. As such vocal imitations prove to be a phenomenon potentially useful to study sound identifi cation
Machine Learning for Human Activity Detection in Smart Homes
Recognizing human activities in domestic environments from audio and active power consumption sensors is a challenging task since on the one hand, environmental sound signals are multi-source, heterogeneous, and varying in time and on the other hand, the active power consumption varies significantly for similar type electrical appliances.
Many systems have been proposed to process environmental sound signals for event detection in ambient assisted living applications. Typically, these systems use feature extraction, selection, and classification. However, despite major advances, several important questions remain unanswered, especially in real-world settings. A part of this thesis contributes to the body of knowledge in the field by addressing the following problems for ambient sounds recorded in various real-world kitchen environments: 1) which features, and which classifiers are most suitable in the presence of background noise? 2) what is the effect of signal duration on recognition accuracy? 3) how do the SNR and the distance between the microphone and the audio source affect the recognition accuracy in an environment in which the system was not trained? We show that for systems that use traditional classifiers, it is beneficial to combine gammatone frequency cepstral coefficients and discrete wavelet transform coefficients and to use a gradient boosting classifier. For systems based on deep learning, we consider 1D and 2D CNN using mel-spectrogram energies and mel-spectrograms images, as inputs, respectively and show that the 2D CNN outperforms the 1D CNN. We obtained competitive classification results for two such systems and validated the performance of our algorithms on public datasets (Google Brain/TensorFlow Speech Recognition Challenge and the 2017 Detection and Classification of Acoustic Scenes and Events Challenge).
Regarding the problem of the energy-based human activity recognition in a household environment, machine learning techniques to infer the state of household appliances from their energy consumption data are applied and rule-based scenarios that exploit these states to detect human activity are used. Since most activities within a house are related with the operation of an electrical appliance, this unimodal approach has a significant advantage using inexpensive smart plugs and smart meters for each appliance. This part of the thesis proposes the use of unobtrusive and easy-install tools (smart plugs) for data collection and a decision engine that combines energy signal classification using dominant classifiers (compared in advanced with grid search) and a probabilistic measure for appliance usage. It helps preserving the privacy of the resident, since all the activities are stored in a local database.
DNNs received great research interest in the field of computer vision. In this thesis we adapted different architectures for the problem of human activity recognition. We analyze the quality of the extracted features, and more specifically how model architectures and parameters affect the ability of the automatically extracted features from DNNs to separate activity classes in the final feature space. Additionally, the architectures that we applied for our main problem were also applied to text classification in which we consider the input text as an image and apply 2D CNNs to learn the local and global semantics of the sentences from the variations of the visual patterns of words. This work helps as a first step of creating a dialogue agent that would not require any natural language preprocessing.
Finally, since in many domestic environments human speech is present with other environmental sounds, we developed a Convolutional Recurrent Neural Network, to separate the sound sources and applied novel post-processing filters, in order to have an end-to-end noise robust system. Our algorithm ranked first in the Apollo-11 Fearless Steps Challenge.Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 676157, project ACROSSIN
From Human to Robot Everyday Activity
The Everyday Activities Science and Engineering (EASE) Collaborative Research Consortium’s mission to enhance the performance of cognition-enabled robots establishes its foundation in the EASE Human Activities Data Analysis Pipeline. Through collection of diverse human activity information resources, enrichment with contextually relevant annotations, and subsequent multimodal analysis of the combined data sources, the pipeline described will provide a rich resource for robot planning researchers, through incorporation in the OpenEASE cloud platform.8997900
Human Adaptation to Isolated and Confined Environments
A study was conducted over seven months in a winter Antarctic isolated and confined environment (ICE). Physiological and psychological data was collected several times a week. Information was collected on a monthly basis on behavior and the use of physical facilities. Adaptation and information indicated that there was a significant decrease in epinephrine and norepinephrine during the middle trimester of the winter. No vital changes were found for blood pressure. Self reports of hostility and anxiety show a linear increase. There were no significant changes in depression during ICE. The physiological and psychological data do not move in a synchronous fashion over time. The data also suggest that both ambient qualities of an ICE and discrete social environmental events, such as the arrival of the summer crew, have an impact on the outcome measures used. It may be most appropiate to develop a model for ICE's that incorporates not only global chronic stressors common to all ICE's but also the role of discrete environmental effects which can minimize or enhance the influence of more chronic stressors. Behavioral adjustment information highlight the importance of developing schedules which balance work and recreational activities
Detection of acoustic events with application to environment monitoring
The goal of this work is to present different detection
techniques and its feasibility for detecting unknown
acoustic signals with general applicability to different
noise conditions. These conditions replicate those commonly
found in real-world acoustic scenarios where information
about the noise and signal characteristics is
frequently lacking. For this purpose, different extensions
of the energy detector and even new structures for improving
the robustness in detection are considered and
explained. Furthermore, three different research lines of
application are presented in which the energy detector
and its extensions are used to improve the localization
accuracy and the classification rates of acoustic sounds.Moragues Escrivá, J.; Serrano Cartagena, A.; Lara MartĂnez, G.; Gosálbez Castillo, J.; Vergara DomĂnguez, L. (2012). Detection of acoustic events with application to environment monitoring. Waves. 4:25-33. http://hdl.handle.net/10251/56161S2533
Robust Audio and WiFi Sensing via Domain Adaptation and Knowledge Sharing From External Domains
Recent advancements in machine learning have initiated a revolution in embedded sensing and inference systems. Acoustic and WiFi-based sensing and inference systems have enabled a wide variety of applications ranging from home activity detection to health vitals monitoring. While many existing solutions paved the way for acoustic event recognition and WiFi-based activity detection, the diverse characteristics in sensors, systems, and environments used for data capture cause a shift in the distribution of data and thus results in sub-optimal classification performance when the sensor and environment discrepancy occurs between training and inference stage. Moreover, large-scale acoustic and WiFi data collection is non-trivial and cumbersome. Therefore, current acoustic and WiFi-based sensing systems suffer when there is a lack of labeled samples as they only rely on the provided training data. In this thesis, we aim to address the performance loss of machine learning-based classifiers for acoustic and WiFi-based sensing systems due to sensor and environment heterogeneity and lack of labeled examples. We show that discovering latent domains (sensor type, environment, etc.) and removing domain bias from machine learning classifiers make acoustic and WiFi-based sensing robust and generalized. We also propose a few-shot domain adaptation method that requires only one labeled sample for a new domain that relieves the users and developers from the painstaking task of data collection at each new domain. Furthermore, to address the lack of labeled examples, we propose to exploit the information or learned knowledge from sources where available data already exists in volumes, such as textual descriptions and visual domain. We implemented our algorithms in mobile and embedded platforms and collected data from participants to evaluate our proposed algorithms and frameworks in an extensive manner.Doctor of Philosoph
A pervasive body sensor network for monitoring post-operative recovery
Over the past decade, miniaturisation and cost reduction brought about by the semiconductor industry has led to computers smaller in size than a pin head, powerful enough to carry out the processing required, and affordable enough to be disposable. Similar technological advances in wireless communication, sensor design, and energy storage have resulted in the development of wireless “Body Sensor Network (BSN) platforms comprising of tiny integrated micro sensors with onboard processing and wireless data transfer capability, offering the prospect of pervasive and continuous home health monitoring. In surgery, the reduced trauma of minimally invasive interventions combined with initiatives to reduce length of hospital stay and a socioeconomic drive to reduce hospitalisation costs, have all resulted in a trend towards earlier discharge from hospital. There is now a real need for objective, pervasive, and continuous post-operative home recovery monitoring systems. Surgical recovery is a multi-faceted and dynamic process involving biological, physiological, functional, and psychological components. Functional recovery (physical independence, activities of daily living, and mobility) is recognised as a good global indicator of a patient’s post-operative course, but has traditionally been difficult to objectively quantify. This thesis outlines the development of a pervasive wireless BSN system to objectively monitor the functional recovery of post-operative patients at home. Biomechanical markers were identified as surrogate measures for activities of daily living and mobility impairment, and an ear-worn activity recognition (e-AR) sensor containing a three-axis accelerometer and a pulse oximeter was used to collect this data. A simulated home environment was created to test a Bayesian classifier framework with multivariate Gaussians to model activity classes. A real-time activity index was used to provide information on the intensity of activity being performed. Mobility impairment was simulated with bracing systems and a multiresolution wavelet analysis and margin-based feature selection framework was used to detect impaired mobility. The e-AR sensor was tested in a home environment before its clinical use in monitoring post-operative home recovery of real patients who have undergone surgery. Such a system may eventually form part of an objective pervasive home recovery monitoring system tailored to the needs of today’s post-operative patient.Open acces
State of the art of audio- and video based solutions for AAL
Working Group 3. Audio- and Video-based AAL ApplicationsIt is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary 4 debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach.
This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users.
The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted.
The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.publishedVersio
- …