100 research outputs found

    Attention-based atrous convolutional neural networks: visualisation and understanding perspectives of acoustic scenes

    Get PDF
    The goal of Acoustic Scene Classification (ASC) is to recognise the environment in which an audio waveform has been recorded. Recently, deep neural networks have been applied to ASC and have achieved state-of-the-art performance. However, few works have investigated how to visualise and understand what a neural network has learnt from acoustic scenes. Previous work applied local pooling after each convolutional layer, therefore reduced the size of the feature maps. In this paper, we suggest that local pooling is not necessary, but the size of the receptive field is important. We apply atrous Convolutional Neural Networks (CNNs) with global attention pooling as the classification model. The internal feature maps of the attention model can be visualised and explained. On the Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 dataset, our proposed method achieves an accuracy of 72.7 %, significantly outperforming the CNNs without dilation at 60.4 %. Furthermore, our results demonstrate that the learnt feature maps contain rich information on acoustic scenes in the time-frequency domain

    Towards using Cough for Respiratory Disease Diagnosis by leveraging Artificial Intelligence: A Survey

    Full text link
    Cough acoustics contain multitudes of vital information about pathomorphological alterations in the respiratory system. Reliable and accurate detection of cough events by investigating the underlying cough latent features and disease diagnosis can play an indispensable role in revitalizing the healthcare practices. The recent application of Artificial Intelligence (AI) and advances of ubiquitous computing for respiratory disease prediction has created an auspicious trend and myriad of future possibilities in the medical domain. In particular, there is an expeditiously emerging trend of Machine learning (ML) and Deep Learning (DL)-based diagnostic algorithms exploiting cough signatures. The enormous body of literature on cough-based AI algorithms demonstrate that these models can play a significant role for detecting the onset of a specific respiratory disease. However, it is pertinent to collect the information from all relevant studies in an exhaustive manner for the medical experts and AI scientists to analyze the decisive role of AI/ML. This survey offers a comprehensive overview of the cough data-driven ML/DL detection and preliminary diagnosis frameworks, along with a detailed list of significant features. We investigate the mechanism that causes cough and the latent cough features of the respiratory modalities. We also analyze the customized cough monitoring application, and their AI-powered recognition algorithms. Challenges and prospective future research directions to develop practical, robust, and ubiquitous solutions are also discussed in detail.Comment: 30 pages, 12 figures, 9 table

    Deep sleep: deep learning methods for the acoustic analysis of sleep-disordered breathing

    Get PDF
    Sleep-disordered breathing (SDB) is a serious and prevalent condition that results from the collapse of the upper airway during sleep, which leads to oxygen desaturations, unphysiological variations in intrathoracic pressure, and sleep fragmentation. Its most common form is obstructive sleep apnoea (OSA). This has a big impact on quality of life, and is associated with cardiovascular morbidity. Polysomnography, the gold standard for diagnosing SDB, is obtrusive, time-consuming and expensive. Alternative diagnostic approaches have been proposed to overcome its limitations. In particular, acoustic analysis of sleep breathing sounds offers an unobtrusive and inexpensive means to screen for SDB, since it displays symptoms with unique acoustic characteristics. These include snoring, loud gasps, chokes, and absence of breathing. This thesis investigates deep learning methods, which have revolutionised speech and audio technology, to robustly screen for SDB in typical sleep conditions using acoustics. To begin with, the desirable characteristics for an acoustic corpus of SDB, and the acoustic definition of snoring are considered to create corpora for this study. Then three approaches are developed to tackle increasingly complex scenarios. Firstly, with the aim of leveraging a large amount of unlabelled SDB data, unsupervised learning is applied to learn novel feature representations with deep neural networks for the classification of SDB events such as snoring. The incorporation of contextual information to assist the classifier in producing realistic event durations is investigated. Secondly, the temporal pattern of sleep breathing sounds is exploited using convolutional neural networks to screen participants sleeping by themselves for OSA. The integration of acoustic features with physiological data for screening is examined. Thirdly, for the purpose of achieving robustness to bed partner breathing sounds, recurrent neural networks are used to screen a subject and their bed partner for SDB in the same session. Experiments conducted on the constructed corpora show that the developed systems accurately classify SDB events, screen for OSA with high sensitivity and specificity, and screen a subject and their bed partner for SDB with encouraging performance. In conclusion, this thesis makes promising progress in improving access to SDB diagnosis through low-cost and non-invasive methods

    Detection of sleep disordered breathing severity using acoustic biomarker and machine learning techniques

    Get PDF
    Purpose Breathing sounds during sleep are altered and characterized by various acoustic specificities in patients with sleep disordered breathing (SDB). This study aimed to identify acoustic biomarkers indicative of the severity of SDB by analyzing the breathing sounds collected from a large number of subjects during entire overnight sleep. Methods The participants were patients who presented at a sleep center with snoring or cessation of breathing during sleep. They were subjected to full-night polysomnography (PSG) during which the breathing sound was recorded using a microphone. Then, audio features were extracted and a group of features differing significantly between different SDB severity groups was selected as a potential acoustic biomarker. To assess the validity of the acoustic biomarker, classification tasks were performed using several machine learning techniques. Based on the apnea–hypopnea index of the subjects, four-group classification and binary classification were performed. Results Using tenfold cross validation, we achieved an accuracy of 88.3% in the four-group classification and an accuracy of 92.5% in the binary classification. Experimental evaluation demonstrated that the models trained on the proposed acoustic biomarkers can be used to estimate the severity of SDB. Conclusions Acoustic biomarkers may be useful to accurately predict the severity of SDB based on the patients breathing sounds during sleep, without conducting attended full-night PSG. This study implies that any device with a microphone, such as a smartphone, could be potentially utilized outside specialized facilities as a screening tool for detecting SDB.The work was partly supported by the SNUBH Grant #06-2014-157 and the Bio and Medical Technology Development Program of the National Research Foundation (NRF) funded by the Korean government, Ministry of Science, ICT & Future Planning (MSIP) (NRF-2015M3A9D7066972, NRF-2015M3A9D7066980)

    CAA-Net: Conditional Atrous CNNs with attention for explainable device-robust acoustic scene classification

    Get PDF
    Acoustic Scene Classification (ASC) aims to classify the environment in which the audio signals are recorded. Recently, Convolutional Neural Networks (CNNs) have been successfully applied to ASC. However, the data distributions of the audio signals recorded with multiple devices are different. There has been little research on the training of robust neural networks on acoustic scene datasets recorded with multiple devices, and on explaining the operation of the internal layers of the neural networks. In this article, we focus on training and explaining device-robust CNNs on multi-device acoustic scene data. We propose conditional atrous CNNs with attention for multi-device ASC. Our proposed system contains an ASC branch and a device classification branch, both modelled by CNNs. We visualise and analyse the intermediate layers of the atrous CNNs. A time-frequency attention mechanism is employed to analyse the contribution of each time-frequency bin of the feature maps in the CNNs. On the Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 ASC dataset, recorded with three devices, our proposed model performs significantly better than CNNs trained on single-device data.Comment: IEEE Transactions on Multimedi

    State of the art of audio- and video based solutions for AAL

    Get PDF
    Working Group 3. Audio- and Video-based AAL ApplicationsIt is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary 4 debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.publishedVersio

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Snoring and arousals in full-night polysomnographic studies from sleep apnea-hypopnea syndrome patients

    Get PDF
    SAHS (Sleep Apnea-Hypopnea Syndrome) is recognized to be a serious disorder with high prevalence in the population. The main clinical triad for SAHS is made up of 3 symptoms: apneas and hypopneas, chronic snoring and excessive daytime sleepiness (EDS). The gold standard for diagnosing SAHS is an overnight polysomnographic study performed at the hospital, a laborious, expensive and time-consuming procedure in which multiple biosignals are recorded. In this thesis we offer improvements to the current approaches to diagnosis and assessment of patients with SAHS. We demonstrate that snoring and arousals, while recognized key markers of SAHS, should be fully appreciated as essential tools for SAHS diagnosis. With respect to snoring analysis (applied to a 34 subjects¿ database with a total of 74439 snores), as an alternative to acoustic analysis, we have used less complex approaches mostly based on time domain parameters. We concluded that key information on SAHS severity can be extracted from the analysis of the time interval between successive snores. For that, we built a new methodology which consists on applying an adaptive threshold to the whole night sequence of time intervals between successive snores. This threshold enables to identify regular and non-regular snores. Finally, we were able to correlate the variability of time interval between successive snores in short 15 minute segments and throughout the whole night with the subject¿s SAHS severity. Severe SAHS subjects show a shorter time interval between regular snores (p=0.0036, AHI cp(cut-point): 30h-1) and less dispersion on the time interval features during all sleep. Conversely, lower intra-segment variability (p=0.006, AHI cp: 30h-1) is seen for less severe SAHS subjects. Also, we have shown successful in classifying the subjects according to their SAHS severity using the features derived from the time interval between regular snores. Classification accuracy values of 88.2% (with 90% sensitivity, 75% specificity) and 94.1% (with 94.4% sensitivity, 93.8% specificity) for AHI cut-points of severity of 5 and 30h-1, respectively. In what concerns the arousal study, our work is focused on respiratory and spontaneous arousals (45 subjects with a total of 2018 respiratory and 2001 spontaneous arousals). Current beliefs suggest that the former are the main cause for sleep fragmentation. Accordingly, sleep clinicians assign an important role to respiratory arousals when providing a final diagnosis on SAHS. Provided that the two types of arousals are triggered by different mechanisms we hypothesized that there might exist differences between their EEG content. After characterizing our arousal database through spectral analysis, results showed that the content of respiratory arousals on a mild SAHS subject is similar to that of a severe one (p>>0.05). Similar results were obtained for spontaneous arousals. Our findings also revealed that no differences are observed between the features of these two kinds of arousals on a same subject (r=0.8, p<0.01 and concordance with Bland-Altman analysis). As a result, we verified that each subject has almost like a fingerprint or signature for his arousals¿ content and is similar for both types of arousals. In addition, this signature has no correlation with SAHS severity and this is confirmed for the three EEG tracings (C3A2, C4A1 and O1A2). Although the trigger mechanisms of the two arousals are known to be different, our results showed that the brain response is fairly the same for both of them. The impact that respiratory arousals have in the sleep of SAHS patients is unquestionable but our findings suggest that the impact of spontaneous arousals should not be underestimated

    Exploring the Landscape of Ubiquitous In-home Health Monitoring: A Comprehensive Survey

    Full text link
    Ubiquitous in-home health monitoring systems have become popular in recent years due to the rise of digital health technologies and the growing demand for remote health monitoring. These systems enable individuals to increase their independence by allowing them to monitor their health from the home and by allowing more control over their well-being. In this study, we perform a comprehensive survey on this topic by reviewing a large number of literature in the area. We investigate these systems from various aspects, namely sensing technologies, communication technologies, intelligent and computing systems, and application areas. Specifically, we provide an overview of in-home health monitoring systems and identify their main components. We then present each component and discuss its role within in-home health monitoring systems. In addition, we provide an overview of the practical use of ubiquitous technologies in the home for health monitoring. Finally, we identify the main challenges and limitations based on the existing literature and provide eight recommendations for potential future research directions toward the development of in-home health monitoring systems. We conclude that despite extensive research on various components needed for the development of effective in-home health monitoring systems, the development of effective in-home health monitoring systems still requires further investigation.Comment: 35 pages, 5 figure

    Networking Architecture and Key Technologies for Human Digital Twin in Personalized Healthcare: A Comprehensive Survey

    Full text link
    Digital twin (DT), refers to a promising technique to digitally and accurately represent actual physical entities. One typical advantage of DT is that it can be used to not only virtually replicate a system's detailed operations but also analyze the current condition, predict future behaviour, and refine the control optimization. Although DT has been widely implemented in various fields, such as smart manufacturing and transportation, its conventional paradigm is limited to embody non-living entities, e.g., robots and vehicles. When adopted in human-centric systems, a novel concept, called human digital twin (HDT) has thus been proposed. Particularly, HDT allows in silico representation of individual human body with the ability to dynamically reflect molecular status, physiological status, emotional and psychological status, as well as lifestyle evolutions. These prompt the expected application of HDT in personalized healthcare (PH), which can facilitate remote monitoring, diagnosis, prescription, surgery and rehabilitation. However, despite the large potential, HDT faces substantial research challenges in different aspects, and becomes an increasingly popular topic recently. In this survey, with a specific focus on the networking architecture and key technologies for HDT in PH applications, we first discuss the differences between HDT and conventional DTs, followed by the universal framework and essential functions of HDT. We then analyze its design requirements and challenges in PH applications. After that, we provide an overview of the networking architecture of HDT, including data acquisition layer, data communication layer, computation layer, data management layer and data analysis and decision making layer. Besides reviewing the key technologies for implementing such networking architecture in detail, we conclude this survey by presenting future research directions of HDT
    corecore