2,305 research outputs found

    I hear you eat and speak: automatic recognition of eating condition and food type, use-cases, and impact on ASR performance

    Get PDF
    We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient

    DETECTION OF HEALTH-RELATED BEHAVIOURS USING HEAD-MOUNTED DEVICES

    Get PDF
    The detection of health-related behaviors is the basis of many mobile-sensing applications for healthcare and can trigger other inquiries or interventions. Wearable sensors have been widely used for mobile sensing due to their ever-decreasing cost, ease of deployment, and ability to provide continuous monitoring. In this dissertation, we develop a generalizable approach to sensing eating-related behavior. First, we developed Auracle, a wearable earpiece that can automatically detect eating episodes. Using an off-the-shelf contact microphone placed behind the ear, Auracle captures the sound of a person chewing as it passes through the head. This audio data is then processed by a custom circuit board. We collected data with 14 participants for 32 hours in free-living conditions and achieved accuracy exceeding 92.8% and F1 score exceeding77.5% for eating detection with 1-minute resolution. Second, we adapted Auracle for measuring children’s eating behavior, and improved the accuracy and robustness of the eating-activity detection algorithms. We used this improved prototype in a laboratory study with a sample of 10 children for 60 total sessions and collected 22.3 hours of data in both meal and snack scenarios. Overall, we achieved 95.5% accuracy and 95.7% F1 score for eating detection with 1-minute resolution. Third, we developed a computer-vision approach for eating detection in free-living scenarios. Using a miniature head-mounted camera, we collected data with 10 participants for about 55 hours. The camera was fixed under the brim of a cap, pointing to the mouth of the wearer and continuously recording video (but not audio) throughout their normal daily activity. We evaluated performance for eating detection using four different Convolutional Neural Network (CNN) models. The best model achieved 90.9% accuracy and 78.7%F1 score for eating detection with 1-minute resolution. Finally, we validated the feasibility of deploying the 3D CNN model in wearable or mobile platforms when considering computation, memory, and power constraints

    A statistical analysis of cervical auscultation signals from adults with unsafe airway protection

    Get PDF
    Background: Aspiration, where food or liquid is allowed to enter the larynx during a swallow, is recognized as the most clinically salient feature of oropharyngeal dysphagia. This event can lead to short-term harm via airway obstruction or more long-term effects such as pneumonia. In order to non-invasively identify this event using high resolution cervical auscultation there is a need to characterize cervical auscultation signals from subjects with dysphagia who aspirate. Methods: In this study, we collected swallowing sound and vibration data from 76 adults (50 men, 26 women, mean age 62) who underwent a routine videofluoroscopy swallowing examination. The analysis was limited to swallows of liquid with either thin (<5 cps) or viscous (≈300 cps) consistency and was divided into those with deep laryngeal penetration or aspiration (unsafe airway protection), and those with either shallow or no laryngeal penetration (safe airway protection), using a standardized scale. After calculating a selection of time, frequency, and time-frequency features for each swallow, the safe and unsafe categories were compared using Wilcoxon rank-sum statistical tests. Results: Our analysis found that few of our chosen features varied in magnitude between safe and unsafe swallows with thin swallows demonstrating no statistical variation. We also supported our past findings with regard to the effects of sex and the presence or absence of stroke on cervical ausculation signals, but noticed certain discrepancies with regards to bolus viscosity. Conclusions: Overall, our results support the necessity of using multiple statistical features concurrently to identify laryngeal penetration of swallowed boluses in future work with high resolution cervical auscultation

    Advances in Management of Voice and Swallowing Disorders

    Get PDF
    Special Issue “Advances in Management of Voice and Swallowing Disorders” is dedicated to innovations in screening and assessment and the effectiveness of interventions in both dysphonia and dysphagia. In contemporary practice, novel techniques have been introduced in diagnostics and rehabilitative interventions (e.g., machine learning, electrical stimulation). Similarly, advancements in methodological approaches to validate measures have been introduced (e.g., item response theory using Rasch analysis), prompting the need to develop new, robust measures for use in clinics and intervention studies. Against this backdrop, this Special Issue focuses on studies aiming to improve early diagnostics of laryngological disorders and its management. This issue also welcomes the submission of studies on diagnostic accuracy and psychometrics performance of existing and newly developed measures. This includes but is not limited to studies investigating screening tools with sound diagnostic accuracy and robust psychometric properties. Furthermore, interventions with high levels of evidence in relation to clinical outcome using robust methodology (e.g., sophisticated meta-analytic approaches) are of great interest. This issue provides an overview of the latest advances in voice and swallowing disorders

    Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture

    Full text link
    This paper presents a configurable version of Extreme Bandwidth Extension Network (EBEN), a Generative Adversarial Network (GAN) designed to improve audio captured with body-conduction microphones. We show that although these microphones significantly reduce environmental noise, this insensitivity to ambient noise happens at the expense of the bandwidth of the speech signal acquired by the wearer of the devices. The obtained captured signals therefore require the use of signal enhancement techniques to recover the full-bandwidth speech. EBEN leverages a configurable multiband decomposition of the raw captured signal. This decomposition allows the data time domain dimensions to be reduced and the full band signal to be better controlled. The multiband representation of the captured signal is processed through a U-Net-like model, which combines feature and adversarial losses to generate an enhanced speech signal. We also benefit from this original representation in the proposed configurable discriminators architecture. The configurable EBEN approach can achieve state-of-the-art enhancement results on synthetic data with a lightweight generator that allows real-time processing.Comment: Accepted in IEEE/ACM Transactions on Audio, Speech and Language Processing on 14/08/202

    Fog Computing in Medical Internet-of-Things: Architecture, Implementation, and Applications

    Full text link
    In the era when the market segment of Internet of Things (IoT) tops the chart in various business reports, it is apparently envisioned that the field of medicine expects to gain a large benefit from the explosion of wearables and internet-connected sensors that surround us to acquire and communicate unprecedented data on symptoms, medication, food intake, and daily-life activities impacting one's health and wellness. However, IoT-driven healthcare would have to overcome many barriers, such as: 1) There is an increasing demand for data storage on cloud servers where the analysis of the medical big data becomes increasingly complex, 2) The data, when communicated, are vulnerable to security and privacy issues, 3) The communication of the continuously collected data is not only costly but also energy hungry, 4) Operating and maintaining the sensors directly from the cloud servers are non-trial tasks. This book chapter defined Fog Computing in the context of medical IoT. Conceptually, Fog Computing is a service-oriented intermediate layer in IoT, providing the interfaces between the sensors and cloud servers for facilitating connectivity, data transfer, and queryable local database. The centerpiece of Fog computing is a low-power, intelligent, wireless, embedded computing node that carries out signal conditioning and data analytics on raw data collected from wearables or other medical sensors and offers efficient means to serve telehealth interventions. We implemented and tested an fog computing system using the Intel Edison and Raspberry Pi that allows acquisition, computing, storage and communication of the various medical data such as pathological speech data of individuals with speech disorders, Phonocardiogram (PCG) signal for heart rate estimation, and Electrocardiogram (ECG)-based Q, R, S detection.Comment: 29 pages, 30 figures, 5 tables. Keywords: Big Data, Body Area Network, Body Sensor Network, Edge Computing, Fog Computing, Medical Cyberphysical Systems, Medical Internet-of-Things, Telecare, Tele-treatment, Wearable Devices, Chapter in Handbook of Large-Scale Distributed Computing in Smart Healthcare (2017), Springe

    Communication system for a tooth-mounted RF sensor used for continuous monitoring of nutrient intake

    Get PDF
    In this Thesis, the communication system of a wearable device that monitors the user’s diet is studied. Based in a novel RF metamaterial-based mouth sensor, different decisions have to be made concerning the system’s technologies, such as the power source options for the device, the wireless technology used for communications and the method to obtain data from the sensor. These issues, along with other safety rules and regulations, are reviewed, as the first stage of development of the Food-Intake Monitoring projectOutgoin

    Cervical Auscultation for the Identification of Swallowing Difficulties

    Get PDF
    Swallowing difficulties, commonly referred to as dysphagia, affect thousands of Americans every year. They have a multitude of causes, but in general they are known to increase the risk of aspiration when swallowing in addition to other physiological effects. Cervical auscultation has been recently applied to detect such difficulties non-invasively and various techniques for analysis and processing of the recorded signals have been proposed. We attempted to further this research in three key areas. First, we characterized swallows with regards to a multitude of time, frequency, and time-frequency features while paying special attention to the differences between swallows from healthy adults and safe dysphagic swallows as well as safe and unsafe dysphagic swallows. Second, we attempted to utilize deep belief networks in order to classify these states automatically and without the aid of a concurrent videofluoroscopic examination. Finally, we sought to improve some of the signal processing techniques used in this field. We both implemented the DBSCAN algorithm to better segment our physiological signals as well as applied the matched complex wavelet transform to cervical auscultation data in order to improve its quality for mathematical analysis

    Multi-modal and multi-dimensional biomedical image data analysis using deep learning

    Get PDF
    There is a growing need for the development of computational methods and tools for automated, objective, and quantitative analysis of biomedical signal and image data to facilitate disease and treatment monitoring, early diagnosis, and scientific discovery. Recent advances in artificial intelligence and machine learning, particularly in deep learning, have revolutionized computer vision and image analysis for many application areas. While processing of non-biomedical signal, image, and video data using deep learning methods has been very successful, high-stakes biomedical applications present unique challenges such as different image modalities, limited training data, need for explainability and interpretability etc. that need to be addressed. In this dissertation, we developed novel, explainable, and attention-based deep learning frameworks for objective, automated, and quantitative analysis of biomedical signal, image, and video data. The proposed solutions involve multi-scale signal analysis for oraldiadochokinesis studies; ensemble of deep learning cascades using global soft attention mechanisms for segmentation of meningeal vascular networks in confocal microscopy; spatial attention and spatio-temporal data fusion for detection of rare and short-term video events in laryngeal endoscopy videos; and a novel discrete Fourier transform driven class activation map for explainable-AI and weakly-supervised object localization and segmentation for detailed vocal fold motion analysis using laryngeal endoscopy videos. Experiments conducted on the proposed methods showed robust and promising results towards automated, objective, and quantitative analysis of biomedical data, that is of great value for potential early diagnosis and effective disease progress or treatment monitoring.Includes bibliographical references
    • 

    corecore