495 research outputs found

    Visual to Sound: Generating Natural Sound for Videos in the Wild

    Full text link
    As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments. As a first step in this direction, we apply learning-based methods to generate raw waveform samples given input video frames. We evaluate our models on a dataset of videos containing a variety of sounds (such as ambient sounds and sounds from people/animals). Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs.Comment: Project page: http://bvision11.cs.unc.edu/bigpen/yipin/visual2sound_webpage/visual2sound.htm

    Usefulness of Artificial Neural Networks in the Diagnosis and Treatment of Sleep Apnea-Hypopnea Syndrome

    Get PDF
    Sleep apnea-hypopnea syndrome (SAHS) is a chronic and highly prevalent disease considered a major health problem in industrialized countries. The gold standard diagnostic methodology is in-laboratory nocturnal polysomnography (PSG), which is complex, costly, and time consuming. In order to overcome these limitations, novel and simplified diagnostic alternatives are demanded. Sleep scientists carried out an exhaustive research during the last decades focused on the design of automated expert systems derived from artificial intelligence able to help sleep specialists in their daily practice. Among automated pattern recognition techniques, artificial neural networks (ANNs) have demonstrated to be efficient and accurate algorithms in order to implement computer-aided diagnosis systems aimed at assisting physicians in the management of SAHS. In this regard, several applications of ANNs have been developed, such as classification of patients suspected of suffering from SAHS, apnea-hypopnea index (AHI) prediction, detection and quantification of respiratory events, apneic events classification, automated sleep staging and arousal detection, alertness monitoring systems, and airflow pressure optimization in positive airway pressure (PAP) devices to fit patients’ needs. In the present research, current applications of ANNs in the framework of SAHS management are thoroughly reviewed

    A review of automated sleep disorder detection

    Get PDF
    Automated sleep disorder detection is challenging because physiological symptoms can vary widely. These variations make it difficult to create effective sleep disorder detection models which support hu-man experts during diagnosis and treatment monitoring. From 2010 to 2021, authors of 95 scientific papers have taken up the challenge of automating sleep disorder detection. This paper provides an expert review of this work. We investigated whether digital technology and Artificial Intelligence (AI) can provide automated diagnosis support for sleep disorders. We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines during the content discovery phase. We compared the performance of proposed sleep disorder detection methods, involving differ-ent datasets or signals. During the review, we found eight sleep disorders, of which sleep apnea and insomnia were the most studied. These disorders can be diagnosed using several kinds of biomedical signals, such as Electrocardiogram (ECG), Polysomnography (PSG), Electroencephalogram (EEG), Electromyogram (EMG), and snore sound. Subsequently, we established areas of commonality and distinctiveness. Common to all reviewed papers was that AI models were trained and tested with labelled physiological signals. Looking deeper, we discovered that 24 distinct algorithms were used for the detection task. The nature of these algorithms evolved, before 2017 only traditional Machine Learning (ML) was used. From 2018 onward, both ML and Deep Learning (DL) methods were used for sleep disorder detection. The strong emergence of DL algorithms has considerable implications for future detection systems because these algorithms demand significantly more data for training and testing when compared with ML. Based on our review results, we suggest that both type and amount of labelled data is crucial for the design of future sleep disorder detection systems because this will steer the choice of AI algorithm which establishes the desired decision support. As a guiding principle, more labelled data will help to represent the variations in symptoms. DL algorithms can extract information from these larger data quantities more effectively, therefore; we predict that the role of these algorithms will continue to expand

    Convolutional Neural Networks for Apnea Detection from Smartphone Audio Signals: Effect of Window Size

    Get PDF
    Although sleep apnea is one of the most prevalent sleep disorders, most patients remain undiagnosed and untreated. The gold standard for sleep apnea diagnosis, polysomnography, has important limitations such as its high cost and complexity. This leads to a growing need for novel cost-effective systems. Mobile health tools and deep learning algorithms are nowadays being proposed as innovative solutions for automatic apnea detection. In this work, a convolutional neural network (CNN) is trained for the identification of apnea events from the spectrograms of audio signals recorded with a smartphone. A systematic comparison of the effect of different window sizes on the model performance is provided. According to the results, the best models are obtained with 60 s windows (sensitivity-0.72, specilicity-0.89, AUROC = 0.88), For smaller windows, the model performance can be negatively impacted, because the windows become shorter than most apnea events, by which sound reductions can no longer be appreciated. On the other hand, longer windows tend to include multiple or mixed events, that will confound the model. This careful trade-off demonstrates the importance of selecting a proper window size to obtain models with adequate predictive power. This paper shows that CNNs applied to smartphone audio signals can facilitate sleep apnea detection in a realistic setting and is a first step towards an automated method to assist sleep technicians. Clinical Relevance- The results show the effect of the window size on the predictive power of CNNs for apnea detection. Furthermore, the potential of smartphones, audio signals, and deep neural networks for automatic sleep apnea screening is demonstrated

    Classification techniques on computerized systems to predict and/or to detect Apnea: A systematic review

    Get PDF
    Sleep apnea syndrome (SAS), which can significantly decrease the quality of life is associated with a major risk factor of health implications such as increased cardiovascular disease, sudden death, depression, irritability, hypertension, and learning difficulties. Thus, it is relevant and timely to present a systematic review describing significant applications in the framework of computational intelligence-based SAS, including its performance, beneficial and challenging effects, and modeling for the decision-making on multiple scenarios.info:eu-repo/semantics/publishedVersio

    Bag-of-words representations for computer audition

    Get PDF
    Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im täglichen Leben allgegenwärtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche Ansätze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jüngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin überlegen. Dies gilt im Besonderen für Einsatzzwecke, für die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester Länge benötigen. Außerdem kann es auch für dynamische (zeitabhängige) Klassifikatoren vorteilhaft sein, die Deskriptoren über ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die Leistungsfähigkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unüberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht für beliebige Modalitäten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgeführt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende Datensätze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse Vorzüge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die Datenunabhängigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementär sind und eine Fusionierung die Leistungsfähigkeit eines Systems des maschinellen Hörens verbessert

    Towards using Cough for Respiratory Disease Diagnosis by leveraging Artificial Intelligence: A Survey

    Full text link
    Cough acoustics contain multitudes of vital information about pathomorphological alterations in the respiratory system. Reliable and accurate detection of cough events by investigating the underlying cough latent features and disease diagnosis can play an indispensable role in revitalizing the healthcare practices. The recent application of Artificial Intelligence (AI) and advances of ubiquitous computing for respiratory disease prediction has created an auspicious trend and myriad of future possibilities in the medical domain. In particular, there is an expeditiously emerging trend of Machine learning (ML) and Deep Learning (DL)-based diagnostic algorithms exploiting cough signatures. The enormous body of literature on cough-based AI algorithms demonstrate that these models can play a significant role for detecting the onset of a specific respiratory disease. However, it is pertinent to collect the information from all relevant studies in an exhaustive manner for the medical experts and AI scientists to analyze the decisive role of AI/ML. This survey offers a comprehensive overview of the cough data-driven ML/DL detection and preliminary diagnosis frameworks, along with a detailed list of significant features. We investigate the mechanism that causes cough and the latent cough features of the respiratory modalities. We also analyze the customized cough monitoring application, and their AI-powered recognition algorithms. Challenges and prospective future research directions to develop practical, robust, and ubiquitous solutions are also discussed in detail.Comment: 30 pages, 12 figures, 9 table
    corecore