51 research outputs found

    A COMPARISON OF EXTENDED SOURCE-FILTER MODELS FOR MUSICAL SIGNAL RECONSTRUCTION

    Get PDF
    China Scholarship Council (CSC)/ Queen Mary Joint PhD scholarship; Royal Academy of Engineering Research Fellowshi

    Recognition of Harmonic Sounds in Polyphonic Audio using a Missing Feature Approach: Extended Report

    Get PDF
    A method based on local spectral features and missing feature techniques is proposed for the recognition of harmonic sounds in mixture signals. A mask estimation algorithm is proposed for identifying spectral regions that contain reliable information for each sound source and then bounded marginalization is employed to treat the feature vector elements that are determined as unreliable. The proposed method is tested on musical instrument sounds due to the extensive availability of data but it can be applied on other sounds (i.e. animal sounds, environmental sounds), whenever these are harmonic. In simulations the proposed method clearly outperformed a baseline method for mixture signals

    Real-time detection of overlapping sound events with non-negative matrix factorization

    Get PDF
    International audienceIn this paper, we investigate the problem of real-time detection of overlapping sound events by employing non-negative matrix factorization techniques. We consider a setup where audio streams arrive in real-time to the system and are decomposed onto a dictionary of event templates learned off-line prior to the decomposition. An important drawback of existing approaches in this context is the lack of controls on the decomposition. We propose and compare two provably convergent algorithms that address this issue, by controlling respectively the sparsity of the decomposition and the trade-off of the decomposition between the different frequency components. Sparsity regularization is considered in the framework of convex quadratic programming, while frequency compromise is introduced by employing the beta-divergence as a cost function. The two algorithms are evaluated on the multi-source detection tasks of polyphonic music transcription, drum transcription and environmental sound recognition. The obtained results show how the proposed approaches can improve detection in such applications, while maintaining low computational costs that are suitable for real-time

    수면 호흡음을 이용한 폐쇄성 수면 무호흡 중증도 분류

    Get PDF
    학위논문 (박사)-- 서울대학교 융합과학기술대학원 융합과학부, 2017. 8. 이교구.Obstructive sleep apnea (OSA) is a common sleep disorder. The symptom has a high prevalence and increases mortality as a risk factor for hypertension and stroke. Sleep disorders occur during sleep, making it difficult for patients to self-perceive themselves, and the actual diagnosis rate is low. Despite the existence of a standard sleep study called a polysomnography (PSG), it is difficult to diagnose the sleep disorders due to complicated test procedures and high medical cost burdens. Therefore, there is an increasing demand for an effective and rational screening test that can determine whether or not to undergo a PSG. In this thesis, we conducted three studies to classify the snoring sounds and OSA severity using only breathing sounds during sleep without additional biosensors. We first identified the classification possibility of snoring sounds related to sleep disorders using the features based on the cyclostationary analysis. Then, we classified the patients OSA severity with the features extracted using temporal and cyclostationary analysis from long-term sleep breathing sounds. Finally, the partial sleep sound extraction, and feature learning process using a convolutional neural network (CNN, or ConvNet) were applied to improve the efficiency and performance of previous snoring sound and OSA severity classification tasks. The sleep breathing sound analysis method using a CNN showed superior classification accuracy of more than 80% (average area under curve > 0.8) in multiclass snoring sounds and OSA severity classification tasks. The proposed analysis and classification method is expected to be used as a screening tool for improving the efficiency of PSG in the future customized healthcare service.Chapter 1. Introduction ................................ .......................1 1.1 Personal healthcare in sleep ................................ ..............1 1.2 Existing approaches and limitations ....................................... 9 1.3 Clinical information related to SRBD ................................ .. ..12 1.4 Study objectives ................................ .........................16 Chapter 2. Overview of Sleep Research using Sleep Breathing Sounds ........... 23 2.1 Previous goals of studies ................................ ................23 2.2 Recording environments and related configurations ........................ 24 2.3 Sleep breathing sound analysis ................................ ...........27 2.4 Sleep breathing sound classification ..................................... 35 2.5 Current limitations ................................ ......................36 Chapter 3. Multiple SRDB-related Snoring Sound Classification .................39 3.1 Introduction ................................ .............................39 3.2 System architecture ................................ ......................41 3.3 Evaluation ................................ ...............................52 3.4 Results ................................ ..................................55 3.5 Discussion ................................ ...............................59 3.6 Summary ................................ ..................................63 Chapter 4. Patients OSA Severity Classification .............................65 4.1 Introduction ................................ .............................65 4.2 Existing Approaches ................................ ......................69 4.3 System Architecture ................................ ......................70 4.4 Evaluation ................................ ...............................85 4.5 Results ................................ ..................................87 4.6 Discussion ................................ ...............................94 4.7 Summary ................................ ..................................97 Chapter 5. Patient OSA Severity Prediction using Deep Learning Techniques .....99 5.1 Introduction ................................ .............................99 5.2 Methods ................................ ..................................101 5.3 Results ................................ ..................................109 5.4 Discussion ................................ ...............................115 5.5 Summary ................................ ..................................118 Chapter 6. Conclusions and Future Work ........................................120 6.1 Conclusions ................................ ..............................120 6.2 Future work ................................ ..............................127Docto

    Common Fate Model for Unison source Separation

    Get PDF
    International audienceIn this paper we present a novel source separation method aiming to overcome the difficulty of modelling non-stationary signals. The method can be applied to mixtures of musical instruments with frequency and/or amplitude modulation, e.g. typically caused by vi-brato. It is based on a signal representation that divides the complex spectrogram into a grid of patches of arbitrary size. These complex patches are then processed by a two-dimensional discrete Fourier transform, forming a tensor representation which reveals spectral and temporal modulation textures. Our representation can be seen as an alternative to modulation transforms computed on magnitude spectrograms. An adapted factorization model allows to decompose different time-varying harmonic sources based on their particular common modulation profile: hence the name Common Fate Model. The method is evaluated on musical instrument mixtures playing the same fundamental frequency (unison), showing improvement over other state-of-the-art methods

    RECOGNITION OF HARMONIC SOUNDS IN POLYPHONIC AUDIO USING A MISSING FEATURE APPROACH

    Get PDF
    ABSTRACT A method based on local spectral features and missing feature techniques is proposed for the recognition of harmonic sounds in mixture signals. A mask estimation algorithm is proposed for identifying spectral regions that contain reliable information for each sound source and then bounded marginalization is employed to treat the feature vector elements that are determined as unreliable. The proposed method is tested on musical instrument sounds due to the extensive availability of data but it can be applied on other sounds (i.e. animal sounds, environmental sounds), whenever these are harmonic. In simulations the proposed method clearly outperformed a baseline method for mixture signals

    Single-Microphone Speech Enhancement Inspired by Auditory System

    Get PDF
    Enhancing quality of speech in noisy environments has been an active area of research due to the abundance of applications dealing with human voice and dependence of their performance on this quality. While original approaches in the field were mostly addressing this problem in a pure statistical framework in which the goal was to estimate speech from its sum with other independent processes (noise), during last decade, the attention of the scientific community has turned to the functionality of human auditory system. A lot of effort has been put to bridge the gap between the performance of speech processing algorithms and that of average human by borrowing the models suggested for the sound processing in the auditory system. In this thesis, we will introduce algorithms for speech enhancement inspired by two of these models i.e. the cortical representation of sounds and the hypothesized role of temporal coherence in the auditory scene analysis. After an introduction to the auditory system and the speech enhancement framework we will first show how traditional speech enhancement technics such as wiener-filtering can benefit on the feature extraction level from discriminatory capabilities of spectro-temporal representation of sounds in the cortex i.e. the cortical model. We will next focus on the feature processing as opposed to the extraction stage in the speech enhancement systems by taking advantage of models hypothesized for human attention for sound segregation. We demonstrate a mask-based enhancement method in which the temporal coherence of features is used as a criterion to elicit information about their sources and more specifically to form the masks needed to suppress the noise. Lastly, we explore how the two blocks for feature extraction and manipulation can be merged into one in a manner consistent with our knowledge about auditory system. We will do this through the use of regularized non-negative matrix factorization to optimize the feature extraction and simultaneously account for temporal dynamics to separate noise from speech
    corecore