13 research outputs found

    Acoustic Analysis of Infant Cry Signals

    Get PDF
    Crying is the first means of communication for an infant through which it expresses its physiological and psychological needs. Infant cry analysis is the investigation of infant cry vocalizations in order to extract social and communicative information about infant behavior, and diagnostic information about infant health. This thesis is part of a larger study whose objective is to analyze the acoustic properties of infant cry signals and use it for early assessment of neurological developmental issues in infants. This thesis deals with two research problems in the context of infant cry signals: audio segmentation of cry recordings in order to extract relevant acoustic parts, and fundamental frequency (F0) estimation of the extracted acoustic regions. The extracted acoustic regions are relevant for extracting parameters useful for drawing correlation with developmental outcomes of the infants. Fundamental frequency (F0) is one such potentially useful parameter whose variation has been found to correlate with cases of neurological insults in infants. The cry recordings are captured in realistic hospital environments under varied contexts like infant crying out of hunger, pain etc. A hidden Markov model (HMM) based audio segmentation system is proposed. The performance of the system is evaluated for different configurations of HMM states, number of component Gaussians, and using different combinations of audio features. Frame based accuracy of 88.5 % is achieved. YIN algorithm, a popular F0 estimation algorithm, is utilized to deal with the fundamental frequency estimation problem, and a method to discard unreliable F0 estimates is suggested. The statistics associated with distribution of F0 estimates corresponding to different components of cry signals are reported. This work would be followed up to find meaningful correlations between extracted F0 estimates and developmental outcomes of the infants. Moreover, other acoustic parameters would also be investigated for the same purpose

    Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair

    Get PDF
    Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the usage of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-latency during inference suitable for real-time speech enhancement or assisted hearing applications. In order to assess our approach across various model types and datasets, we evaluate it with both speaker-independent deep clustering (DC) model and a speaker-dependent mask inference (MI) model. We report an improvement in separation performance of up to 1.5 dB in terms of source-to-distortion ratio (SDR) while maintaining an algorithmic latency of 8 ms.Comment: Accepted to EUSIPCO-202

    Dynamic Processing Neural Network Architecture For Hearing Loss Compensation

    Full text link
    This paper proposes neural networks for compensating sensorineural hearing loss. The aim of the hearing loss compensation task is to transform a speech signal to increase speech intelligibility after further processing by a person with a hearing impairment, which is modeled by a hearing loss model. We propose an interpretable model called dynamic processing network, which has a structure similar to band-wise dynamic compressor. The network is differentiable, and therefore allows to learn its parameters to maximize speech intelligibility. More generic models based on convolutional layers were tested as well. The performance of the tested architectures was assessed using spectro-temporal objective index (STOI) with hearing-threshold noise and hearing aid speech intelligibility (HASPI) metrics. The dynamic processing network gave a significant improvement of STOI and HASPI in comparison to popular compressive gain prescription rule Camfit. A large enough convolutional network could outperform the interpretable model with the cost of larger computational load. Finally, a combination of the dynamic processing network with convolutional neural network gave the best results in terms of STOI and HASPI

    Acoustic Analysis of Infant Cry Signals

    Get PDF
    Crying is the first means of communication for an infant through which it expresses its physiological and psychological needs. Infant cry analysis is the investigation of infant cry vocalizations in order to extract social and communicative information about infant behavior, and diagnostic information about infant health. This thesis is part of a larger study whose objective is to analyze the acoustic properties of infant cry signals and use it for early assessment of neurological developmental issues in infants. This thesis deals with two research problems in the context of infant cry signals: audio segmentation of cry recordings in order to extract relevant acoustic parts, and fundamental frequency (F0) estimation of the extracted acoustic regions. The extracted acoustic regions are relevant for extracting parameters useful for drawing correlation with developmental outcomes of the infants. Fundamental frequency (F0) is one such potentially useful parameter whose variation has been found to correlate with cases of neurological insults in infants. The cry recordings are captured in realistic hospital environments under varied contexts like infant crying out of hunger, pain etc. A hidden Markov model (HMM) based audio segmentation system is proposed. The performance of the system is evaluated for different configurations of HMM states, number of component Gaussians, and using different combinations of audio features. Frame based accuracy of 88.5 % is achieved. YIN algorithm, a popular F0 estimation algorithm, is utilized to deal with the fundamental frequency estimation problem, and a method to discard unreliable F0 estimates is suggested. The statistics associated with distribution of F0 estimates corresponding to different components of cry signals are reported. This work would be followed up to find meaningful correlations between extracted F0 estimates and developmental outcomes of the infants. Moreover, other acoustic parameters would also be investigated for the same purpose

    A Competing Voices Test for Hearing-Impaired Listeners Applied to Spatial Separation and Ideal Time-Frequency Masks

    Get PDF
    People with hearing impairment find competing voices scenarios to be challenging, both with respect to switching attention from one talker to the other, as well as maintaining attention. With the Danish competing voices test (CVT) presented here, the dual-attention skills can be assessed. The CVT provides sentences spoken by three male and three female talkers, played in sentence pairs. The task of the listener is to repeat the target sentence from the sentence pair based on cueing either before or after playback. One potential way of assisting segregation of two talkers is to take advantage of spatial unmasking by presenting one talker per ear after application of time-frequency masks for separating the mixture. Using the CVT, this study evaluated four spatial conditions in 14 moderate-to-severely hearing-impaired listeners to establish benchmark results for this type of algorithm applied to hearing-impaired listeners. The four spatial conditions were as follows: summed (diotic), separate, the ideal ratio mask, and the ideal binary mask. The results show that the test is sensitive to the change in spatial condition. The temporal position of the cue has a large impact, as cueing the target talker before playback focuses the attention toward the target, whereas cueing after playback requires equal attention to the two talkers, which is more difficult. Furthermore, both applied ideal masks show test scores very close to the ideal separate spatial condition, suggesting that this technique is useful for future separation algorithms using estimated rather than ideal masks
    corecore