13 research outputs found
Acoustic Analysis of Infant Cry Signals
Crying is the first means of communication for an infant through which it expresses its physiological and psychological needs. Infant cry analysis is the investigation of infant cry vocalizations in order to extract social and communicative information about infant behavior, and diagnostic information about infant health. This thesis is part of a larger study whose objective is to analyze the acoustic properties of infant cry signals and use it for early assessment of neurological developmental issues in infants.
This thesis deals with two research problems in the context of infant cry signals: audio segmentation of cry recordings in order to extract relevant acoustic parts, and fundamental frequency (F0) estimation of the extracted acoustic regions. The extracted acoustic regions are relevant for extracting parameters useful for drawing correlation with developmental outcomes of the infants. Fundamental frequency (F0) is one such potentially useful parameter whose variation has been found to correlate with cases of neurological insults in infants. The cry recordings are captured in realistic hospital environments under varied contexts like infant crying out of hunger, pain etc. A hidden Markov model (HMM) based audio segmentation system is proposed. The performance of the system is evaluated for different configurations of HMM states, number of component Gaussians, and using different combinations of audio features. Frame based accuracy of 88.5 % is achieved. YIN algorithm, a popular F0 estimation algorithm, is utilized to deal with the fundamental frequency estimation problem, and a method to discard unreliable F0 estimates is suggested. The statistics associated with distribution of F0 estimates corresponding to different components of cry signals are reported.
This work would be followed up to find meaningful correlations between extracted F0 estimates and developmental outcomes of the infants. Moreover, other acoustic parameters would also be investigated for the same purpose
Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair
Time-frequency masking or spectrum prediction computed via short symmetric
windows are commonly used in low-latency deep neural network (DNN) based source
separation. In this paper, we propose the usage of an asymmetric
analysis-synthesis window pair which allows for training with targets with
better frequency resolution, while retaining the low-latency during inference
suitable for real-time speech enhancement or assisted hearing applications. In
order to assess our approach across various model types and datasets, we
evaluate it with both speaker-independent deep clustering (DC) model and a
speaker-dependent mask inference (MI) model. We report an improvement in
separation performance of up to 1.5 dB in terms of source-to-distortion ratio
(SDR) while maintaining an algorithmic latency of 8 ms.Comment: Accepted to EUSIPCO-202
Dynamic Processing Neural Network Architecture For Hearing Loss Compensation
This paper proposes neural networks for compensating sensorineural hearing
loss. The aim of the hearing loss compensation task is to transform a speech
signal to increase speech intelligibility after further processing by a person
with a hearing impairment, which is modeled by a hearing loss model. We propose
an interpretable model called dynamic processing network, which has a structure
similar to band-wise dynamic compressor. The network is differentiable, and
therefore allows to learn its parameters to maximize speech intelligibility.
More generic models based on convolutional layers were tested as well. The
performance of the tested architectures was assessed using spectro-temporal
objective index (STOI) with hearing-threshold noise and hearing aid speech
intelligibility (HASPI) metrics. The dynamic processing network gave a
significant improvement of STOI and HASPI in comparison to popular compressive
gain prescription rule Camfit. A large enough convolutional network could
outperform the interpretable model with the cost of larger computational load.
Finally, a combination of the dynamic processing network with convolutional
neural network gave the best results in terms of STOI and HASPI
Stochastic landslide vulnerability modeling in space and time in a part of the northern Himalayas, India
Acoustic Analysis of Infant Cry Signals
Crying is the first means of communication for an infant through which it expresses its physiological and psychological needs. Infant cry analysis is the investigation of infant cry vocalizations in order to extract social and communicative information about infant behavior, and diagnostic information about infant health. This thesis is part of a larger study whose objective is to analyze the acoustic properties of infant cry signals and use it for early assessment of neurological developmental issues in infants.
This thesis deals with two research problems in the context of infant cry signals: audio segmentation of cry recordings in order to extract relevant acoustic parts, and fundamental frequency (F0) estimation of the extracted acoustic regions. The extracted acoustic regions are relevant for extracting parameters useful for drawing correlation with developmental outcomes of the infants. Fundamental frequency (F0) is one such potentially useful parameter whose variation has been found to correlate with cases of neurological insults in infants. The cry recordings are captured in realistic hospital environments under varied contexts like infant crying out of hunger, pain etc. A hidden Markov model (HMM) based audio segmentation system is proposed. The performance of the system is evaluated for different configurations of HMM states, number of component Gaussians, and using different combinations of audio features. Frame based accuracy of 88.5 % is achieved. YIN algorithm, a popular F0 estimation algorithm, is utilized to deal with the fundamental frequency estimation problem, and a method to discard unreliable F0 estimates is suggested. The statistics associated with distribution of F0 estimates corresponding to different components of cry signals are reported.
This work would be followed up to find meaningful correlations between extracted F0 estimates and developmental outcomes of the infants. Moreover, other acoustic parameters would also be investigated for the same purpose
A Competing Voices Test for Hearing-Impaired Listeners Applied to Spatial Separation and Ideal Time-Frequency Masks
People with hearing impairment find competing voices scenarios to be challenging, both with respect to switching attention from one talker to the other, as well as maintaining attention. With the Danish competing voices test (CVT) presented here, the dual-attention skills can be assessed. The CVT provides sentences spoken by three male and three female talkers, played in sentence pairs. The task of the listener is to repeat the target sentence from the sentence pair based on cueing either before or after playback. One potential way of assisting segregation of two talkers is to take advantage of spatial unmasking by presenting one talker per ear after application of time-frequency masks for separating the mixture. Using the CVT, this study evaluated four spatial conditions in 14 moderate-to-severely hearing-impaired listeners to establish benchmark results for this type of algorithm applied to hearing-impaired listeners. The four spatial conditions were as follows: summed (diotic), separate, the ideal ratio mask, and the ideal binary mask. The results show that the test is sensitive to the change in spatial condition. The temporal position of the cue has a large impact, as cueing the target talker before playback focuses the attention toward the target, whereas cueing after playback requires equal attention to the two talkers, which is more difficult. Furthermore, both applied ideal masks show test scores very close to the ideal separate spatial condition, suggesting that this technique is useful for future separation algorithms using estimated rather than ideal masks