100 research outputs found
Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations
Self-supervised speech representations (SSSRs) have been successfully applied
to a number of speech-processing tasks, e.g. as feature extractor for speech
quality (SQ) prediction, which is, in turn, relevant for assessment and
training speech enhancement systems for users with normal or impaired hearing.
However, exact knowledge of why and how quality-related information is encoded
well in such representations remains poorly understood. In this work,
techniques for non-intrusive prediction of SQ ratings are extended to the
prediction of intelligibility for hearing-impaired users. It is found that
self-supervised representations are useful as input features to non-intrusive
prediction models, achieving competitive performance to more complex systems. A
detailed analysis of the performance depending on Clarity Prediction Challenge
1 listeners and enhancement systems indicates that more data might be needed to
allow generalisation to unknown systems and (hearing-impaired) individualsComment: Accepted @ ASRU 2023 SPARKS worksho
Utterance Weighted Multi-Dilation Temporal Convolutional Networks for Monaural Speech Dereverberation
Speech dereverberation is an important stage in many speech technology
applications. Recent work in this area has been dominated by deep neural
network models. Temporal convolutional networks (TCNs) are deep learning models
that have been proposed for sequence modelling in the task of dereverberating
speech. In this work a weighted multi-dilation depthwise-separable convolution
is proposed to replace standard depthwise-separable convolutions in TCN models.
This proposed convolution enables the TCN to dynamically focus on more or less
local information in its receptive field at each convolutional block in the
network. It is shown that this weighted multi-dilation temporal convolutional
network (WD-TCN) consistently outperforms the TCN across various model
configurations and using the WD-TCN model is a more parameter efficient method
to improve the performance of the model than increasing the number of
convolutional blocks. The best performance improvement over the baseline TCN is
0.55 dB scale-invariant signal-to-distortion ratio (SISDR) and the best
performing WD-TCN model attains 12.26 dB SISDR on the WHAMR dataset.Comment: Accepted at IWAENC 202
Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation
Speech separation models are used for isolating individual speakers in many
speech processing applications. Deep learning models have been shown to lead to
state-of-the-art (SOTA) results on a number of speech separation benchmarks.
One such class of models known as temporal convolutional networks (TCNs) has
shown promising results for speech separation tasks. A limitation of these
models is that they have a fixed receptive field (RF). Recent research in
speech dereverberation has shown that the optimal RF of a TCN varies with the
reverberation characteristics of the speech signal. In this work deformable
convolution is proposed as a solution to allow TCN models to have dynamic RFs
that can adapt to various reverberation times for reverberant speech
separation. The proposed models are capable of achieving an 11.1 dB average
scale-invariant signalto-distortion ratio (SISDR) improvement over the input
signal on the WHAMR benchmark. A relatively small deformable TCN model of 1.3M
parameters is proposed which gives comparable separation performance to larger
and more computationally complex models.Comment: Accepted for ICASSP 202
MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data
Training of speech enhancement systems often does not incorporate knowledge
of human perception and thus can lead to unnatural sounding results.
Incorporating psychoacoustically motivated speech perception metrics as part of
model training via a predictor network has recently gained interest. However,
the performance of such predictors is limited by the distribution of metric
scores that appear in the training data. In this work, we propose MetricGAN+/-
(an extension of MetricGAN+, one such metric-motivated system) which introduces
an additional network - a "de-generator" which attempts to improve the
robustness of the prediction network (and by extension of the generator) by
ensuring observation of a wider range of metric scores in training.
Experimental results on the VoiceBank-DEMAND dataset show relative improvement
in PESQ score of 3.8% (3.05 vs 3.22 PESQ score), as well as better
generalisation to unseen noise and speech.Comment: 5 pages, 4 figures, Submitted to EUSIPCO 202
On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments
Speech separation remains an important topic for multi-speaker technology
researchers. Convolution augmented transformers (conformers) have performed
well for many speech processing tasks but have been under-researched for speech
separation. Most recent state-of-the-art (SOTA) separation models have been
time-domain audio separation networks (TasNets). A number of successful models
have made use of dual-path (DP) networks which sequentially process local and
global information. Time domain conformers (TD-Conformers) are an analogue of
the DP approach in that they also process local and global context sequentially
but have a different time complexity function. It is shown that for realistic
shorter signal lengths, conformers are more efficient when controlling for
feature dimension. Subsampling layers are proposed to further improve
computational efficiency. The best TD-Conformer achieves 14.6 dB and 21.2 dB
SISDR improvement on the WHAMR and WSJ0-2Mix benchmarks, respectively.Comment: Accepted at ASRU Workshop 202
On sound source localization of speech signals using deep neural networks
In recent years artificial neural networks are successfully applied especially in the context of automatic speech recognition. As information processing systems, neural networks are trained by, e.g., backpropagation or restricted Boltzmann machines to classify patterns at the input of the system. The current work presents the implementation of a deep neural network (DNN) architecture for acoustic source localization.EC/FP7/318381/EU/Experimenting Acoustics in Real environments using Innovative Test-beds/EAR-ITEC/FP7/284628/EU/Sounds for Energy Control of Buildings/S4ECoBEC/FP7/609180/EU/Energy efficient & Cost competitive retrofitting solutions for Shopping buildings/ECOSHOPPIN
Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models
Neural networks have been successfully used for non-intrusive speech
intelligibility prediction. Recently, the use of feature representations
sourced from intermediate layers of pre-trained self-supervised and
weakly-supervised models has been found to be particularly useful for this
task. This work combines the use of Whisper ASR decoder layer representations
as neural network input features with an exemplar-based, psychologically
motivated model of human memory to predict human intelligibility ratings for
hearing-aid users. Substantial performance improvement over an established
intrusive HASPI baseline system is found, including on enhancement systems and
listeners unseen in the training data, with a root mean squared error of 25.3
compared with the baseline of 28.7.Comment: Accepted paper. IEEE International Conference on Acoustics Speech and
Signal Processing (ICASSP), Seoul, Korea, April 202
The brain-specific double-stranded RNA-binding protein Staufen2 is required for dendritic spine morphogenesis
Mammalian Staufen2 (Stau2) is a member of the double-stranded RNA-binding protein family. Its expression is largely restricted to the brain. It is thought to play a role in the delivery of RNA to dendrites of polarized neurons. To investigate the function of Stau2 in mature neurons, we interfered with Stau2 expression by RNA interference (RNAi). Mature neurons lacking Stau2 displayed a significant reduction in the number of dendritic spines and an increase in filopodia-like structures. The number of PSD95-positive synapses and miniature excitatory postsynaptic currents were markedly reduced in Stau2 down-regulated neurons. Akin effects were caused by overexpression of dominant-negative Stau2. The observed phenotype could be rescued by overexpression of two RNAi cleavage-resistant Stau2 isoforms. In situ hybridization revealed reduced expression levels of β-actin mRNA and fewer dendritic β-actin mRNPs in Stau2 down-regulated neurons. Thus, our data suggest an important role for Stau2 in the formation and maintenance of dendritic spines of hippocampal neurons
Proteogenetic drug response profiling elucidates targetable vulnerabilities of myelofibrosis
Myelofibrosis is a hematopoietic stem cell disorder belonging to the myeloproliferative neoplasms. Myelofibrosis patients frequently carry driver mutations in either JAK2 or Calreticulin (CALR) and have limited therapeutic options. Here, we integrate ex vivo drug response and proteotype analyses across myelofibrosis patient cohorts to discover targetable vulnerabilities and associated therapeutic strategies. Drug sensitivities of mutated and progenitor cells were measured in patient blood using high-content imaging and single-cell deep learning-based analyses. Integration with matched molecular profiling revealed three targetable vulnerabilities. First, CALR mutations drive BET and HDAC inhibitor sensitivity, particularly in the absence of high Ras pathway protein levels. Second, an MCM complex-high proliferative signature corresponds to advanced disease and sensitivity to drugs targeting pro-survival signaling and DNA replication. Third, homozygous CALR mutations result in high endoplasmic reticulum (ER) stress, responding to ER stressors and unfolded protein response inhibition. Overall, our integrated analyses provide a molecularly motivated roadmap for individualized myelofibrosis patient treatment
- …