100 research outputs found

    Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations

    Full text link
    Self-supervised speech representations (SSSRs) have been successfully applied to a number of speech-processing tasks, e.g. as feature extractor for speech quality (SQ) prediction, which is, in turn, relevant for assessment and training speech enhancement systems for users with normal or impaired hearing. However, exact knowledge of why and how quality-related information is encoded well in such representations remains poorly understood. In this work, techniques for non-intrusive prediction of SQ ratings are extended to the prediction of intelligibility for hearing-impaired users. It is found that self-supervised representations are useful as input features to non-intrusive prediction models, achieving competitive performance to more complex systems. A detailed analysis of the performance depending on Clarity Prediction Challenge 1 listeners and enhancement systems indicates that more data might be needed to allow generalisation to unknown systems and (hearing-impaired) individualsComment: Accepted @ ASRU 2023 SPARKS worksho

    Utterance Weighted Multi-Dilation Temporal Convolutional Networks for Monaural Speech Dereverberation

    Full text link
    Speech dereverberation is an important stage in many speech technology applications. Recent work in this area has been dominated by deep neural network models. Temporal convolutional networks (TCNs) are deep learning models that have been proposed for sequence modelling in the task of dereverberating speech. In this work a weighted multi-dilation depthwise-separable convolution is proposed to replace standard depthwise-separable convolutions in TCN models. This proposed convolution enables the TCN to dynamically focus on more or less local information in its receptive field at each convolutional block in the network. It is shown that this weighted multi-dilation temporal convolutional network (WD-TCN) consistently outperforms the TCN across various model configurations and using the WD-TCN model is a more parameter efficient method to improve the performance of the model than increasing the number of convolutional blocks. The best performance improvement over the baseline TCN is 0.55 dB scale-invariant signal-to-distortion ratio (SISDR) and the best performing WD-TCN model attains 12.26 dB SISDR on the WHAMR dataset.Comment: Accepted at IWAENC 202

    Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

    Full text link
    Speech separation models are used for isolating individual speakers in many speech processing applications. Deep learning models have been shown to lead to state-of-the-art (SOTA) results on a number of speech separation benchmarks. One such class of models known as temporal convolutional networks (TCNs) has shown promising results for speech separation tasks. A limitation of these models is that they have a fixed receptive field (RF). Recent research in speech dereverberation has shown that the optimal RF of a TCN varies with the reverberation characteristics of the speech signal. In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation. The proposed models are capable of achieving an 11.1 dB average scale-invariant signalto-distortion ratio (SISDR) improvement over the input signal on the WHAMR benchmark. A relatively small deformable TCN model of 1.3M parameters is proposed which gives comparable separation performance to larger and more computationally complex models.Comment: Accepted for ICASSP 202

    MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data

    Full text link
    Training of speech enhancement systems often does not incorporate knowledge of human perception and thus can lead to unnatural sounding results. Incorporating psychoacoustically motivated speech perception metrics as part of model training via a predictor network has recently gained interest. However, the performance of such predictors is limited by the distribution of metric scores that appear in the training data. In this work, we propose MetricGAN+/- (an extension of MetricGAN+, one such metric-motivated system) which introduces an additional network - a "de-generator" which attempts to improve the robustness of the prediction network (and by extension of the generator) by ensuring observation of a wider range of metric scores in training. Experimental results on the VoiceBank-DEMAND dataset show relative improvement in PESQ score of 3.8% (3.05 vs 3.22 PESQ score), as well as better generalisation to unseen noise and speech.Comment: 5 pages, 4 figures, Submitted to EUSIPCO 202

    On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

    Full text link
    Speech separation remains an important topic for multi-speaker technology researchers. Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation. Most recent state-of-the-art (SOTA) separation models have been time-domain audio separation networks (TasNets). A number of successful models have made use of dual-path (DP) networks which sequentially process local and global information. Time domain conformers (TD-Conformers) are an analogue of the DP approach in that they also process local and global context sequentially but have a different time complexity function. It is shown that for realistic shorter signal lengths, conformers are more efficient when controlling for feature dimension. Subsampling layers are proposed to further improve computational efficiency. The best TD-Conformer achieves 14.6 dB and 21.2 dB SISDR improvement on the WHAMR and WSJ0-2Mix benchmarks, respectively.Comment: Accepted at ASRU Workshop 202

    On sound source localization of speech signals using deep neural networks

    Get PDF
    In recent years artificial neural networks are successfully applied especially in the context of automatic speech recognition. As information processing systems, neural networks are trained by, e.g., backpropagation or restricted Boltzmann machines to classify patterns at the input of the system. The current work presents the implementation of a deep neural network (DNN) architecture for acoustic source localization.EC/FP7/318381/EU/Experimenting Acoustics in Real environments using Innovative Test-beds/EAR-ITEC/FP7/284628/EU/Sounds for Energy Control of Buildings/S4ECoBEC/FP7/609180/EU/Energy efficient & Cost competitive retrofitting solutions for Shopping buildings/ECOSHOPPIN

    Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models

    Full text link
    Neural networks have been successfully used for non-intrusive speech intelligibility prediction. Recently, the use of feature representations sourced from intermediate layers of pre-trained self-supervised and weakly-supervised models has been found to be particularly useful for this task. This work combines the use of Whisper ASR decoder layer representations as neural network input features with an exemplar-based, psychologically motivated model of human memory to predict human intelligibility ratings for hearing-aid users. Substantial performance improvement over an established intrusive HASPI baseline system is found, including on enhancement systems and listeners unseen in the training data, with a root mean squared error of 25.3 compared with the baseline of 28.7.Comment: Accepted paper. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 202

    The brain-specific double-stranded RNA-binding protein Staufen2 is required for dendritic spine morphogenesis

    Get PDF
    Mammalian Staufen2 (Stau2) is a member of the double-stranded RNA-binding protein family. Its expression is largely restricted to the brain. It is thought to play a role in the delivery of RNA to dendrites of polarized neurons. To investigate the function of Stau2 in mature neurons, we interfered with Stau2 expression by RNA interference (RNAi). Mature neurons lacking Stau2 displayed a significant reduction in the number of dendritic spines and an increase in filopodia-like structures. The number of PSD95-positive synapses and miniature excitatory postsynaptic currents were markedly reduced in Stau2 down-regulated neurons. Akin effects were caused by overexpression of dominant-negative Stau2. The observed phenotype could be rescued by overexpression of two RNAi cleavage-resistant Stau2 isoforms. In situ hybridization revealed reduced expression levels of β-actin mRNA and fewer dendritic β-actin mRNPs in Stau2 down-regulated neurons. Thus, our data suggest an important role for Stau2 in the formation and maintenance of dendritic spines of hippocampal neurons

    Proteogenetic drug response profiling elucidates targetable vulnerabilities of myelofibrosis

    Full text link
    Myelofibrosis is a hematopoietic stem cell disorder belonging to the myeloproliferative neoplasms. Myelofibrosis patients frequently carry driver mutations in either JAK2 or Calreticulin (CALR) and have limited therapeutic options. Here, we integrate ex vivo drug response and proteotype analyses across myelofibrosis patient cohorts to discover targetable vulnerabilities and associated therapeutic strategies. Drug sensitivities of mutated and progenitor cells were measured in patient blood using high-content imaging and single-cell deep learning-based analyses. Integration with matched molecular profiling revealed three targetable vulnerabilities. First, CALR mutations drive BET and HDAC inhibitor sensitivity, particularly in the absence of high Ras pathway protein levels. Second, an MCM complex-high proliferative signature corresponds to advanced disease and sensitivity to drugs targeting pro-survival signaling and DNA replication. Third, homozygous CALR mutations result in high endoplasmic reticulum (ER) stress, responding to ER stressors and unfolded protein response inhibition. Overall, our integrated analyses provide a molecularly motivated roadmap for individualized myelofibrosis patient treatment
    corecore