49 research outputs found

    Single-Channel Speech Enhancement with Deep Complex U-Networks and Probabilistic Latent Space Models

    Full text link
    In this paper, we propose to extend the deep, complex U-Network architecture for speech enhancement by incorporating a probabilistic (i.e., variational) latent space model. The proposed model is evaluated against several ablated versions of itself in order to study the effects of the variational latent space model, complex-value processing, and self-attention. Evaluation on the MS-DNS 2020 and Voicebank+Demand datasets yields consistently high performance. E.g., the proposed model achieves an SI-SDR of up to 20.2 dB, about 0.5 to 1.4 dB higher than its ablated version without probabilistic latent space, 2-2.4 dB higher than WaveUNet, and 6.7 dB above PHASEN. Compared to real-valued magnitude spectrogram processing with a variational U-Net, the complex U-Net achieves an improvement of up to 4.5 dB SI-SDR. Complex spectrum encoding as magnitude and phase yields best performance in anechoic conditions whereas real and imaginary part representation results in better generalization to (novel) reverberation conditions, possibly due to the underlying physics of sound

    Complex Independent Component Analysis of Frequency-Domain Electroencephalographic Data

    Full text link
    Independent component analysis (ICA) has proven useful for modeling brain and electroencephalographic (EEG) data. Here, we present a new, generalized method to better capture the dynamics of brain signals than previous ICA algorithms. We regard EEG sources as eliciting spatio-temporal activity patterns, corresponding to, e.g., trajectories of activation propagating across cortex. This leads to a model of convolutive signal superposition, in contrast with the commonly used instantaneous mixing model. In the frequency-domain, convolutive mixing is equivalent to multiplicative mixing of complex signal sources within distinct spectral bands. We decompose the recorded spectral-domain signals into independent components by a complex infomax ICA algorithm. First results from a visual attention EEG experiment exhibit (1) sources of spatio-temporal dynamics in the data, (2) links to subject behavior, (3) sources with a limited spectral extent, and (4) a higher degree of independence compared to sources derived by standard ICA.Comment: 21 pages, 11 figures. Added final journal reference, fixed minor typo

    Neurophysiologic Markers of Abnormal Brain Activity in Schizophrenia

    Get PDF
    Cortical electrophysiologic event-related potentials are multidimensional measures of information processing that are well-suited for efficiently parsing automatic and controlled components of cognition that span the range of deficits evidenced in schizophrenia patients. These information processes are key cognitive measures that are recognized as informative and valid targets for understanding the neurobiology of schizophrenia. These measures may be used in concert with the Measurement and Treatment Research to Improve Cognition in Schizophrenia (MATRICS) neurocognitive measures in the development of novel treatments for schizophrenia and related neuropsychiatric disorders. The employment of novel event-related potential paradigms designed to carefully characterize the early spectrum of perceptual and cognitive information processing allows investigators to identify the neurophysiologic basis of cognitive dysfunction in schizophrenia and to examine the associated clinical and functional impairments

    Classifier architectures for acoustic scenes and events : implications for DNNs, TDNNs, and perceptual features from DCASE 2016

    No full text
    This paper evaluates neural network (NN) based systems and compares them to Gaussian mixture model (GMM) and hidden Markov model (HMM) approaches for acoustic scene classification (SC) and polyphonic acoustic event detection (AED) that are applied to data of the “Detection and Classification of Acoustic Scenes and Events 2016” (DCASE'16) challenge, task 1 and task 3, respectively. For both tasks, the use of deep neural networks (DNNs) and features based on an amplitude modulation filterbank and a Gabor filterbank (GFB) are evaluated and compared to standard approaches. For SC, additionally a time-delay NN approach is proposed that enables analysis of long contextual information similar to recurrent NNs but with training efforts comparable to conventional DNNs. The SC system proposed for task 1 of the DCASE'16 challenge attains a recognition accuracy of 77.5%, which is 5.6% higher compared to the DCASE'16 baseline system. For the AED task, DNNs are adopted in tandem and hybrid approaches, i.e., as part of HMM-based systems. These systems are evaluated for the polyphonic data of task 3 from the DCASE'16 challenge. Several strategies to address the issue of polyphony are considered. It is shown that DNN-based systems perform less accurate than the traditional systems for this task. Best results are achieved using GFB features in combination with a multiclass GMM-HMM back end
    corecore