4 research outputs found

    Complex Neural Networks for Audio

    Get PDF
    Audio is represented in two mathematically equivalent ways: the real-valued time domain (i.e., waveform) and the complex-valued frequency domain (i.e., spectrum). There are advantages to the frequency-domain representation, e.g., the human auditory system is known to process sound in the frequency-domain. Furthermore, linear time-invariant systems are convolved with sources in the time-domain, whereas they may be factorized in the frequency-domain. Neural networks have become rather useful when applied to audio tasks such as machine listening and audio synthesis, which are related by their dependencies on high quality acoustic models. They ideally encapsulate fine-scale temporal structure, such as that encoded in the phase of frequency-domain audio, yet there are no authoritative deep learning methods for complex audio. This manuscript is dedicated to addressing the shortcoming. Chapter 2 motivates complex networks by their affinity with complex-domain audio, while Chapter 3 contributes methods for building and optimizing complex networks. We show that the naive implementation of Adam optimization is incorrect for complex random variables and show that selection of input and output representation has a significant impact on the performance of a complex network. Experimental results with novel complex neural architectures are provided in the second half of this manuscript. Chapter 4 introduces a complex model for binaural audio source localization. We show that, like humans, the complex model can generalize to different anatomical filters, which is important in the context of machine listening. The complex model\u27s performance is better than that of the real-valued models, as well as real- and complex-valued baselines. Chapter 5 proposes a two-stage method for speech enhancement. In the first stage, a complex-valued stochastic autoencoder projects complex vectors to a discrete space. In the second stage, long-term temporal dependencies are modeled in the discrete space. The autoencoder raises the performance ceiling for state of the art speech enhancement, but the dynamic enhancement model does not outperform other baselines. We discuss areas for improvement and note that the complex Adam optimizer improves training convergence over the naive implementation

    Generalized Splitting 2D Flexible Activation Function

    No full text

    Generalized Splitting 2D Flexible Activation Function

    No full text
    Abstract. It is well known that in problems where both amplitude and phase recovery is essential- like in signal processing for communications, or in problems of nonlinear signal distortions, like control, signal processing and imaging applications- it is important to consider the complex nature (and thus the intimate relation between real and imaginary part) of the data. One of the main problem to design complex neural networks (CpxNN) consists in the definition of the complex Activation Functions (AF): to ensure the universal approximation network capabilities, the AFs should be bounded and differentiable. In the complex domain these characteristics are in contrast with Louiville’s theorem, which asserts that the only bounded and differentiable (analytic) function is the constant function. In this paper we investigate the use of 2D spline to define a new class of flexible activation functions, which are bounded and (locally) analytic suitable to define a new class of complex domain neural networks (CpxNN).
    corecore