1,003 research outputs found

    A Generative Product-of-Filters Model of Audio

    Full text link
    We propose the product-of-filters (PoF) model, a generative model that decomposes audio spectra as sparse linear combinations of "filters" in the log-spectral domain. PoF makes similar assumptions to those used in the classic homomorphic filtering approach to signal processing, but replaces hand-designed decompositions built of basic signal processing operations with a learned decomposition based on statistical inference. This paper formulates the PoF model and derives a mean-field method for posterior inference and a variational EM algorithm to estimate the model's free parameters. We demonstrate PoF's potential for audio processing on a bandwidth expansion task, and show that PoF can serve as an effective unsupervised feature extractor for a speaker identification task.Comment: ICLR 2014 conference-track submission. Added link to the source cod

    Advancements of MultiRate Signal processing for Wireless Communication Networks: Current State Of the Art

    Get PDF
    With the hasty growth of internet contact and voice and information centric communications, many contact technologies have been urbanized to meet the stringent insist of high speed information transmission and viaduct the wide bandwidth gap among ever-increasing high-data-rate core system and bandwidth-hungry end-user complex. To make efficient consumption of the limited bandwidth of obtainable access routes and cope with the difficult channel environment, several standards have been projected for a variety of broadband access scheme over different access situation (twisted pairs, coaxial cables, optical fibers, and unchanging or mobile wireless admittance). These access situations may create dissimilar channel impairments and utter unique sets of signal dispensation algorithms and techniques to combat precise impairments. In the intended and implementation sphere of those systems, many research issues arise. In this paper we present advancements of multi-rate indication processing methodologies that are aggravated by this design trend. The thesis covers the contemporary confirmation of the current literature on intrusion suppression using multi-rate indication in wireless communiquE9; networks

    Collaborative adaptive filtering for machine learning

    No full text
    Quantitative performance criteria for the analysis of machine learning architectures and algorithms have long been established. However, qualitative performance criteria, which identify fundamental signal properties and ensure any processing preserves the desired properties, are still emerging. In many cases, whilst offline statistical tests exist such as assessment of nonlinearity or stochasticity, online tests which not only characterise but also track changes in the nature of the signal are lacking. To that end, by employing recent developments in signal characterisation, criteria are derived for the assessment of the changes in the nature of the processed signal. Through the fusion of the outputs of adaptive filters a single collaborative hybrid filter is produced. By tracking the dynamics of the mixing parameter of this filter, rather than the actual filter performance, a clear indication as to the current nature of the signal is given. Implementations of the proposed method show that it is possible to quantify the degree of nonlinearity within both real- and complex-valued data. This is then extended (in the real domain) from dealing with nonlinearity in general, to a more specific example, namely sparsity. Extensions of adaptive filters from the real to the complex domain are non-trivial and the differences between the statistics in the real and complex domains need to be taken into account. In terms of signal characteristics, nonlinearity can be both split- and fully-complex and complex-valued data can be considered circular or noncircular. Furthermore, by combining the information obtained from hybrid filters of different natures it is possible to use this method to gain a more complete understanding of the nature of the nonlinearity within a signal. This also paves the way for building multidimensional feature spaces and their application in data/information fusion. To produce online tests for sparsity, adaptive filters for sparse environments are investigated and a unifying framework for the derivation of proportionate normalised least mean square (PNLMS) algorithms is presented. This is then extended to derive variants with an adaptive step-size. In order to create an online test for noncircularity, a study of widely linear autoregressive modelling is presented, from which a proof of the convergence of the test for noncircularity can be given. Applications of this method are illustrated on examples such as biomedical signals, speech and wind data

    Deep Learning for Single Image Super-Resolution: A Brief Review

    Get PDF
    Single image super-resolution (SISR) is a notoriously challenging ill-posed problem, which aims to obtain a high-resolution (HR) output from one of its low-resolution (LR) versions. To solve the SISR problem, recently powerful deep learning algorithms have been employed and achieved the state-of-the-art performance. In this survey, we review representative deep learning-based SISR methods, and group them into two categories according to their major contributions to two essential aspects of SISR: the exploration of efficient neural network architectures for SISR, and the development of effective optimization objectives for deep SISR learning. For each category, a baseline is firstly established and several critical limitations of the baseline are summarized. Then representative works on overcoming these limitations are presented based on their original contents as well as our critical understandings and analyses, and relevant comparisons are conducted from a variety of perspectives. Finally we conclude this review with some vital current challenges and future trends in SISR leveraging deep learning algorithms.Comment: Accepted by IEEE Transactions on Multimedia (TMM

    Complex Neural Networks for Audio

    Get PDF
    Audio is represented in two mathematically equivalent ways: the real-valued time domain (i.e., waveform) and the complex-valued frequency domain (i.e., spectrum). There are advantages to the frequency-domain representation, e.g., the human auditory system is known to process sound in the frequency-domain. Furthermore, linear time-invariant systems are convolved with sources in the time-domain, whereas they may be factorized in the frequency-domain. Neural networks have become rather useful when applied to audio tasks such as machine listening and audio synthesis, which are related by their dependencies on high quality acoustic models. They ideally encapsulate fine-scale temporal structure, such as that encoded in the phase of frequency-domain audio, yet there are no authoritative deep learning methods for complex audio. This manuscript is dedicated to addressing the shortcoming. Chapter 2 motivates complex networks by their affinity with complex-domain audio, while Chapter 3 contributes methods for building and optimizing complex networks. We show that the naive implementation of Adam optimization is incorrect for complex random variables and show that selection of input and output representation has a significant impact on the performance of a complex network. Experimental results with novel complex neural architectures are provided in the second half of this manuscript. Chapter 4 introduces a complex model for binaural audio source localization. We show that, like humans, the complex model can generalize to different anatomical filters, which is important in the context of machine listening. The complex model\u27s performance is better than that of the real-valued models, as well as real- and complex-valued baselines. Chapter 5 proposes a two-stage method for speech enhancement. In the first stage, a complex-valued stochastic autoencoder projects complex vectors to a discrete space. In the second stage, long-term temporal dependencies are modeled in the discrete space. The autoencoder raises the performance ceiling for state of the art speech enhancement, but the dynamic enhancement model does not outperform other baselines. We discuss areas for improvement and note that the complex Adam optimizer improves training convergence over the naive implementation
    corecore