17,885 research outputs found
Testing the assumptions of linear prediction analysis in normal vowels
This paper develops an improved surrogate data test to show experimental evidence, for all the simple vowels of US English, for both male and female speakers, that Gaussian linear prediction analysis, a ubiquitous technique in current speech technologies, cannot be used to extract all the dynamical structure of real speech time series. The test provides robust evidence undermining the validity of these linear techniques, supporting the assumptions of either dynamical nonlinearity and/or non-Gaussianity common to more recent, complex, efforts at dynamical modelling speech time series. However, an additional finding is that the classical assumptions cannot be ruled out entirely, and plausible evidence is given to explain the success of the linear Gaussian theory as a weak approximation to the true, nonlinear/non-Gaussian dynamics. This supports the use of appropriate hybrid linear/nonlinear/non-Gaussian modelling. With a calibrated calculation of statistic and particular choice of experimental protocol, some of the known systematic problems of the method of surrogate data testing are circumvented to obtain results to support the conclusions to a high level of significance
Recognizing recurrent neural networks (rRNN): Bayesian inference for recurrent neural networks
Recurrent neural networks (RNNs) are widely used in computational
neuroscience and machine learning applications. In an RNN, each neuron computes
its output as a nonlinear function of its integrated input. While the
importance of RNNs, especially as models of brain processing, is undisputed, it
is also widely acknowledged that the computations in standard RNN models may be
an over-simplification of what real neuronal networks compute. Here, we suggest
that the RNN approach may be made both neurobiologically more plausible and
computationally more powerful by its fusion with Bayesian inference techniques
for nonlinear dynamical systems. In this scheme, we use an RNN as a generative
model of dynamic input caused by the environment, e.g. of speech or kinematics.
Given this generative RNN model, we derive Bayesian update equations that can
decode its output. Critically, these updates define a 'recognizing RNN' (rRNN),
in which neurons compute and exchange prediction and prediction error messages.
The rRNN has several desirable features that a conventional RNN does not have,
for example, fast decoding of dynamic stimuli and robustness to initial
conditions and noise. Furthermore, it implements a predictive coding scheme for
dynamic inputs. We suggest that the Bayesian inversion of recurrent neural
networks may be useful both as a model of brain function and as a machine
learning tool. We illustrate the use of the rRNN by an application to the
online decoding (i.e. recognition) of human kinematics
Graph Spectral Image Processing
Recent advent of graph signal processing (GSP) has spurred intensive studies
of signals that live naturally on irregular data kernels described by graphs
(e.g., social networks, wireless sensor networks). Though a digital image
contains pixels that reside on a regularly sampled 2D grid, if one can design
an appropriate underlying graph connecting pixels with weights that reflect the
image structure, then one can interpret the image (or image patch) as a signal
on a graph, and apply GSP tools for processing and analysis of the signal in
graph spectral domain. In this article, we overview recent graph spectral
techniques in GSP specifically for image / video processing. The topics covered
include image compression, image restoration, image filtering and image
segmentation
Gossip Algorithms for Distributed Signal Processing
Gossip algorithms are attractive for in-network processing in sensor networks
because they do not require any specialized routing, there is no bottleneck or
single point of failure, and they are robust to unreliable wireless network
conditions. Recently, there has been a surge of activity in the computer
science, control, signal processing, and information theory communities,
developing faster and more robust gossip algorithms and deriving theoretical
performance guarantees. This article presents an overview of recent work in the
area. We describe convergence rate results, which are related to the number of
transmitted messages and thus the amount of energy consumed in the network for
gossiping. We discuss issues related to gossiping over wireless links,
including the effects of quantization and noise, and we illustrate the use of
gossip algorithms for canonical signal processing tasks including distributed
estimation, source localization, and compression.Comment: Submitted to Proceedings of the IEEE, 29 page
An Unsupervised Autoregressive Model for Speech Representation Learning
This paper proposes a novel unsupervised autoregressive neural model for
learning generic speech representations. In contrast to other speech
representation learning methods that aim to remove noise or speaker
variabilities, ours is designed to preserve information for a wide range of
downstream tasks. In addition, the proposed model does not require any phonetic
or word boundary labels, allowing the model to benefit from large quantities of
unlabeled data. Speech representations learned by our model significantly
improve performance on both phone classification and speaker verification over
the surface features and other supervised and unsupervised approaches. Further
analysis shows that different levels of speech information are captured by our
model at different layers. In particular, the lower layers tend to be more
discriminative for speakers, while the upper layers provide more phonetic
content.Comment: Accepted to Interspeech 2019. Code available at:
https://github.com/iamyuanchung/Autoregressive-Predictive-Codin
- …