2,272 research outputs found
Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network
We propose a tensor-to-vector regression approach to multi-channel speech
enhancement in order to address the issue of input size explosion and
hidden-layer size expansion. The key idea is to cast the conventional deep
neural network (DNN) based vector-to-vector regression formulation under a
tensor-train network (TTN) framework. TTN is a recently emerged solution for
compact representation of deep models with fully connected hidden layers. Thus
TTN maintains DNN's expressive power yet involves a much smaller amount of
trainable parameters. Furthermore, TTN can handle a multi-dimensional tensor
input by design, which exactly matches the desired setting in multi-channel
speech enhancement. We first provide a theoretical extension from DNN to TTN
based regression. Next, we show that TTN can attain speech enhancement quality
comparable with that for DNN but with much fewer parameters, e.g., a reduction
from 27 million to only 5 million parameters is observed in a single-channel
scenario. TTN also improves PESQ over DNN from 2.86 to 2.96 by slightly
increasing the number of trainable parameters. Finally, in 8-channel
conditions, a PESQ of 3.12 is achieved using 20 million parameters for TTN,
whereas a DNN with 68 million parameters can only attain a PESQ of 3.06. Our
implementation is available online
https://github.com/uwjunqi/Tensor-Train-Neural-Network.Comment: Accepted to ICASSP 2020. Update reproducible cod
Theoretical Error Performance Analysis for Deep Neural Network Based Regression Functional Approximation
Based on Kolmogorov's superposition theorem and universal approximation theorems by Cybenko and Barron, any vector-to-scalar function can be approximated by a multi-layer perceptron (MLP) within certain bounds. The theorems inspire us to exploit deep neural networks (DNN) based vector-to-vector regression. This dissertation aims at establishing theoretical foundations on DNN based vector-to-vector functional approximation, and bridging the gap between DNN based applications and their theoretical understanding in terms of representation and generalization powers.
Concerning the representation power, we develop the classical universal approximation theorems and put forth a new upper bound to vector-to-vector regression. More specifically, we first derive upper bounds on the artificial neural network (ANN), and then we generalize the concepts to DNN based architectures. Our theorems suggest that a broader width of the top hidden layer and a deep model structure bring a more expressive power of DNN based vector-to-vector regression, which is illustrated with speech enhancement experiments.
As for the generalization power of DNN based vector-to-vector regression, we employ a well-known error decomposition technique, which factorizes an expected loss into the sum of an approximation error, an estimation error, and an optimization error. Since the approximation error is associated with our attained upper bound upon the expressive power, we concentrate our research on deriving the upper bound for the estimation error and optimization error based on statistical learning theory and non-convex optimization. Moreover, we demonstrate that mean absolute error (MAE) satisfies the property of Lipschitz continuity and exhibits better performance than mean squared error (MSE). The speech enhancement experiments with DNN models are utilized to corroborate our aforementioned theorems.
Finally, since an over-parameterized setting for DNN is expected to ensure our theoretical upper bounds on the generalization power, we put forth a novel deep tensor learning framework, namely tensor-train deep neural network (TT-DNN), to deal with an explosive DNN model size and realize effective deep regression with much smaller model complexity. Our experiments of speech enhancement demonstrate that a TT-DNN can maintain or even achieve higher performance accuracy but with much fewer model parameters than an even over-parameterized DNN.Ph.D
Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement
This paper investigates different trade-offs between the number of model
parameters and enhanced speech qualities by employing several deep
tensor-to-vector regression models for speech enhancement. We find that a
hybrid architecture, namely CNN-TT, is capable of maintaining a good quality
performance with a reduced model parameter size. CNN-TT is composed of several
convolutional layers at the bottom for feature extraction to improve speech
quality and a tensor-train (TT) output layer on the top to reduce model
parameters. We first derive a new upper bound on the generalization power of
the convolutional neural network (CNN) based vector-to-vector regression
models. Then, we provide experimental evidence on the Edinburgh noisy speech
corpus to demonstrate that, in single-channel speech enhancement, CNN
outperforms DNN at the expense of a small increment of model sizes. Besides,
CNN-TT slightly outperforms the CNN counterpart by utilizing only 32\% of the
CNN model parameters. Besides, further performance improvement can be attained
if the number of CNN-TT parameters is increased to 44\% of the CNN model size.
Finally, our experiments of multi-channel speech enhancement on a simulated
noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture
achieves better results than both DNN and CNN models in terms of
better-enhanced speech qualities and smaller parameter sizes.Comment: Accepted to InterSpeech 202
On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression
In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for the deep neural network (DNN) based vector-to-vector regression. The goal of this work is two-fold: (i) presenting performance bounds of MAE, and (ii) demonstrating new properties of MAE that make it more appropriate than mean squared error (MSE) as a loss function for DNN based vector-to-vector regression. First, we show that a generalized upper-bound for DNN-based vector-to-vector regression can be ensured by leveraging the known Lipschitz continuity property of MAE. Next, we derive a new generalized upper bound in the presence of additive noise. Finally, in contrast to conventional MSE commonly adopted to approximate Gaussian errors for regression, we show that MAE can be interpreted as an error modeled by Laplacian distribution. Speech enhancement experiments are conducted to corroborate our proposed theorems and validate the performance advantages of MAE over MSE for DNN based regression
Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement
Recent studies have highlighted adversarial examples as ubiquitous threats to
the deep neural network (DNN) based speech recognition systems. In this work,
we present a U-Net based attention model, U-Net, to enhance adversarial
speech signals. Specifically, we evaluate the model performance by
interpretable speech recognition metrics and discuss the model performance by
the augmented adversarial training. Our experiments show that our proposed
U-Net improves the perceptual evaluation of speech quality (PESQ) from
1.13 to 2.78, speech transmission index (STI) from 0.65 to 0.75, short-term
objective intelligibility (STOI) from 0.83 to 0.96 on the task of speech
enhancement with adversarial speech examples. We conduct experiments on the
automatic speech recognition (ASR) task with adversarial audio attacks. We find
that (i) temporal features learned by the attention network are capable of
enhancing the robustness of DNN based ASR models; (ii) the generalization power
of DNN based ASR model could be enhanced by applying adversarial training with
an additive adversarial data augmentation. The ASR metric on word-error-rates
(WERs) shows that there is an absolute 2.22 decrease under gradient-based
perturbation, and an absolute 2.03 decrease, under evolutionary-optimized
perturbation, which suggests that our enhancement models with adversarial
training can further secure a resilient ASR system.Comment: The first draft was finished in August 2019. Accepted to IEEE ICASSP
202
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
- …