887 research outputs found
Improvements to deep convolutional neural networks for LVCSR
Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural
Networks (DNN), as they are able to better reduce spectral variation in the
input signal. This has also been confirmed experimentally, with CNNs showing
improvements in word error rate (WER) between 4-12% relative compared to DNNs
across a variety of LVCSR tasks. In this paper, we describe different methods
to further improve CNN performance. First, we conduct a deep analysis comparing
limited weight sharing and full weight sharing with state-of-the-art features.
Second, we apply various pooling strategies that have shown improvements in
computer vision to an LVCSR speech task. Third, we introduce a method to
effectively incorporate speaker adaptation, namely fMLLR, into log-mel
features. Fourth, we introduce an effective strategy to use dropout during
Hessian-free sequence training. We find that with these improvements,
particularly with fMLLR and dropout, we are able to achieve an additional 2-3%
relative improvement in WER on a 50-hour Broadcast News task over our previous
best CNN baseline. On a larger 400-hour BN task, we find an additional 4-5%
relative improvement over our previous best CNN baseline.Comment: 6 pages, 1 figur
Gated Recurrent Neural Tensor Network
Recurrent Neural Networks (RNNs), which are a powerful scheme for modeling
temporal and sequential data need to capture long-term dependencies on datasets
and represent them in hidden layers with a powerful model to capture more
information from inputs. For modeling long-term dependencies in a dataset, the
gating mechanism concept can help RNNs remember and forget previous
information. Representing the hidden layers of an RNN with more expressive
operations (i.e., tensor products) helps it learn a more complex relationship
between the current input and the previous hidden layer information. These
ideas can generally improve RNN performances. In this paper, we proposed a
novel RNN architecture that combine the concepts of gating mechanism and the
tensor product into a single model. By combining these two concepts into a
single RNN, our proposed models learn long-term dependencies by modeling with
gating units and obtain more expressive and direct interaction between input
and hidden layers using a tensor product on 3-dimensional array (tensor) weight
parameters. We use Long Short Term Memory (LSTM) RNN and Gated Recurrent Unit
(GRU) RNN and combine them with a tensor product inside their formulations. Our
proposed RNNs, which are called a Long-Short Term Memory Recurrent Neural
Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor
Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the
tensor product. We conducted experiments with our proposed models on word-level
and character-level language modeling tasks and revealed that our proposed
models significantly improved their performance compared to our baseline
models.Comment: Accepted at IJCNN 2016 URL :
http://ieeexplore.ieee.org/document/7727233
Automatic differentiation in machine learning: a survey
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in
machine learning. Automatic differentiation (AD), also called algorithmic
differentiation or simply "autodiff", is a family of techniques similar to but
more general than backpropagation for efficiently and accurately evaluating
derivatives of numeric functions expressed as computer programs. AD is a small
but established field with applications in areas including computational fluid
dynamics, atmospheric sciences, and engineering design optimization. Until very
recently, the fields of machine learning and AD have largely been unaware of
each other and, in some cases, have independently discovered each other's
results. Despite its relevance, general-purpose AD has been missing from the
machine learning toolbox, a situation slowly changing with its ongoing adoption
under the names "dynamic computational graphs" and "differentiable
programming". We survey the intersection of AD and machine learning, cover
applications where AD has direct relevance, and address the main implementation
techniques. By precisely defining the main differentiation techniques and their
interrelationships, we aim to bring clarity to the usage of the terms
"autodiff", "automatic differentiation", and "symbolic differentiation" as
these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure
ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models
Deep neural networks (DNNs) are one of the most prominent technologies of our
time, as they achieve state-of-the-art performance in many machine learning
tasks, including but not limited to image classification, text mining, and
speech processing. However, recent research on DNNs has indicated
ever-increasing concern on the robustness to adversarial examples, especially
for security-critical tasks such as traffic sign identification for autonomous
driving. Studies have unveiled the vulnerability of a well-trained DNN by
demonstrating the ability of generating barely noticeable (to both human and
machines) adversarial images that lead to misclassification. Furthermore,
researchers have shown that these adversarial images are highly transferable by
simply training and attacking a substitute model built upon the target model,
known as a black-box attack to DNNs.
Similar to the setting of training substitute models, in this paper we
propose an effective black-box attack that also only has access to the input
(images) and the output (confidence scores) of a targeted DNN. However,
different from leveraging attack transferability from substitute models, we
propose zeroth order optimization (ZOO) based attacks to directly estimate the
gradients of the targeted DNN for generating adversarial examples. We use
zeroth order stochastic coordinate descent along with dimension reduction,
hierarchical attack and importance sampling techniques to efficiently attack
black-box models. By exploiting zeroth order optimization, improved attacks to
the targeted DNN can be accomplished, sparing the need for training substitute
models and avoiding the loss in attack transferability. Experimental results on
MNIST, CIFAR10 and ImageNet show that the proposed ZOO attack is as effective
as the state-of-the-art white-box attack and significantly outperforms existing
black-box attacks via substitute models.Comment: Accepted by 10th ACM Workshop on Artificial Intelligence and Security
(AISEC) with the 24th ACM Conference on Computer and Communications Security
(CCS
- …