5,390 research outputs found
Machine Translation : From Statistical to modern Deep-learning practices
Machine translation (MT) is an area of study in Natural Language processing
which deals with the automatic translation of human language, from one language
to another by the computer. Having a rich research history spanning nearly
three decades, Machine translation is one of the most sought after area of
research in the linguistics and computational community. In this paper, we
investigate the models based on deep learning that have achieved substantial
progress in recent years and becoming the prominent method in MT. We shall
discuss the two main deep-learning based Machine Translation methods, one at
component or domain level which leverages deep learning models to enhance the
efficacy of Statistical Machine Translation (SMT) and end-to-end deep learning
models in MT which uses neural networks to find correspondence between the
source and target languages using the encoder-decoder architecture. We conclude
this paper by providing a time line of the major research problems solved by
the researchers and also provide a comprehensive overview of present areas of
research in Neural Machine Translation
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks
Popular deep learning frameworks require users to fine-tune their memory
usage so that the training data of a deep neural network (DNN) fits within the
GPU physical memory. Prior work tries to address this restriction by
virtualizing the memory usage of DNNs, enabling both CPU and GPU memory to be
utilized for memory allocations. Despite its merits, virtualizing memory can
incur significant performance overheads when the time needed to copy data back
and forth from CPU memory is higher than the latency to perform the
computations required for DNN forward and backward propagation. We introduce a
high-performance virtualization strategy based on a "compressing DMA engine"
(cDMA) that drastically reduces the size of the data structures that are
targeted for CPU-side allocations. The cDMA engine offers an average 2.6x
(maximum 13.8x) compression ratio by exploiting the sparsity inherent in
offloaded data, improving the performance of virtualized DNNs by an average 32%
(maximum 61%)
Recent Advances in Convolutional Neural Network Acceleration
In recent years, convolutional neural networks (CNNs) have shown great
performance in various fields such as image classification, pattern
recognition, and multi-media compression. Two of the feature properties, local
connectivity and weight sharing, can reduce the number of parameters and
increase processing speed during training and inference. However, as the
dimension of data becomes higher and the CNN architecture becomes more
complicated, the end-to-end approach or the combined manner of CNN is
computationally intensive, which becomes limitation to CNN's further
implementation. Therefore, it is necessary and urgent to implement CNN in a
faster way. In this paper, we first summarize the acceleration methods that
contribute to but not limited to CNN by reviewing a broad variety of research
papers. We propose a taxonomy in terms of three levels, i.e.~structure level,
algorithm level, and implementation level, for acceleration methods. We also
analyze the acceleration methods in terms of CNN architecture compression,
algorithm optimization, and hardware-based improvement. At last, we give a
discussion on different perspectives of these acceleration and optimization
methods within each level. The discussion shows that the methods in each level
still have large exploration space. By incorporating such a wide range of
disciplines, we expect to provide a comprehensive reference for researchers who
are interested in CNN acceleration.Comment: submitted to Neurocomputin
Text normalization using memory augmented neural networks
We perform text normalization, i.e. the transformation of words from the
written to the spoken form, using a memory augmented neural network. With the
addition of dynamic memory access and storage mechanism, we present a neural
architecture that will serve as a language-agnostic text normalization system
while avoiding the kind of unacceptable errors made by the LSTM-based recurrent
neural networks. By successfully reducing the frequency of such mistakes, we
show that this novel architecture is indeed a better alternative. Our proposed
system requires significantly lesser amounts of data, training time and compute
resources. Additionally, we perform data up-sampling, circumventing the data
sparsity problem in some semiotic classes, to show that sufficient examples in
any particular class can improve the performance of our text normalization
system. Although a few occurrences of these errors still remain in certain
semiotic classes, we demonstrate that memory augmented networks with
meta-learning capabilities can open many doors to a superior text normalization
system.Comment: 9 pages, 10 tables, 3 figure
Robust X-ray Sparse-view Phase Tomography via Hierarchical Synthesis Convolutional Neural Networks
Convolutional Neural Networks (CNN) based image reconstruction methods have
been intensely used for X-ray computed tomography (CT) reconstruction
applications. Despite great success, good performance of this data-based
approach critically relies on a representative big training data set and a
dense convoluted deep network. The indiscriminating convolution connections
over all dense layers could be prone to over-fitting, where sampling biases are
wrongly integrated as features for the reconstruction. In this paper, we report
a robust hierarchical synthesis reconstruction approach, where training data is
pre-processed to separate the information on the domains where sampling biases
are suspected. These split bands are then trained separately and combined
successively through a hierarchical synthesis network. We apply the
hierarchical synthesis reconstruction for two important and classical
tomography reconstruction scenarios: the spares-view reconstruction and the
phase reconstruction. Our simulated and experimental results show that
comparable or improved performances are achieved with a dramatic reduction of
network complexity and computational cost. This method can be generalized to a
wide range of applications including material characterization, in-vivo
monitoring and dynamic 4D imaging.Comment: 9 pages, 6 figures, 2 table
CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks
Recurrent neural networks (RNNs) have been widely adopted in temporal
sequence analysis, where realtime performance is often in demand. However, RNNs
suffer from heavy computational workload as the model often comes with large
weight matrices. Pruning schemes have been proposed for RNNs to eliminate the
redundant (close-to-zero) weight values. On one hand, the non-structured
pruning methods achieve a high pruning rate but introducing computation
irregularity (random sparsity), which is unfriendly to parallel hardware. On
the other hand, hardware-oriented structured pruning suffers from low pruning
rate due to restricted constraints on allowable pruning structure. This paper
presents CSB-RNN, an optimized full-stack RNN framework with a novel compressed
structured block (CSB) pruning technique. The CSB pruned RNN model comes with
both fine pruning granularity that facilitates a high pruning rate and regular
structure that benefits the hardware parallelism. To address the challenges in
parallelizing the CSB pruned model inference with fine-grained structural
sparsity, we propose a novel hardware architecture with a dedicated compiler.
Gaining from the architecture-compilation co-design, the hardware not only
supports various RNN cell types, but is also able to address the challenging
workload imbalance issue and therefore significantly improves the hardware
efficiency
Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks
A low precision deep neural network training technique for producing sparse,
ternary neural networks is presented. The technique incorporates hard- ware
implementation costs during training to achieve significant model compression
for inference. Training involves three stages: network training using L2
regularization and a quantization threshold regularizer, quantization pruning,
and finally retraining. Resulting networks achieve improved accuracy, reduced
memory footprint and reduced computational complexity compared with
conventional methods, on MNIST and CIFAR10 datasets. Our networks are up to 98%
sparse and 5 & 11 times smaller than equivalent binary and ternary models,
translating to significant resource and speed benefits for hardware
implementations.Comment: To appear as a conference paper at the 24th International Conference
On Neural Information Processing (ICONIP 2017
Parameter Transfer Unit for Deep Neural Networks
Parameters in deep neural networks which are trained on large-scale databases
can generalize across multiple domains, which is referred as "transferability".
Unfortunately, the transferability is usually defined as discrete states and it
differs with domains and network architectures. Existing works usually
heuristically apply parameter-sharing or fine-tuning, and there is no
principled approach to learn a parameter transfer strategy. To address the gap,
a parameter transfer unit (PTU) is proposed in this paper. The PTU learns a
fine-grained nonlinear combination of activations from both the source and the
target domain networks, and subsumes hand-crafted discrete transfer states. In
the PTU, the transferability is controlled by two gates which are artificial
neurons and can be learned from data. The PTU is a general and flexible module
which can be used in both CNNs and RNNs. Experiments are conducted with various
network architectures and multiple transfer domain pairs. Results demonstrate
the effectiveness of the PTU as it outperforms heuristic parameter-sharing and
fine-tuning in most settings
A Survey of Model Compression and Acceleration for Deep Neural Networks
Deep neural networks (DNNs) have recently achieved great success in many
visual recognition tasks. However, existing deep neural network models are
computationally expensive and memory intensive, hindering their deployment in
devices with low memory resources or in applications with strict latency
requirements. Therefore, a natural thought is to perform model compression and
acceleration in deep networks without significantly decreasing the model
performance. During the past five years, tremendous progress has been made in
this area. In this paper, we review the recent techniques for compacting and
accelerating DNN models. In general, these techniques are divided into four
categories: parameter pruning and quantization, low-rank factorization,
transferred/compact convolutional filters, and knowledge distillation. Methods
of parameter pruning and quantization are described first, after that the other
techniques are introduced. For each category, we also provide insightful
analysis about the performance, related applications, advantages, and
drawbacks. Then we go through some very recent successful methods, for example,
dynamic capacity networks and stochastic depths networks. After that, we survey
the evaluation matrices, the main datasets used for evaluating the model
performance, and recent benchmark efforts. Finally, we conclude this paper,
discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version
including more recent work
Jet-Images -- Deep Learning Edition
Building on the notion of a particle physics detector as a camera and the
collimated streams of high energy particles, or jets, it measures as an image,
we investigate the potential of machine learning techniques based on deep
learning architectures to identify highly boosted W bosons. Modern deep
learning algorithms trained on jet images can out-perform standard
physically-motivated feature driven approaches to jet tagging. We develop
techniques for visualizing how these features are learned by the network and
what additional information is used to improve performance. This interplay
between physically-motivated feature driven tools and supervised learning
algorithms is general and can be used to significantly increase the sensitivity
to discover new particles and new forces, and gain a deeper understanding of
the physics within jets.Comment: 32 pages, 24 figures. Version that is published in JHE
- …