Search CORE

5,390 research outputs found

Machine Translation : From Statistical to modern Deep-learning practices

Author: Shukla Anupam
Srivastava Siddhant
Tiwari Ritu
Publication venue
Publication date: 11/12/2018
Field of study

Machine translation (MT) is an area of study in Natural Language processing which deals with the automatic translation of human language, from one language to another by the computer. Having a rich research history spanning nearly three decades, Machine translation is one of the most sought after area of research in the linguistics and computational community. In this paper, we investigate the models based on deep learning that have achieved substantial progress in recent years and becoming the prominent method in MT. We shall discuss the two main deep-learning based Machine Translation methods, one at component or domain level which leverages deep learning models to enhance the efficacy of Statistical Machine Translation (SMT) and end-to-end deep learning models in MT which uses neural networks to find correspondence between the source and target languages using the encoder-decoder architecture. We conclude this paper by providing a time line of the major research problems solved by the researchers and also provide a comprehensive overview of present areas of research in Neural Machine Translation

arXiv.org e-Print Archive

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

Author: Chatterjee Niladrish
Keckler Stephen W.
O'Connor Mike
Pool Jeff
Rhu Minsoo
Publication venue
Publication date: 03/05/2017
Field of study

Popular deep learning frameworks require users to fine-tune their memory usage so that the training data of a deep neural network (DNN) fits within the GPU physical memory. Prior work tries to address this restriction by virtualizing the memory usage of DNNs, enabling both CPU and GPU memory to be utilized for memory allocations. Despite its merits, virtualizing memory can incur significant performance overheads when the time needed to copy data back and forth from CPU memory is higher than the latency to perform the computations required for DNN forward and backward propagation. We introduce a high-performance virtualization strategy based on a "compressing DMA engine" (cDMA) that drastically reduces the size of the data structures that are targeted for CPU-side allocations. The cDMA engine offers an average 2.6x (maximum 13.8x) compression ratio by exploiting the sparsity inherent in offloaded data, improving the performance of virtualized DNNs by an average 32% (maximum 61%)

arXiv.org e-Print Archive

Recent Advances in Convolutional Neural Network Acceleration

Author: Chen Tinghuan
Ma Yuzhe
Sun Zhifei
Yu Bei
Zhang Meng
Zhang Qianru
Publication venue
Publication date: 23/07/2018
Field of study

In recent years, convolutional neural networks (CNNs) have shown great performance in various fields such as image classification, pattern recognition, and multi-media compression. Two of the feature properties, local connectivity and weight sharing, can reduce the number of parameters and increase processing speed during training and inference. However, as the dimension of data becomes higher and the CNN architecture becomes more complicated, the end-to-end approach or the combined manner of CNN is computationally intensive, which becomes limitation to CNN's further implementation. Therefore, it is necessary and urgent to implement CNN in a faster way. In this paper, we first summarize the acceleration methods that contribute to but not limited to CNN by reviewing a broad variety of research papers. We propose a taxonomy in terms of three levels, i.e.~structure level, algorithm level, and implementation level, for acceleration methods. We also analyze the acceleration methods in terms of CNN architecture compression, algorithm optimization, and hardware-based improvement. At last, we give a discussion on different perspectives of these acceleration and optimization methods within each level. The discussion shows that the methods in each level still have large exploration space. By incorporating such a wide range of disciplines, we expect to provide a comprehensive reference for researchers who are interested in CNN acceleration.Comment: submitted to Neurocomputin

arXiv.org e-Print Archive

Text normalization using memory augmented neural networks

Author: Hussain Aman
Pramanik Subhojeet
Publication venue: 'Elsevier BV'
Publication date: 03/04/2019
Field of study

We perform text normalization, i.e. the transformation of words from the written to the spoken form, using a memory augmented neural network. With the addition of dynamic memory access and storage mechanism, we present a neural architecture that will serve as a language-agnostic text normalization system while avoiding the kind of unacceptable errors made by the LSTM-based recurrent neural networks. By successfully reducing the frequency of such mistakes, we show that this novel architecture is indeed a better alternative. Our proposed system requires significantly lesser amounts of data, training time and compute resources. Additionally, we perform data up-sampling, circumventing the data sparsity problem in some semiotic classes, to show that sufficient examples in any particular class can improve the performance of our text normalization system. Although a few occurrences of these errors still remain in certain semiotic classes, we demonstrate that memory augmented networks with meta-learning capabilities can open many doors to a superior text normalization system.Comment: 9 pages, 10 tables, 3 figure

arXiv.org e-Print Archive

Robust X-ray Sparse-view Phase Tomography via Hierarchical Synthesis Convolutional Neural Networks

Author: Alorf Abdulaziz
Li Ling
Wu Ziling
Yang Ting
Zhu Yunhui
Publication venue
Publication date: 29/01/2019
Field of study

Convolutional Neural Networks (CNN) based image reconstruction methods have been intensely used for X-ray computed tomography (CT) reconstruction applications. Despite great success, good performance of this data-based approach critically relies on a representative big training data set and a dense convoluted deep network. The indiscriminating convolution connections over all dense layers could be prone to over-fitting, where sampling biases are wrongly integrated as features for the reconstruction. In this paper, we report a robust hierarchical synthesis reconstruction approach, where training data is pre-processed to separate the information on the domains where sampling biases are suspected. These split bands are then trained separately and combined successively through a hierarchical synthesis network. We apply the hierarchical synthesis reconstruction for two important and classical tomography reconstruction scenarios: the spares-view reconstruction and the phase reconstruction. Our simulated and experimental results show that comparable or improved performances are achieved with a dramatic reduction of network complexity and computational cost. This method can be generalized to a wide range of applications including material characterization, in-vivo monitoring and dynamic 4D imaging.Comment: 9 pages, 6 figures, 2 table

arXiv.org e-Print Archive

CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks

Author: Ding Yuhao
Dong Peiyan
Geng Tong
Herbordt Martin
Li Ang
Ma Xiaolong
Shi Runbin
So Hayden K. -H.
Wang Yanzhi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/05/2020
Field of study

Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achieve a high pruning rate but introducing computation irregularity (random sparsity), which is unfriendly to parallel hardware. On the other hand, hardware-oriented structured pruning suffers from low pruning rate due to restricted constraints on allowable pruning structure. This paper presents CSB-RNN, an optimized full-stack RNN framework with a novel compressed structured block (CSB) pruning technique. The CSB pruned RNN model comes with both fine pruning granularity that facilitates a high pruning rate and regular structure that benefits the hardware parallelism. To address the challenges in parallelizing the CSB pruned model inference with fine-grained structural sparsity, we propose a novel hardware architecture with a dedicated compiler. Gaining from the architecture-compilation co-design, the hardware not only supports various RNN cell types, but is also able to address the challenging workload imbalance issue and therefore significantly improves the hardware efficiency

arXiv.org e-Print Archive

Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

Author: Blott Michaela
Faraone Julian
Fraser Nicholas
Gambardella Giulio
Leong Philip H. W.
Publication venue
Publication date: 09/10/2017
Field of study

A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hard- ware implementation costs during training to achieve significant model compression for inference. Training involves three stages: network training using L2 regularization and a quantization threshold regularizer, quantization pruning, and finally retraining. Resulting networks achieve improved accuracy, reduced memory footprint and reduced computational complexity compared with conventional methods, on MNIST and CIFAR10 datasets. Our networks are up to 98% sparse and 5 & 11 times smaller than equivalent binary and ternary models, translating to significant resource and speed benefits for hardware implementations.Comment: To appear as a conference paper at the 24th International Conference On Neural Information Processing (ICONIP 2017

arXiv.org e-Print Archive

Parameter Transfer Unit for Deep Neural Networks

Author: Yang Qiang
Zhang Yinghua
Zhang Yu
Publication venue
Publication date: 23/04/2018
Field of study

Parameters in deep neural networks which are trained on large-scale databases can generalize across multiple domains, which is referred as "transferability". Unfortunately, the transferability is usually defined as discrete states and it differs with domains and network architectures. Existing works usually heuristically apply parameter-sharing or fine-tuning, and there is no principled approach to learn a parameter transfer strategy. To address the gap, a parameter transfer unit (PTU) is proposed in this paper. The PTU learns a fine-grained nonlinear combination of activations from both the source and the target domain networks, and subsumes hand-crafted discrete transfer states. In the PTU, the transferability is controlled by two gates which are artificial neurons and can be learned from data. The PTU is a general and flexible module which can be used in both CNNs and RNNs. Experiments are conducted with various network architectures and multiple transfer domain pairs. Results demonstrate the effectiveness of the PTU as it outperforms heuristic parameter-sharing and fine-tuning in most settings

arXiv.org e-Print Archive

A Survey of Model Compression and Acceleration for Deep Neural Networks

Author: Cheng Yu
Wang Duo
Zhang Tao
Zhou Pan
Publication venue
Publication date: 14/06/2020
Field of study

Deep neural networks (DNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past five years, tremendous progress has been made in this area. In this paper, we review the recent techniques for compacting and accelerating DNN models. In general, these techniques are divided into four categories: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and quantization are described first, after that the other techniques are introduced. For each category, we also provide insightful analysis about the performance, related applications, advantages, and drawbacks. Then we go through some very recent successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrices, the main datasets used for evaluating the model performance, and recent benchmark efforts. Finally, we conclude this paper, discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version including more recent work

arXiv.org e-Print Archive

Jet-Images -- Deep Learning Edition

Author: de Oliveira Luke
Kagan Michael
Mackey Lester
Nachman Benjamin
Schwartzman Ariel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/01/2017
Field of study

Building on the notion of a particle physics detector as a camera and the collimated streams of high energy particles, or jets, it measures as an image, we investigate the potential of machine learning techniques based on deep learning architectures to identify highly boosted W bosons. Modern deep learning algorithms trained on jet images can out-perform standard physically-motivated feature driven approaches to jet tagging. We develop techniques for visualizing how these features are learned by the network and what additional information is used to improve performance. This interplay between physically-motivated feature driven tools and supervised learning algorithms is general and can be used to significantly increase the sensitivity to discover new particles and new forces, and gain a deeper understanding of the physics within jets.Comment: 32 pages, 24 figures. Version that is published in JHE

arXiv.org e-Print Archive