Search CORE

8,775 research outputs found

Improving accuracy and speeding up Document Image Classification through parallel systems

Author: Cortada Jordi
Dominguez Juan Luis
Ferrando Javier
Garcia David
Garcia Raul
Garrido Daniel
Torres Jordi
Valero Mateo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

This paper presents a study showing the benefits of the EfficientNet models compared with heavier Convolutional Neural Networks (CNNs) in the Document Classification task, essential problem in the digitalization process of institutions. We show in the RVL-CDIP dataset that we can improve previous results with a much lighter model and present its transfer learning capabilities on a smaller in-domain dataset such as Tobacco3482. Moreover, we present an ensemble pipeline which is able to boost solely image input by combining image model predictions with the ones generated by BERT model on extracted text by OCR. We also show that the batch size can be effectively increased without hindering its accuracy so that the training process can be sped up by parallelizing throughout multiple GPUs, decreasing the computational time needed. Lastly, we expose the training performance differences between PyTorch and Tensorflow Deep Learning frameworks

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Distributed Training Large-Scale Deep Architectures

Author: Chang Edward Y.
Chen Chun-Yen
Chou Chun-Nan
Lin Ting-Wei
Sung Cheng-Lung
Tsao Chia-Chin
Tung Kuan-Chieh
Wu Jui-Lin
Zou Shang-Xuan
Publication venue
Publication date: 10/08/2017
Field of study

Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Via lessons learned from our routine benchmarking effort, we first identify bottlenecks and overheads that hinter data parallelism. We then devise guidelines that help practitioners to configure an effective system and fine-tune parameters to achieve desired speedup. Specifically, we develop a procedure for setting minibatch size and choosing computation algorithms. We also derive lemmas for determining the quantity of key components such as the number of GPUs and parameter servers. Experiments and examples show that these guidelines help effectively speed up large-scale deep learning training

arXiv.org e-Print Archive

Crossref

Assessing hyper parameter optimization and speedup for convolutional neural networks

Author: A.Krizhevsky
D. L.Tutorial
E.Bochinski
E.Real
J.Bergstra
J.Deng
K.He
L.Xie
N.Srivastava
S.Ioffe
T.Domhan
W. Y.Lee
Z.Zhong
Publication venue: 'IGI Global'
Publication date: 01/01/2020
Field of study

The increased processing power of graphical processing units (GPUs) and the availability of large image datasets has fostered a renewed interest in extracting semantic information from images. Promising results for complex image categorization problems have been achieved using deep learning, with neural networks comprised of many layers. Convolutional neural networks (CNN) are one such architecture which provides more opportunities for image classification. Advances in CNN enable the development of training models using large labelled image datasets, but the hyper parameters need to be specified, which is challenging and complex due to the large number of parameters. A substantial amount of computational power and processing time is required to determine the optimal hyper parameters to define a model yielding good results. This article provides a survey of the hyper parameter search and optimization methods for CNN architectures

LSBU Research Open

Crossref

ResearchOnline@GCU

Learning Combinations of Activation Functions

Author: Manessi Franco
Rozza Alessandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/04/2019
Field of study

In the last decade, an active area of research has been devoted to design novel activation functions that are able to help deep neural networks to converge, obtaining better performance. The training procedure of these architectures usually involves optimization of the weights of their layers only, while non-linearities are generally pre-specified and their (possible) parameters are usually considered as hyper-parameters to be tuned manually. In this paper, we introduce two approaches to automatically learn different combinations of base activation functions (such as the identity function, ReLU, and tanh) during the training phase. We present a thorough comparison of our novel approaches with well-known architectures (such as LeNet-5, AlexNet, and ResNet-56) on three standard datasets (Fashion-MNIST, CIFAR-10, and ILSVRC-2012), showing substantial improvements in the overall performance, such as an increase in the top-1 accuracy for AlexNet on ILSVRC-2012 of 3.01 percentage points.Comment: 6 pages, 3 figures. Published as a conference paper at ICPR 2018. Code: https://bitbucket.org/francux/learning_combinations_of_activation_function

arXiv.org e-Print Archive

Crossref