Search CORE

14,948 research outputs found

Finding Competitive Network Architectures Within a Day Using UCT

Author: Wistuba Martin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/07/2018
Field of study

The design of neural network architectures for a new data set is a laborious task which requires human deep learning expertise. In order to make deep learning available for a broader audience, automated methods for finding a neural network architecture are vital. Recently proposed methods can already achieve human expert level performances. However, these methods have run times of months or even years of GPU computing time, ignoring hardware constraints as faced by many researchers and companies. We propose the use of Monte Carlo planning in combination with two different UCT (upper confidence bound applied to trees) derivations to search for network architectures. We adapt the UCT algorithm to the needs of network architecture search by proposing two ways of sharing information between different branches of the search tree. In an empirical study we are able to demonstrate that this method is able to find competitive networks for MNIST, SVHN and CIFAR-10 in just a single GPU day. Extending the search time to five GPU days, we are able to outperform human architectures and our competitors which consider the same types of layers

arXiv.org e-Print Archive

Crossref

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

Author: Guo Junliang
He Di
Liu Tie-Yan
Qin Tao
Tan Xu
Xu Linli
Publication venue
Publication date: 23/12/2018
Field of study

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models. Previous work shows that the quality of the inputs of the decoder is important and largely impacts the model accuracy. In this paper, we propose two methods to enhance the decoder inputs so as to improve NAT models. The first one directly leverages a phrase table generated by conventional SMT approaches to translate source tokens to target tokens, which are then fed into the decoder as inputs. The second one transforms source-side word embeddings to target-side word embeddings through sentence-level alignment and word-level adversary learning, and then feeds the transformed word embeddings into the decoder as inputs. Experimental results show our method largely outperforms the NAT baseline~\citep{gu2017non} by

5.11

BLEU scores on WMT14 English-German task and

4.72

BLEU scores on WMT16 English-Romanian task.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Comparison of Deep Learning and the Classical Machine Learning Algorithm for the Malware Detection

Author: Rathore Hemant
Sahay Sanjay K.
Sewak Mohit
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/09/2018
Field of study

Recently, Deep Learning has been showing promising results in various Artificial Intelligence applications like image recognition, natural language processing, language modeling, neural machine translation, etc. Although, in general, it is computationally more expensive as compared to classical machine learning techniques, their results are found to be more effective in some cases. Therefore, in this paper, we investigated and compared one of the Deep Learning Architecture called Deep Neural Network (DNN) with the classical Random Forest (RF) machine learning algorithm for the malware classification. We studied the performance of the classical RF and DNN with 2, 4 & 7 layers architectures with the four different feature sets, and found that irrespective of the features inputs, the classical RF accuracy outperforms the DNN.Comment: 11 Pages, 1 figur

arXiv.org e-Print Archive

Crossref