Search CORE

1,944 research outputs found

SelfieBoost: A Boosting Algorithm for Deep Learning

Author: Shalev-Shwartz Shai
Publication venue
Publication date: 08/04/2017
Field of study

We describe and analyze a new boosting algorithm for deep learning called SelfieBoost. Unlike other boosting algorithms, like AdaBoost, which construct ensembles of classifiers, SelfieBoost boosts the accuracy of a single network. We prove a

\log(1/\epsilon)

convergence rate for SelfieBoost under some "SGD success" assumption which seems to hold in practice

arXiv.org e-Print Archive

CiteSeerX

A Reinforced Improved Attention Model for Abstractive Text Summarization

Author: Chang Yu
Huang Yiming
Lei Hang
Li Xiaoyu
Publication venue: Waseda Institute for the Study of Language and Information
Publication date: 01/01/2019
Field of study

Waseda University Repository

Structural Attention Neural Networks for improved sentiment analysis

Author: Kokkinos Filippos
Potamianos Alexandros
Publication venue
Publication date: 01/01/2017
Field of study

We introduce a tree-structured attention neural network for sentences and small phrases and apply it to the problem of sentiment classification. Our model expands the current recursive models by incorporating structural information around a node of a syntactic tree using both bottom-up and top-down information propagation. Also, the model utilizes structural attention to identify the most salient representations during the construction of the syntactic tree. To our knowledge, the proposed models achieve state of the art performance on the Stanford Sentiment Treebank dataset.Comment: Submitted to EACL2017 for revie

arXiv.org e-Print Archive

Crossref

Distributed Deep Learning for Question Answering

Author: Bottou L.
Chilimbi T.
Dean J.
Sutskever I.
Zhang S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/08/2016
Field of study

This paper is an empirical study of the distributed deep learning for question answering subtasks: answer selection and question classification. Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP, DOWNPOUR and EASGD/EAMSGD algorithms have been presented. Experimental results show that the distributed framework based on the message passing interface can accelerate the convergence speed at a sublinear scale. This paper demonstrates the importance of distributed training. For example, with 48 workers, a 24x speedup is achievable for the answer selection task and running time is decreased from 138.2 hours to 5.81 hours, which will increase the productivity significantly.Comment: This paper will appear in the Proceeding of The 25th ACM International Conference on Information and Knowledge Management (CIKM 2016), Indianapolis, US

arXiv.org e-Print Archive

Crossref