5,206 research outputs found
Easy over Hard: A Case Study on Deep Learning
While deep learning is an exciting new technique, the benefits of this method
need to be assessed with respect to its computational cost. This is
particularly important for deep learning since these learners need hours (to
weeks) to train the model. Such long training time limits the ability of (a)~a
researcher to test the stability of their conclusion via repeated runs with
different random seeds; and (b)~other researchers to repeat, improve, or even
refute that original work.
For example, recently, deep learning was used to find which questions in the
Stack Overflow programmer discussion forum can be linked together. That deep
learning system took 14 hours to execute. We show here that applying a very
simple optimizer called DE to fine tune SVM, it can achieve similar (and
sometimes better) results. The DE approach terminated in 10 minutes; i.e. 84
times faster hours than deep learning method.
We offer these results as a cautionary tale to the software analytics
community and suggest that not every new innovation should be applied without
critical analysis. If researchers deploy some new and expensive process, that
work should be baselined against some simpler and faster alternatives.Comment: 12 pages, 6 figures, accepted at FSE201
Adaptive Neural Compilation
This paper proposes an adaptive neural-compilation framework to address the
problem of efficient program learning. Traditional code optimisation strategies
used in compilers are based on applying pre-specified set of transformations
that make the code faster to execute without changing its semantics. In
contrast, our work involves adapting programs to make them more efficient while
considering correctness only on a target input distribution. Our approach is
inspired by the recent works on differentiable representations of programs. We
show that it is possible to compile programs written in a low-level language to
a differentiable representation. We also show how programs in this
representation can be optimised to make them efficient on a target distribution
of inputs. Experimental results demonstrate that our approach enables learning
specifically-tuned algorithms for given data distributions with a high success
rate.Comment: Submitted to NIPS 2016, code and supplementary materials will be
available on author's pag
Deep Semantic Role Labeling with Self-Attention
Semantic Role Labeling (SRL) is believed to be a crucial step towards natural
language understanding and has been widely studied. Recent years, end-to-end
SRL with recurrent neural networks (RNN) has gained increasing attention.
However, it remains a major challenge for RNNs to handle structural information
and long range dependencies. In this paper, we present a simple and effective
architecture for SRL which aims to address these problems. Our model is based
on self-attention which can directly capture the relationships between two
tokens regardless of their distance. Our single model achieves F on
the CoNLL-2005 shared task dataset and F on the CoNLL-2012 shared task
dataset, which outperforms the previous state-of-the-art results by and
F score respectively. Besides, our model is computationally
efficient, and the parsing speed is 50K tokens per second on a single Titan X
GPU.Comment: Accepted by AAAI-201
Dual Language Models for Code Switched Speech Recognition
In this work, we present a simple and elegant approach to language modeling
for bilingual code-switched text. Since code-switching is a blend of two or
more different languages, a standard bilingual language model can be improved
upon by using structures of the monolingual language models. We propose a novel
technique called dual language models, which involves building two
complementary monolingual language models and combining them using a
probabilistic model for switching between the two. We evaluate the efficacy of
our approach using a conversational Mandarin-English speech corpus. We prove
the robustness of our model by showing significant improvements in perplexity
measures over the standard bilingual language model without the use of any
external information. Similar consistent improvements are also reflected in
automatic speech recognition error rates.Comment: Accepted at Interspeech 201
- …