857 research outputs found
The emergence of number and syntax units in LSTM language models
Recent work has shown that LSTMs trained on a generic language modeling
objective capture syntax-sensitive generalizations such as long-distance number
agreement. We have however no mechanistic understanding of how they accomplish
this remarkable feat. Some have conjectured it depends on heuristics that do
not truly take hierarchical structure into account. We present here a detailed
study of the inner mechanics of number tracking in LSTMs at the single neuron
level. We discover that long-distance number information is largely managed by
two `number units'. Importantly, the behaviour of these units is partially
controlled by other units independently shown to track syntactic structure. We
conclude that LSTMs are, to some extent, implementing genuinely syntactic
processing mechanisms, paving the way to a more general understanding of
grammatical encoding in LSTMs.Comment: To appear in Proceedings of NAACL, Minneapolis, MN, 201
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
DeepSoft: A vision for a deep model of software
Although software analytics has experienced rapid growth as a research area,
it has not yet reached its full potential for wide industrial adoption. Most of
the existing work in software analytics still relies heavily on costly manual
feature engineering processes, and they mainly address the traditional
classification problems, as opposed to predicting future events. We present a
vision for \emph{DeepSoft}, an \emph{end-to-end} generic framework for modeling
software and its development process to predict future risks and recommend
interventions. DeepSoft, partly inspired by human memory, is built upon the
powerful deep learning-based Long Short Term Memory architecture that is
capable of learning long-term temporal dependencies that occur in software
evolution. Such deep learned patterns of software can be used to address a
range of challenging problems such as code and task recommendation and
prediction. DeepSoft provides a new approach for research into modeling of
source code, risk prediction and mitigation, developer modeling, and
automatically generating code patches from bug reports.Comment: FSE 201
- …