2,844 research outputs found
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
Correlating neural and symbolic representations of language
Analysis methods which enable us to better understand the representations and
functioning of neural models of language are increasingly needed as deep
learning becomes the dominant approach in NLP. Here we present two methods
based on Representational Similarity Analysis (RSA) and Tree Kernels (TK) which
allow us to directly quantify how strongly the information encoded in neural
activation patterns corresponds to information represented by symbolic
structures such as syntax trees. We first validate our methods on the case of a
simple synthetic language for arithmetic expressions with clearly defined
syntax and semantics, and show that they exhibit the expected pattern of
results. We then apply our methods to correlate neural representations of
English sentences with their constituency parse trees.Comment: ACL 201
Better, Faster, Stronger Sequence Tagging Constituent Parsers
Sequence tagging models for constituent parsing are faster, but less accurate
than other types of parsers. In this work, we address the following weaknesses
of such constituent parsers: (a) high error rates around closing brackets of
long constituents, (b) large label sets, leading to sparsity, and (c) error
propagation arising from greedy decoding. To effectively close brackets, we
train a model that learns to switch between tagging schemes. To reduce
sparsity, we decompose the label set and use multi-task learning to jointly
learn to predict sublabels. Finally, we mitigate issues from greedy decoding
through auxiliary losses and sentence-level fine-tuning with policy gradient.
Combining these techniques, we clearly surpass the performance of sequence
tagging constituent parsers on the English and Chinese Penn Treebanks, and
reduce their parsing time even further. On the SPMRL datasets, we observe even
greater improvements across the board, including a new state of the art on
Basque, Hebrew, Polish and Swedish.Comment: NAACL 2019 (long papers). Contains corrigendu
Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks
In de novo drug design, computational strategies are used to generate novel
molecules with good affinity to the desired biological target. In this work, we
show that recurrent neural networks can be trained as generative models for
molecular structures, similar to statistical language models in natural
language processing. We demonstrate that the properties of the generated
molecules correlate very well with the properties of the molecules used to
train the model. In order to enrich libraries with molecules active towards a
given biological target, we propose to fine-tune the model with small sets of
molecules, which are known to be active against that target.
Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test
molecules that medicinal chemists designed, whereas against Plasmodium
falciparum (Malaria) it reproduced 28% of 1240 test molecules. When coupled
with a scoring function, our model can perform the complete de novo drug design
cycle to generate large sets of novel molecules for drug discovery.Comment: 17 pages, 17 figure
Recurrent Memory Networks for Language Modeling
Recurrent Neural Networks (RNN) have obtained excellent result in many
natural language processing (NLP) tasks. However, understanding and
interpreting the source of this success remains a challenge. In this paper, we
propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only
amplifies the power of RNN but also facilitates our understanding of its
internal functioning and allows us to discover underlying patterns in data. We
demonstrate the power of RMN on language modeling and sentence completion
tasks. On language modeling, RMN outperforms Long Short-Term Memory (LSTM)
network on three large German, Italian, and English dataset. Additionally we
perform in-depth analysis of various linguistic dimensions that RMN captures.
On Sentence Completion Challenge, for which it is essential to capture sentence
coherence, our RMN obtains 69.2% accuracy, surpassing the previous
state-of-the-art by a large margin.Comment: 8 pages, 6 figures. Accepted at NAACL 201
Deep Dialog Act Recognition using Multiple Token, Segment, and Context Information Representations
Dialog act (DA) recognition is a task that has been widely explored over the
years. Recently, most approaches to the task explored different DNN
architectures to combine the representations of the words in a segment and
generate a segment representation that provides cues for intention. In this
study, we explore means to generate more informative segment representations,
not only by exploring different network architectures, but also by considering
different token representations, not only at the word level, but also at the
character and functional levels. At the word level, in addition to the commonly
used uncontextualized embeddings, we explore the use of contextualized
representations, which provide information concerning word sense and segment
structure. Character-level tokenization is important to capture
intention-related morphological aspects that cannot be captured at the word
level. Finally, the functional level provides an abstraction from words, which
shifts the focus to the structure of the segment. We also explore approaches to
enrich the segment representation with context information from the history of
the dialog, both in terms of the classifications of the surrounding segments
and the turn-taking history. This kind of information has already been proved
important for the disambiguation of DAs in previous studies. Nevertheless, we
are able to capture additional information by considering a summary of the
dialog history and a wider turn-taking context. By combining the best
approaches at each step, we achieve results that surpass the previous
state-of-the-art on generic DA recognition on both SwDA and MRDA, two of the
most widely explored corpora for the task. Furthermore, by considering both
past and future context, simulating annotation scenario, our approach achieves
a performance similar to that of a human annotator on SwDA and surpasses it on
MRDA.Comment: 38 pages, 7 figures, 9 tables, submitted to JAI
- …