33 research outputs found
Neural Machine Translation with Word Predictions
In the encoder-decoder architecture for neural machine translation (NMT), the
hidden states of the recurrent structures in the encoder and decoder carry the
crucial information about the sentence.These vectors are generated by
parameters which are updated by back-propagation of translation errors through
time. We argue that propagating errors through the end-to-end recurrent
structures are not a direct way of control the hidden vectors. In this paper,
we propose to use word predictions as a mechanism for direct supervision. More
specifically, we require these vectors to be able to predict the vocabulary in
target sentence. Our simple mechanism ensures better representations in the
encoder and decoder without using any extra data or annotation. It is also
helpful in reducing the target side vocabulary and improving the decoding
efficiency. Experiments on Chinese-English and German-English machine
translation tasks show BLEU improvements by 4.53 and 1.3, respectivelyComment: Accepted at EMNLP201
CI-GNN: A Granger Causality-Inspired Graph Neural Network for Interpretable Brain Network-Based Psychiatric Diagnosis
There is a recent trend to leverage the power of graph neural networks (GNNs)
for brain-network based psychiatric diagnosis, which,in turn, also motivates an
urgent need for psychiatrists to fully understand the decision behavior of the
used GNNs. However, most of the existing GNN explainers are either post-hoc in
which another interpretive model needs to be created to explain a well-trained
GNN, or do not consider the causal relationship between the extracted
explanation and the decision, such that the explanation itself contains
spurious correlations and suffers from weak faithfulness. In this work, we
propose a granger causality-inspired graph neural network (CI-GNN), a built-in
interpretable model that is able to identify the most influential subgraph
(i.e., functional connectivity within brain regions) that is causally related
to the decision (e.g., major depressive disorder patients or healthy controls),
without the training of an auxillary interpretive network. CI-GNN learns
disentangled subgraph-level representations {\alpha} and \b{eta} that encode,
respectively, the causal and noncausal aspects of original graph under a graph
variational autoencoder framework, regularized by a conditional mutual
information (CMI) constraint. We theoretically justify the validity of the CMI
regulation in capturing the causal relationship. We also empirically evaluate
the performance of CI-GNN against three baseline GNNs and four state-of-the-art
GNN explainers on synthetic data and three large-scale brain disease datasets.
We observe that CI-GNN achieves the best performance in a wide range of metrics
and provides more reliable and concise explanations which have clinical
evidence.Comment: 45 pages, 13 figure
What Knowledge Is Needed? Towards Explainable Memory for kNN-MT Domain Adaptation
kNN-MT presents a new paradigm for domain adaptation by building an external
datastore, which usually saves all target language token occurrences in the
parallel corpus. As a result, the constructed datastore is usually large and
possibly redundant. In this paper, we investigate the interpretability issue of
this approach: what knowledge does the NMT model need? We propose the notion of
local correctness (LAC) as a new angle, which describes the potential
translation correctness for a single entry and for a given neighborhood.
Empirical study shows that our investigation successfully finds the conditions
where the NMT model could easily fail and need related knowledge. Experiments
on six diverse target domains and two language-pairs show that pruning
according to local correctness brings a light and more explainable memory for
kNN-MT domain adaptation