14 research outputs found
Interpretable Neural Predictions with Differentiable Binary Variables
The success of neural networks comes hand in hand with a desire for more
interpretability. We focus on text classifiers and make them more interpretable
by having them provide a justification, a rationale, for their predictions. We
approach this problem by jointly training two neural network models: a latent
model that selects a rationale (i.e. a short and informative part of the input
text), and a classifier that learns from the words in the rationale alone.
Previous work proposed to assign binary latent masks to input positions and to
promote short selections via sparsity-inducing penalties such as L0
regularisation. We propose a latent model that mixes discrete and continuous
behaviour allowing at the same time for binary selections and gradient-based
training without REINFORCE. In our formulation, we can tractably compute the
expected value of penalties such as L0, which allows us to directly optimise
the model towards a pre-specified text selection rate. We show that our
approach is competitive with previous work on rationale extraction, and explore
further uses in attention mechanisms
DoLFIn: Distributions over Latent Features for Interpretability
Interpreting the inner workings of neural models is a key step in ensuring
the robustness and trustworthiness of the models, but work on neural network
interpretability typically faces a trade-off: either the models are too
constrained to be very useful, or the solutions found by the models are too
complex to interpret. We propose a novel strategy for achieving
interpretability that -- in our experiments -- avoids this trade-off. Our
approach builds on the success of using probability as the central quantity,
such as for instance within the attention mechanism. In our architecture,
DoLFIn (Distributions over Latent Features for Interpretability), we do no
determine beforehand what each feature represents, and features go altogether
into an unordered set. Each feature has an associated probability ranging from
0 to 1, weighing its importance for further processing. We show that, unlike
attention and saliency map approaches, this set-up makes it straight-forward to
compute the probability with which an input component supports the decision the
neural model makes. To demonstrate the usefulness of the approach, we apply
DoLFIn to text classification, and show that DoLFIn not only provides
interpretable solutions, but even slightly outperforms the classical CNN and
BiLSTM text classifiers on the SST2 and AG-news datasets
Concept Matching for Low-Resource Classification
We propose a model to tackle classification tasks in the presence of very
little training data. To this aim, we approximate the notion of exact match
with a theoretically sound mechanism that computes a probability of matching in
the input space. Importantly, the model learns to focus on elements of the
input that are relevant for the task at hand; by leveraging highlighted
portions of the training data, an error boosting technique guides the learning
process. In practice, it increases the error associated with relevant parts of
the input by a given factor. Remarkable results on text classification tasks
confirm the benefits of the proposed approach in both balanced and unbalanced
cases, thus being of practical use when labeling new examples is expensive. In
addition, by inspecting its weights, it is often possible to gather insights on
what the model has learned
Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking
Graph neural networks (GNNs) have become a popular approach to integrating
structural inductive biases into NLP models. However, there has been little
work on interpreting them, and specifically on understanding which parts of the
graphs (e.g. syntactic trees or co-reference structures) contribute to a
prediction. In this work, we introduce a post-hoc method for interpreting the
predictions of GNNs which identifies unnecessary edges. Given a trained GNN
model, we learn a simple classifier that, for every edge in every layer,
predicts if that edge can be dropped. We demonstrate that such a classifier can
be trained in a fully differentiable fashion, employing stochastic gates and
encouraging sparsity through the expected norm. We use our technique as
an attribution method to analyze GNN models for two tasks -- question answering
and semantic role labeling -- providing insights into the information flow in
these models. We show that we can drop a large proportion of edges without
deteriorating the performance of the model, while we can analyse the remaining
edges for interpreting model predictions