14 research outputs found
Neural Skill Transfer from Supervised Language Tasks to Reading Comprehension
Reading comprehension is a challenging task in natural language processing
and requires a set of skills to be solved. While current approaches focus on
solving the task as a whole, in this paper, we propose to use a neural network
`skill' transfer approach. We transfer knowledge from several lower-level
language tasks (skills) including textual entailment, named entity recognition,
paraphrase detection and question type classification into the reading
comprehension model.
We conduct an empirical evaluation and show that transferring language skill
knowledge leads to significant improvements for the task with much fewer steps
compared to the baseline model. We also show that the skill transfer approach
is effective even with small amounts of training data. Another finding of this
work is that using token-wise deep label supervision for text classification
improves the performance of transfer learning
A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
Much effort has been devoted to evaluate whether multi-task learning can be
leveraged to learn rich representations that can be used in various Natural
Language Processing (NLP) down-stream applications. However, there is still a
lack of understanding of the settings in which multi-task learning has a
significant effect. In this work, we introduce a hierarchical model trained in
a multi-task learning setup on a set of carefully selected semantic tasks. The
model is trained in a hierarchical fashion to introduce an inductive bias by
supervising a set of low level tasks at the bottom layers of the model and more
complex tasks at the top layers of the model. This model achieves
state-of-the-art results on a number of tasks, namely Named Entity Recognition,
Entity Mention Detection and Relation Extraction without hand-engineered
features or external NLP tools like syntactic parsers. The hierarchical
training supervision induces a set of shared semantic representations at lower
layers of the model. We show that as we move from the bottom to the top layers
of the model, the hidden states of the layers tend to represent more complex
semantic information.Comment: 8 pages, 1 figure, To appear in Proceedings of AAAI 201
When Do Discourse Markers Affect Computational Sentence Understanding?
The capabilities and use cases of automatic natural language processing (NLP)
have grown significantly over the last few years. While much work has been
devoted to understanding how humans deal with discourse connectives, this
phenomenon is understudied in computational systems. Therefore, it is important
to put NLP models under the microscope and examine whether they can adequately
comprehend, process, and reason within the complexity of natural language. In
this chapter, we introduce the main mechanisms behind automatic sentence
processing systems step by step and then focus on evaluating discourse
connective processing. We assess nine popular systems in their ability to
understand English discourse connectives and analyze how context and language
understanding tasks affect their connective comprehension. The results show
that NLP systems do not process all discourse connectives equally well and that
the computational processing complexity of different connective kinds is not
always consistently in line with the presumed complexity order found in human
processing. In addition, while humans are more inclined to be influenced during
the reading procedure but not necessarily in the final comprehension
performance, discourse connectives have a significant impact on the final
accuracy of NLP systems. The richer knowledge of connectives a system learns,
the more negative effect inappropriate connectives have on it. This suggests
that the correct explicitation of discourse connectives is important for
computational natural language processing.Comment: Chapter 7 of Discourse Markers in Interaction, published in Trends in
Linguistics. Studies and Monograph
CausaLM: Causal Model Explanation Through Counterfactual Language Models
Understanding predictions made by deep neural networks is notoriously
difficult, but also crucial to their dissemination. As all ML-based methods,
they are as good as their training data, and can also capture unwanted biases.
While there are tools that can help understand whether such biases exist, they
do not distinguish between correlation and causation, and might be ill-suited
for text-based models and for reasoning about high level language concepts. A
key problem of estimating the causal effect of a concept of interest on a given
model is that this estimation requires the generation of counterfactual
examples, which is challenging with existing generation technology. To bridge
that gap, we propose CausaLM, a framework for producing causal model
explanations using counterfactual language representation models. Our approach
is based on fine-tuning of deep contextualized embedding models with auxiliary
adversarial tasks derived from the causal graph of the problem. Concretely, we
show that by carefully choosing auxiliary adversarial pre-training tasks,
language representation models such as BERT can effectively learn a
counterfactual representation for a given concept of interest, and be used to
estimate its true causal effect on model performance. A byproduct of our method
is a language representation model that is unaffected by the tested concept,
which can be useful in mitigating unwanted bias ingrained in the data.Comment: Our code and data are available at:
https://amirfeder.github.io/CausaLM/ Under review for the Computational
Linguistics journa