126 research outputs found
Negation and Speculation in NLP: A Survey, Corpora, Methods, and Applications
Negation and speculation are universal linguistic phenomena that affect the performance of Natural Language Processing (NLP) applications, such as those for opinion mining and information retrieval, especially in biomedical data. In this article, we review the corpora annotated with negation and speculation in various natural languages and domains. Furthermore, we discuss the ongoing research into recent rule-based, supervised, and transfer learning techniques for the detection of negating and speculative content. Many English corpora for various domains are now annotated with negation and speculation; moreover, the availability of annotated corpora in other languages has started to increase. However, this growth is insufficient to address these important phenomena in languages with limited resources. The use of cross-lingual models and translation of the well-known languages are acceptable alternatives. We also highlight the lack of consistent annotation guidelines and the shortcomings of the existing techniques, and suggest alternatives that may speed up progress in this research direction. Adding more syntactic features may alleviate the limitations of the existing techniques, such as cue ambiguity and detecting the discontinuous scopes. In some NLP applications, inclusion of a system that is negation- and speculation-aware improves performance, yet this aspect is still not addressed or considered an essential step
Negation-instance based evaluation of end-to-end negation resolution
In this paper, we revisit the task of negation resolution, which includes the subtasks of cue detection (e.g. ânotâ, âneverâ) and scope resolution. In the context of previous shared tasks, a variety of evaluation metrics have been proposed. Subsequent works usually use different subsets of these, including variations and custom implementations, rendering meaningful comparisons between systems difficult. Examining the problem both from a linguistic perspective and from a downstream viewpoint, we here argue for a negation-instance based approach to evaluating negation resolution. Our proposed metrics correspond to expectations over per-instance scores and hence are intuitively interpretable. To render research comparable and to foster future work, we provide results for a set of current state-of-the-art systems for negation resolution on three English corpora, and make our implementation of the evaluation scripts publicly available
Learning Disentangled Representations of Negation and Uncertainty
Negation and uncertainty modeling are long-standing tasks in natural language
processing. Linguistic theory postulates that expressions of negation and
uncertainty are semantically independent from each other and the content they
modify. However, previous works on representation learning do not explicitly
model this independence. We therefore attempt to disentangle the
representations of negation, uncertainty, and content using a Variational
Autoencoder. We find that simply supervising the latent representations results
in good disentanglement, but auxiliary objectives based on adversarial learning
and mutual information minimization can provide additional disentanglement
gains.Comment: Accepted to ACL 2022. 18 pages, 7 figures. Code and data are
available at https://github.com/jvasilakes/disentanglement-va
Computational models for multilingual negation scope detection
Negation is a common property of languages, in that there are few languages, if any,
that lack means to revert the truth-value of a statement.
A challenge to cross-lingual studies of negation lies in the fact that languages encode
and use it in different ways. Although this variation has been extensively researched in
linguistics, little has been done in automated language processing. In particular, we lack
computational models of processing negation that can be generalized across language.
We even lack knowledge of what the development of such models would require.
These models however exist and can be built by means of existing cross-lingual
resources, even when annotated data for a language other than English is not available.
This thesis shows this in the context of detecting string-level negation scope, i.e. the
set of tokens in a sentence whose meaning is affected by a negation marker (e.g. ânotâ).
Our contribution has two parts.
First, we investigate the scenario where annotated training data is available.
We show that Bi-directional Long Short Term Memory (BiLSTM) networks are
state-of-the-art models whose features can be generalized across language. We also
show that these models suffer from genre effects and that for most of the corpora we
have experimented with, high performance is simply an artifact of the annotation styles,
where negation scope is often a span of text delimited by punctuation.
Second, we investigate the scenario where annotated data is available in only one
language, experimenting with model transfer.
To test our approach, we first build NEGPAR, a parallel corpus annotated for
negation, where pre-existing annotations on English sentences have been edited and
extended to Chinese translations.
We then show that transferring a model for negation scope detection across languages
is possible by means of structured neural models where negation scope is detected on
top of a cross-linguistically consistent representation, Universal Dependencies. On
the other hand, we found cross-lingual lexical information only to help very little
with performance. Finally, error analysis shows that performance is better when a
negation marker is in the same dependency substructure as its scope and that some
of the phenomena related to negation scope requiring lexical knowledge are still not
captured correctly.
In the conclusions, we tie up the contributions of this thesis and we point future
work towards representing negation scope across languages at the level of logical form
as well
Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora
International audienceAutomatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented
- âŠ