98 research outputs found
Anaphora Resolution and Text Retrieval
Empirical approaches based on qualitative or quantitative methods of corpus linguistics have become a central paradigm within linguistics. The series takes account of this fact and provides a platform for approaches within synchronous linguistics as well as interdisciplinary works with a linguistic focus which devise new ways of working empirically and develop new data-based methods and theoretical models for empirical linguistic analyses
An authoring tool for decision support systems in context questions of ecological knowledge
Decision support systems (DSS) support business or organizational decision-making activities, which require the access to information that is internally stored in databases or data warehouses, and externally in the Web accessed by Information Retrieval (IR) or Question Answering (QA) systems. Graphical interfaces to query these sources of information ease to constrain dynamically query formulation based on user selections, but they present a lack of flexibility in query formulation, since the expressivity power is reduced to the user interface design. Natural language interfaces (NLI) are expected as the optimal solution. However, especially for non-expert users, a real natural communication is the most difficult to realize effectively. In this paper, we propose an NLI that improves the interaction between the user and the DSS by means of referencing previous questions or their answers (i.e. anaphora such as the pronoun reference in “What traits are affected by them?”), or by eliding parts of the question (i.e. ellipsis such as “And to glume colour?” after the question “Tell me the QTLs related to awn colour in wheat”). Moreover, in order to overcome one of the main problems of NLIs about the difficulty to adapt an NLI to a new domain, our proposal is based on ontologies that are obtained semi-automatically from a framework that allows the integration of internal and external, structured and unstructured information. Therefore, our proposal can interface with databases, data warehouses, QA and IR systems. Because of the high NL ambiguity of the resolution process, our proposal is presented as an authoring tool that helps the user to query efficiently in natural language. Finally, our proposal is tested on a DSS case scenario about Biotechnology and Agriculture, whose knowledge base is the CEREALAB database as internal structured data, and the Web (e.g. PubMed) as external unstructured information.This paper has been partially supported by the MESOLAP (TIN2010-14860), GEODAS-BI (TIN2012-37493-C03-03), LEGOLANGUAGE (TIN2012-31224) and DIIM2.0 (PROMETEOII/2014/001) projects from the Spanish Ministry of Education and Competitivity. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)
Anaphora Resolution and Text Retrieval
Empirical approaches based on qualitative or quantitative methods of corpus linguistics have become a central paradigm within linguistics. The series takes account of this fact and provides a platform for approaches within synchronous linguistics as well as interdisciplinary works with a linguistic focus which devise new ways of working empirically and develop new data-based methods and theoretical models for empirical linguistic analyses
Demonstrative anaphora: forms and functions in full-text scientific articles
This study examines the functions and characteristics of demonstrative anaphora (this, these, that, those) in a collection of full-text scientific documents, confirming that they play an important role in maintaining discourse focus and binding together cohesive sections of text. Unlike corpora in other subject domains, the Cystic Fibrosis database contains more demonstrative expressions than other class of anaphora. As participants in intersentential reference, demonstratives often refer to complex propositions rather than simple noun phrases. While this tendency complicates automated resolution, our results yield some suggestions toward a resolution algorithm. Primarily, we argue for the incorporation of demonstrative form since different types of demonstratives show different patterns regarding antecedent length and composition. Although further analysis is necessary, our findings provide a groundwork for future exploration
Toward Gender-Inclusive Coreference Resolution
Correctly resolving textual mentions of people fundamentally entails making
inferences about those people. Such inferences raise the risk of systemic
biases in coreference resolution systems, including biases that can harm binary
and non-binary trans and cis stakeholders. To better understand such biases, we
foreground nuanced conceptualizations of gender from sociology and
sociolinguistics, and develop two new datasets for interrogating bias in crowd
annotations and in existing coreference resolution systems. Through these
studies, conducted on English text, we confirm that without acknowledging and
building systems that recognize the complexity of gender, we build systems that
lead to many potential harms.Comment: 28 pages; ACL versio
Resolving Other-Anaphora
Institute for Communicating and Collaborative SystemsReference resolution is a major component of any natural language system. In the past
30 years significant progress has been made in coreference resolution. However, there
is more anaphora in texts than coreference. I present a computational treatment of
other-anaphora, i.e., referential noun phrases (NPs) with non-pronominal heads modi-
fied by “other” or “another”:
[. . . ] the move is designed to more accurately reflect the value of products
and to put steel on more equal footing with other commodities.
Such NPs are anaphoric (i.e., they cannot be interpreted in isolation), with an antecedent
that may occur in the previous discourse or the speaker’s and hearer’s mutual
knowledge. For instance, in the example above, the NP “other commodities” refers to
a set of commodities excluding steel, and it can be paraphrased as “commodities other
than steel”.
Resolving such cases requires first identifying the correct antecedent(s) of the
other-anaphors. This task is the major focus of this dissertation. Specifically, the
dissertation achieves two goals. First, it describes a procedure by which antecedents
of other-anaphors can be found, including constraints and preferences which narrow
down the search. Second, it presents several symbolic, machine learning and hybrid
resolution algorithms designed specifically for other-anaphora. All the algorithms have
been implemented and tested on a corpus of examples from the Wall Street Journal.
The major results of this research are the following:
1. Grammatical salience plays a lesser role in resolving other-anaphors than in resolving
pronominal anaphora. Algorithms that solely rely on grammatical features
achieved worse results than algorithms that used semantic features as well.
2. Semantic knowledge (such as “steel is a commodity”) is crucial in resolving
other-anaphors. Algorithms that operate solely on semantic features outperformed
those that operate on grammatical knowledge.
3. The quality and relevance of the semantic knowledge base is important to success.
WordNet proved insufficient as a source of semantic information for resolving
other-anaphora. Algorithms that use the Web as a knowledge base achieved better performance than those using WordNet, because the Web contains domain specific
and general world knowledge which is not available from WordNet.
4. But semantic information by itself is not sufficient to resolve other-anaphors, as
it seems to overgenerate, leading to many false positives.
5. Although semantic information is more useful than grammatical information,
only integration of semantic and grammatical knowledge sources can handle the
full range of phenomena. The best results were obtained from a combination of
semantic and grammatical resources.
6. A probabilistic framework is best at handling the full spectrum of features, both
because it does not require commitment as to the order in which the features
should be applied, and because it allows features to be treated as preferences,
rather than as absolute constraints.
7. A full resolution procedure for other-anaphora requires both a probabilistic model
and a set of informed heuristics and back-off procedures. Such a hybrid system
achieved the best results so far on other-anaphora
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
Embedding Predications
Written communication is rarely a sequence of simple assertions. More often, in addition to simple assertions, authors express subjectivity, such as beliefs, speculations, opinions, intentions, and desires. Furthermore, they link statements of various kinds to form a coherent discourse that reflects their pragmatic intent. In computational semantics, extraction of simple assertions (propositional meaning) has attracted the greatest attention, while research that focuses on extra-propositional aspects of meaning has remained sparse overall and has been largely limited to narrowly defined categories, such as hedging or sentiment analysis, treated in isolation.
In this thesis, we contribute to the understanding of extra-propositional meaning in natural language understanding, by providing a comprehensive account of the semantic phenomena that occur beyond simple assertions and examining how a coherent discourse is formed from lower level semantic elements. Our approach is linguistically based, and we propose a general, unified treatment of the semantic phenomena involved, within a computationally viable framework. We identify semantic embedding as the core notion involved in expressing extra-propositional meaning. The embedding framework is based on the structural distinction between embedding and atomic predications, the former corresponding to extra-propositional aspects of meaning. It incorporates the notions of predication source, modality scale, and scope. We develop an embedding categorization scheme and a dictionary based on it, which provide the necessary means to interpret extra-propositional meaning with a compositional semantic interpretation methodology. Our syntax-driven methodology exploits syntactic dependencies to construct a semantic embedding graph of a document. Traversing the graph in a bottom-up manner guided by compositional operations, we construct predications corresponding to extra-propositional semantic content, which form the basis for addressing practical tasks. We focus on text from two distinct domains: news articles from the Wall Street Journal, and scientific articles focusing on molecular biology. Adopting a task-based evaluation strategy, we consider the easy adaptability of the core framework to practical tasks that involve some extra-propositional aspect as a measure of its success. The computational tasks we consider include hedge/uncertainty detection, scope resolution, negation detection, biological event extraction, and attribution resolution. Our competitive results in these tasks demonstrate the viability of our proposal
- …