3,033 research outputs found
Network analysis of named entity co-occurrences in written texts
The use of methods borrowed from statistics and physics to analyze written
texts has allowed the discovery of unprecedent patterns of human behavior and
cognition by establishing links between models features and language structure.
While current models have been useful to unveil patterns via analysis of
syntactical and semantical networks, only a few works have probed the relevance
of investigating the structure arising from the relationship between relevant
entities such as characters, locations and organizations. In this study, we
represent entities appearing in the same context as a co-occurrence network,
where links are established according to a null model based on random, shuffled
texts. Computational simulations performed in novels revealed that the proposed
model displays interesting topological features, such as the small world
feature, characterized by high values of clustering coefficient. The
effectiveness of our model was verified in a practical pattern recognition task
in real networks. When compared with traditional word adjacency networks, our
model displayed optimized results in identifying unknown references in texts.
Because the proposed representation plays a complementary role in
characterizing unstructured documents via topological analysis of named
entities, we believe that it could be useful to improve the characterization of
written texts (and related systems), specially if combined with traditional
approaches based on statistical and deeper paradigms
Focusing for Pronoun Resolution in English Discourse: An Implementation
Anaphora resolution is one of the most active research areas in natural
language processing. This study examines focusing as a tool for the resolution
of pronouns which are a kind of anaphora. Focusing is a discourse phenomenon
like anaphora. Candy Sidner formalized focusing in her 1979 MIT PhD thesis and
devised several algorithms to resolve definite anaphora including pronouns. She
presented her theory in a computational framework but did not generally
implement the algorithms. Her algorithms related to focusing and pronoun
resolution are implemented in this thesis. This implementation provides a
better comprehension of the theory both from a conceptual and a computational
point of view. The resulting program is tested on different discourse segments,
and evaluation and analysis of the experiments are presented together with the
statistical results.Comment: iii + 49 pages, compressed, uuencoded Postscript file; revised
version of the first author's Bilkent M.S. thesis, written under the
supervision of the second author; notify Akman via e-mail
([email protected]) or fax (+90-312-266-4126) if you are unable to
obtain hardcopy, he'll work out somethin
Utilizing sub-topical structure of documents for information retrieval.
Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document
DRAMATIC ELEMENTS IN DASHNER’S MAZE RUNNER NOVEL AND FILM ADAPTATION
The research aims to reveal how the application of dramatic elements of Dashner’s Maze Runner is transformed into its film adaptation. To achieve the purpose, the researcher analyzes seven dramatic elements by Gustav Freytag’s Pyramid which consist of exposition, inciting moment, rising action, climax, falling action, resolution, and denouement. This research uses the descriptive qualitative method. The results of this research are the differences of the dramatic elements in the novel and film adaptation are not significant because only the scenes of exposition and rising action are not similar
A Corpus-Based Investigation of Definite Description Use
We present the results of a study of definite descriptions use in written
texts aimed at assessing the feasibility of annotating corpora with information
about definite description interpretation. We ran two experiments, in which
subjects were asked to classify the uses of definite descriptions in a corpus
of 33 newspaper articles, containing a total of 1412 definite descriptions. We
measured the agreement among annotators about the classes assigned to definite
descriptions, as well as the agreement about the antecedent assigned to those
definites that the annotators classified as being related to an antecedent in
the text. The most interesting result of this study from a corpus annotation
perspective was the rather low agreement (K=0.63) that we obtained using
versions of Hawkins' and Prince's classification schemes; better results
(K=0.76) were obtained using the simplified scheme proposed by Fraurud that
includes only two classes, first-mention and subsequent-mention. The agreement
about antecedents was also not complete. These findings raise questions
concerning the strategy of evaluating systems for definite description
interpretation by comparing their results with a standardized annotation. From
a linguistic point of view, the most interesting observations were the great
number of discourse-new definites in our corpus (in one of our experiments,
about 50% of the definites in the collection were classified as discourse-new,
30% as anaphoric, and 18% as associative/bridging) and the presence of
definites which did not seem to require a complete disambiguation.Comment: 47 pages, uses fullname.sty and palatino.st
An authoring tool for decision support systems in context questions of ecological knowledge
Decision support systems (DSS) support business or organizational decision-making activities, which require the access to information that is internally stored in databases or data warehouses, and externally in the Web accessed by Information Retrieval (IR) or Question Answering (QA) systems. Graphical interfaces to query these sources of information ease to constrain dynamically query formulation based on user selections, but they present a lack of flexibility in query formulation, since the expressivity power is reduced to the user interface design. Natural language interfaces (NLI) are expected as the optimal solution. However, especially for non-expert users, a real natural communication is the most difficult to realize effectively. In this paper, we propose an NLI that improves the interaction between the user and the DSS by means of referencing previous questions or their answers (i.e. anaphora such as the pronoun reference in “What traits are affected by them?”), or by eliding parts of the question (i.e. ellipsis such as “And to glume colour?” after the question “Tell me the QTLs related to awn colour in wheat”). Moreover, in order to overcome one of the main problems of NLIs about the difficulty to adapt an NLI to a new domain, our proposal is based on ontologies that are obtained semi-automatically from a framework that allows the integration of internal and external, structured and unstructured information. Therefore, our proposal can interface with databases, data warehouses, QA and IR systems. Because of the high NL ambiguity of the resolution process, our proposal is presented as an authoring tool that helps the user to query efficiently in natural language. Finally, our proposal is tested on a DSS case scenario about Biotechnology and Agriculture, whose knowledge base is the CEREALAB database as internal structured data, and the Web (e.g. PubMed) as external unstructured information.This paper has been partially supported by the MESOLAP (TIN2010-14860), GEODAS-BI (TIN2012-37493-C03-03), LEGOLANGUAGE (TIN2012-31224) and DIIM2.0 (PROMETEOII/2014/001) projects from the Spanish Ministry of Education and Competitivity. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
- …