2,820 research outputs found
Thematic Annotation: extracting concepts out of documents
Contrarily to standard approaches to topic annotation, the technique used in
this work does not centrally rely on some sort of -- possibly statistical --
keyword extraction. In fact, the proposed annotation algorithm uses a large
scale semantic database -- the EDR Electronic Dictionary -- that provides a
concept hierarchy based on hyponym and hypernym relations. This concept
hierarchy is used to generate a synthetic representation of the document by
aggregating the words present in topically homogeneous document segments into a
set of concepts best preserving the document's content.
This new extraction technique uses an unexplored approach to topic selection.
Instead of using semantic similarity measures based on a semantic resource, the
later is processed to extract the part of the conceptual hierarchy relevant to
the document content. Then this conceptual hierarchy is searched to extract the
most relevant set of concepts to represent the topics discussed in the
document. Notice that this algorithm is able to extract generic concepts that
are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure
Event-based Access to Historical Italian War Memoirs
The progressive digitization of historical archives provides new, often
domain specific, textual resources that report on facts and events which have
happened in the past; among these, memoirs are a very common type of primary
source. In this paper, we present an approach for extracting information from
Italian historical war memoirs and turning it into structured knowledge. This
is based on the semantic notions of events, participants and roles. We evaluate
quantitatively each of the key-steps of our approach and provide a graph-based
representation of the extracted knowledge, which allows to move between a Close
and a Distant Reading of the collection.Comment: 23 pages, 6 figure
Selectional Restrictions in HPSG
Selectional restrictions are semantic sortal constraints imposed on the
participants of linguistic constructions to capture contextually-dependent
constraints on interpretation. Despite their limitations, selectional
restrictions have proven very useful in natural language applications, where
they have been used frequently in word sense disambiguation, syntactic
disambiguation, and anaphora resolution. Given their practical value, we
explore two methods to incorporate selectional restrictions in the HPSG theory,
assuming that the reader is familiar with HPSG. The first method employs HPSG's
Background feature and a constraint-satisfaction component pipe-lined after the
parser. The second method uses subsorts of referential indices, and blocks
readings that violate selectional restrictions during parsing. While
theoretically less satisfactory, we have found the second method particularly
useful in the development of practical systems
Having Your Cake and Eating It Too: Autonomy and Interaction in a Model of Sentence Processing
Is the human language understander a collection of modular processes
operating with relative autonomy, or is it a single integrated process? This
ongoing debate has polarized the language processing community, with two
fundamentally different types of model posited, and with each camp concluding
that the other is wrong. One camp puts forth a model with separate processors
and distinct knowledge sources to explain one body of data, and the other
proposes a model with a single processor and a homogeneous, monolithic
knowledge source to explain the other body of data. In this paper we argue that
a hybrid approach which combines a unified processor with separate knowledge
sources provides an explanation of both bodies of data, and we demonstrate the
feasibility of this approach with the computational model called COMPERE. We
believe that this approach brings the language processing community
significantly closer to offering human-like language processing systems.Comment: 7 pages, uses aaai.sty macr
Narrative Language as an Expression of Individual and Group Identity
Scientific Narrative Psychology integrates quantitative methodologies into the study of identity. Its methodology, Narrative Categorical Analysis, and its toolkit, NarrCat, were both originally developed by the Hungarian Narrative Psychology Group. NarrCat is for machine-made transformation of sentences in self-narratives into psychologically relevant, statistically processable narrative categories. The main body of this flexible and comprehensive system is formed by Psycho-Thematic modules, such as Agency, Evaluation, Emotion, Cognition, Spatiality, and Temporality. The Relational Modules include Social References, Semantic Role Labeling (SRL), and Negation. Certain elements can be combined into Hypermodules, such as Psychological Perspective and Spatio-Temporal Perspective, which allow for even more complex, higher level exploration of composite psychological processes. Using up-to-date developments of corpus linguistics and Natural Language Processing (NLP), a unique feature of NarrCat is its capacity of SRL. The structure of NarrCat, as well as the empirical results in group identity research, is discussed
Personal named entity linking based on simple partial tree matching and context free grammar
Personal name disambiguation is the task of linking a personal name to a unique comparable
entry in the real world, also known as named entity linking (NEL). Algorithms for NEL
consist of three main components: extractor, searcher, and disambiguator.
Existing approaches for NEL use exact-matched look-up over the surface form to generate
a set of candidate entities in each of the mentioned names. The exact-matched look-up
is wholly inadequate to generate a candidate entity due to the fact that the personal names
within a web page lack uniform representation. In addition, the performance of a disambiguator
in ranking candidate entities is limited by context similarity. Context similarity is
an inflexible feature for personal disambiguation because natural language is highly variable.
We propose a new approach that can be used to both identify and disambiguate personal
names mentioned on a web page. Our NEL algorithm uses: as an extractor: a control flow
graph; AlchemyAPI, as a searcher: Personal Name Transformation Modules (PNTM) based
on Context Free Grammar and the Jaro-Winkler text similarity metric and as a disambiguator:
the entity coherence method: the Occupation Architecture for Personal Name Disambiguation
(OAPnDis), personal name concepts and Simple Partial Tree Matching (SPTM).
Experimental results, evaluated on real-world data sets, show that the accuracy of our NEL
is 92%, which is higher than the accuracy of previously used methods
Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods
Measuring the similarity of short written contexts is a fundamental problem
in Natural Language Processing. This article provides a unifying framework by
which short context problems can be categorized both by their intended
application and proposed solution. The goal is to show that various problems
and methodologies that appear quite different on the surface are in fact very
closely related. The axes by which these categorizations are made include the
format of the contexts (headed versus headless), the way in which the contexts
are to be measured (first-order versus second-order similarity), and the
information used to represent the features in the contexts (micro versus macro
views). The unifying thread that binds together many short context applications
and methods is the fact that similarity decisions must be made between contexts
that share few (if any) words in common.Comment: 23 page
- âŠ