Search CORE

2,820 research outputs found

Thematic Annotation: extracting concepts out of documents

Author: Andrews Pierre
Rajman Martin
Publication venue
Publication date: 29/12/2004
Field of study

Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document segments into a set of concepts best preserving the document's content. This new extraction technique uses an unexplored approach to topic selection. Instead of using semantic similarity measures based on a semantic resource, the later is processed to extract the part of the conceptual hierarchy relevant to the document content. Then this conceptual hierarchy is searched to extract the most relevant set of concepts to represent the topics discussed in the document. Notice that this algorithm is able to extract generic concepts that are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Event-based Access to Historical Italian War Memoirs

Author: Nanni Federico
Ponzetto Simone Paolo
Rovera Marco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source. In this paper, we present an approach for extracting information from Italian historical war memoirs and turning it into structured knowledge. This is based on the semantic notions of events, participants and roles. We evaluate quantitatively each of the key-steps of our approach and provide a graph-based representation of the extracted knowledge, which allows to move between a Close and a Distant Reading of the collection.Comment: 23 pages, 6 figure

arXiv.org e-Print Archive

MAnnheim DOCument Server

Selectional Restrictions in HPSG

Author: Androutsopoulos Ion
Dale Robert
Publication venue
Publication date: 01/01/2000
Field of study

Selectional restrictions are semantic sortal constraints imposed on the participants of linguistic constructions to capture contextually-dependent constraints on interpretation. Despite their limitations, selectional restrictions have proven very useful in natural language applications, where they have been used frequently in word sense disambiguation, syntactic disambiguation, and anaphora resolution. Given their practical value, we explore two methods to incorporate selectional restrictions in the HPSG theory, assuming that the reader is familiar with HPSG. The first method employs HPSG's Background feature and a constraint-satisfaction component pipe-lined after the parser. The second method uses subsorts of referential indices, and blocks readings that violate selectional restrictions during parsing. While theoretically less satisfactory, we have found the second method particularly useful in the development of practical systems

arXiv.org e-Print Archive

CiteSeerX

Crossref

Having Your Cake and Eating It Too: Autonomy and Interaction in a Model of Sentence Processing

Author: Eiselt Kurt P.
Holbrook Jennifer K.
Mahesh Kavi
Publication venue
Publication date: 01/01/1993
Field of study

Is the human language understander a collection of modular processes operating with relative autonomy, or is it a single integrated process? This ongoing debate has polarized the language processing community, with two fundamentally different types of model posited, and with each camp concluding that the other is wrong. One camp puts forth a model with separate processors and distinct knowledge sources to explain one body of data, and the other proposes a model with a single processor and a homogeneous, monolithic knowledge source to explain the other body of data. In this paper we argue that a hybrid approach which combines a unified processor with separate knowledge sources provides an explanation of both bodies of data, and we demonstrate the feasibility of this approach with the computational model called COMPERE. We believe that this approach brings the language processing community significantly closer to offering human-like language processing systems.Comment: 7 pages, uses aaai.sty macr

arXiv.org e-Print Archive

CiteSeerX

A New Framework for Personal Name Disambiguation

Author: A Elmagarmid
B Hachey
J Lehman
M Bilenko
W Yang
Y Zhai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/11/2018
Field of study

Heriot Watt Pure

Crossref

Narrative Language as an Expression of Individual and Group Identity

Author: Csertő István
Ehmann Bea
Ferenczhalmy Réka
Fülöp Éva
Hargitai Rita
Lendvai Piroska
László János
Péley Bernadette
Pólya Tibor
Szalai Katalin
Vincze Orsolya
Publication venue: 'SAGE Publications'
Publication date: 01/01/2013
Field of study

Scientific Narrative Psychology integrates quantitative methodologies into the study of identity. Its methodology, Narrative Categorical Analysis, and its toolkit, NarrCat, were both originally developed by the Hungarian Narrative Psychology Group. NarrCat is for machine-made transformation of sentences in self-narratives into psychologically relevant, statistically processable narrative categories. The main body of this flexible and comprehensive system is formed by Psycho-Thematic modules, such as Agency, Evaluation, Emotion, Cognition, Spatiality, and Temporality. The Relational Modules include Social References, Semantic Role Labeling (SRL), and Negation. Certain elements can be combined into Hypermodules, such as Psychological Perspective and Spatio-Temporal Perspective, which allow for even more complex, higher level exploration of composite psychological processes. Using up-to-date developments of corpus linguistics and Natural Language Processing (NLP), a unique feature of NarrCat is its capacity of SRL. The structure of NarrCat, as well as the empirical results in group identity research, is discussed

Directory of Open Access Journals

Repository of the Academy's Library

Personal named entity linking based on simple partial tree matching and context free grammar

Author: Buatongkue Sirisuda
Publication venue: Mathematical and Computer Sciences
Publication date: 01/04/2017
Field of study

Personal name disambiguation is the task of linking a personal name to a unique comparable entry in the real world, also known as named entity linking (NEL). Algorithms for NEL consist of three main components: extractor, searcher, and disambiguator. Existing approaches for NEL use exact-matched look-up over the surface form to generate a set of candidate entities in each of the mentioned names. The exact-matched look-up is wholly inadequate to generate a candidate entity due to the fact that the personal names within a web page lack uniform representation. In addition, the performance of a disambiguator in ranking candidate entities is limited by context similarity. Context similarity is an inflexible feature for personal disambiguation because natural language is highly variable. We propose a new approach that can be used to both identify and disambiguate personal names mentioned on a web page. Our NEL algorithm uses: as an extractor: a control flow graph; AlchemyAPI, as a searcher: Personal Name Transformation Modules (PNTM) based on Context Free Grammar and the Jaro-Winkler text similarity metric and as a disambiguator: the entity coherence method: the Occupation Architecture for Personal Name Disambiguation (OAPnDis), personal name concepts and Simple Partial Tree Matching (SPTM). Experimental results, evaluated on real-world data sets, show that the accuracy of our NEL is 92%, which is higher than the accuracy of previously used methods

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods

Author: Pedersen Ted
Publication venue
Publication date: 01/10/2010
Field of study

Measuring the similarity of short written contexts is a fundamental problem in Natural Language Processing. This article provides a unifying framework by which short context problems can be categorized both by their intended application and proposed solution. The goal is to show that various problems and methodologies that appear quite different on the surface are in fact very closely related. The axes by which these categorizations are made include the format of the contexts (headed versus headless), the way in which the contexts are to be measured (first-order versus second-order similarity), and the information used to represent the features in the contexts (micro versus macro views). The unifying thread that binds together many short context applications and methods is the fact that similarity decisions must be made between contexts that share few (if any) words in common.Comment: 23 page

arXiv.org e-Print Archive

University of Minnesota Digital Conservancy