Search CORE

74,836 research outputs found

Recommended from our members

The analysis and acquisition of proper names for robust text understanding

Author: Coates-Stephens S.
Publication venue
Publication date
Field of study

In this thesis we consider the problems that Proper Names cause in the analysis of unedited, naturally-occurring text. Proper Names cause problems because of their high frequency in many types of text, their poor coverage in conventional dictionaries, their importance in the text understanding process, and the complexity of their structure and the structure of the text which describes them. For the most part these problems have been ignored in the field of Natural Language Processing, with the result that Proper Names are one of its most under-researched areas. As a solution to the problem, we present a detailed description of the syntax and semantics of seven major classes of Proper Name, and of their surrounding context. This description leads to the construction of syntactic and semantic rules specifically for the analysis of Proper Names, which capitalise on the wealth of descriptive material which often accompanies a Proper Name when it occurs in a text. Such an approach side-steps the problem of lexical coverage, by allowing a text processing system to use the very text it is analysing to construct lexical and knowledge base entries for unknown Proper Names as it encounters them. The information acquired on unknown Proper Names goes considerably beyond a simple syntactic and semantic classification, instead consisting of a detailed genus and differentia description. A complete solution to the 'Proper Name Problem' must include approaches to the handling of apposition, conjunction and ellipsis, abbreviated reference, and many of the far from standard phenomena encountered in naturally-occurring text. The thesis advances partial and practical solutions in all of these areas. In order to set the work described in a suitable context, the problems of Proper Names are viewed as a subset of the general problem of lexical inadequacy, as it arises in processing real, un-edited, text. The whole of this field is reviewed, and various methods of lexical acquisition compared and evaluated. Our approach to coping with lexical inadequacy and to handling Proper Names is implemented in a news text understanding system called FUNES, which is able to automatically acquire detailed genus and differentia information on Proper Names as it encounters them in its processing of news text. We present an assessment of the system's performance on a sample of unseen news text which is held to support the validity of our approach to handling Proper Names

City Research Online

Building a Generation Knowledge Source using Internet-Accessible Newswire

Author: McKeown Kathleen R.
Radev Dragomir R.
Publication venue
Publication date: 01/01/1997
Field of study

In this paper, we describe a method for automatic creation of a knowledge source for text generation using information extraction over the Internet. We present a prototype system called PROFILE which uses a client-server architecture to extract noun-phrase descriptions of entities such as people, places, and organizations. The system serves two purposes: as an information extraction tool, it allows users to search for textual descriptions of entities; as a utility to generate functional descriptions (FD), it is used in a functional-unification based generation system. We present an evaluation of the approach and its applications to natural language generation and summarization.Comment: 8 pages, uses eps

arXiv.org e-Print Archive

CiteSeerX

Crossref

Columbia University Academic Commons

Information extraction

Author: Hoede C.
Zhang Lei
Publication venue: University of Twente, Department of Applied Mathematics
Publication date: 01/01/2002
Field of study

In this paper we present a new approach to extract relevant information by knowledge graphs from natural language text. We give a multiple level model based on knowledge graphs for describing template information, and investigate the concept of partial structural parsing. Moreover, we point out that expansion of concepts plays an important role in thinking, so we study the expansion of knowledge graphs to use context information for reasoning and merging of templates

University of Twente Research Information

Ontologies and Information Extraction

Author: Nazarenko Adeline
Nédellec Claire
Publication venue
Publication date: 01/01/2005
Field of study

This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts

Author: Bridges Robert A.
Czejdo Bogdan
Goodall John R.
Iannacone Michael D.
McNeil Nikki
Perez Nicolas
Publication venue
Publication date: 11/10/2013
Field of study

Public disclosure of important security information, such as knowledge of vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and other online sources months before proper classification into structured databases. In order to facilitate timely discovery of such knowledge, we propose a novel semi-supervised learning algorithm, PACE, for identifying and classifying relevant entities in text sources. The main contribution of this paper is an enhancement of the traditional bootstrapping method for entity extraction by employing a time-memory trade-off that simultaneously circumvents a costly corpus search while strengthening pattern nomination, which should increase accuracy. An implementation in the cyber-security domain is discussed as well as challenges to Natural Language Processing imposed by the security domain.Comment: 6 pages, 3 figures, ieeeTran conference. International Conference on Machine Learning and Applications 201

arXiv.org e-Print Archive

Crossref

Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

Author: Ferrández Sergio
Monachini Monica
Muñoz Rafael
Toral Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2011
Field of study

This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for diﬀerent languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which aﬀects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The diﬀerent steps of the procedure (mapping, disambiguation, extraction, NE identiﬁcation and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

DCU Online Research Access Service

IMAGINE Final Report

Author: Arana C
Dattani I
Pick R
Recio I
Schmidt P
Publication venue: s.n.
Publication date: 01/09/2003
Field of study

Southampton (e-Prints Soton)