45,407 research outputs found
Building a Generation Knowledge Source using Internet-Accessible Newswire
In this paper, we describe a method for automatic creation of a knowledge
source for text generation using information extraction over the Internet. We
present a prototype system called PROFILE which uses a client-server
architecture to extract noun-phrase descriptions of entities such as people,
places, and organizations. The system serves two purposes: as an information
extraction tool, it allows users to search for textual descriptions of
entities; as a utility to generate functional descriptions (FD), it is used in
a functional-unification based generation system. We present an evaluation of
the approach and its applications to natural language generation and
summarization.Comment: 8 pages, uses eps
Ontology population for open-source intelligence: A GATE-based solution
Open-Source INTelligence is intelligence based on publicly available sources such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but because of the vast amount of documents available, automatic mechanisms for their population are needed, starting from the crawled text. This paper presents an approach for the automatic population of predefined ontologies with data extracted from text and discusses the design and realization of a pipeline based on the General Architecture for Text Engineering system, which is interesting for both researchers and practitioners in the field. Some experimental results that are encouraging in terms of extracted correct instances of the ontology are also reported. Furthermore, the paper also describes an alternative approach and provides additional experiments for one of the phases of our pipeline, which requires the use of predefined dictionaries for relevant entities. Through such a variant, the manual workload required in this phase was reduced, still obtaining promising results
Generating indicative-informative summaries with SumUM
We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies
Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Descriptions of Entities
This paper presents the results of a study on the semantic constraints
imposed on lexical choice by certain contextual indicators. We show how such
indicators are computed and how correlations between them and the choice of a
noun phrase description of a named entity can be automatically established
using supervised learning. Based on this correlation, we have developed a
technique for automatic lexical choice of descriptions of entities in text
generation. We discuss the underlying relationship between the pragmatics of
choosing an appropriate description that serves a specific purpose in the
automatically generated text and the semantics of the description itself. We
present our work in the framework of the more general concept of reuse of
linguistic structures that are automatically extracted from large corpora. We
present a formal evaluation of our approach and we conclude with some thoughts
on potential applications of our method.Comment: 7 pages, uses colacl.sty and acl.bst, uses epsfig. To appear in the
Proceedings of the Joint 17th International Conference on Computational
Linguistics 36th Annual Meeting of the Association for Computational
Linguistics (COLING-ACL'98
A Biologically Informed Hylomorphism
Although contemporary metaphysics has recently undergone a neo-Aristotelian revival wherein dispositions, or capacities are now commonplace in empirically grounded ontologies, being routinely utilised in theories of causality and modality, a central Aristotelian concept has yet to be given serious attention – the doctrine of hylomorphism. The reason for this is clear: while the Aristotelian ontological distinction between actuality and potentiality has proven to be a fruitful conceptual framework with which to model the operation of the natural world, the distinction between form and matter has yet to similarly earn its keep. In this chapter, I offer a first step toward showing that the hylomorphic framework is up to that task. To do so, I return to the birthplace of that doctrine - the biological realm. Utilising recent advances in developmental biology, I argue that the hylomorphic framework is an empirically adequate and conceptually rich explanatory schema with which to model the nature of organism
Recommended from our members
DNA methylation-based classification of central nervous system tumours.
Accurate pathological diagnosis is crucial for optimal management of patients with cancer. For the approximately 100 known tumour types of the central nervous system, standardization of the diagnostic process has been shown to be particularly challenging-with substantial inter-observer variability in the histopathological diagnosis of many tumour types. Here we present a comprehensive approach for the DNA methylation-based classification of central nervous system tumours across all entities and age groups, and demonstrate its application in a routine diagnostic setting. We show that the availability of this method may have a substantial impact on diagnostic precision compared to standard methods, resulting in a change of diagnosis in up to 12% of prospective cases. For broader accessibility, we have designed a free online classifier tool, the use of which does not require any additional onsite data processing. Our results provide a blueprint for the generation of machine-learning-based tumour classifiers across other cancer entities, with the potential to fundamentally transform tumour pathology
Coping with lists in the ifcOWL ontology
Over the past few years, several suggestions have been made of how to convert an EXPRESS schema into an OWL ontology. The conversion from EXPRESS to OWL is of particular use to architectural design and construction industry, because one of the key data models in architectural design and construction industry, namely the Industry Foundation Classes (IFC) is represented using the EXPRESS information modelling language. In each of these conversion options, the way in which lists are converted (e.g. lists of coordinates, lists of spaces in a floor) is key to the structure and eventual strength of the resulting ontology. In this article, we outline and discuss the main decisions that can be made in converting LIST concepts in EXPRESS to equivalent OWL expressions. This allows one to identify which conversion option is appropriate to support proper and efficient information reuse in the domain of architecture and construction
Spanish named entity recognition in the biomedical domain
Named Entity Recognition in the clinical domain and in languages different from English has the difficulty of the absence of complete dictionaries, the informality of texts, the polysemy of terms, the lack of accordance in the boundaries of an entity, the scarcity of corpora and of other resources available. We present a Named Entity Recognition method for poorly resourced languages. The method was tested with Spanish radiology reports and compared with a conditional random fields system.Peer ReviewedPostprint (author's final draft
- …