23,029 research outputs found
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Template Mining for Information Extraction from Digital Documents
published or submitted for publicatio
Towards automatic construction of domain ontologies: Application to ISA88 and assessment
Process Systems Engineering has shown a growing interest on ontologies to develop knowledge models, organize information, and produce software accordingly. Although software tools supporting the structure of ontologies exist, developing a PSE ontology is a creative procedure to be performed by human experts from each specific domain. This work explores the opportunities for automatic construction of domain ontologies. Specialised documentation can be selected and automatically parsed; next pattern recognition methods can be used to extract concepts and relations; finally, supervision is required to validate the automatic outcome, as well as to complete the task. The bulk of the development of an ontology is expected to result from the application of systematic procedures, thus the development time will be significantly reduced. Automatic methods were prepared and applied to the development of an ontology for batch processing based on the ISA88 standard. Methods are described and commented, and results are discussed from the comparison with a previous ontology for the same domain manually developed.Postprint (published version
Generating indicative-informative summaries with SumUM
We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies
Automatic extraction of paraphrastic phrases from medium size corpora
This paper presents a versatile system intended to acquire paraphrastic
phrases from a representative corpus. In order to decrease the time spent on
the elaboration of resources for NLP system (for example Information
Extraction, IE hereafter), we suggest to use a machine learning system that
helps defining new templates and associated resources. This knowledge is
automatically derived from the text collection, in interaction with a large
semantic network
- …