5,369 research outputs found
Instantiation of relations for semantic annotation
http://www.ieee.orgThis paper presents a methodology for the semantic annotation of web pages with individuals of a domain ontology. While most semantic annotation systems can recognize knowledge units, they usually do not establish explicit relations between them. The method presented identifies the individuals which should be related among the whole set of individuals and codes them as role instances within an OWL ontology. This is done by using a correspondence between the tree structure of a web page and the semantics of the information it contains
Web based knowledge extraction and consolidation for automatic ontology instantiation
The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically ex-tract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to gen-erate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation
Ontology Population via NLP Techniques in Risk Management
In this paper we propose an NLP-based method for Ontology Population from texts and apply it to semi automatic instantiate a Generic Knowledge Base (Generic Domain Ontology) in the risk management domain. The approach is semi-automatic and uses a domain expert intervention for validation. The proposed approach relies on a set of Instances Recognition Rules based on syntactic structures, and on the predicative power of verbs in the instantiation process. It is not domain dependent since it heavily relies on linguistic knowledge. A description of an experiment performed on a part of the ontology of the PRIMA project (supported by the European community) is given. A first validation of the method is done by populating this ontology with Chemical Fact Sheets from Environmental Protection Agency . The results of this experiment complete the paper and support the hypothesis that relying on the predicative power of verbs in the instantiation process improves the performance.Information Extraction, Instance Recognition Rules, Ontology Population, Risk Management, Semantic Analysis
Refining Implicit Argument Annotation for UCCA
Predicate-argument structure analysis is a central component in meaning
representations of text. The fact that some arguments are not explicitly
mentioned in a sentence gives rise to ambiguity in language understanding, and
renders it difficult for machines to interpret text correctly. However, only
few resources represent implicit roles for NLU, and existing studies in NLP
only make coarse distinctions between categories of arguments omitted from
linguistic form. This paper proposes a typology for fine-grained implicit
argument annotation on top of Universal Conceptual Cognitive Annotation's
foundational layer. The proposed implicit argument categorisation is driven by
theories of implicit role interpretation and consists of six types: Deictic,
Generic, Genre-based, Type-identifiable, Non-specific, and Iterated-set. We
exemplify our design by revisiting part of the UCCA EWT corpus, providing a new
dataset annotated with the refinement layer, and making a comparative analysis
with other schemes.Comment: DMR 202
EliXR-TIME: A Temporal Knowledge Representation for Clinical Research Eligibility Criteria.
Effective clinical text processing requires accurate extraction and representation of temporal expressions. Multiple temporal information extraction models were developed but a similar need for extracting temporal expressions in eligibility criteria (e.g., for eligibility determination) remains. We identified the temporal knowledge representation requirements of eligibility criteria by reviewing 100 temporal criteria. We developed EliXR-TIME, a frame-based representation designed to support semantic annotation for temporal expressions in eligibility criteria by reusing applicable classes from well-known clinical temporal knowledge representations. We used EliXR-TIME to analyze a training set of 50 new temporal eligibility criteria. We evaluated EliXR-TIME using an additional random sample of 20 eligibility criteria with temporal expressions that have no overlap with the training data, yielding 92.7% (76 / 82) inter-coder agreement on sentence chunking and 72% (72 / 100) agreement on semantic annotation. We conclude that this knowledge representation can facilitate semantic annotation of the temporal expressions in eligibility criteria
Using Protege for automatic ontology instantiation
This paper gives an overview on the use of Protégé in the Artequakt system, which integrated Protégé with a set of natural language tools to automatically extract knowledge about artists from web documents and instantiate a given ontology. Protégé was also linked to structured templates that generate documents from the knowledge fragments it maintains
- …