302 research outputs found
Recommended from our members
Using background knowledge for ontology evolution
One of the current bottlenecks for automating ontology evolution is resolving the right links between newly arising information and the existing knowledge in the ontology. Most of existing approaches mainly rely on the user when it comes to capturing and representing new knowledge. Our ontology evolution framework intends to reduce or even eliminate user input through the use of background knowledge. In this paper, we show how various sources of background knowledge could be exploited for relation discovery. We perform a relation discovery experiment focusing on the use of WordNet and Semantic Web ontologies as sources of background knowledge. We back our experiment with a thorough analysis that highlights various issues on how to improve and validate relation discovery in the future, which will directly improve the task of automatically performing ontology changes during evolution
Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings
We consider the task of inferring is-a relationships from large text corpora.
For this purpose, we propose a new method combining hyperbolic embeddings and
Hearst patterns. This approach allows us to set appropriate constraints for
inferring concept hierarchies from distributional contexts while also being
able to predict missing is-a relationships and to correct wrong extractions.
Moreover -- and in contrast with other methods -- the hierarchical nature of
hyperbolic space allows us to learn highly efficient representations and to
improve the taxonomic consistency of the inferred hierarchies. Experimentally,
we show that our approach achieves state-of-the-art performance on several
commonly-used benchmarks
TiFi: Taxonomy Induction for Fictional Domains [Extended version]
Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin
Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy
<p>Abstract</p> <p>Background</p> <p>Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively.</p> <p>Results</p> <p>The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate.</p> <p>Conclusion</p> <p>Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90% success given a consistently modelled ontology. Overall, the results show that well structured ontologies can play a very important role to improve disambiguation.</p> <p>Availability</p> <p>The three benchmark datasets created for the purpose of disambiguation are available in Additional file <supplr sid="S1">1</supplr>.</p> <suppl id="S1"> <title> <p>Additional file 1</p> </title> <text> <p><b>Benchmark datasets used in the experiments.</b> The three corpora (High quality/Low quantity corpus; Medium quality/Medium quantity corpus; Low quality/High quantity corpus) are given in the form of PubMed identifiers (PMID) for True/False cases for the 7 ambiguous terms examined (GO/MeSH/UMLS identifiers are also given).</p> </text> <file name="1471-2105-10-28-S1.txt"> <p>Click here for file</p> </file> </suppl
Cross-language Ontology Learning: Incorporating and Exploiting Cross-language Data in the Ontology Learning Process
Hans Hjelm. Cross-language Ontology Learning:
Incorporating and Exploiting Cross-language Data in the Ontology Learning Process.
NEALT Monograph Series, Vol. 1 (2009), 159 pages.
© 2009 Hans Hjelm.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/10126
Completing and Debugging Ontologies: state of the art and challenges
As semantically-enabled applications require high-quality ontologies,
developing and maintaining ontologies that are as correct and complete as
possible is an important although difficult task in ontology engineering. A key
step is ontology debugging and completion. In general, there are two steps:
detecting defects and repairing defects. In this paper we discuss the state of
the art regarding the repairing step. We do this by formalizing the repairing
step as an abduction problem and situating the state of the art with respect to
this framework. We show that there are still many open research problems and
show opportunities for further work and advancing the field.Comment: 56 page
A Latent-Dirichlet-Allocation Based Extension for Domain Ontology of Enterprise’s Technological Innovation
This paper proposed a method for building enterprise's technological innovation domain ontology automatically from plain text corpus based on Latent Dirichlet Allocation (LDA). The proposed method consisted of four modules: 1) introducing the seed ontology for domain of enterprise's technological innovation, 2) using Natural Language Processing (NLP) technique to preprocess the collected textual data, 3) mining domain specific terms from document collections based on LDA, 4) obtaining the relationship between the terms through the defined relevant rules. The experiments have been carried out to demonstrate the effectiveness of this method and the results indicated that many terms in domain of enterprise's technological innovation and the semantic relations between terms are discovered. The proposed method is a process of continuously cycles and iterations, that is the obtained objective ontology can be re-iterated as initial seed ontology. The constant knowledge acquisition in the domain of enterprise's technological innovation to update and perfect the initial seed ontology
- …