682 research outputs found
The horse before the cart: improving the accuracy of taxonomic directions when building tag hierarchies
Content on the Web is huge and constantly growing, and building taxonomies for such content can help with navigation and organisation, but building taxonomies manually is costly and time-consuming. An alternative is to allow users to construct folksonomies: collective social classifications. Yet, folksonomies are inconsistent and their use for searching and browsing is limited. Approaches have been suggested for acquiring implicit hierarchical structures from folksonomies, however, but these approaches suffer from the âpopularity-generalityâ problem, in that popularity is assumed to be a proxy for generality, i.e. high-level taxonomic terms will occur more often than low-level ones. To tackle this problem, we propose in this paper an improved approach. It is based on the HeymannâBenz algorithm, and works by checking the taxonomic directions against a corpus of text. Our results show that popularity works as a proxy for generality in at most 90.91% of cases, but this can be improved to 95.45% using our approach, which should translate to higher-quality tag hierarchy structure
Taxonomy Induction using Hypernym Subsequences
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike all previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimumcost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
stateof- the-art taxonomy induction approaches across four languages.
Importantly, we also show that our approach is robust to the presence of noise
in the input vocabulary. To the best of our knowledge, no previous approaches
have been empirically proven to manifest noise-robustness in the input
vocabulary
A conceptârelationship acquisition and inference approach for hierarchical taxonomy construction from tags
Author name used in this publication: W. M. WangAuthor name used in this publication: C. F. CheungAuthor name used in this publication: Adela S. M. Lau2009-2010 > Academic research: refereed > Publication in refereed journalAccepted ManuscriptPublishe
Recommended from our members
Toward the automation of business process ontology generation
Semantic Business Process Management (SBPM) utilises semantic technologies (e.g., ontology) to model and query process representations. There are times in which such models must be reconstructed from existing textual documentation. In this scenario the automated generation of ontological models would be preferable, however current methods and technology are still not capable of automatically generating accurate semantic process models from textual descriptions. This research attempts to automate the process as much as possible by proposing a method that drives the transformation through the joint use of a foundational ontology and lexico-semantic analysis. The method is presented, demonstrated and evaluated. The original dataset represents 150 business activities related to the procurement processes of a case study company. As the evaluation shows, the proposed method can accurately map the linguistic patterns of the process descriptions to semantic patterns of the foundational ontology to a high level of accuracy, however further research is required in order to reduce the level of human intervention, expand the method so as to recognise further patterns of the foundational ontology and develop a tool to assist the business process modeller in the semi-automated generation of process models
Building WordNet for Afaan Oromoo
WordNet is a lexical database which has many relations to disambiguate the sense of words for natural languages. From the WordNet relations synonyms and hyponym has major role for natural language processing and artificial intelligence applications. In this paper, word embedding (Word2Vec) and lexico-syntactic pattern (LSP) are developed to extract automatically synonyms and hyponyms respectively. For this study, the word embedding is evaluated on two specialized domain algorithms such as a continuous bag of words and Skip Gram algorithms and show superior results. Applying word embedding (Word2Vec) algorithms for Afaan Oromo texts has been registered 80.09% and 85.04% for the continuous bag of words and Skip Gram respectively. According to the result achieved in this study, the skip-gram algorithm does a better job for frequent pairs of words than a continuous bag of words. But, a continuous bag of words algorithm is faster while skip-gram is slower. A lexical syntactic pattern with the combination of Word2Vec and without Word2Vec is also evaluated using information retrieval evaluation metrics such as precision, recall and F-measure to extract hyponym relation from Afaan Oromoo texts. The precision, recall and F-measure have been registered by lexical syntactic patterns without the combination of Word2Vec is 66.73%, 72%, and 69.26% respectively and with the combination of Word2Vec 81.14%, 80.8%, and 81.1% have been registered for precision, recall and F-measure respectively. There are factors that could affect the accuracy of results: 1) the style of writer of Afaan Oromoo i.e. they write a noun phrase with many adjective to express the noun for the reader; and, 2) it is possible that some instances of the LSP are missed due to misspellings and other typographical errors. Keywords: Afaan Oromoo WordNet, Word embedding, Lexico syntactic patterns, Extraction of WordNet relations. DOI: 10.7176/CEIS/11-3-01 Publication date:May 31st 202
Learning Taxonomic Relations from Heterogeneous Evidence
Cimiano P, Schmidt-Thieme L, Pivk A, Staab S. Learning Taxonomic Relations from Heterogeneous Evidence. In: Proceedings of the ECAI 2004 Ontology Learning and Population Workshop. 2004
- âŠ